* LIMT Lockheed Martin Information Technology (LMIT), Research Triangle Park, North Carolina 27709; Lilly Research Laboratory, Greenfield, Indiana 46140;
National Environmental Research Council (NERC), University of Manchester, Manchester M13 9PL, U.K.;
Alpha-Gamma Technologies, Inc., Raleigh, North Carolina 27609; ¶ U.S. Environmental Protection Agency, Research Triangle Park, North Carolina 27711; || U.S. National Center for Toxicogenomics Research (NCTR), Jefferson, Arkansas 72079; ||| U.S. National Center for Toxicogenomics (NCT), Research Triangle Park, North Carolina 27709; |||| Glaxo-SmithKline, Inc., Research Triangle Park, North Carolina 27709; # U.S. National Toxicology Program, Research Triangle Park, North Carolina 27709; ** Xybion, Inc., Cedar Knolls, New Jersey;
The European Bioinformatics Institute (EMBL-EBI) Hinxton, Wellcome Trust Genome Campus, Cambridge CB10, U.K.;
U.S. Food and Drug Administration (FDA) Center for Drug Evaluation and Research, Rockville, Maryland 20857; and
Leadscope, Inc., Columbus, Ohio 43215
1 To whom correspondence should be addressed at National Center for Toxicogenomics, PO Box 12233 Mail Drop F1-05, 111 Alexander Drive, Research Triangle Park NC 27709-2233. Fax: (919) 541-1460. E-mail: fostel{at}niehs.nih.gov.
Received July 19, 2005; accepted September 2, 2005
![]() |
ABSTRACT |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Key Words: Chemical Effects in Biological Systems (CEBS) Knowledgebase; toxicogenomics study protocols; toxicity endpoint data; acetaminophen; phenotypic anchor.
![]() |
INTRODUCTION |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
The CEBS Knowledgebase is being developed at the National Center for Toxicogenomics (NCT). Currently still early in the development process, CEBS will become a public toxicogenomics resource integrating traditional toxicology and pathology phenotype data with data from highly parallel technologies, such as from microarray or proteomics studies, in biological context using the Study Design Description (Waters, 2004; Waters et al., 2003
). To accomplish this, CEBS captures the relevant characteristics of Study Subjects and methods (Protocols), and the Study Timeline on which Events such as treatment, animal care, and exit (euthanasia) occur. These characteristics are collectively termed the Study Design Description. The CEBS Data Dictionary (CEBS-DD) includes the terms, definitions, and relationships to support the accurate capture of elements of the Study Design Description by CEBS. Once the Study Design Description has been captured, it can be used to organize and annotate the data derived from Study Subjects and to display the data in a meaningful biological context within CEBS.
The challenges inherent in building the CEBS-DD are two-fold. First, the minimal information needed to interpret a toxicogenomics Study must be identified to ensure that data deposited in CEBS meet a common minimum standard. This need is satisfied by CEBS-DD, which extends the original Minimal Information about a Microarray Experiment/Toxicology (MIAME/Tox) standard developed by the NCT, the European Bioinformatics Institute (EBI) and the International Life Sciences Institute Health and Environmental Sciences Institute (ILSI-HESI) (www.mged.org/MIAME1.1-DenverDraft.DOC). The minimal information requirement is highly dependent upon the biological conduct of the experiment, and has been extended in the CEBS-DD primarily within the framework of an acute toxicity Study. CEBS will offer a graphical user interface (GUI) to capture minimal Study information (see Figure 1).
|
The CEBS-DD is focused on defining data fields typically encountered in a standard acute toxicity Study, since this type of Study makes up the majority of the data currently in CEBS. However, the design of the CEBS-DD is such that it can be extended to capture relevant information from other areas of toxicology such as reproductive toxicology, neurotoxicology, or carcinogenesis studies by including additional terms specific to these disciplines. Terms in the CEBS-DD were defined to be as broadly applicable as possible, so that their use could be extended to descriptions of other complex biological investigations in sciences including neurobiology, infectious diseases, behavior, development, and reproduction. Therefore, the concepts defined within CEBS-DD have been shared with the Microarray Gene Expression Data (MGED) Society, Reporting Structures for Biological Investigations (RSBI) Working Groups with the aim of making them as broadly applicable and available as possible. A standard, shared data structure will permit CEBS to accept data from a wide variety of disciplines and sources, incorporate the available information into the CEBS Knowledgebase, and present the data to the user in an organized, uniform context.
Description of the CEBS Data Dictionary
Construction of the CEBS-DD
The CEBS-DD initially began as a list of data fields deemed useful to maintain in a form within CEBS accessible to query; in other words, information that might impact biological response or data quality which therefore should be kept accessible to users' queries within CEBS. For example, rather than keeping the name and availability of a particular diet regimen in an unstructured text file, retaining the diet name, composition, and availability in a structured form in CEBS makes it possible for the user to query for effects of the diet in cross-Study integrated data analysis. However this means that the details of the diet must be deposited in CEBS using specified terms and vocabulary, for example, (Diet = {NCT2000, Purina certified rodent chow 5002C}; FeedAvailability = {ad lib, calorie restricted, time restricted}), and these details must be associated with the correct animals and Studies in CEBS.
The list of desired fields in the CEBS-DD expanded following conversations with each of several different groups of potential users, such as pharmaceutical toxicologists, pathologists, chemists, bioinformaticians, and end-users with a specific query list. As the CEBS-DD developed, a number of data exchange formats and databases were examined in order to ensure that the CEBS-DD was comprehensive and that the organization, definitions, and relationships among terms in the CEBS-DD were consistent with those of other efforts. These external sources included (1) the In Vivo Data Warehouse from the Lilly Research Laboratories, the PATH/TOX System from XYBION Medical Systems, both proprietary solutions to housing data gathered in compliance with federal regulations such as the FDA's 21 Code of Federal Regulations (CFR), Part 11, concerning electronic records; (2) the Standards for Exchange of Nonclincial Data (SEND) Consortium draft ver. 1.6 exchange format developed for the electronic deposition of non-clinical data to regulators; (3) the TDMSU database format used by the National Toxicology Program (NTP) to house pathology findings; (4) DSS-Tox and Tox-ML, two formats designed to permit exchange and integration of chemical toxicity data, and (5) Toxicology Samples and Protocols (TSP) v. 1.0, a public Lab Information Management System (LIMS) designed for three omics technologies (Bao, 2005). These external sources are characterized more fully in Table 1.
|
|
The Lilly IVDW data model and data dictionary were used in developing the toxicity data domains of the CEBS-DD since the IVDW relationships permit several types of data to be readily archived and retrieved by users at Lilly. However, because CEBS will be a public resource, it must be able to accept Study Design Descriptions from different institutions, performed under different scientific protocols. Thus the section of the Lilly IVDW describing the Study itself was expanded within the CEBS-DD to permit the flexible description of any Study Design, with an initial focus on acute toxicity Studies with parallel Design Type (a Study with parallel Design Type includes two or more treatment groups treated differently and sampled over time).
The central feature of CEBS-DD is the Study, an experiment covering a defined period of time, and having experimental Subjects, and experimental methods, or Protocols. The CEBS-DD defines the Study Timeline as a series of Events, where at a given Event, a particular Protocol is applied to one or more Groups of experimental Subjects. A Group consists of biological replicates, i.e., Subjects exposed to similar experimental factors or conditions. This is diagrammed in Figure 3, illustrating these core components of the CEBS-DD and the relationships between them. The Study Timeline, Events, and Protocols are described below.
|
A Subject is the most complex biological unit within a Study Group. This definition allows organs to be considered part of a Subject when lab animals are used, or to be the Subject themselves if an in vitro organ culture is used in the experiment. A Study Group is composed of one or more Subjects and distinguished from other Groups within the Study by one or more factors under investigation. If a Group consists of more than one Subject, then they are, by definition, biological replicates. A Study Group generally has one or more comparator Groups, for instance a vehicle control, or an untreated control. The concept of Group is put forward to facilitate making the comparisons of data derived from Subjects within one Group and those within what is designated as a comparator Group. This is of particular importance when integrating toxicity endpoint data with microarray data derived from a two-color platform using RNA pooled from a group of control Subjects as a comparator. Knowing the identity of the correct comparator Subjects permits the toxicology data to be transformed into a similar format as the microarray data (e.g., ratios of a toxicity measure in a single treated animal to the average of the measures made in untreated comparator animals).
During a Study, non-invasive or invasive observations can be made on a Subject which produce numeric or textual data. These are termed Observations to distinguish them from Specimens, which are biological tissue samples taken during a Study. A Specimen can be obtained from a Subject prior to the Subject leaving the Study, for instance, a blood or urine sample taken mid-way through the Study. The Specimen is prepared and preserved, and usually stored in a time-independent way, for example in a freezer or in preservative. At this point the Specimen is considered to have left the Study, and any subsequent work performed on the Specimen is considered to be an Assay or test, outside the Study Timeline. Specimens from different Subjects can be pooled to obtain sufficient material for an Assay, for instance in the case of pooling serum samples so that drug levels can be measured. This new specimen is termed a Pool, to distinguish it from a Group. A Group consists of individual Subjects acting as biological replicates, while a Pool is a single Specimen derived by combining Specimens from individual Subjects.
Representations of Time and Events within a Study
The concept of Study used in CEBS is based on an experiment which occurs at a specific point in time. Often conditions (light, temperature, feeding, etc.) are controlled; thus one 24-hour day is essentially like any other. For this reason the actual date on which an event occurs is less important than the timing within the Study Timeline. The term "clock time" will be used to refer to "a date and time" and "Study time" will refer to a point on the Study Timeline (for example, Clock time: "9 AM on Wednesday May 11, 2005" as opposed to Study time: "9 AM on Study Day 1"). This difference between clock time and Study time is diagrammed in Figure 4. Figure 4A shows a simple Study consisting of "treat, wait a day, then exit". Viewed in this time framework, the Study duration would be 24 h, with two Events occurring, one (treatment) at time = 0 and the other (exit) at time = 24 h.
|
In addition to Study time and clock time, the timing of events with respect to the Subject's experience is also important. For example, treating an animal during the day (hours with light) leads to a different response that the same treatment applied at night (Boorman, 2005). Thus the concept of time must also capture the Subject's experience, for example time of day or time of estrus. This is another example of relative time, but in this case it is relative to the Subject rather than relative to the Study. "Subject time" can be incorporated into Study time, by indicating an Event coinciding with the light to dark change or onset of estrus. The CEBS-DD permits the description of clock time of Events by permitting a date/time stamp to be associated with the Event, and also permits the description of time relative to the Study by permitting the user to describe events as "Study day 3," for example. The time relative to light/dark cycle can be captured using CEBS-DD terms for the number of hours of light that had occurred prior to the event, and the number of hours of light per 24 h. Thus the CEBS-DD is suited for both Studies performed in a rigidly controlled laboratory and Studies carried out in environments where recording the actual date and time of an Event is important.
The CEBS-DD defines five Event Types in a Study: Subject Treatment, Subject Disposition, Observations made on the Subject, Subject Care, and Specimen Preparation. Each Event Type has an associated Protocol Type, and minimal information needed for the particular Protocol. Furthermore, the minimal information for a Protocol depends on the Subject Type, since an Exit Event for a lab animal is very different from the Exit Event for a patient. Events and Observations made prior to the start of the Study constitute Subject History.
Each Event Type will necessarily have associated methods, termed Protocols. The CEBS-DD permits many conditions of the Protocol to be captured, and defines a few critical parameters that are dependent on the Subject Type and Stressor Type. These are shown in Table 2. For example, a critical parameter in a lab animal Study is the diet provided to the animals and whether the animal was fed ad libitum. This information is included in the animal Care Protocol. Additionally the user could provide the source of the animal feed, the cage size and type, and other factors. Currently these additional factors are components of the "optional" section of the animal Care Protocol in the CEBS-DD.
|
|
|
These two examples illustrate a key concept: The definition of "minimal" information needed for a Study will depend on the Study Subject Type, Stressor Type, and other important experimental details. The CEBS-DD contains seven classifiers useful in categorizing a Study (see Supplemental Materials). These terms allow the user to rapidly classify the Study on deposition, thereby permitting CEBS to collect the corresponding minimal information and also to associate other relevant concepts with the Study Design Description. These classifiers also support rapid computational access to the Study when users query CEBS and the formation of experimental design for analysis of the data.
A prototype Study entry user interface has been developed and is undergoing testing within the NCT. Figure 5 shows the prototype Study entry page, listing the minimal information proposed to describe and classify a Study. Also given in Figure 5 are prototype Group and Timeline representations. The Event/Protocol terms in the Timeline (euthanasia, animal husbandry, and in-life observations) are specific for the Subject Type (lab animals). This information is managed by CEBS, using the terms in the CEBS-DD. Terms for each classification type are represented in the CEBS-DD although the level of detail is most highly developed for a Study of an acute toxicity Discipline Type, in a controlled lab environment, with lab animals as Subjects, using a chemical Stressor and a parallel Study Design Type.
|
|
Organizing Data Derived from Subject and Specimens
Often the experimenter makes observations of the Subjects during the Study timeline. These could include measures of a lab animal's weight and/or food consumption, of a patient's temperature and blood pressure, or of a culture's growth rate and average viability. The CEBS-DD terms such measures "Observations" to distinguish them from data derived from an Assay of biological Specimens collected during a Study. Assay data are derived from tests performed on a Specimen independently of Study time, but are linked in CEBS to the Event within the Study when the Specimen was prepared.
Because Specimens are linked to a Subject and anchored to an Event within the Study Timeline, Specimens collected sequentially from a given Subject, such as blood draws over a period of time, can be easily identified within a database such as CEBS. Organizing the data in this manner makes it straightforward to identify the biological responses exhibited by Subjects over time, or to identify non-responder Subjects or other individual behaviors that would indicate that a given Subject might not be behaving as a biological replicate to others in their Group. Furthermore, by identifying the appropriate comparator Group when the Study is deposited, the database or an application can compute "change" relative to the response seen in the comparator Group, thereby providing additional biological context for interpreting the responses.
The range of potential tests continues to evolve with technological advancements, making it impossible to enumerate all the possible Observations and Assays. However, the CEBS-DD provides a structure for deposition of these data. Each Observation will be performed on a Subject at a particular Event in the Study Timeline, using a given Protocol (either a Standard Protocol, already residing within CEBS, or a newly defined Protocol entered by the depositor). Thus Observation data can be entered into CEBS in a text file with headers: Observation Name, Observation Value, Observation Units, Subject ID, Event ID, Date/time, Protocol (for Observations that do not use a Standard Protocol). For Observations made on a Group (such as the weight of all animals within a cage) the Group ID is used in place of the Subject ID.
The case of Assay data is similar to that for Observation data. Assay data can be associated with the Event in the Study at which the Specimen used in the Assay was prepared. Examples of Assays would be a microarray experiment, a histopathology examination, a clinical pathology panel, an ELISA (Enzyme-Linked ImmunoSorbent Assay) or in situ hybridization. Each Assay will have an associated Protocol, and possibly its own Design depending on the Assay Protocol (e.g., the standard used for comparison or how the data were derived), and capture of this information is facilitated by the CEBS-DD. Models from their respective fields cover the Assay Protocol and Design, for instance the MAGE-OM covers microarray experiments, and MISFISHIE (Deutsch, 2004) is the standard proposed for in situ hybridizations. Work is underway to convert the Path Code tables used at the NTP to describe histopathology findingsNTP, into a mouse pathology ontology, and the NCT (National Center for Toxicogenomics) is currently working with KEVRIC Laboratories to construct a prototype toxicology ontology. The Assay data themselves are exchanged via data files with similar format to those used for Observations, with Specimen ID used in place of Subject ID.
Sample Quality Documentation: Data Documentation
CEBS will provide Data Documentation, a means for users to make judgments about the technical quality of data within CEBS. At one end of the spectrum, it will be nearly impossible to define a universally accepted "quality metric" for rapidly evolving technologies such as microarray and proteomics studies until the field converges on a standard set of platforms and methods. Studies covered by 21 CFR, Part 11 fall at the opposite end of the spectrummany of the validation details captured to ensure accuracy of the measuring instruments are not needed by users of a knowledgebase such as CEBS. Thus, CEBS takes a middle path in defining Data Documentation. The Data Documentation in CEBS will mirror the data standards used in laboratories to assess the quality of samples during manipulations. In the case of RNA isolation these quality assurance measurements include the starting sample integrity measured by gel electrophoresis or BioAnalyzer trace, the median length after fragmentation, and the efficiency of the label incorporation. This information is captured by TSP and other LIMS systems, and linking to such a system would permit CEBS to capture this information electronically.
In the case of pathology data, quality assurance measures might include the number of pathologists involved in reporting the finding or the lexicon used for making the finding, the availability of micrographs (Irwin et al., 2002), or the availability of historic data from the clinical pathology laboratory for comparison with the data from the current experiment. These data are often available in publications, either as text comments or included implicitly in the publication figures, and are often used by the scientist reading the paper to assign weight to the interpretations. CEBS will permit the depositor to include ancillary information that may speak to the technical quality of the data deposited in CEBS, and Data Documentation fields are therefore part of the CEBS-DD and will be available to the CEBS user.
![]() |
CASE EXAMPLE: USE OF THE CEBS-DD TO INTEGRATE OMICS DATA AND CLINICAL CHEMISTRY DATA IN THE CONTEXT OF BIOLOGY MATERIALS AND METHODS |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Analysis of protein expression was performed using 2D gel separation. Liver samples from each rat were processed individually. Samples were thawed, homogenized in 9M urea buffer containing ampholytes and dithiothreitol, centrifuged to remove debris and subjected to isoelectric focusing using a pH 3 to 10 gradient. Proteins were then separated by mass using SDS PAGE electrophoresis in 618% acrylamide, fixed and stained with Sypro Ruby (Molecular Probes). Gels were scanned, background adjusted and streaky regions omitted, and then each gel assigned to a match set where a representative control gel was selected as the standard image and spot intensities quantified by densitometry.
Differential gene expression analysis was performed using total RNA isolated using RNeasy kits (QIAGEN, Valencia, CA). Equal amounts of RNA from each vehicle-only control animal were pooled for control gene expression at each dose and time period, and compared with individual rats by hybridization to printed cDNA rat genome arrays as described by Heinloth et al. (2004). The samples were hybridized in duplicate with fluor reversal for each individual rat.
Data Preprocessing
Data from two-color cDNA arrays.
Intensity values below 300 intensity units were edited to a threshold value of 300 to stabilize the computed ratios. Ratios were converted to log2 prior to integration. Transcripts with missing signals in more than two-thirds of the arrays were excluded. 6500 genes remained.
Proteomics data.
Individual intensity values for 2D gel separated proteins from individual animals, indexed by spot numbers from a master list for the experiment, but otherwise not identified. Protein spots observed in fewer than one third of the gel images were excluded from the analysis. Values below 1000 and spots not present in the treated sample were replaced with a threshold value of 1000 prior to computing the ratio to the intensity seen in untreated animals. Ratios were converted to log10 after integration. 1832 protein spots from liver homogenate and 838 protein spots from serum were used in this analysis.
Clinical chemistry data.
Levels of liver enzymes ALT (alanine aminotransferase) and AST (aspartate aminotransferase) in serum were used to assess the toxic response of each treated animal. The ratio for each treated animal to the average value for the control animals at the same time point was computed, and converted to log10 after integration.
Data Integration and Clustering
The animals were grouped by dose-time or by phenotypic response, as diagrammed in Figures 8A and 8B. Animals were aligned by dose and time (see Fig. 8A) by grouping animals treated with equivalent doses and recovery times together to create the nine groups indicated by circles in Figure 8A. These "DT" dose-time groups are 6-h-0-dose through 48-h-1500 mg/kg dose. Animals were also aligned by phenotype, in this case by ALT and AST values. The scatter plot of ALT vs. AST in Figure 8B indicates the grouping of the animals by phenotype, to create 15 "phenotype groups" or PG. The aim was to include at least one animal from each study in each group, and to have approximately equal numbers of animals in each group overall.
|
As this is a prototyping exercise for features not yet implemented within CEBS, the matrix was imported into Spotfire Decision Site for Genomics where profiles were filtered by first eliminating those well-correlated with a straight line across all treatment conditions, then eliminating those which did not surpass 0.5 log units or fall below 0.5 log units. Following these computational filters, the remaining profiles were examined visually eliminating those that did not pass "biology-like" filters. Such "biology" filters removed flat profiles with a spike at a single animal (e.g., lacking concordant responses among biological replicates), jagged profiles alternating between increased and decreased over multiple points, and profiles judged to be too close to the zero line to be significant based only on microarray data. This process left profiles consistent with measurable, biologically relevant responses.
![]() |
RESULTS |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Three Specimens were prepared from each Subject during necropsy. Information about the Specimens from the 1500 mg/kg48 h groups in each of the two studies (transcriptomics and proteomics) is shown in Figure 6, to illustrate how Specimens are associated with Subjects. The data derived from Assays performed on these Specimens (omics, histopathology findings and clinical chemistry) are annotated using the Study Design and the CEBS-DD. All data derived from each Subject can be associated through this annotation, which permits omics data to be associated within CEBS with the dose-time exposure and with the phenotypic response (clinical chemistry and histopathology) of the Subject.
|
|
|
|
The microarray data in this example were from two-color arrays, thus the transcriptomics data are in the form of a ratio of expression in a treated animal to expression in a pool of control animals. Because the microarray data were in the form of ratios, the other data were also pre-processed and converted to ratios as described in the Methods. This produced six DT groups (150 mg/kg, sampled at 6, 24, and 48 h, and 1500 mg/kg, sampled at 6, 24, and 48 h), and the nine PG groups identified in Figure 8. The virtual rat profiles are composed of data from four sources: 6500 liver transcript measures, 1832 protein spots from liver homogenate, 838 protein spots from serum, and four measures of liver enzymes ALT and AST. The integrated data from the 15 virtual rats were clustered using unsupervised bi-directional clustering using Pearson correlation as a metric. The resulting heat map is shown in Figure 9. In this figure, green indicates elements with decreased intensity relative to control, and red indicates profiles with increased intensity relative to control. The dendrograms at the top and side indicate the similarity between clusters of correlated virtual rats and data elements, respectively.
The dendrogram of virtual rats (at the top of Fig. 9) has three branches, the rightmost of which is enclosed in a yellow outline. The profiles in this cluster come from the virtual rats DT-H24 and H48, and PG-1, 2, 3, and 4. These correspond to the virtual rats with elevated serum enzyme levels in Figure 8. The expression of many of the data elements appear to be more changed in these virtual rats compared to the others, as indicated by the more intense red and green colors in their profiles.
The dendrogram to the left of Figure 9, describing the correlation of different data elements, reveals a number of relatively equal-sized clusters. The clustering is driven by the expression in the six virtual rats outlined in Figure 9 (and named above, those with elevated ALT and AST). In order to understand how the profiles of transcripts, liver proteins, and serum proteins were behaving in this analysis, the heat map shown in Figure 9 was split into four sub-plots in Figure 10, segregated by data type. Each sub-plot in Figure 10 contains data from one source: liver transcriptomics, liver proteomics, serum proteomics, or serum enzymes. There are only four serum enzymes, and these are highly correlated, so these data appear as a single line in the serum enzyme sub-plot. This view of the data permits the investigator to identify which cluster in the left-hand dendrogram contains the serum enzymes, and the genes and proteins with expression levels highly correlated with the levels of serum enzymes. It is clear that the data elements are interdigitated in the heat map, thus the integrated data gives a new, biologically richer picture than can be derived from data from individual domains alone.
![]() |
DISCUSSION |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
The CEBS-DD is an evolving compendium of terms, definitions, relationships, and controlled vocabularies used to describe a toxicogenomics Study and the associated Phenotype data (clinical pathology and histopathology findings). The CEBS-DD is aligned with developing standards for the exchange of toxicity data, and will be used to develop parsers and user interfaces for data from different sources, using current standards for data exchange, to support the facile electronic transfer of data to CEBS. The CEBS-DD incorporates the language and terms in use in the toxicology literature and in data exchange initiatives. Thus the CEBS-DD uses terms with established meaning and relationships as well as controlled vocabularies taken from lexicons presently in use to record the Study Design Description. This is anticipated to facilitate the description of a variety of disparate Studies accurately and intuitively. Additionally, the terms and definitions within the CEBS-DD were selected to have meaning beyond acute toxicity Studies, thereby permitting easy extension of the CEBS-DD to other disciplines. This is anticipated to permit CEBS the flexibility to accept data obtained in a variety of Studies, correctly interpret the key features of the Study Design, and use this to provide the CEBS user an accurate view of the Study Timeline, Events, Subject Groups, experimental factors and Protocols.
There are several public repositories for microarray data, but none that capture the full Study Timeline, toxicity data and data from highly parallel transcriptomics and proteomics. CEBS is such a public toxicogenomics repository. Because of this role, the CEBS-DD was developed, first with the aim of facilitating the accurate capture of data from a variety of sources, and second, of providing the foundation for a standard set of terms, relationships, definitions and controlled vocabularies that may facilitate other institutions in developing their own data repositories. The exchange of data is greatly facilitated if a common set of terms is used by the community. The NCT continues to track evolving public data exchange initiatives to ensure that the CEBS-DD accurately reflects developing community standards.
The CEBS Knowledgebase aims to permit the user to integrate data from different studies. Once in CEBS, the Study Design Description, Toxicity Phenotype Data and data from highly parallel assays such as transcriptomics or proteomics can be brought together within CEBS, and also provided for download using standard formats for use in other applications. An example of the use of terms and relationships in the CEBS-DD to integrate data from a proteomics Study and a transcriptomics Study of acetaminophen responses in rats, was provided in order to illustrate the utility of this approach. The CEBS-DD terms and controlled vocabularies permit fields of interest to be defined in CEBS, for instance experimental factors such as dose and time, protocol components such as dose range and route of administration, or feed or method of disposition. Resulting toxicological measures, histopathological findings and omics data can them be computationally associated with the appropriate descriptors, and the resulting data subjected to supervised or unsupervised pattern finding. Additionally, following the example used here, data from similar but not identical animals can be combined to form a virtual subject for analysis. This approach can be used within CEBS to identify associations of molecular responses with diet, disposition methods, or with toxicological responses of different severity, occurring over time, exposure level and stressor.
The NCT extends to Journal Editors an invitation to use CEBS to manage omics data published in their journals. Increasingly there is a need for full documentation of omics experimental data sets which, due to their large size, are unsuited for inclusion in printed form. The tools contained in CEBS would allow independent evaluation of data contained in an omics study, in context of toxicology, and we feel such a database would aid in advancing the field of toxicogenomics.
![]() |
SUPPLEMENTARY DATA |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
![]() |
NOTES |
---|
![]() |
ACKNOWLEDGMENTS |
---|
![]() |
REFERENCES |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Benz, R. D., Arvidson, K., Cheeseman, M., Fostel, J., Hollingshaus, G., Johnson, D., Johnson, W., Kemper, R., Lee, P., Mathews, E., et al. (2005). ToxML, An Endpoint Specific Database Ontology For Linking Toxicology To Chemistry. Submitted.
Boorman, G., Irwin, R. D., Vallant, M. K., Gerken, D. K., Lobenhofer, E. K., Hejtmancik, M. R., Hurban, P., Brys, A. M., Travlos, G. S., Parker, J. S., and Portier, C. J. (2005). Variation in the hepatic gene expression in individual male Fischer rats. Toxicol. Pathol. 33, 102110.[CrossRef][ISI][Medline]
Deutsch, E. (2004). MISFISHIE... Minimum Information Specification For In Situ Hybridization and Immunohistochemistry Experiments (MISFISHIE). scgap.systemsbiology.net/standards/misfishie/
Heinloth, A. N., Boorman, G. A., Nettesheim, P., Fannin, R. D., Sieber, S. O., Snell, M. L., Tucker, C. J., Li, L., Travlos, G. S., Vansant, G., et al. (2004). Gene expression profiling of rat livers reveals indicators of potential adverse effects. Toxicol. Sci. 80, 193202.
Irwin, R., Boorman, G. A., Waters, M. D., Hardisty, J. F., and Sills, R. C. (2002). Quality review procedures necessary for rodent pathology databases and toxicogenomic studies: The National Toxicology Program experience. Toxicol. Pathol. 30, 8892.[CrossRef][ISI][Medline]
Richard, A., and Williams, C. R. (2002). Distributed structure-searchable toxicity (DSSTox) public database network: A proposal. Mutat. Res. 499, 2752.[ISI][Medline]
Tong, W., Harris, S., Sun, H., Fang, H., Fuscoe, J., Harris, A., Hong, H., Xie, Q., Perkins, R., Shi, L., et al. (2003). ArrayTracksupporting toxicogenomic research at the U.S. Food and Drug Administration National Center for Toxicological Research. Environ. Health Perspect. Toxicogenomics 111, 18191826.
Tong, W., Harris, S., Cao, X., Fang, H., Shi, L., Sun, H., Fuscoe, J., Harris, A., Hong, H., and Xie, O. (2004). Development of public toxicogenomics software for microarray data management and analysis. Mutat. Res./Fundam. Mol. Mechanisms Mutagen. 549, 241253.[CrossRef]
Waters, M., Boorman, G., Bushel, P., Cunningham, M., Irwin, R., Merrick, A., Olden, K., Paules, R., Selkirk, J., Stasiewicz, S., Weis, B., Van Houten, B., Walker, N., and Tennant, R. (2003). Systems toxicology and the Chemical Effects in Biological Systems (CEBS) knowledge base. Environ. Health Perspect. Toxicogenomics 111, 1528.
Waters, M. D. (2004). The CEBS Knowledge Base: The Integration of Molecular Profiling, Toxicology and Pathology Datasets for Knowledge Discovery. Presented at the 43rd Annual Meeting of the Society of Toxicology, Baltimore, MD, March 2125, 2004.
S. Consortium Standard for Exchange of Non-clinical Data. (Abstract)