Chemical Effects in Biological Systems—Data Dictionary (CEBS-DD): A Compendium of Terms for the Capture and Integration of Biological Study Design Description, Conventional Phenotypes, and ‘Omics Data

Jennifer Fostel*,1, Danielle Choi*, Craig Zwickl{dagger}, Norman Morrison{ddagger}, Asif Rashid*,§, Atif Hasan§, Wenjun Bao, Ann Richard, Weida Tong||, Pierre R. Bushel|||, Roger Brown||||, Maribel Bruno|||, Michael L. Cunningham#, David Dix, William Eastin#, Carlos Frade**, Alex Garcia{dagger}{dagger}, Alexandra Heinloth|||, Rick Irwin#, Jennifer Madenspacher|||, B. Alex Merrick|||, Thomas Papoian{bowtie}{bowtie}, Richard Paules|||, Philippe Rocca-Serra{dagger}{dagger}, Assunta-Susanna Sansone{dagger}{dagger}, James Stevens{dagger}, Kenneth Tomer|||, Chihae Yang{bowtie}{bowtie}{bowtie} and Michael Waters|||

* LIMT Lockheed Martin Information Technology (LMIT), Research Triangle Park, North Carolina 27709; {dagger} Lilly Research Laboratory, Greenfield, Indiana 46140; {ddagger} National Environmental Research Council (NERC), University of Manchester, Manchester M13 9PL, U.K.; § Alpha-Gamma Technologies, Inc., Raleigh, North Carolina 27609; U.S. Environmental Protection Agency, Research Triangle Park, North Carolina 27711; || U.S. National Center for Toxicogenomics Research (NCTR), Jefferson, Arkansas 72079; ||| U.S. National Center for Toxicogenomics (NCT), Research Triangle Park, North Carolina 27709; |||| Glaxo-SmithKline, Inc., Research Triangle Park, North Carolina 27709; # U.S. National Toxicology Program, Research Triangle Park, North Carolina 27709; ** Xybion, Inc., Cedar Knolls, New Jersey; {dagger}{dagger} The European Bioinformatics Institute (EMBL-EBI) Hinxton, Wellcome Trust Genome Campus, Cambridge CB10, U.K.; {bowtie}{bowtie} U.S. Food and Drug Administration (FDA) Center for Drug Evaluation and Research, Rockville, Maryland 20857; and {bowtie}{bowtie}{bowtie} Leadscope, Inc., Columbus, Ohio 43215

1 To whom correspondence should be addressed at National Center for Toxicogenomics, PO Box 12233 Mail Drop F1-05, 111 Alexander Drive, Research Triangle Park NC 27709-2233. Fax: (919) 541-1460. E-mail: fostel{at}niehs.nih.gov.

Received July 19, 2005; accepted September 2, 2005


    ABSTRACT
 TOP
 ABSTRACT
 INTRODUCTION
 CASE EXAMPLE: USE OF...
 RESULTS
 DISCUSSION
 SUPPLEMENTARY DATA
 REFERENCES
 
A critical component in the design of the Chemical Effects in Biological Systems (CEBS) Knowledgebase is a strategy to capture toxicogenomics study protocols and the toxicity endpoint data (clinical pathology and histopathology). A Study is generally an experiment carried out during a period of time for the purpose of obtaining data, and the Study Design Description captures the methods, timing, and organization of the Study. The CEBS Data Dictionary (CEBS-DD) has been designed to define and organize terms in an attempt to standardize nomenclature needed to describe a toxicogenomics Study in a structured yet intuitive format and provide a flexible means to describe a Study as conceptualized by the investigator. The CEBS-DD will organize and annotate information from a variety of sources, thereby facilitating the capture and display of toxicogenomics data in biological context in CEBS, i.e., associating molecular events detected in highly-parallel data with the toxicology/pathology phenotype as observed in the individual Study Subjects and linked to the experimental treatments. The CEBS-DD has been developed with a focus on acute toxicity studies, but with a design that will permit it to be extended to other areas of toxicology and biology with the addition of domain-specific terms. To illustrate the utility of the CEBS-DD, we present an example of integrating data from two proteomics and transcriptomics studies of the response to acute acetaminophen toxicity (A. N. Heinloth et al., 2004, Toxicol. Sci. 80, 193–202).

Key Words: Chemical Effects in Biological Systems (CEBS) Knowledgebase; toxicogenomics study protocols; toxicity endpoint data; acetaminophen; phenotypic anchor.


    INTRODUCTION
 TOP
 ABSTRACT
 INTRODUCTION
 CASE EXAMPLE: USE OF...
 RESULTS
 DISCUSSION
 SUPPLEMENTARY DATA
 REFERENCES
 
The ability to archive, retrieve, and exchange high content data sets, including transcript profiles, among laboratories, industries, and government agencies is a crucial step in exploiting the power of high content technologies to describe the response of an organism to the environment. A key step in achieving this end is to develop a publicly accessible database and associated standards for the exchange of data with associated metadata that provide experimental context so that the data can be mined efficiently and intuitively. Currently no national or international standard provides the necessary nucleus of metadata standards around which such a database can be organized to facilitate the facile electronic exchange of information among interested stakeholders. Therefore, a consortium of the stakeholders from the private and public sectors have contributed to the development of a data dictionary containing terms, definitions, relationships, and controlled vocabularies for the Chemical Effects in Biological Systems (CEBS) Knowledgebase.

The CEBS Knowledgebase is being developed at the National Center for Toxicogenomics (NCT). Currently still early in the development process, CEBS will become a public toxicogenomics resource integrating traditional toxicology and pathology phenotype data with data from highly parallel technologies, such as from microarray or proteomics studies, in biological context using the Study Design Description (Waters, 2004Go; Waters et al., 2003Go). To accomplish this, CEBS captures the relevant characteristics of Study Subjects and methods (Protocols), and the Study Timeline on which Events such as treatment, animal care, and exit (euthanasia) occur. These characteristics are collectively termed the Study Design Description. The CEBS Data Dictionary (CEBS-DD) includes the terms, definitions, and relationships to support the accurate capture of elements of the Study Design Description by CEBS. Once the Study Design Description has been captured, it can be used to organize and annotate the data derived from Study Subjects and to display the data in a meaningful biological context within CEBS.

The challenges inherent in building the CEBS-DD are two-fold. First, the minimal information needed to interpret a toxicogenomics Study must be identified to ensure that data deposited in CEBS meet a common minimum standard. This need is satisfied by CEBS-DD, which extends the original Minimal Information about a Microarray Experiment/Toxicology (MIAME/Tox) standard developed by the NCT, the European Bioinformatics Institute (EBI) and the International Life Sciences Institute – Health and Environmental Sciences Institute (ILSI-HESI) (www.mged.org/MIAME1.1-DenverDraft.DOC). The minimal information requirement is highly dependent upon the biological conduct of the experiment, and has been extended in the CEBS-DD primarily within the framework of an acute toxicity Study. CEBS will offer a graphical user interface (GUI) to capture minimal Study information (see Figure 1).



View larger version (40K):
[in this window]
[in a new window]
 
FIG. 1. Diagram of the architecture of CEBS and the role of the CEBS-DD. CEBS internal architecture is composed of object models and data repositories. The users access CEBS for query through a web interface. The prototype GUI is shown using the CEBS-DD as a reference to define data entered by the user. The CEBS-DD defines the minimum information needed to describe and interpret a toxicogenomics experiment, which has been the basis for developing a graphical user interface on the Research CEBS prototype for use by depositors with a single study to enter. It is likely that the SEND format will become a standard for exchange of richly annotated toxicity data between regulatory agencies and sponsors. Aligning the CEBS-DD with SEND will permit the writing of a SEND parser, to permit CEBS to incorporate SEND-formatted files without the need for manual re-entry of the information. The Xybion Path/Tox Medical System is an example of a 21CFR Part 11-compliant data repository, which has also been mapped by the CEBS-DD. The National Toxicology Program uses the Toxicology Data Management System (TDMS) to store results of studies, and the CEBS-DD is being used to guide development of SQL views into TDMS from CEBS. The Distributed Structure-Searchable Toxicity Database System (DSSTox) defines a set of common data fields (both chemical and toxicological) to be added to public chemical toxicity databases that will permit facile interoperability, aggregation and querying of chemically induced toxicology-related data [5]. CEBS-DD uses the fields defined by DSSTox, making CEBS part of the DSSTox System.

 
The second challenge, also met by the CEBS-DD, is to define the "maximum" information that can be provided by a data depositor. CEBS must be able to accurately capture any and all relevant pieces of information about the Study and then interpret and present the data in a way that permits querying by the CEBS user. In most cases well-annotated sources are already in an electronic format; thus, it is anticipated that transfer to CEBS of data from richly annotated studies will occur electronically rather than through a manual input web interface, and that the CEBS-DD will facilitate the writing of parsers by supplying annotation and synonyms for different data formats. This electronic parsing would occur apart from the CEBS web interface designed to support manual entry of minimal information (see Figure 1).

The CEBS-DD is focused on defining data fields typically encountered in a standard acute toxicity Study, since this type of Study makes up the majority of the data currently in CEBS. However, the design of the CEBS-DD is such that it can be extended to capture relevant information from other areas of toxicology such as reproductive toxicology, neurotoxicology, or carcinogenesis studies by including additional terms specific to these disciplines. Terms in the CEBS-DD were defined to be as broadly applicable as possible, so that their use could be extended to descriptions of other complex biological investigations in sciences including neurobiology, infectious diseases, behavior, development, and reproduction. Therefore, the concepts defined within CEBS-DD have been shared with the Microarray Gene Expression Data (MGED) Society, Reporting Structures for Biological Investigations (RSBI) Working Groups with the aim of making them as broadly applicable and available as possible. A standard, shared data structure will permit CEBS to accept data from a wide variety of disciplines and sources, incorporate the available information into the CEBS Knowledgebase, and present the data to the user in an organized, uniform context.

Description of the CEBS Data Dictionary
Construction of the CEBS-DD
The CEBS-DD initially began as a list of data fields deemed useful to maintain in a form within CEBS accessible to query; in other words, information that might impact biological response or data quality which therefore should be kept accessible to users' queries within CEBS. For example, rather than keeping the name and availability of a particular diet regimen in an unstructured text file, retaining the diet name, composition, and availability in a structured form in CEBS makes it possible for the user to query for effects of the diet in cross-Study integrated data analysis. However this means that the details of the diet must be deposited in CEBS using specified terms and vocabulary, for example, (Diet = {NCT2000, Purina certified rodent chow 5002C}; FeedAvailability = {ad lib, calorie restricted, time restricted}), and these details must be associated with the correct animals and Studies in CEBS.

The list of desired fields in the CEBS-DD expanded following conversations with each of several different groups of potential users, such as pharmaceutical toxicologists, pathologists, chemists, bioinformaticians, and end-users with a specific query list. As the CEBS-DD developed, a number of data exchange formats and databases were examined in order to ensure that the CEBS-DD was comprehensive and that the organization, definitions, and relationships among terms in the CEBS-DD were consistent with those of other efforts. These external sources included (1) the In Vivo Data Warehouse from the Lilly Research Laboratories, the PATH/TOX System from XYBION Medical Systems, both proprietary solutions to housing data gathered in compliance with federal regulations such as the FDA's 21 Code of Federal Regulations (CFR), Part 11, concerning electronic records; (2) the Standards for Exchange of Nonclincial Data (SEND) Consortium draft ver. 1.6 exchange format developed for the electronic deposition of non-clinical data to regulators; (3) the TDMSU database format used by the National Toxicology Program (NTP) to house pathology findings; (4) DSS-Tox and Tox-ML, two formats designed to permit exchange and integration of chemical toxicity data, and (5) Toxicology Samples and Protocols (TSP) v. 1.0, a public Lab Information Management System (LIMS) designed for three ‘omics technologies (Bao, 2005). These external sources are characterized more fully in Table 1.


View this table:
[in this window]
[in a new window]
 
TABLE 1 Characterization of the External Data Sources Examined in Preparing the CEBS-DD

 
Since CEBS will be a repository for a correspondingly wide range of data and Study types, the CEBS-DD was developed by incorporating concepts from a variety of standardized data sources. Concepts common to CEBS-DD and external source(s) serve as the beginning of a synonym list for the exchange of toxicological data. A single row of the CEBS-DD, showing the synonym alignment of terms from external sources, is reproduced in Figure 2. The terms and definitions from the full CEBS-DD are available at http://www.niehs.nih.gov/cebs-df/. Synonyms are not included there at this time since some contain proprietary information.



View larger version (54K):
[in this window]
[in a new window]
 
FIG. 2. One row from the CEBS-DD. Blocks of different colors indicate different external data sources. Text on the same row as a CEBS-DD term indicates that the concept is found in that particular source, and provides the name used for the concept within the source. The terms and definitions in the full CEBS-DD are available at www.niehs.nih.gov/cebs-df/.

 
Components and Relationships within CEBS-DD
The Lilly Research Laboratory In Vivo Data Warehouse (IVDW) was built to permit the integration of legacy and real-time safety data from different groups within the company. Because this is an active database designed for the capture and integration of toxicology Study data, the IVDW served as the foundation for the CEBS-DD. The schema for the IVDW is available at www.niehs.nih.gov/cebs-df/. The IVDW database contains a central table to hold clinical pathology results, and supporting tables identifying the Study, the animals, and the groups. These are supported in turn by additional tables containing the details of the Study.

The Lilly IVDW data model and data dictionary were used in developing the toxicity data domains of the CEBS-DD since the IVDW relationships permit several types of data to be readily archived and retrieved by users at Lilly. However, because CEBS will be a public resource, it must be able to accept Study Design Descriptions from different institutions, performed under different scientific protocols. Thus the section of the Lilly IVDW describing the Study itself was expanded within the CEBS-DD to permit the flexible description of any Study Design, with an initial focus on acute toxicity Studies with parallel Design Type (a Study with parallel Design Type includes two or more treatment groups treated differently and sampled over time).

The central feature of CEBS-DD is the Study, an experiment covering a defined period of time, and having experimental Subjects, and experimental methods, or Protocols. The CEBS-DD defines the Study Timeline as a series of Events, where at a given Event, a particular Protocol is applied to one or more Groups of experimental Subjects. A Group consists of biological replicates, i.e., Subjects exposed to similar experimental factors or conditions. This is diagrammed in Figure 3, illustrating these core components of the CEBS-DD and the relationships between them. The Study Timeline, Events, and Protocols are described below.



View larger version (33K):
[in this window]
[in a new window]
 
FIG. 3. Relationships between concepts making up the central Study structure in the CEBS-DD. An Investigation contains one or more Studies. Within a Study, Subjects are organized into Groups. Groups are treated according to Protocols at particular Events in the Study Timeline (not shown in the Figure). Subjects produce Specimens, which can be combined into a Pool.

 
An Investigation, considered to be a self-contained unit of scientific inquiry, includes one or more related Studies, has a hypothesis or central area of focus, and a named principal investigator (PI) responsible for the information within the Investigation. A Study resides in CEBS within the context of an Investigation, and also has a purpose, title and PI, duration and a start date, and is classified by the Study PI using a series of classifiers described in the next section. Several related Studies may be conducted within a single Investigation.

A Subject is the most complex biological unit within a Study Group. This definition allows organs to be considered part of a Subject when lab animals are used, or to be the Subject themselves if an in vitro organ culture is used in the experiment. A Study Group is composed of one or more Subjects and distinguished from other Groups within the Study by one or more factors under investigation. If a Group consists of more than one Subject, then they are, by definition, biological replicates. A Study Group generally has one or more comparator Groups, for instance a vehicle control, or an untreated control. The concept of Group is put forward to facilitate making the comparisons of data derived from Subjects within one Group and those within what is designated as a comparator Group. This is of particular importance when integrating toxicity endpoint data with microarray data derived from a two-color platform using RNA pooled from a group of control Subjects as a comparator. Knowing the identity of the correct comparator Subjects permits the toxicology data to be transformed into a similar format as the microarray data (e.g., ratios of a toxicity measure in a single treated animal to the average of the measures made in untreated comparator animals).

During a Study, non-invasive or invasive observations can be made on a Subject which produce numeric or textual data. These are termed Observations to distinguish them from Specimens, which are biological tissue samples taken during a Study. A Specimen can be obtained from a Subject prior to the Subject leaving the Study, for instance, a blood or urine sample taken mid-way through the Study. The Specimen is prepared and preserved, and usually stored in a time-independent way, for example in a freezer or in preservative. At this point the Specimen is considered to have left the Study, and any subsequent work performed on the Specimen is considered to be an Assay or test, outside the Study Timeline. Specimens from different Subjects can be pooled to obtain sufficient material for an Assay, for instance in the case of pooling serum samples so that drug levels can be measured. This new specimen is termed a Pool, to distinguish it from a Group. A Group consists of individual Subjects acting as biological replicates, while a Pool is a single Specimen derived by combining Specimens from individual Subjects.

Representations of Time and Events within a Study
The concept of Study used in CEBS is based on an experiment which occurs at a specific point in time. Often conditions (light, temperature, feeding, etc.) are controlled; thus one 24-hour day is essentially like any other. For this reason the actual date on which an event occurs is less important than the timing within the Study Timeline. The term "clock time" will be used to refer to "a date and time" and "Study time" will refer to a point on the Study Timeline (for example, Clock time: "9 AM on Wednesday May 11, 2005" as opposed to Study time: "9 AM on Study Day 1"). This difference between clock time and Study time is diagrammed in Figure 4. Figure 4A shows a simple Study consisting of "treat, wait a day, then exit". Viewed in this time framework, the Study duration would be 24 h, with two Events occurring, one (treatment) at time = 0 and the other (exit) at time = 24 h.



View larger version (21K):
[in this window]
[in a new window]
 
FIG. 4. Differentiating Study time and clock time. (A) A simple Study design: "treat, wait a day, then exit". The Study timeline is 24 h in duration, with two "events" occurring, one (treat) at time = 0 and the other (exit) at time = 24 h. (B) Shows a "clock time" diagram of the same Study, in which the four groups A, B, C, and D are treated, each for one 24-h period with two events (treat and euthanasia), but each treatment commencing at a different time. This diagram could represent the actual "clock time" corresponding to the Study shown in (A).

 
Figure 4B shows a clock time diagram of the same Study, in which the four Groups A, B, C and D are treated, each for one 24-h period with two Events (treat and exit), but with each treatment commencing at a different time. This diagram could represent the actual clock time corresponding to a Study with design as shown in Figure 4A, but which actually required three consecutive days to carry out. In this representation, the duration of the Study was three days. Under less controlled laboratory/environmental conditions, the clock time of events is of great importance, since environmental variables beyond the control of the investigator may be important to the interpretation of the data. Additionally, the time-of-day, time-of-year or phase-of-tide can be an important environmental variable. In this case the representation of the Study Timeline shown in Figure 4B is more appropriate.

In addition to Study time and clock time, the timing of events with respect to the Subject's experience is also important. For example, treating an animal during the day (hours with light) leads to a different response that the same treatment applied at night (Boorman, 2005Go). Thus the concept of time must also capture the Subject's experience, for example time of day or time of estrus. This is another example of relative time, but in this case it is relative to the Subject rather than relative to the Study. "Subject time" can be incorporated into Study time, by indicating an Event coinciding with the light to dark change or onset of estrus. The CEBS-DD permits the description of clock time of Events by permitting a date/time stamp to be associated with the Event, and also permits the description of time relative to the Study by permitting the user to describe events as "Study day 3," for example. The time relative to light/dark cycle can be captured using CEBS-DD terms for the number of hours of light that had occurred prior to the event, and the number of hours of light per 24 h. Thus the CEBS-DD is suited for both Studies performed in a rigidly controlled laboratory and Studies carried out in environments where recording the actual date and time of an Event is important.

The CEBS-DD defines five Event Types in a Study: Subject Treatment, Subject Disposition, Observations made on the Subject, Subject Care, and Specimen Preparation. Each Event Type has an associated Protocol Type, and minimal information needed for the particular Protocol. Furthermore, the minimal information for a Protocol depends on the Subject Type, since an Exit Event for a lab animal is very different from the Exit Event for a patient. Events and Observations made prior to the start of the Study constitute Subject History.

Each Event Type will necessarily have associated methods, termed Protocols. The CEBS-DD permits many conditions of the Protocol to be captured, and defines a few critical parameters that are dependent on the Subject Type and Stressor Type. These are shown in Table 2. For example, a critical parameter in a lab animal Study is the diet provided to the animals and whether the animal was fed ad libitum. This information is included in the animal Care Protocol. Additionally the user could provide the source of the animal feed, the cage size and type, and other factors. Currently these additional factors are components of the "optional" section of the animal Care Protocol in the CEBS-DD.


View this table:
[in this window]
[in a new window]
 
TABLE 2 Event Types and Examples of Required Information Associated with Each

 

View this table:
[in this window]
[in a new window]
 
TABLE 4 Required Treatment Parameters, Dependent on Subject Type and Stressor Type

 
The Subject Treatment is probably the most complex protocol, and is subdivided into Stressor Characteristics and Treatment Protocol in the CEBS-DD. Stressor Characteristics are properties true for the Stressor even when applied in different laboratories. In the case of a chemical Stressor, Stressor Characteristics include the structure, CAS number, various chemical descriptors, source, and other structural features defined by the DSSTox conventions (Richard and Williams, 2002Go). The Chemical Treatment Protocol includes method-specific details such as the vehicle used, the purity, the route and administration regimen, as well as the dose per administration. For a Genetic Stressor, Stressor Characteristics include the locus targeted and the construct used, while the Stressor Protocol includes the genotype of the recipient strain and any means needed to induce expression. Stressor Types and their key Characteristics are listed in Table 3. Suggested required fields for Treatment Protocols of different types are listed in Table 4.


View this table:
[in this window]
[in a new window]
 
TABLE 3 Stressor Types and Required Information Associated with the Stressor

 
Context for Identifying Minimal Necessary Descriptions
At a minimum, to interpret a toxicogenomics Study, one needs to know what was done to the Subjects, and how the Subjects responded. Of course, the more details provided, the clearer the picture that can be formed of the Study and the data, but a minimum description would cover the treatment applied to the Subject, a basic characterization of the Subjects, and a relevant phenotype recorded for each individual Subject. For a Chemical Stressor applied to Subject Type lab animal and causing liver toxicity, minimal information includes chemical name, route of application, dose and dose regimen, the species, strain and sex of animal used, and a measure of response such as liver pathology or serum levels of liver enzymes. For a Genetic Stressor applied to an in vitro cell line Subject Type the details would be different, for instance the locus impacted, the details of the genetic construct used, the growth medium used, the cell type and passage number, and a measure of effect such as perturbation of the cell cycle, impact on viability of the culture, or altered expression of a relevant gene or change in developmental profile. A Study can also contain multiple Stressor Types, such as a study of the effects of a particular chemical agent in two genetic backgrounds.

These two examples illustrate a key concept: The definition of "minimal" information needed for a Study will depend on the Study Subject Type, Stressor Type, and other important experimental details. The CEBS-DD contains seven classifiers useful in categorizing a Study (see Supplemental Materials). These terms allow the user to rapidly classify the Study on deposition, thereby permitting CEBS to collect the corresponding minimal information and also to associate other relevant concepts with the Study Design Description. These classifiers also support rapid computational access to the Study when users query CEBS and the formation of experimental design for analysis of the data.

A prototype Study entry user interface has been developed and is undergoing testing within the NCT. Figure 5 shows the prototype Study entry page, listing the minimal information proposed to describe and classify a Study. Also given in Figure 5 are prototype Group and Timeline representations. The Event/Protocol terms in the Timeline (euthanasia, animal husbandry, and in-life observations) are specific for the Subject Type (lab animals). This information is managed by CEBS, using the terms in the CEBS-DD. Terms for each classification type are represented in the CEBS-DD although the level of detail is most highly developed for a Study of an acute toxicity Discipline Type, in a controlled lab environment, with lab animals as Subjects, using a chemical Stressor and a parallel Study Design Type.



View larger version (44K):
[in this window]
[in a new window]
 
FIG. 5. Examples of the prototype Study entry user interface developed for Research CEBS within the NCT. The prototype Study entry page, Group capture and Study Timeline are shown. The prototype GUI captures the minimal information needed to describe a toxicogenomics Study. The depositor is asked to characterize the Study using the classifiers listed in Table 5, then to define the Study Groups, corresponding Subjects, and relationship to comparator Group(s). Generally two main study factors are used to define the Groups. After Groups are defined, the depositor indicates when particular Events occurred on the Study Timeline, and associates Groups and Protocols with each Event. Once the Study is fully defined, the depositor can associate data with Subjects and data generating events (Observations and Specimen Preparation).

 

View this table:
[in this window]
[in a new window]
 
TABLE 5 Study Design Definition Details of Two Acetaminophen Studies, Given in Terms Defined by the CEBS-DD

 
A local relational database has been developed by the NCT to capture Study information using this interface, and to describe the relationships between the concepts and terms in the CEBS-DD. This database is based on the schema of the ArrayTrack database (Tong et al., 2003Go, 2004Go) developed at the National Center for Toxicogenomics Research (NCTR), but has been extended to capture the Study Design Description terms used by the CEBS-DD as well as toxicology data. Both the database schema and the CEBS-DD are available at www.niehs.nih.gov/cebs-df/. CEBS and ArrayTrack will become interoperable in future development, both incorporating key features of this schema.

Organizing Data Derived from Subject and Specimens
Often the experimenter makes observations of the Subjects during the Study timeline. These could include measures of a lab animal's weight and/or food consumption, of a patient's temperature and blood pressure, or of a culture's growth rate and average viability. The CEBS-DD terms such measures "Observations" to distinguish them from data derived from an Assay of biological Specimens collected during a Study. Assay data are derived from tests performed on a Specimen independently of Study time, but are linked in CEBS to the Event within the Study when the Specimen was prepared.

Because Specimens are linked to a Subject and anchored to an Event within the Study Timeline, Specimens collected sequentially from a given Subject, such as blood draws over a period of time, can be easily identified within a database such as CEBS. Organizing the data in this manner makes it straightforward to identify the biological responses exhibited by Subjects over time, or to identify non-responder Subjects or other individual behaviors that would indicate that a given Subject might not be behaving as a biological replicate to others in their Group. Furthermore, by identifying the appropriate comparator Group when the Study is deposited, the database or an application can compute "change" relative to the response seen in the comparator Group, thereby providing additional biological context for interpreting the responses.

The range of potential tests continues to evolve with technological advancements, making it impossible to enumerate all the possible Observations and Assays. However, the CEBS-DD provides a structure for deposition of these data. Each Observation will be performed on a Subject at a particular Event in the Study Timeline, using a given Protocol (either a Standard Protocol, already residing within CEBS, or a newly defined Protocol entered by the depositor). Thus Observation data can be entered into CEBS in a text file with headers: Observation Name, Observation Value, Observation Units, Subject ID, Event ID, Date/time, Protocol (for Observations that do not use a Standard Protocol). For Observations made on a Group (such as the weight of all animals within a cage) the Group ID is used in place of the Subject ID.

The case of Assay data is similar to that for Observation data. Assay data can be associated with the Event in the Study at which the Specimen used in the Assay was prepared. Examples of Assays would be a microarray experiment, a histopathology examination, a clinical pathology panel, an ELISA (Enzyme-Linked ImmunoSorbent Assay) or in situ hybridization. Each Assay will have an associated Protocol, and possibly its own Design depending on the Assay Protocol (e.g., the standard used for comparison or how the data were derived), and capture of this information is facilitated by the CEBS-DD. Models from their respective fields cover the Assay Protocol and Design, for instance the MAGE-OM covers microarray experiments, and MISFISHIE (Deutsch, 2004Go) is the standard proposed for in situ hybridizations. Work is underway to convert the Path Code tables used at the NTP to describe histopathology findingsNTP, into a mouse pathology ontology, and the NCT (National Center for Toxicogenomics) is currently working with KEVRIC Laboratories to construct a prototype toxicology ontology. The Assay data themselves are exchanged via data files with similar format to those used for Observations, with Specimen ID used in place of Subject ID.

Sample Quality Documentation: Data Documentation
CEBS will provide Data Documentation, a means for users to make judgments about the technical quality of data within CEBS. At one end of the spectrum, it will be nearly impossible to define a universally accepted "quality metric" for rapidly evolving technologies such as microarray and proteomics studies until the field converges on a standard set of platforms and methods. Studies covered by 21 CFR, Part 11 fall at the opposite end of the spectrum—many of the validation details captured to ensure accuracy of the measuring instruments are not needed by users of a knowledgebase such as CEBS. Thus, CEBS takes a middle path in defining Data Documentation. The Data Documentation in CEBS will mirror the data standards used in laboratories to assess the quality of samples during manipulations. In the case of RNA isolation these quality assurance measurements include the starting sample integrity measured by gel electrophoresis or BioAnalyzer trace, the median length after fragmentation, and the efficiency of the label incorporation. This information is captured by TSP and other LIMS systems, and linking to such a system would permit CEBS to capture this information electronically.

In the case of pathology data, quality assurance measures might include the number of pathologists involved in reporting the finding or the lexicon used for making the finding, the availability of micrographs (Irwin et al., 2002Go), or the availability of historic data from the clinical pathology laboratory for comparison with the data from the current experiment. These data are often available in publications, either as text comments or included implicitly in the publication figures, and are often used by the scientist reading the paper to assign weight to the interpretations. CEBS will permit the depositor to include ancillary information that may speak to the technical quality of the data deposited in CEBS, and Data Documentation fields are therefore part of the CEBS-DD and will be available to the CEBS user.


    CASE EXAMPLE: USE OF THE CEBS-DD TO INTEGRATE ‘OMICS DATA AND CLINICAL CHEMISTRY DATA IN THE CONTEXT OF BIOLOGY
MATERIALS AND METHODS
 TOP
 ABSTRACT
 INTRODUCTION
 CASE EXAMPLE: USE OF...
 RESULTS
 DISCUSSION
 SUPPLEMENTARY DATA
 REFERENCES
 
Study Designs
Fischer 344/N male rats were dosed by oral gavage with acetaminophen at the start of the study, and then permitted to recover for 6, 24, or 48 h. Rats received a total dose of either 150 or 1500 mg/kg body weight, in an aqueous vehicle. Control rats were included for each dose and recovery time, and received the vehicle only. Minor differences between the two studies included the vehicle (0.25 % ethylcellulose or 0.25% carboxymethyl cellulose), the volume used to dose (one dose of 10 ml/kg or two doses of 15 ml/kg), and whether food was withheld 16 h prior to dosing. Animals were euthanized with carbon dioxide from a regulated source. Liver sections were taken, flash frozen and stored at –80°C until prepared for transcriptomics or proteomics analysis. All rats received the highest standard of humane care in accordance with protocols approved by the NIEHS committee on animal care and the NIH Guide for the Care and Use of Laboratory Animals.

Analysis of protein expression was performed using 2D gel separation. Liver samples from each rat were processed individually. Samples were thawed, homogenized in 9M urea buffer containing ampholytes and dithiothreitol, centrifuged to remove debris and subjected to isoelectric focusing using a pH 3 to 10 gradient. Proteins were then separated by mass using SDS PAGE electrophoresis in 6–18% acrylamide, fixed and stained with Sypro Ruby (Molecular Probes). Gels were scanned, background adjusted and streaky regions omitted, and then each gel assigned to a match set where a representative control gel was selected as the standard image and spot intensities quantified by densitometry.

Differential gene expression analysis was performed using total RNA isolated using RNeasy kits (QIAGEN, Valencia, CA). Equal amounts of RNA from each vehicle-only control animal were pooled for control gene expression at each dose and time period, and compared with individual rats by hybridization to printed cDNA rat genome arrays as described by Heinloth et al. (2004)Go. The samples were hybridized in duplicate with fluor reversal for each individual rat.

Data Preprocessing
Data from two-color cDNA arrays.
Intensity values below 300 intensity units were edited to a threshold value of 300 to stabilize the computed ratios. Ratios were converted to log2 prior to integration. Transcripts with missing signals in more than two-thirds of the arrays were excluded. 6500 genes remained.

Proteomics data.
Individual intensity values for 2D gel separated proteins from individual animals, indexed by spot numbers from a master list for the experiment, but otherwise not identified. Protein spots observed in fewer than one third of the gel images were excluded from the analysis. Values below 1000 and spots not present in the treated sample were replaced with a threshold value of 1000 prior to computing the ratio to the intensity seen in untreated animals. Ratios were converted to log10 after integration. 1832 protein spots from liver homogenate and 838 protein spots from serum were used in this analysis.

Clinical chemistry data.
Levels of liver enzymes ALT (alanine aminotransferase) and AST (aspartate aminotransferase) in serum were used to assess the toxic response of each treated animal. The ratio for each treated animal to the average value for the control animals at the same time point was computed, and converted to log10 after integration.

Data Integration and Clustering
The animals were grouped by dose-time or by phenotypic response, as diagrammed in Figures 8A and 8B. Animals were aligned by dose and time (see Fig. 8A) by grouping animals treated with equivalent doses and recovery times together to create the nine groups indicated by circles in Figure 8A. These "DT" dose-time groups are 6-h-0-dose through 48-h-1500 mg/kg dose. Animals were also aligned by phenotype, in this case by ALT and AST values. The scatter plot of ALT vs. AST in Figure 8B indicates the grouping of the animals by phenotype, to create 15 "phenotype groups" or PG. The aim was to include at least one animal from each study in each group, and to have approximately equal numbers of animals in each group overall.



View larger version (38K):
[in this window]
[in a new window]
 
FIG. 8. Diagram of two possible ways to group Subjects prior to integrating data from different studies. Subjects can be grouped by study factors such as dose and time (A) or grouped by phenotype (B). In this figure, the IDs of the Subjects from both the Transcriptomics (Heinloth et al., 2004Go) and Proteomics (Merrick, manuscript in preparation; Wetmore, manuscript in preparation) Studies are listed, and the individual animals are also diagrammed graphically to illustrate the grouping mechanism. In the diagrams, square symbols represent animals from the Proteomics Study and circles represent animals from the Transcriptomics Study. Data within each group were combined to form a "virtual rat" corresponding to the group.

 
Transcript data, proteomics data, and clinical chemistry data from the animals in each group were averaged by microarray feature, protein spot, or enzyme measure. The resulting data columns were catenated to produce a composite profile containing transcriptomics (liver), proteomics (liver and serum), and clinical chemistry data. A merged matrix, containing 15 columns (comprised of 9 PG columns and 6 columns for DT groups) and 9174 rows (comprised of a transcriptomics response of 6500 elements, derived from liver, a proteomics response of 1832 elements from liver homogenate proteins, a serum proteomics response of 838 elements, and four serum enzyme measures) was created.

As this is a prototyping exercise for features not yet implemented within CEBS, the matrix was imported into Spotfire Decision Site for Genomics where profiles were filtered by first eliminating those well-correlated with a straight line across all treatment conditions, then eliminating those which did not surpass 0.5 log units or fall below –0.5 log units. Following these computational filters, the remaining profiles were examined visually eliminating those that did not pass "biology-like" filters. Such "biology" filters removed flat profiles with a spike at a single animal (e.g., lacking concordant responses among biological replicates), jagged profiles alternating between increased and decreased over multiple points, and profiles judged to be too close to the zero line to be significant based only on microarray data. This process left profiles consistent with measurable, biologically relevant responses.


    RESULTS
 TOP
 ABSTRACT
 INTRODUCTION
 CASE EXAMPLE: USE OF...
 RESULTS
 DISCUSSION
 SUPPLEMENTARY DATA
 REFERENCES
 
One main objective of the CEBS-DD is to permit Study data to be integrated within CEBS in order to gain increased biological understanding and share this with users of CEBS. It is likely that studies in different laboratories will be carried out under different conditions, using different Study Designs, and yet still produce data that can be compared in a wider analysis. In order to test the capability of the CEBS-DD to represent studies in a form suited to integration, a case study of integration approaches applied to two studies on the effects of acetaminophen in Fischer 344 rats (transcriptomics, Heinloth et al., 2004Go, and proteomics, Merrick, manuscript in preparation; Wetmore, manuscript in preparation), is provided here. Data from both Studies are housed in CEBS and will be made available to the public. The Study Design of the two studies is similar: groups of rats were treated with vehicle, or with acetaminophen at 150 mg/kg or 1500 mg/kg. Groups of each treatment type were euthanized at 6, 24, or 48 h following the acetaminophen exposure. The Study Timeline in Figure 5 represented both of these Studies. Some additional details of the two studies, given in terms defined by the CEBS-DD, are provided in Table 5.

Three Specimens were prepared from each Subject during necropsy. Information about the Specimens from the 1500 mg/kg–48 h groups in each of the two studies (transcriptomics and proteomics) is shown in Figure 6, to illustrate how Specimens are associated with Subjects. The data derived from Assays performed on these Specimens (‘omics, histopathology findings and clinical chemistry) are annotated using the Study Design and the CEBS-DD. All data derived from each Subject can be associated through this annotation, which permits ‘omics data to be associated within CEBS with the dose-time exposure and with the phenotypic response (clinical chemistry and histopathology) of the Subject.



View larger version (55K):
[in this window]
[in a new window]
 
FIG. 6. Relationship between data from two Acetaminophen Studies and the Study Timeline, based on the CEBS-DD. The timeline of these two Studies shows treatment (acetaminophen or vehicle, administered by gavage) at time = 0, and the Euthanasia and Specimen Preparation Events at Time = 6, 24, and 48. Study animal husbandry and in-life observations also began at time = 0, as indicated. The Events in the two Studies are identical.

 
The Study Timeline of these two Studies is represented in Figure 7. In each case, a Treatment Event (vehicle or acetaminophen at one of two doses, administered by gavage) occurred at time = 0, and Exit (Euthanasia) and Specimen Preparation Events occurred at Time = 6, 24, and 48. Subject Care and in-life Observations also began at time = 0, as indicated in Figure 7. Subject IDs are aligned with their Exit Event in Figure 7. The Events in the two Studies are identical, but the Groups used and the Protocols applied are slightly different, as indicated in Figure 7 (Groups) and Table 5 (Protocols).



View larger version (33K):
[in this window]
[in a new window]
 
FIG. 7. Link between Subjects and Specimen IDs, organized by the CEBS-DD and the prototype database. The microarray, proteomics and clinical chemistry data shown in Figures 9 and 10 are derived from these Specimens.

 


View larger version (34K):
[in this window]
[in a new window]
 
FIG. 9. Heatmap showing the fold-change of individual genes/enzymes in each of the virtual rats derived from Figure 9. Green indicates a decrease in expression relative to the comparator sample of one half log unit, and red is a corresponding increase. Proteomics samples and clinical chemistry results are log base 10 and transcriptomics samples are log base 2. The dendrogram at the top indicates the degree of similarity of particular virtual rats, and the dendrogram at the side indicates the similarity of individual genes/enzymes.

 


View larger version (51K):
[in this window]
[in a new window]
 
FIG. 10. Heatmap from Figure 9, subdivided into the four types of gene or enzyme signal. Gene expression levels, measured by transcriptomics, are separated from gene expression measured in liver by proteomics, or in serum by proteomics. Serum enzyme levels are also shown.

 
The most straightforward method to align microarray and proteomics data would be to obtain the data from Specimens from the same animals. However since the individual rats used in the two Studies in the example were different, it is impossible to make a direct animal-by-animal alignment of the microarray and proteomics data. The two different studies were selected for the CEBS-DD case study for this reason, as they exemplify data from different depositors. Thus before the microarray and proteomics data can be integrated, the rats must be grouped, the corresponding data aggregated within the group, and merged into a "virtual rat" (alternatively termed an "in silico rat") representing the rats in the group. The rats can be grouped based on dose and time, (see Fig. 8A; these groups are identified as "DT" in later figures) or based on their phenotypic response group, (see Fig. 8B; termed "PG" in later figures). The level of liver enzymes ALT (alanine aminotransferase) and AST (aspartate aminotransferase) in the serum Specimens was used to estimate the phenotypic response of each rat. Figure 8B gives a plot of ALT and AST for individual treated rats, with solid circles indicating PG groups of animals with similar phenotype and the dotted circle around rats treated with acetaminophen but with levels of ALT and AST virtually unchanged. These were divided into four PG groups to maintain approximately the same number of animals per group.

The microarray data in this example were from two-color arrays, thus the transcriptomics data are in the form of a ratio of expression in a treated animal to expression in a pool of control animals. Because the microarray data were in the form of ratios, the other data were also pre-processed and converted to ratios as described in the Methods. This produced six DT groups (150 mg/kg, sampled at 6, 24, and 48 h, and 1500 mg/kg, sampled at 6, 24, and 48 h), and the nine PG groups identified in Figure 8. The virtual rat profiles are composed of data from four sources: 6500 liver transcript measures, 1832 protein spots from liver homogenate, 838 protein spots from serum, and four measures of liver enzymes ALT and AST. The integrated data from the 15 virtual rats were clustered using unsupervised bi-directional clustering using Pearson correlation as a metric. The resulting heat map is shown in Figure 9. In this figure, green indicates elements with decreased intensity relative to control, and red indicates profiles with increased intensity relative to control. The dendrograms at the top and side indicate the similarity between clusters of correlated virtual rats and data elements, respectively.

The dendrogram of virtual rats (at the top of Fig. 9) has three branches, the rightmost of which is enclosed in a yellow outline. The profiles in this cluster come from the virtual rats DT-H24 and H48, and PG-1, 2, 3, and 4. These correspond to the virtual rats with elevated serum enzyme levels in Figure 8. The expression of many of the data elements appear to be more changed in these virtual rats compared to the others, as indicated by the more intense red and green colors in their profiles.

The dendrogram to the left of Figure 9, describing the correlation of different data elements, reveals a number of relatively equal-sized clusters. The clustering is driven by the expression in the six virtual rats outlined in Figure 9 (and named above, those with elevated ALT and AST). In order to understand how the profiles of transcripts, liver proteins, and serum proteins were behaving in this analysis, the heat map shown in Figure 9 was split into four sub-plots in Figure 10, segregated by data type. Each sub-plot in Figure 10 contains data from one source: liver transcriptomics, liver proteomics, serum proteomics, or serum enzymes. There are only four serum enzymes, and these are highly correlated, so these data appear as a single line in the serum enzyme sub-plot. This view of the data permits the investigator to identify which cluster in the left-hand dendrogram contains the serum enzymes, and the genes and proteins with expression levels highly correlated with the levels of serum enzymes. It is clear that the data elements are interdigitated in the heat map, thus the integrated data gives a new, biologically richer picture than can be derived from data from individual domains alone.


    DISCUSSION
 TOP
 ABSTRACT
 INTRODUCTION
 CASE EXAMPLE: USE OF...
 RESULTS
 DISCUSSION
 SUPPLEMENTARY DATA
 REFERENCES
 
The CEBS-DD terms and controlled vocabularies permit fields of interest to be defined in CEBS, for instance dose and time, or toxicological endpoint such as ALT and AST, and associated with an individual Study Subject. Once captured, this information can be used computationally to associate like animals to form a virtual subject for further analysis, such as biomarker discovery or investigation of the mechanism of toxicity. This method can be applied to any kind of numeric or categorical data derived from individual animals, and used to synthesize an integrated, "virtual animal" that can be subjected to supervised or unsupervised data mining to explore the biological response of interest. For example, the profiles seen in the heat map can be used to identify potential protein or RNA biomarkers associated with acetaminophen toxicity (Waters, 2004Go). The integration exercise described here was performed manually, based on the information captured using terms in the CEBS-DD. The next steps in CEBS development will be to explore computational methods to permit the user to carry out such an analysis within the CEBS Knowledgebase.

The CEBS-DD is an evolving compendium of terms, definitions, relationships, and controlled vocabularies used to describe a toxicogenomics Study and the associated Phenotype data (clinical pathology and histopathology findings). The CEBS-DD is aligned with developing standards for the exchange of toxicity data, and will be used to develop parsers and user interfaces for data from different sources, using current standards for data exchange, to support the facile electronic transfer of data to CEBS. The CEBS-DD incorporates the language and terms in use in the toxicology literature and in data exchange initiatives. Thus the CEBS-DD uses terms with established meaning and relationships as well as controlled vocabularies taken from lexicons presently in use to record the Study Design Description. This is anticipated to facilitate the description of a variety of disparate Studies accurately and intuitively. Additionally, the terms and definitions within the CEBS-DD were selected to have meaning beyond acute toxicity Studies, thereby permitting easy extension of the CEBS-DD to other disciplines. This is anticipated to permit CEBS the flexibility to accept data obtained in a variety of Studies, correctly interpret the key features of the Study Design, and use this to provide the CEBS user an accurate view of the Study Timeline, Events, Subject Groups, experimental factors and Protocols.

There are several public repositories for microarray data, but none that capture the full Study Timeline, toxicity data and data from highly parallel transcriptomics and proteomics. CEBS is such a public toxicogenomics repository. Because of this role, the CEBS-DD was developed, first with the aim of facilitating the accurate capture of data from a variety of sources, and second, of providing the foundation for a standard set of terms, relationships, definitions and controlled vocabularies that may facilitate other institutions in developing their own data repositories. The exchange of data is greatly facilitated if a common set of terms is used by the community. The NCT continues to track evolving public data exchange initiatives to ensure that the CEBS-DD accurately reflects developing community standards.

The CEBS Knowledgebase aims to permit the user to integrate data from different studies. Once in CEBS, the Study Design Description, Toxicity Phenotype Data and data from highly parallel assays such as transcriptomics or proteomics can be brought together within CEBS, and also provided for download using standard formats for use in other applications. An example of the use of terms and relationships in the CEBS-DD to integrate data from a proteomics Study and a transcriptomics Study of acetaminophen responses in rats, was provided in order to illustrate the utility of this approach. The CEBS-DD terms and controlled vocabularies permit fields of interest to be defined in CEBS, for instance experimental factors such as dose and time, protocol components such as dose range and route of administration, or feed or method of disposition. Resulting toxicological measures, histopathological findings and ‘omics data can them be computationally associated with the appropriate descriptors, and the resulting data subjected to supervised or unsupervised pattern finding. Additionally, following the example used here, data from similar but not identical animals can be combined to form a virtual subject for analysis. This approach can be used within CEBS to identify associations of molecular responses with diet, disposition methods, or with toxicological responses of different severity, occurring over time, exposure level and stressor.

The NCT extends to Journal Editors an invitation to use CEBS to manage ‘omics data published in their journals. Increasingly there is a need for full documentation of ‘omics experimental data sets which, due to their large size, are unsuited for inclusion in printed form. The tools contained in CEBS would allow independent evaluation of data contained in an ‘omics study, in context of toxicology, and we feel such a database would aid in advancing the field of toxicogenomics.


    SUPPLEMENTARY DATA
 TOP
 ABSTRACT
 INTRODUCTION
 CASE EXAMPLE: USE OF...
 RESULTS
 DISCUSSION
 SUPPLEMENTARY DATA
 REFERENCES
 
Supplementary data are available online at www.toxsci.oupjournals.org.


    NOTES
 
The information in this document has been funded in part by the National Institute of Environmental Health Sciences, the National Center for Toxicogenomics, and the U.S. Environmental Protection Agency. It has been reviewed by the National Health and Environmental Effects Research Laboratory and approved for publication. Approval does not signify that the contents reflect the views of the Agency, nor does mention of trade names or commercial products constitute endorsement or recommendation for use.


    ACKNOWLEDGMENTS
 
As part of the process of keeping in touch with developing standards, the authors participate in a number of discussion forums: NCT Path/Tox Working Group, SEND Consortium and the SEND Controlled Vocabulary Working Group, MGED RSBI, HL7/CDISC/I3C Pharmacogenomics Tracks 1 and 3, HESI Genomics Committee, and IEEE Bioinformatics Standards. Additionally many fruitful discussions have been held with members of the Science Applications International Corp. (SAIC) CEBS Development Team. We gratefully acknowledge the contributions these discussion participants have made to the development of the CEBS-DD.


    REFERENCES
 TOP
 ABSTRACT
 INTRODUCTION
 CASE EXAMPLE: USE OF...
 RESULTS
 DISCUSSION
 SUPPLEMENTARY DATA
 REFERENCES
 
Bao, W., Schmid, J. E., Goetz, A. K., Ren, H., and Dix, D. J. (2005). A database for tracking toxicogenomics samples and procedures. Reprod. Toxicol. 19, 411–419.[CrossRef][ISI][Medline]

Benz, R. D., Arvidson, K., Cheeseman, M., Fostel, J., Hollingshaus, G., Johnson, D., Johnson, W., Kemper, R., Lee, P., Mathews, E., et al. (2005). ToxML, An Endpoint Specific Database Ontology For Linking Toxicology To Chemistry. Submitted.

Boorman, G., Irwin, R. D., Vallant, M. K., Gerken, D. K., Lobenhofer, E. K., Hejtmancik, M. R., Hurban, P., Brys, A. M., Travlos, G. S., Parker, J. S., and Portier, C. J. (2005). Variation in the hepatic gene expression in individual male Fischer rats. Toxicol. Pathol. 33, 102–110.[CrossRef][ISI][Medline]

Deutsch, E. (2004). MISFISHIE... Minimum Information Specification For In Situ Hybridization and Immunohistochemistry Experiments (MISFISHIE). scgap.systemsbiology.net/standards/misfishie/

Heinloth, A. N., Boorman, G. A., Nettesheim, P., Fannin, R. D., Sieber, S. O., Snell, M. L., Tucker, C. J., Li, L., Travlos, G. S., Vansant, G., et al. (2004). Gene expression profiling of rat livers reveals indicators of potential adverse effects. Toxicol. Sci. 80, 193–202.[Abstract/Free Full Text]

Irwin, R., Boorman, G. A., Waters, M. D., Hardisty, J. F., and Sills, R. C. (2002). Quality review procedures necessary for rodent pathology databases and toxicogenomic studies: The National Toxicology Program experience. Toxicol. Pathol. 30, 88–92.[CrossRef][ISI][Medline]

Richard, A., and Williams, C. R. (2002). Distributed structure-searchable toxicity (DSSTox) public database network: A proposal. Mutat. Res. 499, 27–52.[ISI][Medline]

Tong, W., Harris, S., Sun, H., Fang, H., Fuscoe, J., Harris, A., Hong, H., Xie, Q., Perkins, R., Shi, L., et al. (2003). ArrayTrack–supporting toxicogenomic research at the U.S. Food and Drug Administration National Center for Toxicological Research. Environ. Health Perspect. Toxicogenomics 111, 1819–1826.

Tong, W., Harris, S., Cao, X., Fang, H., Shi, L., Sun, H., Fuscoe, J., Harris, A., Hong, H., and Xie, O. (2004). Development of public toxicogenomics software for microarray data management and analysis. Mutat. Res./Fundam. Mol. Mechanisms Mutagen. 549, 241–253.[CrossRef]

Waters, M., Boorman, G., Bushel, P., Cunningham, M., Irwin, R., Merrick, A., Olden, K., Paules, R., Selkirk, J., Stasiewicz, S., Weis, B., Van Houten, B., Walker, N., and Tennant, R. (2003). Systems toxicology and the Chemical Effects in Biological Systems (CEBS) knowledge base. Environ. Health Perspect. Toxicogenomics 111, 15–28.

Waters, M. D. (2004). The CEBS Knowledge Base: The Integration of Molecular Profiling, Toxicology and Pathology Datasets for Knowledge Discovery. Presented at the 43rd Annual Meeting of the Society of Toxicology, Baltimore, MD, March 21–25, 2004.

S. Consortium Standard for Exchange of Non-clinical Data. (Abstract)