Neurobehavioral Assessment: A Survey of Use and Value in Safety Assessment Studies

Lawrence D. Middaugh*,1, Diana Dow-Edwards{dagger}, Abby A. Li{ddagger}, J. David Sandler§, Jennifer Seed, Larry P. Sheets||, Dana L. Shuey|||, William Slikker, Jr.||||, Walter P. Weisenburger#, L. David Wise** and Murray R. Selwyn{dagger}{dagger},2

* Department of Psychiatry and Behavioral Science, Medical University of South Carolina, Charleston, South Carolina 29425; {dagger} Department of Physiology and Pharmacology, SUNY-Brooklyn, Brooklyn, New York 11203; {ddagger} Exponent, Inc., San Francisco, California 94114, § International Life Sciences Institute, Health and Environmental Sciences Institute, Washington, District of Columbia 20005; Environmental Protection Agency, Washington, District of Columbia 20460; || Toxicology Department, Bayer CropScience, Stilwell, Kansas 66085; ||| Endo Pharmaceuticals, Inc., Chadds Ford, Pennsylvania 19317; |||| Division of Neurotoxicology, National Center for Toxicological Research, Jefferson, Arkansas 72079; # Central Research Division, Drug Safety Evaluation, Pfizer, Inc., Groton, Connecticut 06340; ** Merck Research Laboratories, West Point, Pennsylvania 19486; and {dagger}{dagger} PAREXEL International, Durham, North Carolina 27713

Received March 28, 2003; accepted July 28, 2003


    ABSTRACT
 TOP
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS
 DISCUSSION
 REFERENCES
 
This report describes the results of a survey designed to evaluate the contribution of F1 neurobehavioral testing to hazard identification and characterization in safety assessment studies. (To review the details of the distributed survey, please see the supplementary data for this article on the journal’s Web site.) The survey provided information about studies completed in industrial laboratories in the United States, Europe, and Japan since 1990 on 174 compounds. The types of compounds included were pharmaceutical (81%), agricultural (7%), industrial (1%), or were undefined (10%). Information collected included the intended use of the test agent, general study design and methodology, the types and characteristics of F1 behavioral evaluations, and the frequency with which agents affected neurobehavioral parameters in comparison to other F0 and F1 generation parameters. F1 general toxicology parameters such as mortality, pre- and postweaning body weight, and food intake were assessed in most studies and were affected more frequently than other parameters by the test agents. F1 behavioral parameters were assessed less consistently across studies, and were less frequently affected by the agents tested. Although affected by agents less often than general toxicology parameters, F1 behavioral parameters along with other parameters defined the no-observed-effect level (NOEL) in 17/113 (15%) of studies and solely defined the NOEL in 3/113 (2.6%) of studies. Thus, F1 behavioral parameters sometimes improved on the standard toxicological measures of hazard identification. While not detecting agent effects as readily as some measures, the F1 behavioral parameters provide information about agent effects on specialized functions of developing offspring not provided by other standard measures of toxicity. The survey results emphasize the need for further research into the methods of behavioral assessment as well as the mechanisms underlying the neurobehavioral alterations.

Key Words: developmental toxicology; neurobehavioral toxicology; F1 generation assessment; perinatal drug exposure.


    INTRODUCTION
 TOP
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS
 DISCUSSION
 REFERENCES
 
Early studies on animals established that maternal exposure to tranquilizers (Werboff and Dembicki, 1962Go), methyl mercury (Haddad et al., 1969Go), stimulants (Middaugh et al., 1974Go), anticonvulsants (Middaugh et al., 1975Go; Vorhees, 1983Go), ethanol (Abel, 1975Go), excess vitamin A (Hutchings et al., 1973Go), and pesticides (see Mactutus and Tilson, 1986Go for review) can have behavioral consequences for the offspring. Reports such as these, as well as recognition of the behavioral dysfunction associated with fetal alcohol syndrome (Jones and Smith, 1973Go), indicated that the developing nervous system is particularly vulnerable to insult by external agents and could be assessed behaviorally. Evidence that the effects of early exposure to various agents could be detected by behavioral evaluation led Great Britain (Barlow, 1985Go) and Japan (Tanimura, 1985Go) to adopt regulatory guidelines in 1975, which included behavioral assessment of the F1 generation. Although need for such testing was also recognized by the United States Environmental Protection Agency (USEPA), the absence of reliable and valid standardized neurobehavioral batteries delayed promulgation of regulatory guidelines for routine testing (Kimmel and Buelke-Sam, 1985Go). This deficiency prompted a search for standardized, validated, behavioral procedures to assess effects of agents on pregnant animals (F0 generation) and their offspring (F1 generation).

Early efforts to establish neurobehavioral testing in hazard identification and risk assessment focused on establishing reliable and valid methodology, and several necessary requirements for an effective screening battery of behavioral tests were proposed (Buelke-Sam and Kimmel, 1979Go). Among the many requirements, it was noted that for behavioral evaluation to become a meaningful aspect of a routine screening system, "methods must be identified which (1) yield reproducible results within and across laboratories, and (2) are sensitive to the effects of a range of agents...." An attempt to identify such tests was the subject of the Collaborative Behavioral Teratology Study (CBTS). This study was conceived in 1978 and culminated with a symposium/workshop and collection of publications in 1985 (Adams et al., 1985aGo,bGo; Barlow, 1985Go; Butcher, 1985Go; Butcher and Nelson, 1985Go; Geyer and Reiter, 1985Go; Hutchings, 1985Go; Kimmel and Buelke-Sam, 1985Go; Kimmel et al., 1985Go; Kutscher and Nelson, 1985Go; Nolen, 1985Go; Riley et al., 1985Go; Sobotka and Vorhees, 1985Go; Tilson and Wright, 1985Go). Among other things, the CBTS concluded that F1 screening batteries should include tests that evaluate sensory systems, neuromotor development, locomotor activity, learning and memory, reactivity and/or habituation, and reproductive behavior (Kimmel and Buelke-Sam, 1985Go). Importantly, the study established that behavioral assessment techniques conducted on the F1 generation according to the CBTS protocol were reliable, both within and across laboratories, in detecting effects of CNS-targeted agents. Furthermore, it established that litter, gender, and test history contributed to significant variation and required control.

An initial effort toward establishing the effectiveness of F1 generation behavioral evaluation in safety assessment underscored the impact of variability in specific test protocols (e.g., age at testing) on interlaboratory reliability and test validity (Lochry, 1987Go). Another assessment of the use and effectiveness of F1 generation behavioral evaluation was a 1991 survey of pharmaceutical laboratories in the United States and Europe completed by the Middle Atlantic Reproduction and Teratology Association (MARTA). Results of the MARTA survey (Lochry et al., 1994Go) indicated that behavioral testing paradigms were used extensively for hazard identification and characterization, and that there was remarkable similarity across laboratories in the types of behavior evaluated. Finally, Ulbrich and Palmer (1996)Go reported that neurobehavioral changes were commonly observed in offspring during neurotoxicity testing in reproduction, embryotoxicity, and peripostnatal risk assessment studies. Moreover, behavioral changes, as evidenced by effects on various motor activity and avoidance learning tests, was either the only adverse effect detected (i.e., alone defined the NOEL) or occurred at the low-observed adverse effect level (LOAEL) together with other signs of developmental toxicity (i.e., contributed to defining the NOEL) in a number of these studies.

Aside from the Ulbrich and Palmer report (1996)Go, which included studies of varying study design, exposure periods, and evaluation types, the overall contribution of F1 generation behavioral evaluations toward detecting and characterizing the effects of potential toxicants in industrial laboratory studies has not been reported. Thus, the primary objective of the present study was to evaluate the contribution of such neurobehavioral testing to hazard identification and characterization in safety assessment studies of a wide variety of compounds under currently used study designs and testing regimens. Toward this end, the Developmental and Reproductive Toxicology (DART) Technical Committee of the Health and Environmental Sciences Institute (HESI) within the International Life Sciences Institute (ILSI) conducted a retrospective analysis of information obtained through a survey of organizations conducting risk assessment studies throughout the United States, Europe, and Japan. The survey provided information about the characteristics and outcome of studies completed since 1990. Analyses focused on the frequency with which the individual F1 behavioral parameters were evaluated, were affected by agents tested, and the frequency with which they defined the overall F1 NOEL within the study, either alone or together with other parameters.


    MATERIALS AND METHODS
 TOP
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS
 DISCUSSION
 REFERENCES
 
Presurvey considerations.
To determine the feasibility of this endeavor and the number of studies potentially available for consideration, a presurvey letter was sent to 380 individuals from various organizations internationally. The organizations included pharmaceutical and chemical companies and contract research organizations. To prevent potential duplication of information, contract research organizations were requested to obtain the approval of sponsors who themselves might be submitting identical information. A commitment of participation and an estimate of the number of studies each organization would provide were requested of the correspondents.

Survey development.
A detailed survey (see supplementary data on this journal’s Web site) was developed by a Steering Committee in collaboration with the full HESI DART committee to collect the information considered of importance based on the historical goals of F1 behavioral testing in industrial settings. Hence, the survey was designed to gather data about (1) the general design of each study, (2) the intended use of the agent evaluated, (3) the types of parameters evaluated with emphasis on neurobehavioral evaluation, and (4) the NOELs for each parameter evaluated and the frequency with which each defined the overall NOEL for the F0 and F1 generations. General design information included sample size, the number and range of doses, the route of administration, and the developmental period and duration of exposure. Additionally, information was requested about the therapeutic indication or intended use of the test article, as well as the parameters and general methodology used for neurobehavioral assessment. Study outcome data was collected by requesting that the respondents provide the NOEL for each parameter included in a given study as well as the overall NOEL determined for both the F0 and F1 generation parameters. NOELs were defined by each contributing laboratory. The idea was that the overall NOEL was based on the most sensitive parameter of any type, including standard measures of toxicity and the battery of behavioral tests used in the particular study. Information requested was confined to a given study with no attempt to collect information on a particular test article across studies. The survey was distributed to those organizations that indicated in the presurvey an interest and ability to contribute studies data to the project.

Survey implementation and follow-up.
The statistical consulting firm of Statistics Unlimited, Inc. (SUI) implemented the survey by producing a survey instrument in Visual BASIC to be run in the Windows operating system. An executable module was created on diskette and these diskettes were mailed to all organizations indicating a willingness to participate. To maintain confidentiality, a unique three-digit hexadecimal identification number was assigned to each organization. This identification number was known only to the organization and to SUI, and the database contained no other identifying information about the organization. Responding organizations provided the completed surveys to SUI either by diskette or by e-mail. After a reasonable length of time, the organizations that had not responded were contacted by telephone and/or e-mail to determine if they anticipated completing the survey.

Data analysis.
Data from the survey were tabulated by SUI using SAS software, version 8.1 running on the Open/VMS operating system on a MicroVAX 3100-85 computer. Standard summary statistics of means, medians, standard deviations, minima, and maxima were calculated for quantitative variables. Discrete data were typically characterized by frequency distributions. For some F1 generation parameters, separate results were determined for males and females. To determine if the data from males and females could be pooled for descriptive analyses, two statistical tests were performed. McNemar’s test was used to determine if the proportion of studies with missing parameter-specific results were different for males and females. A Wilcoxon signed rank test was used to determine if the parameter-specific NOELs were significantly different for males and females. If either test was statistically significant at p < 0.05, then the results from males and females were analyzed separately. Otherwise, results of the two genders were combined for descriptive statistical analyses. In the case of a discrepancy for the NOEL of a given parameter between males and females, the more sensitive (lower) NOEL was used. If a given study had a missing NOEL for one gender, then the NOEL from the other was used.


    RESULTS
 TOP
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS
 DISCUSSION
 REFERENCES
 
General Characteristics of the Database
Original contact with 380 individuals established that 65 organizations would participate in the survey to provide the results of 668 studies. Of the original response, 33 of the 65 organizations (51%) completed and submitted survey summaries for 174 studies (26% of the studies originally committed). Participant organizations included 17 from the United States, 10 from Japan, and six from Europe. Follow-up with nonresponding organizations via e-mail and phone established that lack of resources was the major reason for not completing the survey.

The limited sample (51% of the organizations and 26% of the known available studies) as well as the voluntary, nonrandom nature of participation might qualify interpretation of the analyses. This qualification, although real, should not negate the value of the data set. It should be noted that the contributing organizations were global and included private industry laboratories (pharmaceutical and chemical) as well as contract research organizations. The compounds tested represented a broad variety of chemical and pharmacological classes. The studies submitted were also reasonably well distributed across the organizations (i.e., there wasn’t a disproportionate contribution from a small number of organizations). Moreover, the size of the data set reduces any impact that specific studies from a contributor might have on the total data set. Although no positive control experiments were submitted, the information collected on study design and methodology, though limited, revealed general adherence to currently accepted designs and methods. All but one of the studies were conducted under Good Laboratory Practices (GLP) standards to support drug development or chemical registration and, therefore, followed currently accepted guidelines for studies from regulatory agencies. Collectively, these factors suggest that, while the response rate was low, the data set was representative of the universe of studies that have been conducted using these assessments.

Table 1Go provides a summary of the types of agents included in the submitted studies. The table includes 172 of the 174 submitted studies. Two studies were omitted because the agents evaluated were not specified. The table is organized according to decreasing number of studies completed on agents within each of the major categories and subcategories. As noted in Table 1Go, 141 (81%) of the submitted studies were conducted on pharmaceutical agents. Within this category, the predominant subcategories of agents studied were cardiovascular, neuropharmacological, and hormonal agents. Agricultural and industrial agents had minimal representation in the studies submitted. The category "Other" includes studies for which the agent category was not specified, perhaps because the agent did not fit into one of the listed categories.


View this table:
[in this window]
[in a new window]
 
TABLE 1 Studies Reported According to Functional Category
 
Various other characteristics of the submitted studies were identified. The species tested was predominately the rat (99%) with only two of the 174 studies conducted on mice. The route of administration for most of the agents was by oral gavage (67%). In other studies, the route of administration was by bolus IV injection (20%), subcutaneous injection (6%), diet (3%), continuous intravenous infusion (2%), dermal (occluded) application (1%), intraperitoneal injection (<1%), and the drinking water (<1%). General experimental characteristics were similar across the 174 reported studies. For example, sample size was 20–30 per group for 84% of the reported studies, with only 5% of the studies using less than 20 per group. Administration of the agent was generally initiated during gestation (92% of the studies), predominantly at the beginning of the second week of gestation (74%) and continued throughout lactation (20–23 days postnatally) in 79% of the studies. The number of doses examined was four, including the vehicle control, for most of the studies (90%). To provide information about the extent of the dose range in the reported studies, we examined the number of studies in which the ratio of the high to low dose was within a given range. As expected for studies across widely varying agent categories, dose ranges examined varied considerably. The ratio of the high to low dose was <=5 for 19% of the studies, >5 to 10 for 40%, >10 to 20 for 19%, and >20 for 22% of the studies.

F0 Generation
Table 2Go summarizes the information collected from the 174 studies on the maternal F0 generation parameters listed in the first column. The first line of the table provides information about the F0 parameters when all parameters in a study were considered in establishing NOEL (overall F0 parameters). Subsequent lines provide information about each individual F0 parameter evaluated. The frequency and percentage with which each parameter was assessed are noted in column 2 of Table 2Go. Indicants of study outcome across the parameters are noted in the remaining columns of the table, as described below.


View this table:
[in this window]
[in a new window]
 
TABLE 2 Maternal F0 Parameters Evaluated and Contribution to Study Outcome
 
F0 parameters: Frequency of evaluation.
As is evident in Table 2Go, mortality, body weight, clinical signs, pathology, food consumption, and maternal behavior of the dams, organized in descending order of evaluation, were routinely assessed (i.e., > 80% of the reported studies). Water consumption, on the other hand, was infrequently examined. The survey further characterized clinical signs into three categories, according to the conditions under which they were evaluated. Across the three categories, observations were made in the home cage during mortality/morbidity checks, outside the home cage using standard terms, and in a standardized arena using a formal predetermined scoring system. The categories were not mutually exclusive since observations were made under more than one of the conditions in some studies. The three categories were used in 58%, 45%, and 4% of the reported studies, respectively. Although the survey attempted to obtain information about whether or not neurobehavioral observations (e.g., FOB-type evaluations) were obtained during assessments of clinical signs, apparent confusion about the question resulted in too many missing values to assess the results.

F0 parameters: Study outcome.
Two types of study outcome information were obtained. Firstly, the respondents provided the agent dose category (<low, low, medium, high), which determined the NOEL for each individual parameter as well as the overall F0 NOEL that was based on the most sensitive parameter included in each study. These data are summarized in columns 3–6 of Table 2Go as the percentage of studies in which the NOEL was defined at the different dose level categories with reference to all studies that evaluated the parameter (i.e., excludes studies that did not evaluate the parameter). Column 7 summarizes the percentage and number of studies in which an agent effect was detected. The percentage is a summation of the three lower doses noted in columns 3–5 and provides an index for comparing parameters according to the extent to which they detected an agent effect. The high dose category studies (column 6) were excluded from this sum because an overall NOEL at the highest dose indicates the absence of an agent effect at any dose tested.

The second type of study outcome information was the frequency with which each parameter NOEL established the overall F0 NOEL (i.e., F0 parameter NOEL = overall F0 NOEL). This information is summarized in the last column of Table 2Go. Column 8 summarizes the percentage of studies in which an individual parameter NOEL = the overall NOEL as a numerator and the total number of studies that included a particular parameter and detected an agent effect as the denominator. The numbers associated with the percentages are provided in the last column. Again, the high dose category studies were excluded for the reason noted above and the percentage is based on only those studies that assessed the parameter and also detected an effect of the agent evaluated. Comparing the percentage of studies that detected agent effects (Table 2Go, column 7) and the extent to which the different parameters contributed to the overall NOEL (Table 2Go, column 8) across parameters provides information about their relative sensitivity in establishing safe levels of exposure for the particular agent evaluated.

As noted in the first line of Table 2Go, an overall NOEL was not observed at any of the doses tested for 16% of the reported studies (i.e., the NOEL was something less than the lowest dose examined; effects were observed at all doses). The overall NOEL was determined to be the lowest dose tested for 25% of the studies (effects were observed at the medium dose); the medium dose for 34% of the studies (effects were observed at the high dose); and the high dose for 24% of the studies (no effects of the agent were observed). Thus, when all F0 parameters were considered, 75% of the reported studies noted an effect of the agent evaluated as noted in column 7 of Table 2Go (132 of the 174 reported studies noted an effect of the agent evaluated). In comparison, inspection of this column indicates that the dam’s food consumption and body weight were affected by the tested agents in >50% of the reported studies, suggesting that these parameters were relatively sensitive indicators of agent effects. Pathology, mortality, and maternal behavior, on the other hand, were much less frequently affected.

Finally, comparison of the values in the last column of Table 2Go across the individual parameters indicates that the body weight and food consumption parameters had NOELs = the overall F0 NOEL in a high percentage of the reported studies. Again, there were relatively few studies for which pathology, mortality, and maternal behavior had individual parameter NOELs = the overall F0 NOEL. Although the analysis suggests that water consumption was comparable to clinical signs on these measures, the fact that only 10% of the reported studies evaluated water consumption raises questions about the reliability of this interpretation.

F1 Generation
Table 3Go summarizes the F1 parameters evaluated in the studies submitted in the survey. The individual parameters are grouped, somewhat arbitrarily, according to the following categories: general toxicology, physical developmental landmarks, preweaning reflex landmarks, mating and reproduction, and behavioral assessment. Data associated with frequency of use for the different parameters and study outcome are organized and presented as noted above for Table 2Go. The survey contained information for both male and female offspring; however, unless effects differed according to gender, data are combined for presentation in Table 3Go. Analyses of data for males and females, as described in Materials and Methods, established statistically significant male versus female differences (p < 0.01) for only postweaning body weight in the general toxicology category and for the mating and fecundity measures in the mating and reproduction category. Data for these parameters are presented separately for males and females in Table 3Go.


View this table:
[in this window]
[in a new window]
 
TABLE 3 F1 Parameters Evaluated and Their Contribution to the Overall F1 NOEL
 
F1 parameters: Frequency of evaluation.
The individual parameters identified in the first column of Table 3Go are organized according to decreasing frequency of evaluation within each of the major categories. Most of the parameters listed in the general toxicology category (i.e., mortality, body weight, food consumption, and clinical signs) and the mating and reproduction category of Table 3Go were measured in most of the reported studies. Notable exceptions were the evaluation of food and water consumption. Assessment of the physical development category parameters was variable. Vaginal opening, eye opening, and preputial separation were noted in 70% or more of the studies. The remaining parameters in this category, as well as those listed in the reflex landmarks category, were evaluated in 52% or less of the studies. The frequency for measuring the parameters listed in the behavioral assessment category was also variable. Locomotion and cognition were evaluated in 60% or more of the studies, while the other parameters were examined in less than 40% of the studies.

The survey obtained more detailed information about the characteristics of the clinical signs and the behavioral parameters than is summarized in Table 3Go. Clinical signs were evaluated under the varying conditions noted above for the F0 generation. They were tested in home cage during mortality/morbidity checks in 55% of the studies, outside of the home cage using standard terms in 42% of the studies, and/or in a standard arena using a predetermined formal scoring system in 10% of the studies. In some cases, more than one of the procedures was used in a single study.

Behavioral assessments included evaluations of motor activity, cognition, sensory function, and auditory startle habituation, although the methodological characteristics varied across laboratories and not all methodological details were captured by the survey. For most studies, one pup of each sex per litter was evaluated and the average sample size was near 20 per dose group (range 10–30); one study reported evaluation of all pups in the litter for the learning tests with a sample size of eight litters per group. How data from this study were analyzed to maintain the litter as the unit of analysis was not revealed by the survey.

Motor activity was assessed by methods classified as either locomotion/ambulation (82% of the studies) or motor activity/other (32% of the studies), as noted in Table 3Go. For some studies, evaluations of both types were performed. Regardless of assessment type, offspring were commonly evaluated shortly after weaning (22–30 days old) and/or at 31–50 days of age. Locomotion/ambulation evaluations were also frequently performed in adult F1 animals (>50 days). The most common method for assessing locomotion/ambulation was monitoring activity in a novel environment (e.g., open field) using a photocell detection system (84% of studies in this category). Other methods used for the locomotion/ambulation category included the figure-eight maze, the running wheel, and home cage evaluation, which were used in 8%, 4%, and 4% of the studies, respectively. The average duration of the activity tests was approximately 2 h but varied from approximately 2 min (presumably a home cage observational evaluation) to 24 h. Methods classified as motor activity/other included any type of activity other than locomotion. The most common method comprising this category of activity was the monitoring of rearings, which was reported for 49% of the studies. Other methods, such as video-based systems, infrared detection systems, and home-cage observation ratings, were used in 49% of the studies noting this category of behavior. Stereotypy, in the context of activity assessment, was measured in a single study (2%).

Cognitive evaluation noted in Table 3Go was classified as learning/memory, learning/acquisition, or short- and long-term memory based on information from the survey that indicated whether learning/acquisition or memory was emphasized. Some studies included multiple assessments with different emphases. Methods used in all three categories included various types of water mazes, various versions of passive-avoidance, and various operant techniques. The three tests were used, respectively, in 67%, 29%, and 45% of the studies for the assessment of learning/memory; in 62%, 31%, and 8% of the studies for the assessment of learning/acquisition; and in 50%, 46%, and 4% of the studies for the assessment of short- and long-term memory. For studies emphasizing memory, the hiatus between acquisition and memory trials was most often 7 days (range < 1–7 days). Responses to the question regarding the hiatus between acquisition and testing, however, were not sufficiently consistent to allow a time distinction between short- and long-term memory.

For sensory function assessments, the visual and auditory systems were most frequently evaluated (50% and 42% of the studies assessing sensory function, respectively). Tactile sensation was less commonly evaluated (8% of the studies). Sensory system function was typically evaluated both during development and in adulthood.

Auditory startle habituation was examined in both developing and adult offspring. This test was generally considered to be an index of central nervous system processing and reactivity, and not considered to be a test of hearing or audition.

F1 parameters: Study outcome.
The two types of study outcome information described above for the F0 parameters were also obtained for the F1 measures. Thus, Columns 3–7 of Table 3Go provide information about whether the agents tested produced effects on the individual parameters, and column 8 provides information about the extent to which each parameter NOEL defined the overall F1 NOEL (i.e., F1 parameter NOEL = overall F1 NOEL).

As described for the F0 parameters in Table 2Go, the first line of Table 3Go provides information about outcome when all parameters included in the study were considered (i.e., overall F1 parameters). The overall F1 NOEL was not observed at any of the doses tested for 11% of the reported studies (i.e., NOEL was something less than the lowest dose examined; effects were observed at all doses). The overall F1 NOEL was established by the lowest dose tested for 20% of the studies (effects were observed at the medium dose); by the medium dose for 33% of the studies (effects were observed at the high dose); and by the high dose for 35% of the studies (no effects of the agent were observed). Thus, for the overall F1 parameters, 111 of the 174 (64%) reported studies noted an effect of the agent evaluated (Table 3Go, column 7).

Inspection of column 7 in Table 3Go, which summarizes the percentage and number of studies in which an agent affected the parameters, indicates that offspring mortality, pre- and postweaning body weights, and food consumption were the most frequently affected F1 parameters. Food consumption was also affected in a relatively high percentage of the studies in which it was evaluated; however, it was assessed in only 52 (29%) of the reported studies. In comparison, mortality and body weights were evaluated in virtually all studies. As noted in these columns, each of the remaining parameters, regardless of major category, were much less frequently affected (range 3–17%) except for eye opening, which was in the medium range (i.e., affected by agents in 24% of the studies).

The second outcome measure, the relative frequency with which the individual parameters defined the overall F1 NOEL (i.e., F1 parameter NOEL = overall F1 NOEL), in general provided a pattern across individual parameters similar to the frequency of effects measure noted above (see column 8 of Table 3Go). As expected, the mortality, body weight, and food consumption parameters had NOELs equal to the overall F1 NOEL in a high percentage of the reported studies, as was observed for the "frequency of effects" measure (column 7). In addition, the parameters for which there were relatively few studies where the individual parameter NOEL equaled the overall NOEL were also those that were infrequently affected by agents. Possible exceptions to the consistency between the two measures are noted for hair growth (physical developmental landmarks), acoustic startle (preweaning reflex landmarks), and sensory function (behavioral parameters). The reasons for the discrepancies between the two measures for these parameters are unknown but could reflect variance associated with the relatively few studies included in the sample. For example, of the 13 reported studies that detected an agent effect and also included hair growth as a measure, in only four of these studies (31%) was the NOEL for hair growth equal to the overall NOEL.

Comparison of F1 behavioral parameters to other F1 parameters.
The frequency with which the individual parameters contributed to defining the overall F1 NOEL (i.e., F1 parameter NOEL = overall F1 NOEL) provides a comparison of the relative sensitivity of F1 behavioral parameters (i.e., activity measures, learning and/or memory assessments, quantitative auditory startle and habituation, and sensory function) with other F1 parameters in hazard identification. Of the 174 studies, 61 (35% as in Table 3Go, column 6, high dose/no effect) were excluded from this analysis because the highest dose tested produced no effect (overall F1 NOEL at the high dose). The remaining 113 studies (i.e., those in which agent effects were observed) were categorized according to whether the NOEL for the combined behavioral parameters was defined by doses lower than, equivalent to, or greater than doses required for the other F1 parameters. Of particular interest are the categories in which F1 behavioral parameter NOELs were established by doses lower than or equivalent to doses required to establish NOELs for other F1 parameters. For three of the 113 reported studies (2.6%), the NOELs for behavioral parameters were established at lower doses than those for other F1 parameters. In two of these studies, higher doses produced no additional effects on other F1 parameters (i.e., the behavioral effects were the only effects observed in F1 animals). For another 15% of these 113 studies, the NOELs for F1 behavioral parameters were established at doses equivalent to those for other F1 parameters. Thus, behavioral parameters alone established the overall F1 NOEL for three studies. The behavioral parameters that defined the NOEL in these three studies were auditory startle habituation in one study and cognitive assessments (learning/memory, learning acquisition) for the other two studies. How the particular study characteristics (e.g., study design, methodology, and test characteristics) might have influenced the outcome of these studies was not determined because of the few studies in this category.

For the additional studies in which the overall F1 NOEL was established by both F1 behavioral parameters and other F1 parameters, each of the behavioral parameters listed in Table 3Go defined the overall F1 NOEL in one or more of the 16 studies. Among this group, locomotion/ambulation, auditory startle, learning/memory, and learning/acquisition contributed toward defining the F1 NOEL in 29–53% of the studies. Other F1 parameters that defined the overall F1 NOEL parameters, in terms of percentage of studies, were preweaning body weight (82%), postweaning body weight (76%), mortality (53%), clinical signs (47%), food consumption (35%), postweaning pathology (24%), and preweaning pathology (18%).

F1 parameter study outcome: Comparison to F0 study outcome.
Studies involving prenatal and early postnatal administration of drugs or other agents provide the potential for a confounding influence of maternal toxicity on the offspring. Therefore, as a means to evaluate the relationship between F1 outcomes and maternal effects, the ratios between the overall F0 (maternal) NOEL and the overall and individual parameter F1 NOELs were examined. Exclusion of 15 studies that failed to establish a NOEL in either generation left 159 studies available for comparison. The F1: F0 NOEL ratio was 1 for 83 (52%) of the studies, >1 in 52 (33%) of the studies, and <1 in 24 (15%) of the studies. Thus, developmental parameters were more sensitive than maternal parameters in 15% of the studies. Moreover, F1 behavioral parameters were affected by agents in the absence of maternal toxicity in 5 (3%) of the studies (i.e., the NOEL for F1 behavioral assessment was less than the F0 NOEL).


    DISCUSSION
 TOP
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS
 DISCUSSION
 REFERENCES
 
Potential Study Biases/Limitations
As noted above, survey data collected by voluntary submission from multiple organizations can have several potential data set biases that can influence interpretation and conclusions. Although the rate of return was somewhat low (51%), it was similar to that reported for the MARTA survey (Lochry et al., 1994Go). Thus, the data sets are susceptible to selection bias resulting from the particular organizations that chose to submit studies, as well as by the studies on the particular agents they chose to submit. As discussed in the above section, however, the Committee believes that the organizations that ultimately submitted studies were a representative sample of all organizations conducting evaluations of this type, and the studies included a wide range of pharmaceutical and chemical classes.

Another potential bias could derive from interlaboratory methodological differences (e.g., the extent of experimental control, personnel training, experimental protocols, and equipment), which can impact the outcome and interpretation of studies (Kimmel and Buelke-Sam, 1985Go; Lochry, 1987Go). Categorizing data from different laboratories according to functional domains (learning/memory, motor activity, etc.) can also have an impact on the interpretations, since the procedures used to assess these functions can differ substantially in their reliability and sensitivity. For example, data from the different procedures used to assess learning/memory (e.g., Biel maze vs. Morris maze spatial navigation tasks) or to assess motor activity (e.g., open field vs. figure-eight activity measures) may differ in their sensitivity and reliability but are combined into common categories. The absence of a positive control in survey data prevents identifying the extent to which these methodological differences across laboratories might affect interpretation and conclusions. However, it should be noted that all but one of the studies were conducted under GLP by laboratories that frequently conduct positive control experiments to evaluate their techniques and train technical personnel.

Finally, bias related to how the different contributing laboratories defined NOEL and what they considered to be true effects is a possibility. The absence of a clustering of positive studies to a small number of contributing organizations, however, suggests that this is unlikely to be a significant concern in interpreting survey results.

In spite of these potential problems, which are common to all surveys using voluntarily submitted material without random selection, the authors believe that the survey yields a substantial amount of useful information regarding the role of behavioral assessment in hazard identification and characterization as conducted by industrial laboratories today.

Database and Experimental Design Characteristics
The current survey was limited to studies conducted since 1990 and, thus, represents current procedures conducted under GLP guidelines. The survey contained a predominance of pharmaceutical studies (81%) as was reported for the 1991 MARTA survey (88%; Lochry et al., 1994Go). Experimental design characteristics were generally consistent with recommendations provided by the CBTS study (Kimmel and Buelke-Sam, 1985Go). The survey revealed that the rat is the predominant species used for the reported studies. The studies included sufficient sample size and number of agent doses. Agents were typically administered by oral gavage throughout the presumed period of central nervous system development in the rodent. Behavioral evaluations were generally performed at similar stages of development (shortly after weaning and/or during early adulthood), and usually included one pup per sex per litter. General maternal and offspring toxicity parameters were consistent across studies. Hence, several aspects of experimental design that could impact study outcome had some consistency across the submitted studies.

F1 Generation Behavioral Parameters: Frequency of Evaluation
One goal of the survey was to ascertain the frequency with which the different F1 generation behavioral parameters were employed in hazard identification and characterization. Some form of motor activity and learning/memory assessments were included in most studies (60–82% of the studies), albeit not as consistently as the general toxicology measures that were assessed in virtually every submitted study. Some form of locomotion/ambulation was evaluated in 82% of the studies, a frequency comparable to the general toxicology and the mating/reproduction parameters as well as several parameters in the physical development category. Some form of learning and/or memory was assessed in slightly more than half of the studies. Finally, sensory function and auditory startle habituation were evaluated about as often as reflex development parameters (in one-quarter to half of the studies).

The similar frequencies with which studies evaluated motor activity and cognition in this and the MARTA survey (Lochry et al., 1994Go) suggests some standardization of the general categories of behavior to be evaluated in toxicity studies, which is necessary for interlaboratory reliability (Adams, 1986Go; Adams et al., 1985aGo, 1985bGo; Buelke-Sam and Kimmel, 1979Go; Kimmel and Buelke-Sam, 1985Go; Kimmel et al., 1985Go). A caveat to this interpretation, however, is that quite different protocols were combined into the single behavioral categories, learning/memory, or motor activity.

Relative Sensitivity of F1 Generation Behavioral Parameters in Hazard Identification
The percentage of studies in which the individual parameters were affected by agents (column 7) or defined the overall F1 NOEL (column 8) of Tables 2Go and 3Go within a given study provides an index of their relative sensitivity for detecting toxicity of the agent evaluated.

Relative Sensitivity of Parameters within the Behavioral Assessment Category
The F1 behavioral parameters included in the submitted studies had roughly equivalent sensitivity in hazard detection. Comparison of the behavioral category parameters against other F1 parameters (see Results above) also failed to reveal behavioral parameters distinguished by their relative sensitivity. Although distinguishably sensitive neurobehavioral parameters were not identified by the survey, the long-/short-term memory and the sensory function parameters were relatively insensitive. This was established by both sensitivity measures for the memory category and by the parameter defining the overall F1 NOEL with other parameters for the sensory function test (see Table 3Go, columns 7 and 8). These tests would likely benefit from more attention to the details of behavioral assessment as outlined in a recent methods report for behavioral screening batteries (Cory-Slechta et al., 2001Go).

The absence of differences in the relative sensitivity of the behavioral parameters in our study is consistent with a recent symposium report on cognitive evaluation in neurotoxicity screening (Moser et al., 2000Go). This report noted that the functional observational battery (FOB) and motor activity were more sensitive than cognitive measures in some experiments, while in other experiments they did not differ. The lack of consistency among different measures of a particular function is not particularly surprising given the large range of assessment techniques employed and the diversity of compounds tested in the different laboratories. This analysis suggests that a single "magic bullet" parameter, or a single behavioral category, capable of serving as a biomarker for all types of developmental neurotoxicity is unlikely to exist.

Relative Sensitivity of Behavioral Parameters in Comparison to Other F1 Parameters
As noted in the Results section, the comparisons of F1 behavioral parameters with other F1 parameters indicated that NOEL doses were higher for the behavioral parameters than for other F1 parameters in 82% of the qualified studies, with preweaning body weight being the most frequently affected parameter. Other parameters assigned to the general toxicology category (e.g., postweaning body weight, mortality, and food consumption), although less sensitive than preweaning body weight, were more sensitive than most other F1 measures including behavioral parameters. In spite of being relatively less sensitive in detecting toxicity in the diverse compounds included in the survey, it must be emphasized that behavioral parameters were equivalent to other parameters in 17/113 (15%) of the reported studies and were more sensitive than other parameters in 3/113 (2.6%) of the studies. Furthermore, without information about the behavioral parameters affected by exposure to compounds during CNS development, characterization of their toxic effects, such as distinguishing developmental neurotoxicity from systemic, general, or maternal toxicity, would not be possible.

The contribution of F1 behavioral parameters toward hazard identification in the present survey differs somewhat from that reported by Ulbricht and Palmer (1996)Go. Behavioral effects, sometimes in combination with other parameters, defined the overall F1 NOEL in 18% of the reported studies in our survey. In comparison, Ulbricht and Palmer reported that motor activity and avoidance learning tests were affected in 28% of the studies they reviewed, but that these effects were noted at lower or equivalent doses to those for other F1 parameters in 5% of studies. Thus, although behavioral effects were observed in a lower percentage of studies in the current survey, the behavioral parameters were relatively more sensitive in hazard identification than noted in the earlier report. Differences in the particular agents sampled or in categorization of the behavioral parameters could contribute to these differences in outcome.

The results of our survey, together with the Ulbricht and Palmer report, suggest that neurobehavioral assessments are generally less-sensitive measures of F1 developmental toxicity than the parameters categorized as general toxicology in our survey. Several interpretations of this apparent reduced sensitivity are possible. Firstly, the developing nervous system might be relatively insensitive to general xenobiotic effects, although this seems unlikely based on numerous early reports suggesting that the developing nervous system is particularly vulnerable to insult (Buelke-Sam and Kimmel, 1979Go; Butcher et al., 1975Go; Rodier, 1976Go; Spyker, 1976Go). Secondly, particularly sensitive behavioral measures may have been masked by combining them with less-sensitive measures into a common category. Thirdly, the currently used behavioral tests are generally insensitive measures of developmental neurotoxicity. As proposed 25–30 years ago (Buelke-Sam and Kimmel, 1979Go; Butcher, 1976Go; Geyer and Reiter, 1985Go) and noted in the Ulbricht and Palmer report (1996)Go, more apical measures might be more readily impacted by the widely diverse agents included in surveys. It is possible that the particular behavioral techniques commonly used are less apical and, therefore, less likely to be affected by the wide variety of unknown agents in screening programs.

Most likely a combination of these interpretations, and perhaps other possibilities, contributes to the apparent relative insensitivity of the neurobehavioral measures in comparison to the general toxicology (e.g., body weight) in this survey. Clearly, a better understanding of normal nervous system development, mechanisms of developmental neurotoxicity, and test methodologies is needed. However, we would emphasize that these two studies indicate that neurobehavioral effects do occur in exposed F1 offspring and at times prove to be the most sensitive indicator of developmental effects. Importantly, these tests also help to characterize an agent’s potential hazard by providing information about whether exposure during development produces a behavioral dysfunction in the animal model. This information is, thus, available for comparison with drug exposure at other time periods. Given that no particular parameter was distinguishably more sensitive than others, it remains important to include numerous paradigms for a full characterization of potential behavioral effects of an agent to better understand its potential risk. Continued examination of multiple endpoints in regulatory studies seems warranted.

F0 and F1 NOEL: A Comparison
The F1: F0 NOEL ratio comparisons indicated that agent effects were detected more frequently by F0 than F1 parameters in these studies (i.e., ratio > 1 in 33% of studies considered). In spite of their apparent insensitivity, the F1 parameters, both general and behavioral, were relatively more sensitive than F0 maternal parameters in 15% of the studies (ratio < 1). Furthermore, F1 behavioral parameters were affected by agents in the absence of maternal toxicity in five (3%) of the studies (i.e., the NOEL for F1 behavioral assessment was less than the F0 NOEL). The survey does not provide sufficient detail to establish a basis for the differences in sensitivity of the F0 and F1 parameters.

Summary and Conclusions
The general toxicology parameters (mortality, pre- and postweaning body weight, and food consumption) were evaluated in virtually every study included in the survey. In general, the agents tested affected these parameters more frequently than other F1 generation parameters. Furthermore, the agents evaluated affected F0 parameters more frequently than the F1 measures. An important conclusion from the survey is that hazard identification was improved by the inclusion of multiple tests and neurobehavioral assessments. Using multiple test measures, agent effects were noted in 113/174 (64%) of the studies in comparison to only 85/174 (49%) for the best individual measure. Thus, isolated use of the best single measure in these studies would have missed possible toxic effects of 16% of the agents evaluated.

Of the F1 neurobehavioral parameters, motor activity and cognition were the most frequently assessed; however, they were less frequently evaluated as well as less often affected by the agents tested than were the general toxicology parameters. F1 behavioral parameters, however, in spite of their apparent lower sensitivity, were equivalent to other parameters in 17/113 (15%) of the reported studies. Also, they were relatively more sensitive than other parameters in 3/113 studies (2.6 %), indicating that toxicity would have been defined at a higher dose or could have been missed entirely in the absence of these parameters. While not detecting agent effects as readily as some measures, the F1 behavioral parameters provide information about agent effects on specialized functions of developing offspring not provided by other standard measures of toxicity. Thus, the inclusion of the F1 neurobehavioral assessments provides value to the risk assessment process by their unique contribution toward detecting and characterizing neurobehavioral toxicity from agent exposure during CNS development, a period of potentially heightened vulnerability. Further research into both the methods of assessment as well as the mechanisms mediating behavior will improve the contribution of these measures toward hazard identification and characterization.


    ACKNOWLEDGMENTS
 
The authors acknowledge the helpful critical reviews provided by Drs. J. Buelke-Sam and E. Faustman. The study was supported by the ILSI/HESI DART Committee.


    NOTES
 
1 To whom correspondence should be addressed at the Medical University of South Carolina, Center for Drug and Alcohol Programs, Department of Psychiatry and Behavioral Sciences, 67 President Street, P.O. Box 250861, Charleston, SC 29425. Fax: (843) 792-7353. E-mail: middauld{at}musc.edu. Back

2 The views expressed in this report are those of the authors and not of the organizations that they represent. Back


    REFERENCES
 TOP
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS
 DISCUSSION
 REFERENCES
 
Abel, E. L. (1975). Emotionality in offspring of rats fed alcohol while nursing. J. Stud. Alcohol 36, 654–658.[ISI][Medline]

Adams, J. (1986). Clinical relevance of experimental behavioral teratology. Neurotoxicology 7, 19–34.

Adams, J., Buelke-Sam, J., Kimmel, C. A., Nelson, C. J., Reiter, L. W., Sobotka, T. J., Tilson, H. A., and Nelson, B. K. (1985a). Collaborative behavioral teratology study: Protocol design and testing procedures. Neurobehav. Toxicol. Teratol. 7, 579–586.[ISI][Medline]

Adams, J., Oglesby, D. M., Ozemek, H. S., Rath, J., Kimmel, C. A., and Buelke-Sam, J. (1985b). Collaborative behavioral teratology study: Programmed data entry and automated test systems. Neurobehav. Toxicol. Teratol. 7, 547–559.[ISI][Medline]

Barlow, S. M. (1985). United Kingdom: Regulatory attitudes toward behavioral teratology testing. Neurobehav. Toxicol. Teratol. 7, 643–646.[ISI][Medline]

Buelke-Sam, J., and Kimmel, C. A. (1979). Development and standardization of screening methods for behavioral teratology. Teratology 20, 17–30.[ISI][Medline]

Butcher, R. E. (1976). Behavioral testing methods for assessing risk. Environ. Health Perspect. 18, 75–78.[ISI][Medline]

Butcher, R. E. (1985). An historical perspective on behavioral teratology. Neurobehav. Toxicol. Teratol. 7, 537–540.[ISI]

Butcher, R. E., Howver, K., Burbacher, J., and Scott, W. (1975). Behavioral effects from antenatal exposure to teratogens. In Aberrant Development in Infancy: Human and Animal Studies (N. R. Ellis, Ed.), pp. 161–167. Erlbaum, Hillsdale, NJ.

Butcher, R. E., and Nelson, C. J. (1985). Design and analysis issues in behavioral teratology testing. Neurobehav. Toxicol. Teratol. 7, 659.[ISI][Medline]

Cory-Slechta, D. A., Crofton, K. M., Foran, J. A., Ross, J. F., Sheets, L. P., Weiss, B., and Mileson, B. (2001). Methods to identify and characterize developmental neurotoxicity for human health risk assessment. I: Behavioral effects. Environ. Health Perspect. 109(Suppl. 1), 79–91.[ISI][Medline]

Cranmer, J. M., and Tilson, H. A. (1986). Neurotoxicology in the fetus and child: Introduction and overview. Neurotoxicology 7, 1–2.

Garman, R. H., Fix, A. S., Jortner, B. S., Jensen, K. F., Hardisty, J. F., and Claudio, L. (2001). Methods to identify and characterize developmental neurotoxicity for human health risk assessment. II: Neuropathology. Environ. Health Perspect. 109, 93–100.[ISI][Medline]

Geyer, M. A., and Reiter, L. W. (1985). Strategies for the selection of test methods. Neurobehav. Toxicol. Teratol. 7, 662.

Haddad, R. K., Rabe, A., Laqueur, G. L., Spatz, M., and Valsamis, M. P. (1969). Intellectual deficit associated with transplacentally induced microcephally in the rat. Science 163, 88–90.[ISI][Medline]

Hutchings, D. E. (1985). Issues of methodology and interpretation in clinical and animal behavioral teratology studies. Neurobehav. Toxicol. Teratol. 7, 639–642.[ISI][Medline]

Hutchings, D. E., Gibbon, J., and Kaufman, M. A. (1973). Maternal vitamin A excess during the early fetal period: Effects on learning and development in the offspring. Dev. Psychobiol. 6, 445–457.[ISI][Medline]

Jones, K. L., and Smith, D. W. (1973). Recognition of the fetal alcohol syndrome in early infancy. Lancet 2, 999–1001.[ISI][Medline]

Kimmel, C. A., and Buelke-Sam, J. (1985). Collaborative behavioral teratology study: Background and overview. Neurobehav. Toxicol. Teratol. 7, 541–545.[ISI][Medline]

Kimmel, C. A., Buelke-Sam, J., and Adams, J. (1985). Collaborative behavioral teratology study: Implications, current applications and future directions. Neurobehav. Toxicol. Teratol. 7, 669–673.[ISI][Medline]

Kutscher, C. L., and Nelson, B. K. (1985). Dosing considerations in behavioral teratology testing. Neurobehav. Toxicol. Teratol. 7, 663–664.[ISI][Medline]

Lochry, E. A. (1987). Concurrent use of behavioral/functional testing in existing reproductive and developmental toxicity screens: Practical considerations. J. Am. Coll. Toxicol. 6, 433–439.[ISI]

Lochry, E. A., Johnson, C., and Wier, P. J. (1994). Behavioral evaluations in developmental toxicity testing: MARTA survey results. Neurotoxicol. Teratol. 16, 55–63.[CrossRef][ISI][Medline]

Mactutus, C. F., and Tilson, H. A. (1986). Psychogenic and neurogenic abnormalities after perinatal insecticide exposure: A critical review. In Handbook of Behavioral Teratology, (E. P. Riley and C. V. Vorhees, Eds.), pp. 335–390. Plenum Press, New York.

Middaugh, L. D., Blackwell, L. A., Santos, C. A., and Zemp, J. W. (1974). Effects of d-amphetamine sulfate given to pregnant mice on activity and on catecholamines in the brains of offspring. Dev. Psychobiol. 7, 429–438.[ISI][Medline]

Middaugh, L. D., Santos, C. A., and Zemp, J. W. (1975). Phenobarbital during pregnancy alters operant behavior of offspring in C57BL/6J mice. Pharmacol. Biochem. Behav. 3, 1137–1139.[CrossRef][ISI][Medline]

Mileson, B. E., and Ferenc, S. A. (2001). Methods to identify and characterize developmental neurotoxicity for human health risk assessment: Overview. Environ. Health Perspect. 109, 77–78.[ISI][Medline]

Moser, V. C., Bowen, S. E., Li, A. A., Sette, W. S., and Weisenburger, W. P. (2000). Cognitive evaluation: is it needed in neurotoxicity screening? Symposium presented at the Annual Behavioral Toxicology Society meeting. Neurotoxicol. Teratol. 22, 785–798.[CrossRef][Medline]

Nolen, G. A. (1985). An industrial developmental toxicologist’s view of behavioral teratology and possible guidelines. Neurobehav. Toxicol. Teratol. 7, 653–657.[ISI][Medline]

Riley, E. P., Hannigan, J. H., and Balaz-Hannigan, M. A. (1985). Behavioral teratology as the study of early brain damage: Considerations for the assessment of neonates. Neurobehav. Toxicol. Teratol. 7, 635–638.[ISI][Medline]

Rodier, P. M. (1976). Postnatal functional evaluations. In Handbook of Behavioral Teratology (J. Wilson and F. Fraser, Eds.), pp. 185–209. Plenum Press, New York.

Sobotka, T. J., and Vorhees, C. V. (1985). Application of behavioral teratology data. Neurobehav. Toxicol. Teratol. 7, 665.[ISI][Medline]

Spyker, J. M. (1976). Behavioral teratology and toxicology. In Behavioral Toxicology (B. Weiss and V. Laties, Eds.), pp. 311–349. Plenum Press, New York.

Tanimura, T. (1985). Guidelines for developmental toxicity testing of chemicals in Japan. Neurobehav. Toxicol. Teratol. 7, 647–652.[ISI][Medline]

Tilson, H. A., and Wright, D. C. (1985). Interpretation of behavioral teratology data. Neurobehav. Toxicol. Teratol. 7, 667–668.[ISI][Medline]

Ulbrich, B., and Palmer, A. K. (1996). Neurobehavioral aspects of developmental toxicity testing. Environ. Health Perspect. 104(Suppl. 2), 407–412.[ISI][Medline]

Vorhees, C. V. (1983). Fetal anticonvulsant syndrome in rats: Dose- and period-response relationships of prenatal diphenylhydantoin, trimethadione, and Phenobarbital exposure on the structural and functional development of offspring. J. Pharmacol. Exp. Ther. 277, 274–287.

Werboff, J., and Dembicki, E. L. (1962). Toxic effects of tranquilizers administered to gravid rats. J. Neuropsychiatry 4, 87–91.