* Statistics and Information Sciences and
Nonclinical Safety Assessment, Lilly Research Laboratories, Eli Lilly and Company, Drop Code GL43, 2001 West Main Street, Greenfield, Indiana 46140
Received August 1, 2001; accepted December 11, 2001
ABSTRACT
To evaluate compound-related effects on the growth of rodents, body weight and food consumption data are commonly collected either weekly or biweekly in toxicology studies. Body weight gain, food consumption relative to body weight, and efficiency of food utilization can be derived from body weight and food consumption for each animal in an attempt to better understand the compound-related effects. These five parameters are commonly analyzed in toxicology studies for each sex using a one-factor analysis of variance (ANOVA) at each collection point. The objective of this manuscript is to present an alternative approach to the evaluation of compound-related effects on body weight and food consumption data from both subchronic and chronic rodent toxicology studies. This approach is to perform a repeated-measures ANOVA on a selected set of parameters and analysis intervals. Compared with a standard one-factor ANOVA, this approach uses a statistical analysis method that has greater power and reduces the number of false-positive claims, and consequently provides a succinct yet comprehensive summary of the compound-related effects. Data from a mouse carcinogenicity study are included to illustrate this repeated-measures ANOVA approach to analyzing growth data in contrast with the one-factor ANOVA approach.
Key Words: one-factor analysis of variance; repeated-measures analysis of variance; type I error; carcinogenicity study; growth phase; maintenance phase.
Measures of animal growth are routinely evaluated in toxicology studies and are key to interpretation of compound-related effects. Growth data based on body weight and food consumption are collected, analyzed, and interpreted for most rodent toxicology studies. One way to evaluate the growth data is to analyze a few profile features from these typically nonlinear curves (Winer, 1962). For example, the linear, quadratic, and cubic trends of the profile curves across time can be approximated by polynomials of the 1st, 2nd and 3rd degrees, respectively. Because it is often important in toxicology studies to interpret compound-related effects with reference to the specific time period, the profile analysis is not ideal. At Eli Lilly and Company, three additional parameters are derived for each animal: body weight gain, food consumption relative to body weight, and efficiency of food utilization (EFU). To smooth the data and lessen the fluctuation in food consumption and EFU parameters, cumulative food consumption and cumulative EFU values are calculated based on data collected since the beginning of the study. Therefore, the five parameters evaluated in rodent toxicology studies are:
Body weight;
Body weight gain from initial body weight;
Cumulative daily food consumption (total food consumed since study initiation divided by days on study);
Cumulative daily food consumption relative to body weight (cumulative daily food consumption divided by average body weight);
Cumulative EFU (body weight gain per 100 g food consumed).
Depending on the duration of the study and phase of growth, body weight and food consumption data are collected either weekly or biweekly. Each parameter is analyzed using a one-factor analysis of variance (ANOVA) at each time point for each sex.
Performing a one-factor ANOVA on each of the five parameters for each sex at each collection point leads to an inflated type I error rate (the probability of declaring a finding positive when it is in fact false). Higher than expected false-positive claims could cloud the interpretation of compound-related effects. The practice of obtaining cumulative food consumption and cumulative efficiency of food utilization data is an attempt to "smooth" the data to facilitate evaluation of the overall effect of the compound on these parameters. However, potentially meaningful increases or decreases are diluted by all the previous measures. These cumulative quantities, although reasonably smooth, have a limited ability to reflect temporal effects associated with the treatment. In light of these considerations, the practice of statistically analyzing five parameters based on body weight and food consumption is re-evaluated. The objective of this paper is to present an alternative approach to the evaluation of rodent growth data from both subchronic and chronic toxicology studies. An example of a 21-month mouse carcinogenicity study is included to illustrate these methods of analysis.
METHODS
In the evaluation of statistical methods for rodent growth data, four key points are discussed:
The following criteria are included in the considerations: effects on body weight and food consumption should be associated with appropriate periods of time, the number of parameters should be minimized to include only the essential parameters needed for data interpretation, the statistical methods should be robust and powerful, and inflation of type I error rate should be minimized.
Collection Intervals
The frequency of data collection is often determined by considerations for feeder capacity and dose calculation. In repeat-dose toxicology studies conducted at Eli Lilly and Company, body weight and food consumption data are collected weekly for up to 14 weeks and biweekly thereafter. These intervals are chosen based on the limitations of feeder capacity and the need to adjust doses based on recent body weight.
Analysis Intervals and Phases of Growth.
The number of analyses is kept to a minimum by defining biologically relevant analysis intervals. Each analysis interval is defined as a collection of one or more time intervals and is used for the statistical analysis. For example, an analysis interval may consist of three time intervals, Weeks 6, 7, and 8. Rodents used for toxicology studies typically begin study at 57 weeks of age. In general, growth is rapid in the first 3 months and slows down thereafter for both rats and mice (Fig. 1). Mouse body weights are typically more variable than rat body weights. Upon review of many mouse and rat studies, the transition point between growth and maintenance phases is determined to be around the end of the14th week of the study. Data in the growth phase and maintenance phase are analyzed separately.
|
In calculating the analysis interval averages, missing values will lead to missing observations for the intervals. Toward the latter part of a carcinogenicity study, animals die from a variety of age-related causes and tumors. If an animal dies in an analysis interval, then the animal will be represented in all analysis intervals up to the one in which it died. Although data for the animal will not be represented in the analysis interval in which it died, the loss of information is likely to be inconsequential, as the impact of the moribundity of individual animals in the examination of body weight effects is of less interest than the generalized effect across surviving animals.
Analysis Parameters
Two parameters are defined for evaluating the effects of a compound on body weight and food consumption. One is interval body weight (IntBW), defined as the weekly measured body weight during the first 4 weeks or the average body weight in each analysis interval thereafter. The other is interval daily food consumption adjusted for body weight (IntFCD_BW), a derived quantity defined as the food consumed in the analysis interval divided by the number of days and the average body weight in this analysis interval. In addition, changes in body weight gain at specific time points are sometimes desired to assist in interpretation of compound-related effects. For example, for dose selection for carcinogenicity studies that are for drug safety evaluation (International Conference on Harmonisation, 1995), a 10% change in body weight gain is specified as an important criterion. Therefore descriptive statistics on body weight gain can be calculated and reported without performing inferential statistical tests, as appropriate statistical analyses are conducted on IntBW and IntFCD_BW. For comparison purposes, the five analysis parameters commonly analyzed using a one-factor ANOVA as defined in the introduction are also discussed here to contrast with IntBW and IntFCD_BW.
Body weight.
Because body weights of an animal across time can be a function of the animal's initial body weight, randomization by body weight stratification at the beginning of the study is routinely carried out. This practice minimizes the bias in group means across treatment groups, but maximizes variability within each treatment group. Maximized variability will contribute to a loss of statistical power unless the variability is accounted for in the analysis. In an attempt to adjust for this variability, body weight gain has historically been statistically analyzed. However, this adjustment is performed under the assumption that there is a linear relationship between the initial body weight and subsequent body weights and that the slope of the linear relationship is 1. For data that do not exhibit this relationship, analyzing weight gain can potentially induce a misleading effect and increase the variability of the analysis. Therefore, the average body weight obtained in an analysis interval (IntBW), along with the initial body weight as a covariate, is more appropriate than either body weight alone or body weight gain for the evaluation of compound-related effects on body weight.
Food consumption.
Because food consumption may be affected by the animal's body weight, comparing daily food consumption among treatment groups without accounting for the body weight differential may be misleading. Therefore, compound-related effects on food consumption are evaluated by calculating relative daily food consumption, which is defined as daily food consumption divided by average body weight.
As food consumption tends to fluctuate across time, one option to smooth the data is to calculate the cumulative daily food consumption and cumulative relative daily food consumption. However, if these cumulative quantities include data from the beginning of the study up to the time point of interest, any increases or decreases are diluted by all the previous measures. For example, for a 26-week study, the actual daily food consumption in the analysis interval of the last 2 weeks accounts for only 1/13 of the cumulative daily food consumption and cumulative relative daily food consumption calculated for that interval, thereby blunting the 6-month effects. These cumulative quantities, although reasonably smooth, lose the ability to reflect temporal effects associated with the treatment. Therefore, the relative daily food consumption obtained in each analysis interval (IntFCD_BW), is more appropriate than either of the cumulative food consumption parameters for the evaluation of compound-related effects on food consumption.
Statistical Analyses
Repeated-measures ANOVA is performed on the two analysis parameters, IntBW and IntFCD_BW. This analysis evaluates the effects of the fixed factors: treatment, time, and the interaction between treatment and time. The initial body weight is included as a covariate to reduce variability and improve precision in the analysis of body weights. The individual animal is the experimental unit nested within the treatment groups as a random effect in the statistical model. Compound symmetry is assumed as the default covariance structure for each animal across time. Other covariance structures may be selected based on current or historical data. The compound symmetry covariance structure is also called an exchangeable covariance structure, in that the correlation coefficient is the same between any two time points. For body weight data, both the initial body weight and the random animal effect are important to account for animal-to-animal variability. The initial body weight accounts for the variability in animals that exists prior to treatment, whereas the random animal effect accounts for the variability that exists during the treatment. For food consumption data, body weight information is included in the derivation of IntFCD_BW, and no covariate adjustments are made in the repeated-measures ANOVA. Compound-related effects are evaluated based on least squares means of treatment groups, which control other factors in the model. In the absence of any significant treatment by time interactions, the evaluation of treatment effects is simply performed on the results pooled across analysis intervals. However, in the presence of a significant treatment by time interaction, the treatment effects will be evaluated in each analysis interval to describe the changing treatment effects across time. For example, consider a study with a control and three treatment groups of increasing doses. The contrast for testing for a linear trend in the treatment means in the second of five time intervals is
![]() |
RESULTS
Female data from a 21-month mouse carcinogenicity study were selected to illustrate the repeated-measures ANOVA methods for analysis of the growth data. The statistical analyses were performed using PROC MIXED in SAS 6.12 (SAS Institute Inc., 1996). This study had one control group and three treatment groups, each with 60 animals per sex. Although body weight and food consumption data are usually collected weekly for up to 14 weeks and biweekly thereafter, growth data were collected weekly for this sample study. For IntBW and IntFCD_BW, a total of nine values were obtained for each animal for each parameter during the growth phase. These nine values consist of four weekly values measured or derived for the first 4 weeks and five 3-week moving averages calculated at Weeks 5, 7, 9, 11, and 13. As the duration of this mouse study is only 21 months, slight modification was applied to the data collected during the maintenance phase. Eight interval averages instead of a typical nine for 24-month studies were obtained for each animal for each parameter. These eight interval averages consisted of three 5-week moving averages at Weeks 16, 20, and 24, and five 14-week moving averages (there were only 12 weeks available for the last average) for the rest of the study. Results of repeated-measures ANOVAs performed on IntBW and IntFCD_BW summarized for each analysis interval are compared with the results from one-factor ANOVAs on weekly data (Tables 2 and 3).
|
|
|
|
DISCUSSION
In contrast to the approach of performing a one-factor ANOVA at each collection point, the repeated-measures ANOVA alternative is to perform one repeated-measures ANOVA for the growth phase and another for the maintenance phase. The numbers of analyses performed on body weight and food consumption data for each sex for one rodent study are presented in Table 4. The repeated-measures approach of taking into account information in each phase should provide a more succinct yet comprehensive picture in the evaluation of compound-related effects. For example, for a typical 2-year carcinogenicity study, the practice of analyzing weekly or biweekly data will result in 296 one-factor ANOVAs for each sex: one for each of the 59 collection points for each of the five parameters and one for the initial body weight collected on Day 0. If these body weights collected at 60 time points are independent, then a 0.05 type I error rate will be inflated to 0.95 if 60 one-factor ANOVAs are performed on the data; i.e. there is a 95% chance of making a false-positive claim in at least one of the 60 independent tests. As each of the 60 tests is performed at the 0.05 type I error rate, for each test there is a 5% chance of making a false-positive claim and a 95% chance of not making that mistake. Therefore, the chance of not making any false-positive claims in these 60 tests is 4.6% calculated as 0.046 = (10.05)60, and the chance of making a false-positive claim is 95.4% as 0.954 = 10.046. Because the interpretation of compound-related effects is based on scientific judgment, extra effort is required to sort out and dismiss spurious findings due to the high incidence of false-positive claims. Although growth data collected from the same animal are not expected to be independent, weak dependency can still drive the type I error rate much higher than 0.05. In addition, when the type I error rate is held at the same level, there is a loss of power by performing a one-factor ANOVA on partial data at each time point instead of performing a more comprehensive repeated-measures ANOVA using the full set of information available in each phase of growth. The alternative analysis approach for IntBW and IntFCD_BW includes two repeated-measures ANOVAs each on eight or nine analysis interval averages. Depending on the significance of the treatment and time interaction, the compound-related effects could be evaluated either for each analysis interval or for the entire phase. For the former case, the number of analyses performed using the repeated-measures ANOVA approach is only 13% of that of the one-factor ANOVA approach. The exact reduction in the type I error rate could be estimated through simulations, but it suffices to say that the reduction would be dramatic for each of the two parameters.
|
Efficiency of food utilization is not calculated or analyzed on a regular basis. Although it may assist in toxicological interpretation for molecules with certain mechanisms of action(e.g., certain metabolic perturbations), it is quite variable and is not needed in a majority of rodent toxicology studies.
In summary, for evaluation of growth data in rodent toxicology studies, we illustrated an approach that uses a powerful statistical analysis method, reduces the number of false-positive claims, and consequently provides a succinct yet comprehensive summary of the compound-related effects. Only two key parameters are statistically examined: body weight (IntBW) and body weight-normalized daily food consumption (IntFCD_BW) obtained for each predetermined analysis interval. The preselected intervals for IntBW and IntFCD_BW are defined as weekly intervals for the first 4 weeks; 2-week intervals (3-week moving averages) for up to Week 14; 4-week intervals (5-week moving averages) for up to Week 26; and 14-week intervals to the end of a 2-year study. Repeated-measures ANOVA is performed on data for the growth and maintenance phases separately for each parameter for each sex. In general, similar overall conclusions are expected from the one-factor ANOVA and repeated-measures ANOVA approaches, but the latter streamlines interpretation of results and is a better-suited statistical approach for growth data.
|
The authors gratefully acknowledge Ms. Cindy Lee for computing support, Drs. Wendell Smith, Michael Dorato, Gerald Long, Mary Jeanne Kallman, Lorrene Buckley, Judy Henck, Mr. James Hoffman, Ms. Kathy Piroozi, Ms. Judith Hoyt, Ms. Susan Christopher, and Mr. Patrick Cocke for their helpful insight into this analysis strategy. We also thank Dr. Karl Lin (U.S. FDA) and Professor Raymond Carroll (Texas A&M) for their review of the statistical analysis approaches.
NOTES
1 To whom correspondence should be addressed. Fax: (317) 277-4783. E-mail: hoffman_wherly_p{at}lilly.com.
REFERENCES
International Conference on Harmonisation (1995). Guideline on Dose Selection for Carcinogenicity Studies of Pharmaceuticals. Federal Register 60, 12781281.
SAS Institute Inc. (1996). The MIXED procedure. In: SAS/STAT Software: Changes and Enhancements through Release 6.12, pp. 571702. SAS Institute Inc., Cary, NC.
Tukey, J. W., Ciminera, J. L., and Heyse, J. F. (1985). Testing the statistical certainty of a response to increasing doses of a drug. Biometrics 41, 295301.[ISI][Medline]
Vonesh, E. F., and Chinichilli, V. M. (1997). Linear and Nonlinear Models for the Analysis of Repeated Measurements. Marcel Dekker, New York.
Winer, B. J. (1962). Statistical Principles in Experimental Design, 2nd ed. McGraw-Hill, New York.