1 Department of Mathematics & Statistics, University of Windsor, Windsor, Ontario N9B 3P4, Canada.
2 McLaughlin Centre for Population Health Risk Assessment, Institute of Population Health, University of Ottawa, Ottawa, Ontario K1N 6N5, Canada.
3 Department of Epidemiology and Community Medicine, University of Ottawa, Ottawa, Ontario K1H 8M5, Canada.
4 Healthy Environments and Consumer Safety Branch, Health Canada, Ottawa, Ontario K1A 0L2, Canada.
Correspondence: Karen Fung, Department of Mathematics & Statistics, University of Windsor, Windsor, Ontario N9B 3P4, Canada. E-mail: kfung{at}uwindsor.ca
![]() |
Abstract |
---|
![]() ![]() ![]() ![]() ![]() ![]() |
---|
Methods The objective of this paper is to evaluate the performance of different variations of these two procedures using computer simulation. Hospital admission data were generated under realistic models with known parameters permitting estimates based on time series and case-crossover analyses to be compared with these known values.
Results While accurate estimates can be achieved with both methods, both methods require some decisions to be made by the researcher during the course of the analysis. With time series analysis, it is necessary to choose the time span in the LOESS smoothing process, or degrees of freedom when using natural cubic splines. For case-crossover studies using either uni- or bi-directional control selection strategies, the choice of time intervals needs to be made.
Conclusions We prefer the times series approach because the best estimates of risk that can be obtained with time series analysis are more precise than the best estimates based on case-crossover analysis.
Accepted 22 May 2003
An abundance of epidemiological evidence indicates that increased counts of adverse health events are associated with elevated ambient air pollutant levels. Much of this evidenceis obtained through relative risk regression of Poisson time series data.13 Recently, increasing rigorous statistical time series modelling techniques have been used to better control for potential confounders in the analysis of such data.4 Furthermore, sophisticated analytical techniques have been introduced to adjust for seasonal trends in the data, culminating in the introduction of generalized additive models (GAM) for time series analysis.5 Cakmak et al.6 examined a number of filtering techniques designed to eliminate temporal cycles in time series data. Although temporal trends can be explicitly included in the model, non-parametric local smoothing methods (LOESS) based on generalized additive models (GAM) were widely used to take into account such trends in the analysis.7 Recently, it was discovered that when there is substantial concurvity (reflecting a high degree of multicollinearity) among variables in the data, lack of convergence in the GAM algorithm can lead to unstable estimates of risk, and understatement of the variance of risk estimates.8,9 As a consequence, Dominici et al.10 suggested another approach using parametric natural cubic splines in the generalized linear model (GLM) instead of the LOESS.
Some critics have argued that results based on times series analysis are not robust against the selection of the technique used to control for secular trends such as seasonal effects.11,12 It is further argued that the sensitivity of time series analysis may depend on the correlation between the air pollutant of interest and weather, which can vary from city to city. Hence, results obtained in different locations may not be robust against model specification.
The case-crossover design described by Maclure13 has recently been used as an alternative to time series analysis.1416 The case-crossover design is essentially a case-control design in which cases serve as their own controls. Risk estimates are based on within-subject comparisons of exposures at failure times with exposures at times prior to failure, using matched case-control methods. This procedure can be used to investigate whether a recent exposure has triggered the occurrence of a particular adverse health outcome, such as admission to hospital. The case-crossover design was originally conceived as a unidirectional retrospective design in which control time precedes the outcome. Navidi17 pointed out that uni-directional control sampling can subject the case-crossover design to bias from time trends in exposure prior to the occurrence of an adverse event.
Control times can also be assessed bi-directionally,18 sampling symmetrically from times before and after the outcome of interest. When the outcome does not affect subsequent exposure (a reasonable assumption for air pollution studies), it is also possible to determine the level of exposure at times post-failure. Such bi-directional sampling expands the temporal sampling frame and permits a greater number of referents to be chosen. Simulation analyses show that relative risk estimates based on the case-control design are more resistant to confounding by time trends.17
Recently, Navidi and Weinhand19 proposed a semi-symmetric bi-directional design. They pointed out that the bi-directional design is non-informative in that given the knowledge of the risk set (the case and the two controls), the case is necessarily bracketed in time by the two controls, and may lead to biased estimates of risk. The semi-symmetric bi-directional case-crossover design chooses one of the two controls selected with the bi-directional design at random. Since it is not possible to determine which of the two observations is the case and which is the control, this design may reduce or eliminate bias.
Case-crossover designs are particularly useful for estimating effects that are acute or transient. Otherwise, carry-over effects can result in bias. Because each subject serves as his/her own control, the case-crossover approach controls for the effects of stable subject-specific covariates such as gender and race. If the case and control periods in each risk set are relatively close in time, this method also controls for potential time-varying confounders such as seasonal effects or personal habits by design.
The purpose of this paper is to evaluate the performance of different methods of assessing the association between short-term exposure to outdoor air pollution and hospital admission for a specific health outcome. It is clear that the choice of statistical methods will affect the conclusions reached, as demonstrated through re-analyses of previous data sets using the case-crossover design.14,15 Even after adopting a time seriesor case-crossover approach, there are still a number of analytical choices to be made that may affect the conclusion reached. Since the accuracy of each method cannot be evaluated using empirical data, computer simulation under known conditions was used to evaluate alternative analytical approaches that have been used in practice. In particular, the hospital admission data were generated under specific models with known parameters, so that estimates derived from the different methods of analysis can be compared with these known values.
For this study, we have selected a particular pollutant (fine particulate matter <2.5 µm in aerodynamic diameter, PM2.5), and a particular health event (asthma), to evaluate the accuracy of risk estimates obtained by the different methods of analysis. In order to make our simulation as realistic as possible, we obtained the actual number of hospital admissions from the Ontario Ministry of Health for asthma in boys 612 years in the Toronto metropolitan area from 1981 to 1993. Daily values of PM2.5 were obtained from the Ontario Ministry of Environment and Energy (OMEE) for the same period.
![]() |
Simulation study |
---|
![]() ![]() ![]() ![]() ![]() ![]() |
---|
![]() | (1) |
Here, Dt is a time series consisting of a repetition of seven values representing the ratio of the average number of admissions on each of the 7 days of the week to the average daily admission rate. Starting with Monday, the seven values used to represent day of the week effects are: 1.16, 1.07, 1.0, 1.01, 1.01, 0.84, and 0.89. (Note that the series Dt reflects the fact that there are more admissions on Mondays and fewer on weekends.) St is a time series representing seasonality in the daily number of admissions. We applied a 19-day symmetric linear filter
![]() |
proposed by Shumway et al.21 to the admissions series to remove temporal trends and slow-moving serial correlation, where the symmetrical weights 0,
1,...,
9 are given by 0.0874, 0.0857, 0.0807, 0.0729, 0.0629, 0.0518, 0.0404, 0.0296, 0.0200, and 0.0123. Here, ut denotes the number of asthma admissions for boys aged 612 years in Toronto on the tth day, and
is the average number of admissions per day over the entire period.
The mean and interquartile range of PM2.5 in this dataset are 18 and 9.3 µg/m3, respectively. The odds ratio (OR) was fixed at 1.05 for an increase of 9.3 µg/m3 in PM2.5 (a value within the central portion of the range 1.001.09 of risk estimates reported in a recent paper by Lin et al.16), yielding
![]() |
The average daily admissions over the entire period was 0.97. Equating this value to the model in (1), assuming Dt and St are close to 1, and substituting the mean PM2.5 value for xt, we obtain
![]() |
Five-hundred sets of time series data were generated using the model given in equation (1). Each data set consists of 4748 days of simulated admission data and 4748 days of real PM2.5 data. These data were analysed using both time series and case-crossover methods.
Statistical analysis
Time series analysis
Two basic approaches to time series analysis were considered: co-adjustment and pre-adjustment. The co-adjustment method examines temporal trends and the air pollution predictor simultaneously in a GAM. The pre-adjustment approach, on the other hand, removes temporal trends from both the health and air pollution time series separately prior to linking them together. All analyses were run using the statistical software package S-Plus (Insightful, Seattle, WA22) to obtain estimates of the risk coefficient ß1 and the corresponding OR.
In modelling time series of adverse health events and air pollution concentrations, it is important to take into account the strong temporal trends present in the data due to seasonality and weather. The usual technique for removing such temporal trends is to apply a non-parametric smoothed function (LOESS in S-Plus) of day of study with the span chosen such that the autocorrelation in the residuals is minimal. A GAM (ref. 23, GAM in S-Plus) was then fitted to the hospital admissions data adjusting for air pollutant (PM2.5), day of the week, and a LOESS smooth function of day of study with a span of 13 months. It was recently found that the GAM backfitting algorithm is very unstable because of convergence problems with this algorithm,10,23 determined by the degree of concurvity (multicollinearity) among the parametric and non-parametric components of the model. The LOESS function in the S-Plus GAM employs certain approximations in calculating standard errors of the risk estimates because the full covariance matrix is difficult to calculate. As a consequence, the estimated standard errors obtained from GAM models are biased and too small.810 Ramsey et al.9 tried to estimate the variance by bootstrapping but found that the GAM variance estimates produced by the bootstrap are also biased downward. They then gave an exact method for variance estimation, but the number of steps required to do the computation is proportional to the cube of the number of observations in the data set. In our case, the computation is simply too large to accommodate the correct method, especially in a simulation setting. Instead, we will estimate the variance by a large set of parameter estimates derived from these simulated data sets and we call it the empirical variance.
Dominici et al.10 suggested an alternative smoother using GLM with parametric natural cubic splines (GLM NS) instead of the non-parametric smoothers LOESS. This smoother again requires the specification of a degrees of freedom (number of knots + 2) so that the autocorrelation in the residuals is minimized. In this study, we want to compare both LOESS and NS. For LOESS, we chose spans of 93, 62, and 31 days, which correspond to 54, 80, and 158 degrees of freedom in NS, respectively. To illustrate, if the span in LOESS is 31 days, we will have 12 cycles in a year and with 13 years of data, the degrees of freedom is 12 x 13 + 2 = 158. A convergence criterion of 10-8 was used in both LOESS and GLM NS.
With the pre-adjustment approach, we first removed temporal trends from the admission and air pollution time series by separately fitting GAM to adjust for day of the week and a LOESS or NS function of day of study. The two pre-filtered series (admission/predicted admission, PM2.5-predicted PM2.5) were then linked together using a GAM with s given by the predicted admission data. These weights were selected in order to ensure that the variance of the admissions is equal to their expected value, reflecting Poisson variation in the data.
Case-crossover analysis
The case-crossover approach essentially compares exposures at the time of hospital admission (the case period) with one or more periods when the admission did not occur (the control periods). With this approach, cases serve as their own controls. Excess risk is then evaluated using a pair-matched design and conditional logistic regression analysis.
For uni-directional designs, we sampled controls either 2 weeks before (pre) hospital admission, or 2 weeks after (post) hospital admission. This time interval was chosen because it represents the minimum period outside which correlation imposed by the 19-day Shumway filter used in the generation of the data can be ignored. (Other choices of the time interval for control selection will be discussed later.) Let x1k denote the covariate vector for the case and x0k the covariate vector for the control in the kth stratum (pair). The conditional likelihood for the kth stratum is then
![]() |
This likelihood function is identical to that for an unconditional logistic regression model with a constant term of zero and covariate x* = x1k - x0k. This allows us to use standard logistic regression software to compute the conditional maximum likelihood estimates and to obtain standard errors of the estimated coefficients.
For bi-directional designs, we sampled both pre- and post-admission to hospital (2 weeks in each direction) to effectively obtain a two-to-one matched case-control study. If more than one event occurs on the same day (a tie), computation of the exact likelihood becomes complex. Conditional logistic regression analysis can be performed via the Cox proportional-hazards function routine in S-Plus. Assuming that we identify the c cases in the kth stratum first, followed by the (nc) controls, the conditional likelihood for the kth stratum is then given by
![]() |
where P(xj) is the probability of obtaining the data for the jth case, and 1P(xj) is the probability of obtaining the data for the jth control. The denominator sums the joint probabilities for all possible permutations of the n observations comprised of c cases and (nc) controls in the kth stratum. Each permutation is indexed by the subscript i in equation (3). The full conditional likelihood is the product of the Lk over all the strata. The maximum likelihood estimates of the parameters can be obtained by choosing the exact option in S-Plus or the discrete option in SAS.
Early recognition of computational difficulties with the exact analysis, which requires a substantial amount of computer time for large data sets with ties, has led to development of more tractable approximations. We included two options for handling for ties in S-Plus due to Breslow24 and Efron25 in our comparisons. Breslows approximation works well when the number of ties is relatively small. Efrons approximation gives results that are much closer to the exact result when there are more ties. A detailed comparison of these methods of correction for ties has been given by Allison.26
For the semi-symmetric bi-directional case-crossover design, controls were selected at 2 weeks before (pre) or after (post) hospital admission with probability 0.5.
Criteria for evaluation
The mean of 1 and the mean of the odds ratios were calculated across the 500 simulated data sets. The model standard error is the square root of the mean of the 500 approximated variances of the LOESS estimates given by the GAM routine. To check if each of these model standard errors is underestimated, an empirical standard deviation was calculated from the 500 estimates of
1. The proportion of times the 95% CI for the OR included the true OR was evaluated (coverage), and the proportion of times the hypothesis that ß1 = 0 (power) was rejected was recorded. Results are presented in Tables 1
and 2
. Since the case-crossover analysis requires that controls be selected 2 weeks earlier or 2 weeks later than the case, we only considered cases between days 15 and 4734 in the 4748 day time series. These data were analysed using both time series and case-crossover methods.
|
|
![]() |
Results |
---|
![]() ![]() ![]() ![]() ![]() ![]() |
---|
Comparing the empirical standard errors of the LOESS estimates to the approximated standard errors from the GAM routine, we see that the approximated standard errors are indeed slightly smaller in the co-adjustment cases, indicating that some concurvity may exist. However, we do not see the same phenomenon in the pre-adjustment cases. The approximated standard errors are very close and sometimes even slightly larger than the empirical ones. The latter is also true when using natural splines.
Results in Table 2 show that the bi-directional method and the semi-bi-directional method in the case-crossover design outperform the uni-directional approach, because they can control for different patterns of time trends in exposures and outcomes, confirming previous findings by Navidi17 and Lee and Schwartz.14 In this study, the uni-directional pre-admission control approach yields estimates that are biased downwards, whereas the post-admission control approach gives estimates that are biased upward. For the bi-directional design with different methods of handling ties, the exact method appears to work best, with the approximate methods tending to yield coefficients that are biased toward zero. As expected, Efrons method of correcting for ties gives results that are much closer to the exact results than Breslows approximation, which only works well when there are relatively few ties. Estimates based on the semi-bi-directional method are midway between the pre and post uni-directional methods. This method did not perform as well as the exact bi-directional method in our simulations. According to Navidi and Weinhand,19 the semi-bi-directional method should have led to less-biased estimates. Unfortunately, it did not in this case.
We now compare the best time series method (pre-adjustment with 31 days span using LOESS with convergence level of 10-8 or co-adjustment with 158 degrees of freedom using the natural splines) with the best case-crossover analysis (exact bi-directional). (Note that there is no underestimating of variances in these time series cases.) While the mean value of the estimates 1 produced by all these methods are close to the true value, the standard error of
1 is much larger with the case-crossover approach. This explains why the power of the time-series analysis is higher than the power of the case-crossover analysis. The two approaches are similar in terms of the mean value coverage probability of the confidence interval for the OR. However, the length of the CI is much longer with the case-crossover approach because of the larger standard error of the estimated OR.
![]() |
Discussion |
---|
![]() ![]() ![]() ![]() ![]() ![]() |
---|
Since the data used in this simulation study were generated using a times series model, it is conceivable that a time series analysis may be expected to perform better than the case-crossover analysis, which does not make explicit use of the temporal stochastic structure of the data. On the other hand, since the case-crossover analysis requires no specific assumptions concerning the longitudinal structure of the data, the essentially non-parametric case-crossover analysis may be expected to be insensitive to the underlying time series model. In any event, a stochastic time series model is necessary for computer simulation of time series data of the type evaluated here.15,17
The time series approach involves fitting a log-linear regression model of the expected hospital admission counts on air pollution. This method is flexible in that many explanatory or confounding variables (such as weather or day of week) can be included in the model. The method allows for co-adjustment by including the temporal trends with the predictors, or pre-adjustment by filtering out temporal trends prior to analysis. The predictors can be chosen so that only the significant covariates are included in the model. Inferences on the unknown regression parameters can be made using existing statistical methods available in standard statistical packages. Autocorrelation among the observations can be handled using generalized estimating equations. The cumulative periodogram, acf function, and Bartletts test can be used to determine what span one should choose in the LOESS function or degrees of freedom in natural splines to minimize the autocorrelation. The fit of the regression model to the data can also be evaluated using Akaikes information criteria (AIC). Notice that we did not optimize the admission span for each generated sample in our experiment here because it requires inspection of graphs as well as personal judgement. In a simulation study, we cannot examine each simulated dataset individually. Instead, we used a fixed span of 13 months which should provide some idea of the most suitable span to be used across the 500 simulated datasets.
In the case-crossover analysis, the length of time for which the data exhibits autocorrelation (determined for example, by the acf function) needs to be determined. The optimal choice of time interval for controls is then the minimum number of weeks outside the autocorrelation period. (The day of the week for case and controls is assumed to be the same here.) It is known that when the controls are too distant from the case, selection bias due to seasonal patterns in either exposure or outcome can occur, or there may be possible confounding due to the changing conditions at different points in time. On the other hand, when controls are spaced too close to the case, autocorrelation among daily exposure might introduce bias. As in the times series analysis, we used a fixed time interval and did not attempt to find the optimal interval for each simulated sample. Case-crossover designs are not suitable for cumulative effect estimation because of the difficulty in control selection in the presence of lag effects of air pollution.
In this simulation study, pre-unidirectional case-crossover analysis was found to underestimate risk while post-unidirectional methods overestimate the OR for air pollution in relation to asthma hospitalization. Note that these are crude estimates with no adjustments of covariates. Our results are consistent with those from previous simulation studies that suggested that the bi-directional case-crossover design, in which control information is assessed both before and after the event, should be preferred because it can control for different patterns of time trends in exposures and outcomes.18 However, our findings are not consistent with the most recent results given by Navidi and Weinhand,19 who reported that the semi-bi-directional design led to more accurate results than the bi-directional design. In the present study, the mean estimate of risk (reflected either by the regression coefficient, 1, or the corresponding OR) is, as expected, midway between the mean values of the estimates of risk based the two (pre and post) uni-directional designs.
A major advantage of the case-crossover design is that since each individual serves as his/her own control, potential confounding caused by subject characteristics can be eliminated. However, such self-control may not adequately control for a characteristic that changes over time. For example, a smoker may be more likely to smoke outdoors, and is therefore more likely to be exposed to outdoor air pollution at the same time.
Another advantage of the case-crossover analysis is that it is relatively easy to implement. The bi-directional case-crossover design is also able to control for time trends in both exposures and outcomes.13,27 Our study indicates that the bi-directional case-crossover design using the Efron and exact methods provides reasonably good estimates of the OR under simple time series models.
The case-crossover design also has certain disadvantages. First, the estimates are less precise than those based on time series analysis. This may be due to the use of less information in case-crossover as compared with time series analysis. Second, selecting a time interval for controls is crucial. And third, the case-crossover design is not suitable for cumulative effect estimation.
In conclusion, it is difficult to choose between the two methods of analysis (time series and case-crossover) examined in this study. Both methods can provide reasonable estimates of the risk of adverse health outcomes associated with short-term exposure to elevated concentrations of ambient air pollutants. Both methods may be implemented in a number of different ways, depending on choices made about temporal smoothing in time series analysis and control selection and handling of ties in case-crossover analysis. However, the best time series approach considered here does lead to more precise estimates of risk than the best case-crossover approach.
KEY MESSAGES
|
![]() |
Acknowledgments |
---|
![]() |
References |
---|
![]() ![]() ![]() ![]() ![]() ![]() |
---|
2 Burnett R, Dales R, Krewski D, Vincent R, Dann R, Brook JR. Associations between ambient particular sulfate and admissions to Ontario hospitals for cardiac and respiratory diseases. Am J Epidemiol 1995;142:1522.[Abstract]
3 Schwartz J. Air pollution and hospital admissions for heart disease in eight US counties. Epidemiology 1999;10:1722.[ISI][Medline]
4 Burnett R, Bartlett S, Krewski D, Roberts G, Raad-Young M. Air pollution effects on hospital admissions: a statistical analysis of parallel time series. Environ Ecologic Stat 1994;1:32532.
5 Schwartz J. Air pollution and daily mortality in Birmingham, Alabama. Am J Epidemiol 1993;137:113647.[Abstract]
6 Cakmak S, Burnett R, Krewski D. Adjusting for temporal variation in the analysis of parallel time series of health and environmental variables. J Expos Anal Environ Epidemiol 1998;8:12944.[ISI][Medline]
7 Schwartz, J. Nonparametric smoothing in the analysis of air pollution and respiratory illness. Can J Stat 1994;22:47188.[ISI]
8 Samet JM, Dominici F, McDermott A, Zeger SL. New problems for an old design: time-series analyses of air pollution and health. Epidemiology In press.
9 Ramsay T, Burnett RT, Krewski D. The effect of concurvity in generalized additive models linking mortality to ambient particulate matter. Epidemiology In press.
10 Dominici F, McDermott A, Zeger SL, Samet JM. On the use of generalized additive models in time series of air pollution and health. Am J Epidemiol 2002;156:193203.
11 Li Y, Roth HD. Daily mortality analysis by using different regression models in Philadelphia County, 19731990. Inhal Toxicol 1995;7:4558.[ISI]
12 Moolgavkar S, Luebeck EG, Hall TA, Anderson EL. Air pollution and daily mortality in Philadelphia. Epidemiology 1995;6:47684.[ISI][Medline]
13 Maclure M. The case-crossover design: A method for studying transient effects on the risk of acute events. Am J Epidemiology 1991;133:14453.[Abstract]
14 Lee JT, Schwartz J. Reanalysis of the effects of air pollution on daily mortality in Seoul, Korea: a case-crossover design. Environ Health Perspect 1999;107:63336.[ISI][Medline]
15 Checkoway H, Levy D, Sheppard L, Kaufman J, Koenig J, Siscovick D. A case-crossover analysis of fine particulate matter air pollution and out-of-hospital sudden cardiac arrest. Research Report, Health Effects Institute, 2000; No. 99.
16 Lin M, Chen Y, Burnett RT, Villeneuve P, Krewski D. The association of estimated ambient concentrations of coarse particulate matter on asthma hospitalization of children in Toronto, Canada: case-crossover and time-series analyses. Environ Health Perspect 2002;110:57581.[ISI][Medline]
17 Navidi W. Bidirectional case-crossover designs for exposures with time trends. Biometrics 1998;54:596605.[ISI][Medline]
18 Bateson TF, Schwartz J. Control for seasonal variation and time trend in case-crossover studies of acute effects of environmental exposures. Epidemiology 1999;10:53944.[ISI][Medline]
19 Navidi W, Weinhand E. Risk set sampling for case-crossover designs. Epidemiology 2002;13:10005.[CrossRef][ISI][Medline]
20 National Research Council. Research Priorities for Airborne Particulate Matter I: Immediate Priorities and a Long-Term Research Portfolio. Washington, DC: National Academy Press, 1998.
21 Shumway RH, Tai R, Tai L, Pawitan Y. Modeling mortality fluctuations in Los Angeles as functions of pollution and weather patterns. Environ Res 1988;45:22441.[ISI][Medline]
22 S-Plus 6, Insightful Corporation. 1700 Westlake Ave. N, Suite 500, Seattle, WA, 2001.
23 Hastie TJ, Tibshirani RJ. Generalized Additive Models. Chapman & Hall, London, 1990.
24 Breslow NE. Covariance analysis of censored survival data. Biometrics 1974;30:8999.[ISI][Medline]
25 Efron B. The efficiency of Coxs likelihood function for censored data. J Am Stat Assoc 1977;76:31219.
26 Allison PD. Survival Analysis Using the SAS System: A Practical Guide. SAS, Institute Inc., Cary, NC, 1995.
27 Redelmeier DA, Tibshirani RJ. Interpretation and bias in case-crossover studies. J Clin Epidemiol 1997;50:128187.[CrossRef][ISI][Medline]