1 Department of Biostatistics, Bloomberg School of Public Health, Johns Hopkins University, Baltimore, MD.
2 Department of Epidemiology, Bloomberg School of Public Health, Johns Hopkins University, Baltimore, MD.
Received for publication February 5, 2001; accepted for publication October 12, 2001.
![]() |
ABSTRACT |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
air pollution; Fourier analysis; hierarchical model; mortality; Poisson distribution; time factors; time series
Abbreviations: Abbreviations: CI, confidence interval; PM10, particulate matter with an aerodynamic diameter ≤10 µg/m3.
![]() |
INTRODUCTION |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
A number of studies over the last decade have shown an association between particle concentrations in outdoor air and daily mortality counts in urban locations (13). These associations have been estimated through the use of Poisson regression methods, and the findings have been reported as log relative rates of mortality associated with air pollution levels on recent days. These associations have been widely interpreted as reflecting the effect of air pollution on persons who have heightened susceptibility because of chronic heart or lung diseases (4).
Thus, the increased mortality associated with higher pollution levels may be restricted to very frail people whose life expectancy would have been short even without air pollution. This possibility is termed the "mortality displacement" or "harvesting" hypothesis (5). If an effect is evident only at short timescales, pollution-related deaths are advanced by only a few days, and in fact, the days of life lost might arguably be of low quality for the frail individuals at risk of dying. Consequently, the public health relevance of the findings of the daily time-series studies has been questioned (6). The mortality displacement hypothesis received specific discussion in the 1996 Staff Paper on Particulate Matter prepared by the US Environmental Protection Agency because of its policy implications (4). The findings of two long-term prospective cohort studies of air pollution and mortality, the Harvard Six Cities Study (7) and the American Cancer Societys Cancer Prevention Study II (8), were considered to offer critical evidence counter to the mortality displacement hypothesis.
Several investigators have approached the problem of mortality displacement using analytical models for daily time-series data (912). If the association between air pollution and mortality does reflect the existence of a pool of frail individuals in the population, episodes of high pollution that lead to increased mortality might reduce the size of this pool, and days subsequent to high-pollution days would then be expected to show a reduced effect of air pollution. Therefore, the occurrence of this phenomenon can be investigated by assessing interaction between prior high-pollution days and the effects of subsequent pollution exposure on mortality counts; under the mortality displacement hypothesis, a negative interaction is predicted (13, 14).
Recently, Kelsall et al. (15) and Schwartz (11) developed related methods for analysis of daily time-series data, both offering approaches to estimating air pollution-mortality associations at varying timescales. More specifically, Kelsall et al.s methodology gives a continuous smooth estimate of relative risk as a function of timescale (frequency domain log-linear regression). Zeger et al. (10) applied the frequency domain log-linear regression to previously analyzed data for Philadelphia, Pennsylvania, from 19731988. Schwartz (11, 12) used a filtering algorithm (16) to separate the time series of daily deaths, air pollution, and weather into long-wavelength components, midscale components, and residual, very short-term components and applied this method to data on Boston, Massachusetts, from 19791986 and Chicago, Illinois, from 19881993. Note that both of these methods (10, 11) analyze both pollution and mortality on the same timescales, i.e., shorter-term to longer-term. Both sets of analyses found effects on longer timescales.
In this paper, we extend the work by Zeger et al. (10) and Schwartz (11) in the methodological, substantive, and computational arenas. More specifically, we develop a timescale decomposition of a time series based on the discrete Fourier transform; we introduce a two-stage model for combining evidence across locations for estimation of pooled timescale-specific air pollution effects on mortality; and we provide the software for decomposing a time series into a set of desired timescale components. At the first stage of the model, we use Fourier series analyses (17, 18) to decompose the daily time series of the air pollution variable into distinct timescale components. This decomposition leads to a set of orthogonal predictors, each representing a specific timescale of variation in the exposure. We then use this set of predictors in Poisson regression models to estimate a relative rate of mortality corresponding to each timescale exposure while controlling for other covariates such as temperature. A comparison between our approach and the frequency domain log-linear regression analysis is provided below in the section "Sensitivity analysis and model comparison."
The method is applied to concentrations of particulate matter, based on measurements of particles with an aerodynamic diameter less than or equal to 10 µg/m3 (PM10) and daily mortality counts from four US citiesPittsburgh, Pennsylvania; Minneapolis, Minnesota; Chicago, Illinois; and Seattle, Washington. These were four cities with daily PM10 measurements that were among the 90 largest US cities used in the National Morbidity, Mortality, and Air Pollution Study (19, 20). The analyses are restricted to these cities because they are the only US locations with daily air pollution concentrations available in this database for this time interval, while in most other locations, PM10 levels were measured only every 6 days as required by the Environmental Protection Agency. Our approach is not suitable for every-sixth-day PM10 data, for two reasons: 1) no information is available from the data for estimation of the short-term effects of air pollution on mortality and 2) because of the "aliasing" phenomenon, the effects of air pollution at the longer timescales are distorted. In our context, the aliasing phenomenon occurs when the sampling interval is larger than 1 day, so that variations in the daily time series at the shortest timescales produce an apparent effect at the longer timescales.
![]() |
MATERIALS AND METHODS |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
|
and
(1)
where c denotes the overdispersion parameter and the
s, the parameters of interest, denote the log relative rate of daily mortality for each 10-unit increase in the air pollution level in location c on a timescale k. Our modeling approach replaces the term
, where
is the air pollution time series and ßc is the city-specific log relative rate of mortality, with the sum
, where Xt =
kXkt, and the X1t, ..., Xkt, ..., XKt is a set of orthogonal predictors. This model estimates relative rates of mortality at different timescales and characterizes the timescale variation in the air pollution time series that contributes to the estimate of the overall effect ßc. Here we expect that under a short-term mortality displacement scenario, mortality would be mainly associated with a short-term effect of air pollution.
To protect the pollution relative rates from confounding by longer-term trends and seasonality, we also remove the variation in the time series at timescales approximately longer than 2 months by including a smooth function of time with 7 degrees of freedom per year. A sensitivity analysis with respect to selection of the number of degrees of freedom in the smooth function of time is discussed below. Smooth functions of temperature and dew point temperature are used to control for potential confounding by temperature and humidity. The rationale for and details on the selected smooth functions are provided by Samet et al. (2325), Kelsall et al. (26), and Dominici et al. (27).
Figure 2 illustrates the decomposition of Xt into six timescales. From the top of the panel to the bottom are displayed time series ranging from series that comprise only the more smooth fluctuations (low frequency components) to time series that comprise only the less smooth variations (high frequency components). The actual value of Xt (the last time series at the bottom) on day t is obtained by summing the values of the six component series on each day. Details on the Fourier series decomposition and the URL address for downloading the software for its implementation are provided in the Appendix. Using the decomposed time series, we can estimate timescale coefficients denoting the relative change in mortality per 10-µg/m3 increase in the corresponding timescale components of
. We estimate a vector of regression coefficients
and their covariance matrix Vc. Although one might want to allow a latency time for the effect of pollution, in equation 1 we regress
on
rather than use lagged pollution series
for some lag lk > 0. We investigate whether a lagged predictor is needed in the sensitivity analysis.
|
![]() |
RESULTS |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
with variance
.
An alternative approach would be to use as weights Wc = (D + Vc)1, where D is a diagonal between-city covariance matrix with diagonal element 2. Because of the limited number of cities in the present analysis, we cannot estimate
2 reliably and have assumed
= 0. A sensitivity analysis of our results with respect to different values of
2 obtained from hierarchical analyses of data from 20 cities (27) and 88 cities (28) is discussed below.
We estimated city-specific and pooled log relative rates of mortality for the following six timescale variations of PM10: ≥60 days, 3059 days, 1429 days, 713 days, 3.56 days, and <3.5 days. Figure 3 shows the pooled estimates of the log relative rates of mortality at different timescales for total mortality, cardiovascular and respiratory mortality, and mortality due to other causes. At the far right are the plotted estimates of the log relative rate of mortality obtained using the nondecomposed time series Xt. For all causes and for cause-specific mortality, we found that estimates of the association between air pollution and mortality obtained using the smoother variations in the time series (10 days to 1 month) are somewhat larger than those obtained using the less smooth variations (13 days). The largest effects occurred at timescales greater than 2 months for total mortality (1.35 percent per 10 µg/m3; 95 percent confidence interval (CI): 0.52, 2.17), cardiovascular and respiratory mortality (1.87 percent per 10 mg/m3; 95 percent CI: 0.75, 2.99), and other-cause mortality (0.72 percent per 10 µg/m3; 95 percent CI: 0.55, 1.95). To test the hypothesis that estimated effects at the longer timescales are larger than those at the shortest timescales, we linearly regress the pooled on the timescales and calculated weighted least squares estimates. The solid lines in figure 3 represent the fitted linear regressions. In all cases, the estimated slopes are negative, with t statistics close to the significance level.
|
|
We first test the sensitivity of the log relative rate estimates to the choice of lag for component exposure series at timescales shorter than 1 month. We assume that the lag lk is 0 for timescales greater than 1 month, since lags of 4 days will have little effect on the results for large timescales. We fit several different lags for each component exposure series and choose the best lag lk (the one with the largest t statistic) rather than assume that lk is 0. The optimal lags were obtained by including all timescale components in the models; they are summarized in table 2. Results for total mortality with an optimal lag compared with the original model with a zero lag are shown in the upper half of figure 4. Although the estimates differ at particular timescales, the overall shape of the curves remains similar and remains inconsistent with the short-term mortality displacement hypothesis.
|
|
Our strategy for investigating the impact of the assumption of homogeneity ( = 0) of the pollution effects on our results is based on inspecting the pooled timescale estimates for total mortality under four alternative values for
. These were extracted from Bayesian hierarchical analyses of 20 cities (27) and the 88 largest cities in the United States (28). The posterior mean values of
and the corresponding prior distributions are summarized in table 3.
|
|
|
![]() |
DISCUSSION |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
The larger relative rates at longer timescales may partly reflect a greater biologic impact on chronic exposures than on acute exposures. In fact, estimated relative risks from the Harvard Six Cities (7) and American Cancer Society (8) cohort studies, which address chronic exposures, are larger than estimates from times-series models (28), which are constrained to estimate the effects of shorter-timescale exposures.
The estimated relative rate of total mortality for the longest timescale (2 months) was 1.35 percent per 10-unit increase in PM10 for the four cities considered. While 1.35 percent is approximately 8 times larger than the overall pooled estimate of 0.17 percent, it is still an order of magnitude smaller than the estimated relative risks from the cohort studies (7, 8). Thus, the time-series relative rates, even when restricted to longer-term exposures, are much smaller than those from the major cohort studies. This difference might indicate that the most harmful exposures occur over much larger timescales than can be studied with time-series methods. However, relative rate estimates at the longer timescales should be interpreted with caution because of the confounding effects of seasonality and trend.
Our results are consistent with findings from previous reports for Philadelphia (10), Boston (11), and Chicago (12) that have used harvesting-resistant estimators. These methods are based on a conceptually straightforward stratification of the air pollution time series into different frequency bands, allowing assessment of associations on timescales with differing implications.
Our approach and the approaches proposed by Zeger et al. (10) and Schwartz (11, 12) address related but different questions. Zeger et al. (10) and Schwartz (11, 12) decompose both the air pollution time series and the mortality time series into different timescales of variation (Xkt and Ykt) and then aim to identify the timescale component that leads to the strongest association between time-averaged air pollution and time-averaged mortality. The timescale analysis proposed in this paper decomposes only the air pollution time series into different timescales (Xkt) and then characterizes the timescale variation of the effect of exposure on daily mortality. For linear models, these two approaches will provide the same results. In Poisson regression, with small effects such as those that occur with air pollution variables, the differences between results from the two approaches will probably be small. Our approach, however, is applicable over the range of Poisson or other generalized linear model applications.
The timescale decomposition shown in figure 2 could have been performed using wavelet methods. Wavelets are a natural extension of Fourier analysis; however, in wavelet analysis, the window or "scale" with which we look at the information stream is selected automatically. In our context, this automatic selection of the timescales is not particularly desirable. One of the advantages of using wavelets is that functions with discontinuities and functions with sharp spikes can be represented using substantially fewer wavelet basis elements than sine-cosine basis elements. Because a common characteristic of time series of mortality, air pollution, and weather data is their periodicity without large discontinuities, Fourier analysis is adequate for our purpose.
The mortality displacement problem that motivated the development of this method is not unique to air pollution; it has also been discussed in relation to heat waves and influenza. The statistical approach proposed in this paper is suitable for these or other epidemiologic analyses with the focus of differentiating short-term effects from long-term effects of a time-varying exposure on a health outcome. The set of timescales selected should match hypotheses concerning relations between exposure time and response. We also provide an alternative strategy with which to control for temporal confounding, since it is likely that confounding may vary with the timescale.
The timescale estimates from model 1 lead to specific patterns for the coefficients of a distributed lag model (9, 29, 30). A large effect at timescale k corresponds to an increased number of deaths for k/2 days after an air pollution episode, followed by a rebound below the baseline level for another k/2 days, owing to the depletion of the pool of susceptible people.
Unlike the distributed lag model, our approach is symmetric in time; that is, we use a symmetric time window (t lk, t + lk) to estimate Xkt. The temporal symmetry of our approach does not complicate our inferences, for two reasons. First, and most importantly, it is not plausible that mortality causes air pollution; it is only reasonable to consider the possibility that air pollution causes mortality. Second, we use a symmetric time window simply to better estimate the smooth variations of air pollution Xkt.
Other key methodological issues in time-series studies of air pollution and mortality are the nonlinearity in the dose-response curves, the effect of copollutants, and the effect of measurement error. These issues are discussed elsewhere and remain a topic of investigation (25, 28, 31, 32). In the context of mismeasurement of exposure, it is expected that the relative rate of mortality corresponding to the short time scales might be more attenuated by the measurement error than the relative rate of mortality corresponding to longer timescales. This is because more of the short timescale signal is actually error, whereas the longer timescale measure has effectively been smoothed so that measurement error is less of a contributor and hence less a source of bias. However, measurement error will not reverse the sign of an estimated coefficient or reverse the shape of the curve in figure 3. Therefore, even in the presence of measurement error, our results still do not support the "harvesting hypothesis" that the association between particle concentrations and mortality is entirely due to mortality among very frail persons who lose a few days of life.
![]() |
ACKNOWLEDGMENTS |
---|
The authors thank Drs. Rafael Irizarry, Giovanni Parmigiani, and Marina Vannucci for comments on an earlier draft of the paper.
APPENDIX
Here we outline the approach to decomposing a daily time series Xt into timescale components {Xkt : through the use of the discrete Fourier transform. The discrete Fourier transform is defined as
where T is the length of the series Xt and j = 2
j/T is the jth Fourier frequency with j cycles in the length of the data. Note that if j = 1, then
1 = 2
/T is a Fourier frequency with one cycle in the length of the data and describes the longest-term fluctuations. We note that when j ≥ T/2, we have d(
T j) =
, where
denotes the complex conjugate of d(
j). If T is even and j = T/2, then
T/2 =
is a Fourier frequency with a cycle for 2 days and describes the shortest-term fluctuations. Similarly, if T is odd and j = (T 1)/2, then
is the Fourier frequency describing the shortest-term fluctuations.
Let [0, 1, ...,
k, ...,
K,
] be a partition of the interval [0,
], and we define Ik = (
k1,
k] » [
T k,
T k + 1). The following holds:
We can decompose the Xt into Xkts by implementing the following algorithm. For k = 1, ..., K:
Taper the data Xt and get
Calculate the discrete Fourier transform of and get d(
j).
Set
Get Xkt by applying the inverse of the discrete Fourier transform to d*(j), j = 1, ..., T/2.
SAS, S-Plus, and R software for decomposing a time series into a desired set of frequency components can be downloaded at http://www.ihapss.jhsph.edu/software/fd/software_fd.htm.
![]() |
NOTES |
---|
![]() |
REFERENCES |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Related articles in Am. J. Epidemiol.: