GRASPA, Research Group for Stastistical Applications to Environmental ProblemsResearch Unit of Palermo, Dipartimento di Scienze Statistiche e Matematiche S Vianelli, Facoltà di Economia, Università di Palermo90128 Palermo, Italy. E-mail: vito.muggeo{at}giustizia.it
SirsThere is considerable and increasing attention in evaluating the effects of temperature on health, due to the growing concentration of greenhouse gases which is causing average temperatures to rise. Death excesses are observed with low and high temperatures, therefore it is of interest to quantify both the cold- and the heat-related risks in order to gain insight into possible consequences of global warming.
Taking a clue from the recent paper by Gouveia et al. which appeared in this journal,1 I discuss briefly some drawbacks resulting from the methodology used in that paper.
A brief introduction on the topic comes first.
The Poisson regression has become the standard means to analyse daily time-series mortality data. Such a model relates the log-expected death count (logE[Y]) to explanatory variables, including temperature and confounders known to explain, to some extent, variability in daily deaths.
It is well known that the mortalitytemperature relationship is V-shaped, therefore a possible way to account for such a non-linear relationship is to set two complementary variables TEMP1 = min(TEMP-,0) and TEMP2 = max(TEMP-
,0). TEMP is the average temperature and
, usually assumed as known, is the value where mortality reaches its minimum. This approach is rather useful since it allows direct estimation of percent change in mortality associated with 1°C increase in cold and heat.1,2
Among the confounders, seasonality is undoubtedly the most important factor as mortality time-series are always characterized by a periodic pattern with peaks during the winters. Several methods have been discussed and employed to control for long-term trend, including dummy variables for months and harmonic components. However, recently non-parametric smoothing terms have become quite popular, since they are able to account for non-regular cycles.
Let f(TIME,df) be the non-parametric smoother of seasonality (TIME = 1,2,...) with df degrees of freedom and x the other confounders (such as influenza epidemics, day-of-week, holiday, air pollution) with linear parameters . Omitting a possible smoothing term for relative humidity, the semi-parametric model is:
![]() | (1) |
The backfitting algorithm is usually employed to estimate the parameters in the so-called generalized additive model in equation (1); S-plus, for instance, uses the backfitting by means of its function gam().
In air-pollution effect assessments, recently it has been pointed out that backfitting can lead to bias in the linear parameter estimates (,ß1,ß2 in the equation above) and underestimation of the corresponding standard errors.3,4
Simulations have shown that bias is essentially due to different factors, the most important being the degree of concurvity, namely, roughly speaking, the correlation between the non-parametric smoothed variable (TIME) and the parametric variables whose coefficients are of interest. These findings concern the effect of air pollutants whose pattern in time (that is the correlation with TIME variable) is usually moderate.
The problem that has to be emphasized when studying temperature effects is that temperature itself is much more related to time than any pollutant and thus leads to a higher degree of concurvity; as a consequence, bias in estimates of ß1 and ß2 are expected to be larger. To illustrate, Table 1 shows the estimates for all natural causes (ICD.IX 1799) mortality data, 19971999, in Palermo (South Italy having approximatively 700 000 residents). Estimates refer to per cent risk change, i.e. 100 x (exp(ß) 1)) obtained by backfitting (actually the S-plus gam() function) and by a possible alternative approach based on parametric regression splines that have been shown to yield unbiased estimates for the linear parameters.3 The model includes days-of-week, holidays, influenza epidemics, and air-pollution (PM10) evaluated as mean of lag 01, and temperature also included as mean lag 01.
|
Another point should be discussed here: estimation of model (1) is usually carried out conditioning on the break-point, , that is assumed to be fixed.1,2 Independently of the estimation approach, assuming
known can lead to underestimating the standard error of the other parameters (including ß1 and ß2), since the uncertainty of
is neglected. In order to obtain correct estimates of standard errors, one should fit several models for every fixed breakpoint
in the range of the observed temperature values and apply the formula of the conditional variance, taking averages over all
values selected.
Alternatively one could use a method recently proposed that allows estimation jointly of all the parameters of the model.5 Such a method could be very useful and desirable when one is interested in estimating a three-segment relationship. As pointed out by both referees, it may be more plausible to assume a rather wide range of moderate temperatures over which the risk is negligible. In principle, estimation of such multiple break-point models can be carried out quite straightforwardly according to the method outlined in reference 5, through a parameterization similar to the one used in the single break-point case. However, temperaturemortality curves often exhibit high heterogeneity, making estimation of two break-point patterns quite difficult.
In conclusion, ignoring the side-effect of backfitting and leaving out uncertainty in break-point detection is very likely to cause underestimation in standard errors and therefore overestimation of the precision of relative risks. While bias is expected in any linear parameter when backfitting is used, the bias in the estimate of the parameters for temperature can be very severe and can lead to misleading findings and conclusions. In particular, due to high concurvity between TIME and TEMP1, the cold effect is likely to be seriously overestimated with the heat-related risk remaining substantially unchanged as shown in Table 1. Thus care must be taken in fitting data according to model (1) and in interpreting the relevant results about temperature effects.
References
1 Gouveia N, Hajat S, Armstrong B. Socioeconomic differentials in the temperature-mortality relationship in São Paulo, Brazil. Int J Epidemiol 2003;32:39097.[CrossRef][ISI][Medline]
2 Kunst AE, Looman CWN, Mackenbach JP. Outdoor air temperature and mortality in the Netherlands: a time series analysis. Am J Epidemiol 1993;137:33141.[Abstract]
3 Dominici F, McDermott A, Zeger SL, Samet JM. On the use of generalized additive models in time-series studies of air pollution and health. Am J Epidemiol 2002;156:193203.
4 Ramsay TO, Burnett RT, Krewski D. The effect of concurvity in generalized additive models linking mortality to ambient particulate matter. Epidemiology 2003;14:1823.[CrossRef][ISI][Medline]
5 Muggeo VMR. Estimating regression models with unknown break-points. Statist Med 2003;22:305571.[CrossRef][ISI]
|