From the Department of Biostatistics, Bloomberg School of Public Health, Johns Hopkins University, 615 North Wolfe Street, Room E4138, Baltimore, MD 21205 (e-mail: fdominic{at}jhsph.edu).
Abbreviations:
NMMAPS, National Morbidity, Mortality, and Air Pollution Study; PM10, particulate matter 10 µm in diameter
![]() |
INTRODUCTION |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
In the past, critics of single-site studies questioned the validity of the data used and the statistical techniques applied to them. The critics noted inconsistencies in findings among studies and even in the same city upon independent reanalysis (5, 6
). They questioned the choice of particular cities and asked whether models had been selected that gave estimates of effect that were biased upwards. These criticisms have since been addressed by the use of multisite studies (8
, 9
) in which site-specific data on air pollution and health are assembled under a common framework. Hierarchical models, which combine information across locations, have provided a statistical approach for analyzing multisite studies (10
).
The work by Hwang and Chan (11), published in this issue of the Journal, is one of the latest contributions on this topic. Their study illustrates the utility of using hierarchical models to analyze data on the relation between air pollution concentrations and clinic visits for treatment of lower respiratory tract illness. Hwang and Chan analyzed such data (as well as data on temperature and dew point levels) for 50 sites in Taiwan in 1998. Here I discuss the advantages of using hierarchical models to analyze multisite time-series data on air pollution and health, provide perspective on the results of Hwang and Chan (11
), and address the problem of publication bias in meta-analyses.
![]() |
HIERARCHICAL MODELS |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
The use of Bayesian hierarchical models to analyze multi-site time-series data in relation to air pollution and health provides an appropriate approach for combining evidence across studies, quantifying the sources of variability, and identifying effect modification. For example, Hwang and Chan (11) assume a two-stage hierarchical model with the following structure.
Stage I: site
Given a time series of daily mortality counts at a given site, the association between air pollution and health within that site is described using a regression model, which takes into account potentially confounding factors such as trend, season, and climate. Among the output of the stage I analysis are the point estimate (s) and the statistical variance (
s) of the estimated mortality rate associated with each air pollutant at each site.
Stage II: the Taiwan region
Data from the 50 sites of the Taiwan Ambient Air Quality Network are combined by using a linear regression model, where the outcome variable is the true relative mortality rate associated with air pollution indexes within each site and the explanatory variables (Xjs) are the site-specific characteristics (population density, yearly averages of the pollutants and of temperature) that may modify the relative rate. Formally,
![]() |
If the predictors Xjs are centered about their mean values, the intercept (0) can be interpreted as the pooled coefficient for a hypothetical site with mean predictors. The regression parameters (
j) measure the change in the true relative rate of mortality associated with a 1-unit change in the corresponding site-specific variable.
The sources of variation in the estimation of the health effects of air pollution are specified by the levels of the hierarchical model. Under a two-stage hierarchical model, the difference between the estimated site-specific relative rate (s) and the true pooled relative rate (
0) can be broken down as
![]() |
The variation of s about ßs is described by the within-site variance (
s), which depends on the number of days with available air pollution data and on the predictive power of the site-specific regression model. The variation of ßs about
0 is described by the between-site variance (
2), which measures the heterogeneity of the air pollution effects across cities. The specification of a Bayesian hierarchical model is completed with the selection of the prior distributions for the parameters at the highest level of the hierarchy. If there is no desire to incorporate prior information into the analysis, then vague prior distributions are the default choice. However, it is important to complete the Bayesian analysis by investigating the sensitivity of the substantial findings to the choice of the prior distributions.
Posterior distributions of the pooled estimate (0), of the between-site variance (
2), and of the second-stage regression parameters (
j) provide an appropriate summary of the site-specific relative rates of mortality; a full characterization of the heterogeneity of the air pollution effects across the 50 locations; and the identification of site-specific characteristics that modify the association between air pollution and health. The two-stage hierarchical approach used by Hwang and Chan (11
) can be extended to include additional levels of the hierarchical models (for example, sites within geographic regions, geographic regions within nations, etc.) which lead to the estimation of additional sources of variability (within sites, between sites within regions, and between regions) and potential effect modifiers at the site or regional level (for example, see Dominici et al. (18
)).
Complex hierarchical models can be fitted using simulation-based methods (17, 19
) which provide samples from the posterior distributions of all parameters of interest. Several software packages for this process are now available (for example, see http://www.mrc-bsu.cam.ac.uk/bugs/). One of the appealing features of simulation-based approaches is that site ranking with respect to the magnitude of the health effects of air pollution is straightforward. For example, in the paper by Hwang and Chan (11
), the posterior probability that a particular site is the worst location in terms of the health effects of air pollution can be estimated easily by determining the empirical frequency with which the relative rate of hospital admissions at that particular site is the largest.
![]() |
HETEROGENEITY AND EFFECT MODIFICATION |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Hwang and Chan (11) reported that rates of daily clinic visits were positively associated with current-day concentrations of nitrogen dioxide, carbon monoxide, sulfur dioxide, and particulate matter
10 µm in diameter (PM10). Overall, they found that a 10-unit (ppb) increase in current-day nitrogen dioxide concentrations was associated with approximately a 5.8 percent increase (95 percent posterior interval: 4.9, 6.8) in daily clinic visits for respiratory illness (11
). This is a much larger estimate than the 0.2 percent increase in mortality (95 percent posterior interval: -0.25, 0.7) reported in the National Morbidity, Mortality, and Air Pollution Study (NMMAPS) (21
). Hwang and Chan also found that a 10-unit (µg/m3) increase in current-day PM10 concentrations was associated with approximately a 0.84 percent increase (95 percent posterior interval: 0.35, 1.31) in daily clinic visits for respiratory illness (11
). This pooled PM10 result is slightly lower than the morbidity results of the multisite NMMAPS, which reported a 1.4 percent increase (95 percent confidence interval: 0.34, 2.45) in hospital admissions for chronic obstructive pulmonary disease associated with current-day PM10 (21
).
With respect to effect modification, Hwang and Chan (11) found that an individual pollution coefficient for nitrogen dioxide was modified by a yearly average PM10 in a direction implying lower shorter-term (acute) effects of nitrogen dioxide at greater longer-term (chronic) levels of PM10. Similarly, the effects of sulfur dioxide and carbon monoxide exposure were also associated negatively with a community's annual PM10 concentrations.
In recent papers (18, 22
, 23
), researchers have explored and discussed effect modification of the health effects of air pollution but have focused mainly on the identification of modifiers of the short-term effects of PM10 instead of modifiers of the short-term effects of nitrogen dioxide, as Hwang and Chan did (11
). In the NMMAPS, Dominici et al. (18
) found that short-term PM10 effects are modified by long-term average PM10 level, indicating greater short-term (acute) PM10 effects at lower long-term (chronic) levels of PM10. Negative associations between short-term effects of air pollution (nitrogen dioxide and PM10) and long-term average PM10 level might indicate that the pool of susceptible individuals is smaller in cities with higher average PM10 concentrations. If this were the case, the short-term effects of air pollution on mortality would be lower in cities with a large particulate matter average than in cities with a lower particulate matter average and a relatively larger pool of susceptible individuals. On the other hand, Katsouyanni et al. (22
), within the APHEA Study [Air Pollution and Health, European Approach], found that long-term average nitrogen dioxide concentration was an effect modifier of the short-term effect of PM10, but in the opposite direction: the higher the average nitrogen dioxide level, the larger the particle effect. These findings suggest that the acute effects of PM10 on mortality might be greater in locations with a greater long-term average of air pollution originating from vehicle exhaust (nitrogen dioxide is considered an indicator of traffic-derived pollution) as compared with pollution from other sources.
In these multisite studies, there has not been a consistent pattern of effect modification, probably because of difficult methodological problems. These difficulties include the following: 1) the sites have sociodemographic characteristics which vary spatially within that site, leading to serious concerns about ecologic bias (19); 2) the predictors included in the second stage of the model represent a crude proxy for the desired site-specific characteristics, and they are very highly correlated; and 3) the number of sites is limited, which restricts the number of potential effect modifiers that can be investigated jointly.
![]() |
PUBLICATION BIAS |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
|
|
Figure 2 shows the posterior distribution of the pooled effect of PM10 on mortality (0) obtained under the meta-analysis (dotted line) and under the NMMAPS (solid line). Substantial overlap between the two curves indicates that the results obtained under the NMMAPS (21
) and the results obtained under the Levy et al. meta-analysis (23
) are relatively consistent. However, the posterior distribution of the pooled effect has a larger mean, and it is much more concentrated under the Levy et al. meta-analysis (23
) than under the NMMAPS (21
). For example, the posterior probability that the pooled PM10 effect estimated with the meta-analysis is larger than the pooled PM10 estimated under the NMMAPS is close to 80 percent. This indicates that meta-analyses of published studies that selectively favor findings with significant effects might seriously overestimate the pooled effect and underestimate the corresponding statistical uncertainty. However, this difference might also be explained by methodological differences between the NMMAPS and the time-series studies used in the meta-analysis.
![]() |
CONCLUSIONS |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Multisite studies and Bayesian hierarchical models provide a unified framework for 1) estimating individual pollutant effects for particular sites, pooled effects, and components of variation; 2) investigating effect modification; and, more generally, 3) producing more credible results by appropriately taking into account all sources of uncertainty and by overcoming the problem of publication bias.
Heterogeneity is potentially a key issue for the policy implications of epidemiologic studies of air pollution and health, because, in the presence of substantial heterogeneity across sites, the overall effect may have less public health relevance than site-specific estimates, and the characterization of effect modification becomes of primary scientific and public health interest (27). Clearly, interpretation of findings on effect modification is a difficult component of these analyses. The methods are weakened by gaps in the publicly available data on air pollution, mortality, and site-specific characteristics and by the inherent limitations of these data.
A promising approach to the investigation of effect modification might rely on the integration of results from small-area studies and global studies (that is, studies in which information from several multisite studies is combined). These two approaches are complementary with respect to their advantages and limitations. Small-area studies collect data in more homogeneous and restricted geographic areas. Therefore, they are less prone to ecologic bias, and the potential modifiers are more precisely measured. However, results of small-area studies are less generalizable to other locations. Global studies aim to combine information on air pollution and health within and between multisite studies from different countries and regions (the United States, Europe, Canada, and Asia), and therefore they are likely to provide a more complete characterization of the heterogeneity. However, they are still sensitive to the problem of ecologic bias, and additional work in these areas is necessary.
With repeated application of hierarchical models to multi-site data, current statistical methods offer an approach for tracking the health effects of air pollution over time as control measures are implemented. These techniques might contribute to the development of a global surveillance system for measuring the health effects of air pollution and for documenting improvements attributable to changes in the regulation of air purity.
![]() |
NOTES |
---|
![]() |
REFERENCES |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Related articles in Am. J. Epidemiol.: