Department of Public Health, Wellington School of Medicine, University of Otago, PO Box 7343, Wellington, New Zealand. E-mail: tblakely{at}wnmeds.ac.nz
A classic finding of the Whitehall Study was that only a third of the association of occupational grade (a socioeconomic ranking of occupations within the British civil service) with coronary heart disease mortality was explained after adjusting for known cardiovascular risk factors.1 In current terminology, a third of the association of socioeconomic position with coronary heart disease mortality was estimated as indirect via known risk factors, and two-thirds (that unexplained) was estimated as direct. This direct effect, it is assumed (e.g. ref. 2), represents the mediating or indirect effects of other factors (e.g. unmeasured and/or unknown dietary and lifestyle behaviours, psychosocial factors). For example, subsequent research on the Whitehall Study has suggested that workplace characteristics of control and demand explained much of the remaining two-thirds3although this analysis has been criticized for conflating measures of socioeconomic position.4,5 The point here, though, is that this original finding from the Whitehall Study is just one example of a widespread (almost universal) practice within epidemiology that aims to describe and quantify causal pathways by controlling for possible mediating variables using standard epidemiological methods.
Enter Cole and Hernán6 who, in this edition of the International Journal of Epidemiology, build on prior work of Robins and Greenland (1992)7 and Poole and Kaufman (2000)8 to demonstrate that such standard epidemiological practice may be misleading. In brief, using counterfactual models and causal graphs they demonstrate that if an unknown variable (e.g. genotype) confounds the association of the mediating variable with the outcome, then stratifying the exposure-disease association by the mediating variable may not accurately partition the total effect into its direct and indirect components. Previous methodological work demonstrating this fallibility has used completely hypothetical examples. The Cole and Hernán example starts with actual data on randomized aspirin (exposure) and subsequent myocardial infarction (MI; outcome). However, the distribution of the potential mediating variable (platelet aggregation) and the genotype confounder remain constructed. Their data distribution is consistent with a causal and protective effect of aspirin on MI (relative risk of 0.6) being entirely due to platelet aggregation, yet when they stratify the aspirin-MI association by platelet aggregation the relative risk is unchanged. According to standard epidemiological expectation the stratified relative risk should have been 1.0.
What happened? To help understand Cole and Hernán's example, I have rearranged the data to determine RRUM|Ethe relative risk of the confounder high platelet aggregation association, stratified by the randomized aspirin exposure (Table 1
). It is striking that 95% of those people exposed to the confounder U within the unexposed (E = 0; no aspirin on a randomized intervention) had high platelet aggregation compared to 50% among the exposed (E = 1). The difference in the percentages among those not exposed to the confounder U is also striking: 50% compared to 5%. Thus, the confounder U is strongly associated with the intermediary variable M.
|
|
This type of situation described by Cole and Hernán is extremepurposely so for illustrative purposesbut nevertheless it is possible. For example, smoking is associated with MI, but smoking may also result in a decreased body mass index that (of itself) is associated with a decreased risk of MI. The key issues are in what direction and with what magnitude does this bias tend to occur in the real world. Regarding direction, we would usually expect the crude association of an unmeasured confounder with an outcome to at least be in the same direction as that mediated by some pathway variable. For example, a bad lifestyle (U) is associated with the constellation of high blood pressure, smoking, and being overweight. Consequently, controlling for just blood pressure (M) in the association between, say, socioeconomic status (E) and coronary heart disease (D) will probably overestimate the indirect effect via blood pressure alone as blood pressure is positively correlated with these other risk factors. (This overestimation of indirect effects is opposite to Cole and Hernán's example where the indirect effect was [completely] underestimated.)
At least as important as the source of error described by Cole and Hernán is measurement error of potential mediating variables that will usually cause indirect effects to be underestimated. For example, it is highly probably that if multiple (including across the life-course) and more accurate measurements of the known cardiovascular disease risk factors had been available in the Whitehall study,1 more than a third of the association of socioeconomic position with coronary heart disease would have been explained (e.g. ref. 9).
There is an urgent need for further methodological research that determines the likely magnitude and direction of bias in the estimation of direct and indirect effects, both by the use of real data sets and sensitivity analyses that determine if less extreme hypothetical examples than that presented by Cole and Hernán still produce noteworthy bias. (Ideally, this methodological research should also incorporate bias from measurement error.) In the meantime, it would be foolish to ignore the alarming findings of researchers such as Cole and Hernán that seriously question the validity of a common epidemiological practice, but neither should we abandon (yet) our standard methods of estimating direct and indirect effects. Accordingly, the three recommendations issued by Cole and Hernán are sensible: plan to collect information on potential confounders of the mediator-outcome association, include these potential confounders in the analysis, and clearly state the implicit assumptions of the standard method used to measure direct and indirect effects.
Acknowledgments
Tony Blakely is funded by the Health Research Council of New Zealand. Useful comments on drafts of this paper were received from Charlie Poole, Stephen Cole and Miguel Hernán.
References
1 Marmot MG, Shipley M, Rose G. Inequalities in deathspecific explanations of a general pattern? Lancet 1984;ii:100306.
2 Victora C, Huttly S, Fuchs S, Olinto M. The role of conceptual frameworks in epidemiological analysis: a hierarchical approach. Int J Epidemiol 1997;26:22427.[Abstract]
3 Marmot MG, Bosma H, Hemingway H, Brunner E, Stansfeld S. Contribution of job control and other risk factors to social variations in coronary heart disease incidence. Lancet 1997;350:23539.[CrossRef][ISI][Medline]
4 White M. Contribution of job control to social gradient in coronary disease [letter]. Lancet 1997;350:140405.[Medline]
5 Lynch J, Kaplan G. Socioeconomic position. In: Berkman L, Kawachi I (eds). Social Epidemiology. New York: Oxford University Press, 2000, pp.1335.
6
Cole SR, Hernán MA. Fallibility in estimating direct effects. Int J Epidemiol 2002;31:16365.
7 Robins J, Greenland S. Identifiability and exchangeability for direct and indirect effects. Epidemiology 1992;3:14355.[ISI][Medline]
8 Poole C, Kaufman J. What does standard adjustment for downstream mediators tell us about social effect pathways? [Conference poster, Society for Epidemiologic Research, June 1517 2000, Seattle.] Am J Epidemiol 2000;151(11): Abstract 208.[ISI][Medline]
9 Lynch J, Kaplan G, Cohen R, Tuomilehto J, Salonen J. Do cardiovascular risk factors explain the relation between socioeconomic status, risk of all-cause mortality, cardiovascular mortality, and acute myocardial infarction? Am J Epidemiol 1996;144:93442.[Abstract]