a Department of Epidemiology, University of North Carolina School of Public Health, Chapel Hill, NC 275997400, USA.
b Carolina Population Center, University of North Carolina at Chapel Hill, 123 West Franklin Street, Chapel Hill, NC 275163997, USA.
c Department of Otolaryngology, University at Buffalo, 3435 Main Street, Buffalo, NY 14214, USA.
Jay S Kaufman, Department of Epidemiology (CB#7400), University of North Carolina School of Public Health, McGavran-Greenberg Hall, Pittsboro Road, Chapel Hill, NC 275997400, USA. E-mail: Jay_Kaufman{at}unc.edu
Maldonado and Greenland have provided a great service to our field in crafting this broadly accessible and eminently readable review of causal principles in epidemiological research.1 Attention to these issues yields substantial benefits in study design, analysis, and interpretation, and this new elucidation promises to raise the quality of epidemiological thought and practice widely by introducing the concepts to a new generation of researchers, and clarifying them further for the rest of us. Indeed, it is fitting that this review should appear in this journal, as Greenland and Robins' seminal article on this topic appeared in 1986 in these very pages.2 The authors have achieved an admirable level of clarity and simplicity in their presentation. Some of the devices for obtaining this conceptual simplicity, however, succeed at the risk of obscuring other important issues, and we comment on a few of these below. This is not to suggest that an alternative presentation may have been preferred, but rather merely to briefly explore a few of the many questions that are understandably avoided in the paper.
The authors organize their presentation around an aggregate model, rather than the individual causal model that dominates elsewhere.2,3 While this choice leads most directly to the comprehension of epidemiological contrasts, it also circumvents several considerations. To begin with, because causation ultimately operates at the individual level, an elucidation at that level, via potential response variables, helps to demystify the black box causal behaviour of a total population. Potential-response variables indicate, for each individual and for each exposure level under consideration, the disease response of that individual had it received that exposure. With this framework in hand, one can attach a clearer meaning to the stated assumption in the paper that disease occurrence is deterministic,1,p.1036 to wit, individual potential responses are fixed rather than random quantities. In other words, the characteristics identifying an individual are sufficient to uniquely determine that individual's response to any given level of exposure.
Following the authors, we consider the simplified case of only two levels of exposure, exposed and not exposed, and two levels of response, disease and no disease. If one were to assume, further, that exposure distributions 1 and 0 are everyone exposed and everyone not exposed, respectively, then the numbers of new cases', A1 and A0, are easily seen to be the numbers of individuals in the target population having potential response disease, if exposed and if not exposed, respectively. The authors actually consider more general exposure distributions characterized by per cent (or proportion) of population subjected to each exposure level (allowing for possibly more than two levels). This generalization may be problematic in that the proportions, alone, are insufficient to determine the aggregate numbers A1 and A0 unless one makes the highly unrealistic assumption that all individuals have the same potential responses. There is, however, another way out of this impasse, which is to assume that the different exposure levels in a mixed exposure distribution are assigned by random partition of the target population into subsets of the appropriate size. One might consider such randomization to arise by design, as in experimental studies,4 or by nature, as in observational studies.5 A consequence is that A1 and A0 are also random, and the quantities of interest would then become their expected values.1,p.1036
Another important source of variability is the sampling of the study population from the target population. The authors clearly subsume this under the rubric of confounding, inasmuch as bias in the estimation of the causal effect due to sampling variability arises from use of a substitute population that does not precisely correspond to the outcome experience (actual and counterfactual) of the target population. While this approach has many advantages, it also risks some confusion. Later in the paper the authors state that various epidemiological study designs represent different ways of choosing substitutes and sampling subjects from target and substitutes into the study...,1,p.1036 re-establishing a distinction between the two concepts that they had just wed. Furthermore, many readers may understandably be uncomfortable with the resulting de-finition of a confounder as a variable that partly explains why confounding is present,1,p.1036 since they may attribute explanation to causal confounders only. If we subsume sampling variability under the general category of confounding, then we find that we may indeed reduce confounding through conditioning on some covariates even when these covariates explain nothing, in that they are causally irrelevant to the etiologic process linking exposure of interest to disease.6 This discomfort may be heightened by the apparent inconsistency of taking expected values to deal with a stochastic potential response model1,p.1036 or with random assignments in mixed exposure distributions (as was noted above to be a necessary aspect of mixed distributions), but not taking expected values when sampling variability is involved. Finally, we must accept that many other authors distinguish confounding from sampling variability. Stone, for example, asserts that confounding pertains to distributions in the total population from which the sample was taken, and that confounding is present only if there exist unmeasured covariates which affect outcome, and are not independent of exposure, conditional on the measured covariates.7 For the time being at least, these conflicting approaches are sure to generate continued confusion in our field, and in our interactions with statisticians and social scientists.
As a final point, we note that the individual model of causation reinforces an appreciation for implications of the choice of exposures to study and the interpretation of their effect estimates. Causal inference is contingent on the manipulability of the exposure in order to provide some plausible basis for accepting the substitute population as even remotely adequate for the estimation of the counterfactual quantity of interest. This is particularly relevant to social epidemiology, because when the exposure is an individual attribute, such as race or sex, then any choice of substitute population can generally be rejected as grossly inadequate. For example, a team of epidemiologists recently claimed to have found evidence of racial differences in tumor virulence between black and white men with prostate cancer, based on an observational study of mortality in an equal access medical care setting.8 The choice to locate the study in the equal access' setting was motivated by the desire to have the conditional (i.e. covariate-adjusted) mortality experience of white prostate cancer patients serve as a reasonable substitute population for the counterfactual experience of black patients, had they been white. The assertion by the authors that this study reveals some innate biological feature of black race rests on this premise. The discussion by Maldonado and Greenland helps to clarify exactly why we may be left perplexed by such an assertion. It not only requires that we imagine what it means for there to be a counterfactual outcome distribution (i.e. the number of deaths that would have occurred among blacks, had they been white), but also that this quantity is reasonably estimated by the chosen substitute population, a particular group of white men. The approach appears to be quite problematic on both counts.9
In closing, we express our congratulations to Maldonado and Greenland for this contribution to the literature. Awareness of the foundations of causal inference in epidemiology has increased in recent years, and this is due in large part to the diligent efforts of Sander Greenland, James Robins, and their students. The present paper serves to provoke further discussion and insight, and to instruct a wider audience of epidemiologists. Through this ongoing process, we benefit our understanding thereby improving our science, and thus, our capacity to intervene upon and improve human health.
References
1
Maldonado G, Greenland S. Estimating causal effects. Int J Epidemiol 2001;30:103542.
2 Greenland S, Robins JM. Identifiability, exchangeability, and epidemiological confounding. Int J Epidemiol 1986;15:43339.[ISI][Medline]
3 Greenland S. Interpretation and choice of effect measures in epidemiologic analysis. Am J Epidemiol 1987;125:76168.[ISI][Medline]
4 Copas JB. Randomization models for the matched and unmatched 2 x 2 tables. Biometrika 1973;60:46776.[ISI]
5 Robins JM. Confidence intervals for causal parameters. Stat Med 1988;7:77385.[ISI][Medline]
6 Robins JM, Morgenstern H. The foundations of confounding in epidemiology. Computers and Mathematics with Applications 1987;14: 869916.[CrossRef]
7 Stone R. The assumptions on which causal inferences rest. J R Statist Soc (B) 1993;55:45566.[ISI]
8 Robbins AS, Whittemore AS, van den Eeden SK. Race, prostate cancer survival, and membership in a large health maintenance organization. J Natl Cancer Inst 1998;13:98690.[CrossRef]
9 Kaufman JS, Cooper RS. Seeking causal explanations in social epidemiology. Am J Epidemiol 1999;150:11320.[Abstract]