Commentary: Counterfactuals: help or hindrance?

AP Dawid

University College London, Department of Statistical Science, Gower Street, London WC1E 6BT, UK.

I welcome this attempt to clarify some of the often perplexing issues, both definitional and philosophical, underlying the formulation and estimation of causal quantities of interest. At the same time I am a little disappointed that the authors' case1 has not been made with deeper analysis and greater clarity. In particular, I believe that their emphasis on a counterfactual understanding of causality is mostly superfluous and, at some points, misleading. See Dawid2—henceforth CIWC—for a detailed account of this position, as well as some dissenting views. In the terminology of their paper, Maldonado and Greenland are considering, as their experimental unit u, a specified population, in given circumstances, studied over a given etiologic time period (§1 of CIWC echoes the authors' valuable emphasis on the need for absolute clarity in the definitions and external referents of the theoretical terms employed). Their treatment t is the ‘exposure distribution’ applied to the population. Although Maldonado and Greenland insist on a clear definition of exposure at the individual level, at the desired population level this is less precisely specified (e.g. 20% of the population smoke); this could, and ideally should, be described in greater detail (exactly who smoked, and for how long). However, it appears implicit in the authors' account that populations may be regarded as sufficiently large and homogeneous that such individual level detail can be ‘averaged out’ over the population, so that we can neglect the effect, on the observed overall proportion, of sampling variability and other such phenomena. Such a ‘large population’ assumption must also underlie their working assumption that ‘response’, as measured by population proportion affected, is ‘deterministic’. It is not clear to me exactly what else is intended by this description. In particular, does it imply that, were we to study two different populations, we would expect to observe identical responses to the same exposure distribution?—the property termed ‘uniformity’ in CIWC. This property is a very strong one that I would not normally expect to hold, but it is at least empirically testable. When it does hold, we can find a perfect substitute population u0 for a given population u1. On applying, say, exposure distribution 1 to u1 and exposure distribution 0 to u0, we could then observe, in effect, both R1 and R0 (where, as in the paper, Ri, or more fully Ri(u1), denotes the disease frequency, if the target population u1 had experienced exposure distribution i)—and thus directly measure any causal contrast. So, when the above uniformity property can be taken to hold, ‘counterfactuals' become observable and unproblematic.

A weaker form of uniformity, which we may term ‘conditional uniformity’, might apply when there are covariates that affect response. Conditional uniformity asserts that, if two different populations have identical values for the covariates, and identical exposure distributions, then they will deliver identical responses—still a strong assumption, and again testable (for a specified set of covariates). When this holds, it is once again, in principle, possible to find a perfect substitute, by matching on all the relevant covariates. The authors appear to be mainly concerned with the practical difficulty that the chosen substitute u0 might not be perfectly matched, leading to R1(u0) != R1(u1), and so typically ‘biasing’ the substitute causal contrast. This important point is well made and helpful, but its connection with ‘confounding’ as usually understood is far from clear.

Things becomes much murkier if no uniformity assumption can be made, since then no perfect substitute exists. Even if two populations u1 and u0 could be regarded as a priori exchangeable, so that R0(u0) and R0(u1) initially have the same distribution, R0(u0) may no longer be an appropriate (unbiased) substitute for R0(u1) after exposing population 1 to exposure distribution 1 and observing R1(u1), since that observation might carry some information about the level of immunity in population 1, so changing the distribution of R0(u1) but not that of R0(u0). As pointed out in §11 of CIWC, such effects are highly sensitive to untestable assumptions made about the joint distribution of [R1(u1), R0(u1)] (although bounds are available, which become tighter as the situation approaches that of uniformity).

We can eliminate some of these difficulties by using data of the form R1(u1) and R0(u0) (for a number of distinct but exchangeable populations) to estimate the marginal probability distributions of each of R1(u*) and R0(u*) for some new, as yet unexposed, ‘test population’ u*, exchangeable with those studied. (Analogues of the authors' cautions against bias will continue to apply if the populations u1, u0 and u* are not perfectly exchangeable; but the analysis can then be modified accordingly, so as to take into account differences in observed [CIWC, §8] and/or unobserved [CIWC, §6] concomitant variables.) I argued in CIWC that comparison of these estimated marginal distributions for R1(u*) and R0(u*) is all that is required for causal inference about the effects of switching between treatments (exposure distributions) on the test population u*. Note particularly that, in contrast to inference about a causal contrast such as R1(u1)/R0(u1), such a comparison is not affected by untestable assumptions about the joint distribution of [R1(u1), R0(u1)]. So in this setting the assumption of coexisting potential responses is unnecessary, and can indeed be positively harmful. (The issue is not exactly that either response is ‘counterfactual’—before the exposure decision for u*, each of R1(u*) and R0(u*) can in principle still be observed, and so is ‘hypothetical’ rather than counterfactual; rather, the point is that, for any population u, we can never, even in principle, observe both R1(u) and R0(u) together—they are ‘complementary’ —and so we can never learn about their dependence structure.)

I do not find the authors' treatment of ‘effect-measure modifier’ helpful, since it is phrased in terms of quantities I find I cannot meaningfully relate to. Their Pdoomed is very much a feature of the empirically unknowable joint distribution of [R1(u1), R0(u1)], and as such I regard it as pointlessly metaphysical. On purely commonsense grounds, at the individual level, response to either exposure will normally be dependent on a host of further stochastic factors, as well as on exposure, so that I find it difficult to accept that this individual response somehow already existed prior to its realization (an attitude I dubbed ‘fatalism’ in §7 of CIWC). How then can I compare the actual realized response with a counterfactual response under a different exposure, which, even if allowed as a proper subject of discourse, should still be regarded as stochastic? But if there is no predetermined value of this comparison, there can be no such thing as a ‘doomed’ patient—any more than there can be a penny that, when tossed, will land tails up.

It is significant that Pdoomed disappears in the expression for RDcausal, but not in that for RRcausal. This is a reflection of the fact that, in the terminology of §9 of CIWC, RDcausal is a ‘sheep’, having also a perfectly good non-counterfactual interpretation; while RRcausal is a ‘goat’, and simply not an appropriate subject of discourse.

In CIWC I emphasized the importance of the distinction between inference about ‘Effects of Causes', referring to predictions about a new population u* under various hypothetical exposure distribution; and ‘Causes of Effects', referring to a comparison of an observed R1(u1), for a population u1 already subjected to exposure distribution 1, with R0(u1), the purely counterfactual response it would have displayed had it actually been subjected to exposure distribution 0. An application of inference about Causes of Effects might arise in a legal liability suit, in which an ex-soldier sues the army for having caused his leukaemia through exposing him to depleted uranium contained in anti-tank shells. Epidemiological evidence about the expected consequences of such exposure, and about the natural incidence of leukaemia, would clearly be of relevance; but since such evidence can only directly address the question of Effects of Causes, its correct incorporation and analysis in this context raises some very subtle issues—for example, how to allow for the fact that some individuals might be more susceptible than others, irrespective of exposure?

I argued in CIWC that inference about Effects of Causes is reasonably straightforward, and does not require any recourse to counterfactuals; while inference about Causes of Effects is beset by ambiguities that are compounded, rather than being resolved, by being set in a counterfactual framework. Although it is not always clear from the way they are phrased, the problems considered in the paper currently under discussion are largely concerned with the simpler problem of Effects of Causes —where I do not see a counterfactual analysis contributing much beyond unnecessary complication of concepts and notation. Moreover, there is a danger that readers may be misled into thinking that the paper supplies tools for valid analysis of problems involving Causes of Effects. Caveat emptor!

In summary, while I value the authors' emphasis on clarity of definition and their discussion of the problem of bias, I do not consider that they have proved their case that ‘counterfactual analysis can cut through some of the fog in epidemiology’. In my own view, such analysis is more likely to obscure the clarity of the view.

References

1 Maldonado G, Greenland S. Estimating causal effects. Int J Epidemiol 2001;30:1035–42.[Abstract/Free Full Text]

2 Dawid AP. Causal inference without counterfactuals (with Discussion). J Am Statist Assoc 2000;95:407–48.[ISI]