Interval estimation by simulation as an alternative to and extension of confidence intervals

Sander Greenland

Departments of Epidemiology and Statistics, University of California, Los Angeles, Los Angeles, CA 90095–1772, USA. E-mail: lesdomes{at}ucla.edu


    Abstract
 Top
 Abstract
 Confidence intervals for an...
 Some ideas from bootstrapping
 Adding sources of uncertainty...
 Discussion
 Appendix 1
 Appendix 2
 References
 
There are numerous techniques for constructing confidence intervals, most of which are unavailable in standard software. Modern computing power allows one to replace these techniques with relatively simple, general simulation methods. These methods extend easily to incorporate sources of uncertainty beyond random error. The simulation concepts are explained in an example of estimating a population attributable fraction, a problem for which analytical formulas can be quite unwieldy. First, simulation of conventional intervals is illustrated and compared to bootstrapping. The simulation is then extended to include sampling of bias parameters from prior distributions. It is argued that the use of almost any neutral or survey-based prior that allows non-zero values for bias parameters will produce an interval estimate less misleading than a conventional confidence interval. Along with simplicity and generality, the ease with which simulation can incorporate these priors is a key advantage over conventional methods.


Keywords Attributable fraction, attributable risk, Bayesian methods, bias, bootstrapping, confidence intervals, confounding, meta-analysis, Monte Carlo methods, relative risk, risk analysis, simulation

Accepted 2 June 2004

Adequate reporting of epidemiological results requires interval estimates, and to this end there are numerous techniques for constructing confidence intervals. The variety is compounded by having separate techniques for each combination of effect measure (risk, rate, and odds ratios, risk and rate differences, attributable fractions, etc.), exposure and disease (binary, polytomous, etc.), estimation method or weighting (maximum likelihood, Mantel-Haenszel, standardized, etc.), and study design (cohort, case-control, cross-sectional, etc.). While the variety may appear daunting, just a few principles underpin the techniques. Furthermore, modern computing power allows one to replace these techniques with relatively simple, general simulation methods that require only standard software packages. The methods also provide P-values.

Simulation methods (also known as Monte Carlo methods) extend easily to incorporate sources of uncertainty beyond random error, such as hypotheses about biases. To fix ideas I will focus on estimating attributable fractions, which illustrates the simplicity of simulation methods relative to traditional analytical formulas. In the example (which has been used in policy discussions regarding power-system modifications), refinements in accounting for random error make little difference, which should be unsurprising given the large numbers involved. Large numbers can create an illusion of precision, however, because of the small standard errors they produce. Yet uncertainties about inestimable biases (such as that caused by differential subject co-operation) are not reduced by large numbers, and thus become relatively more important as the numbers grow larger. Simulation methods provide a straightforward extension of conventional confidence intervals to intervals that account for proposed distributions for bias sources.


    Confidence intervals for an adjusted attributable fraction
 Top
 Abstract
 Confidence intervals for an...
 Some ideas from bootstrapping
 Adding sources of uncertainty...
 Discussion
 Appendix 1
 Appendix 2
 References
 
The health effects of magnetic fields remain controversial, but there is a consistent association of residential fields above 3 milliGauss (3 mG, equal to 0.3 microtesla) with childhood leukaemia. For example, one pooled analysis produced an age–sex–study adjusted Mantel-Haenszel odds ratio across 11 case-control studies of 1.68, with 95% confidence limits (CL) of 1.23, 2.29, and no pattern of variation in results across location or design of study.1 Nearly the same result obtains when further studies are added.2 Details of the analyses and the studies have been given elsewhere and will not be repeated here, as only the summary statistics will be needed.

The positive summary results have led to demands for remediation by citizens groups and some health officials in the US. Remediation can be costly, however, and it has been asked what degree of incidence reduction could be expected even if the observed associations were causal and all exposure above 3 mG could be removed. One answer to this question is provided by estimating the population attributable fraction for the exposure. This task requires an estimate of the exposure prevalence in the target population. A survey of magnetic fields in 987 US homes produced an estimate of 4.6% (45 homes) with 95% limits of 3.5%, 6.6% for the prevalence of fields above 3 mG in the US (High Voltage Transmission Research Center, 1993). The survey was within utilities serving over half the US population, and similar results would be expected for the remainder of the US and also for Canada (given the similarities among their interconnected power systems). The prevalence information can be combined with a risk ratio estimate to get an attributable fraction estimate for the US + Canadian population (about 300 million people served by 120v 60 Hz power).

Adjusted attributable fractions
It is common practice for authors to take a prevalence point estimate or guess and an adjusted risk ratio estimate and limits and plug them into the Levin's3 unadjusted (crude) attributable fraction formula. To examine this and other procedures, let RRa be the adjusted risk ratio estimate, let P0 be the estimated exposure prevalence in the target population, let O0 be the estimated prevalence odds P0/(1 – P0), and let AFp be the estimated attributable fraction. Levin's formula is then

(1)
which assumes that RRa estimates the effect in the target population and there is no confounding in the target population; that is, it assumes there is no bias, and no change in the ratio effect moving from the study to the target population. Due to leukaemia rarity, we can ignore the distinction among odds, rate, and risk ratios. Hence, in the example, O0 = 45/942 = 0.0478 and RRa = 1.68 with 95% limits of 1.23, 2.29, so formula (1) yields AFp = 0.68/(1.68 + 1/0.478) = 3.0% and naïve 'plug-in' 95% limits of 0.23/(1.23 + 1/0.0478) = 1.0%, 1.29/(2.29 + 1/0.0478) = 5.6%.

Wald and simulated Wald intervals
First consider the problem of incorporating uncertainty about the prevalence odds O0 into the interval estimate when one has exposure survey data from the target, as in the present example. Analytical formulas can incorporate survey data directly into an attributable fraction estimate,4 but the formulas can be complex and are not in packaged programs. Among the alternatives are simulation methods (such as basic bootstrapping) that reduce interval estimation to repeated random sampling and point estimation. If both RRa and O0 are based on large numbers of exposed and unexposed cases and non-cases, one can use a rapid, simple simulation based on the same normal (Gaussian) approximations used to construct ordinary confidence intervals. All one needs are valid point estimate formulas and a normal random number generator (which is available in major software).

In the present example, the estimated sampling distribution of ln(RRa) is approximately normal with estimated mean ln(1.68) = 0.519 and standard deviation SDR = ln(2.29/1.23)/2 (1.96) = 0.159, while that of ln(O0) is approximately normal with estimated mean ln(45/942) = –3.041 and standard deviation SD0 = (1/45 + 1/942)1/2 = 0.153. These estimates were used to construct the confidence limits for the risk ratio and the prevalence odds given above, by the Wald method (taking the estimated mean ±1.96 standard deviations). They can also be used to construct Wald limits for the attributable fraction. First, one estimates an approximate standard deviation SDLp for the complementary log transformation Lp {equiv} ln(1 – AFp):4,5

where T is the survey sample size (T = 987). Here, Lp = ln(1 – 0.0301) = –0.0305 and SDLp = 0.0127, so the Wald AF limits are 1 – exp{–0.0305 ±1.96(0.0127)} = 0.6%, 5.4%.

The standard deviation formula for Lp is unwieldy and becomes even more complex when RRa or O0 are derived from a regression model. Fortunately, we can simulate rather than compute SDLp. All one needs is a normal random number generator and some transformations using natural logs and exponentials. We first generate N versions of the complementary log attributable fraction estimate, with version i (i = 1, ..., N) computed by the following steps:

  1. draw two standard normal numbers zai and z0i and convert them into simulated random errors eai = 0.159zai and e0i = 0.153z0i for ln(RRa) and ln(O0);
  2. transform these errors into 'corrected' estimates RRai = exp(0.519 – eai) and O0i = exp(–3.041 – e0i);
  3. transform RRai and O0i into a simulated estimate AFpi using formula (1);
  4. transform the AFpi to Lpi = ln(1 – AFpi).
Having gone through this cycle N times, we compute the standard deviation SDLs of the Lpi, then transform back into Wald AF limits via 1 – exp(Lp ± 1.96SDLs). N can be made so large that simulation error in SDLs is negligible. Using N = 250 000 in the example, the process runs in about a second on a 2 GHz laptop computer, and yields 0.5%, 5.5% for the 95% limits, very close to the limits based on SDLp. For comparison, the 2.5th and 97.5th percentiles of the simulated AFpi are 1.0% and 5.9%; I shall refer to these figures as the naïve percentiles.


    Some ideas from bootstrapping
 Top
 Abstract
 Confidence intervals for an...
 Some ideas from bootstrapping
 Adding sources of uncertainty...
 Discussion
 Appendix 1
 Appendix 2
 References
 
Step (1) above uses familiar normal approximations to the distributions of ln(RRa) and ln(O0) and the final interval involves another normal approximation for Lp. Various refinements provided by bootstrap theory can avoid these approximations and improve the small sample behaviour of confidence intervals.6,7 This topic is vast; I here sketch only a few points that are not well illustrated by the example (due to the large sample size) but can be relevant in other settings. This section may be skipped without loss of continuity.

Resampling
Resampling involves repeated generation of a new data set instead of normal numbers at step (1) of each simulation cycle. One resampling approach replaces step (1) with

(1a) regenerate each case-control data set by randomly sampling (with replacement) from each set the number of cases and controls observed, and regenerate the survey data set by randomly sampling from the survey data;
(1b) recompute the estimates from these resamplings to get RRai* and O0i*;
(1c) estimate the random errors on the log scale as eai = ln(RRai*) – ln(RRa) and e0i = ln(O0i*) – ln(O0).
Steps (2)–(4) may then remain the same as above. A problem with the simple resampling in step (1a), however, is that data cells that are zero remain so upon resampling. This problem can be avoided by smoothing the observed counts before resampling. Adding 1/2 to each cell is a naive version of smoothing but is unrealistic because it treats all categories as equally probable (in the example, exposed categories are far less probable than unexposed). More accurate smoothing methods average the observed counts with those expected under a reasonable statistical model, such as the counts expected under the common odds ratio model.8,9 Another approach is to regenerate the data at each cycle from the fitted models used for the original point estimates (parametric bootstrapping), rather than resampling the observed data.

A disadvantage of regenerating the data is that if RRa or Oa is derived from a fitted model, one must refit the entire model to each data set, which can increase the computing time dramatically. Furthermore, convergence problems are likely to arise in some resamples, although these can be avoided by smoothing both the original and the resampled data. An advantage of resampling, however, is that it allows one to simulate the entire modelling process. For example, conventional methods applied after variable selection algorithms (such as stepwise regression) produce overly small P-values and overly narrow confidence intervals because they do not account for the selection;10 by applying the selection algorithm to each resample, one can mitigate this problem.

Percentile methods
By using percentiles of the simulated distribution one can avoid the normal approximation to Lp and the need to compute a standard deviation for it. Using the naïve percentiles of 1.0% and 5.9% given above as confidence limits would be an example. When coupled with resampling, however, care must be taken to use percentiles based on correction of estimates for random error, as in step (2) above. Unfortunately, one often sees uncorrected attributable fraction estimates computed directly from the resampled data, and percentiles of these estimates used as confidence limits (e.g. Kooperberg and Pettiti).11 In the present example this use would be equivalent to computing the AFpi directly from RRai* and O0i* in step (1b). Such direct use of simulated estimates is conceptually backward: The uncorrected simulated estimates ln(RRai*) and ln(O0i*) equal ln(RRa) + eai and ln(O0) + e0i ; hence, they incorporate the small sample bias and skewness of the original estimators ln(RRa) and ln(O0), plus they add in similar layers of bias and skewness in eai and e0i. To avoid this problem one may flip (pivot) the simulated errors around the original estimates by subtracting those errors from ln(RRa) and ln(O0), as in step (2) above, so that the bias and skewness in the simulated errors eai and e0i will tend to cancel the errors in the original estimators.7,12 This step will be essential if the random errors have nonzero mean or are asymmetric (skewed), as is often the case for epidemiological estimates.13

Bootstrap intervals can be refined in other ways that are especially valuable if strategies such as model selection are simulated.6,7 Nonetheless, these refinements require recomputation of variance estimates at each cycle as well as other complications, and so remove the simplicity advantage of simulation. Furthermore, if the numbers involved are so small that these refinements make a difference comparable to other sources of uncertainty, the total uncertainty is likely to be so large that little should be inferred from the data.


    Adding sources of uncertainty beyond random error
 Top
 Abstract
 Confidence intervals for an...
 Some ideas from bootstrapping
 Adding sources of uncertainty...
 Discussion
 Appendix 1
 Appendix 2
 References
 
Sophisticated epidemiologists understand that confidence limits reflect only uncertainty due to random error, and thus may reflect only a fraction of the actual uncertainty one should have about effects. In formula (1), for example, RRa and O0 refer to the target population (US + Canada), not to the study populations (which are smaller populations from around the world). Nonetheless, like all textbook formulas, formula (1) assumes RRa applies to the target as well as to the study population, and that O0 is known or can be estimated directly from the study data that produced RRa.(ref. 10, ch. 16) Plugging the RRa limits into the above formulas neglects uncertainties about the target population relative to the study populations. This results in 95% intervals that are too narrow to contain the target parameter with 95% probability (i.e. invalid intervals).

Another source of uncertainty is uncontrolled bias. When bias is present, conventional confidence intervals and significance tests will not be valid. Because some degree of bias can be expected in all epidemiological studies, adherence to conventional intervals and tests seems a dubious if not deceptive practice, especially in very large studies or in meta-analyses (which tend to produce small standard errors).

Recently, techniques from the policy literature14–16 have been brought into epidemiology in order to construct intervals that account for uncertainty sources beyond random error, such as confounding, selection bias, and misclassification.2,17–21 These techniques (sometimes called Monte Carlo risk analysis or Monte Carlo sensitivity analysis) produce intervals that tend to be wider and may be shifted relative to ordinary interval estimates, and as a result may be expected to have a higher probability of containing the true effect. The techniques provide a natural bridge from conventional (random error only) analyses to advanced uncertainty analyses based on Bayesian techniques.2,20,22–26 They will be illustrated here for what is thought a minor problem in the example, uncontrolled confounding, and then for what is thought a major problem, response bias.

The techniques can be applied to estimate any epidemiological measure (Appendix 1), and have been illustrated in various forms for estimation of relative risks.2,18–21 The case of the attributable fraction is more complex in two respects: first, it involves combining two epidemiological estimates, one for the risk ratio and one for the exposure prevalence; second, it involves (or should involve) explicit projection from study populations to a target population.

Adding uncertainty about age–sex confounding in the target
Let RRt be the unconfounded risk ratio representing the true effect of high fields, which for now I will assume is the same in the target and all the study populations, and let A be the set of factors used for adjustment in RRa; A comprises age and sex in the example. Even if RRa validly estimates RRt (i.e. no bias in RRa from the original studies), formula (1) remains biased if confounding by A is present in the target population.10,27 To deal with this problem, let P1 be the estimated exposure prevalence among the cases in the target population, let RRu be the unadjusted (crude) risk ratio in the target population, and let Cu be the relative impact that adjustment for A would have on the crude target population risk ratio. If Cu is validly estimated by RRa/RRu, two formulas that properly adjust AFp for A are

(2)

(3)
(Appendix 2). Formula (1) is just the special case of these formulas when there is no confounding by A in the target (so that Cu = 1); if this condition is implausible we should prefer formulas (2) and (3).

The vast majority of adjusted attributable fraction methods use formula (2) or generalizations, with P1 taken from the study data.(e.g. refs 11,28–30) Such usage assumes the source and target populations have identical case exposure prevalences. This assumption is usually not justified. In the example, we cannot validly estimate P1 from the study data because the case series were international, and even the US/Canadian cases were derived from special populations restricted in ways that may be related to exposure prevalence. Nor can we validly estimate Cu by comparing the unadjusted and adjusted risk ratio estimates in the study data, because most of those data were age-sex matched, and matching alters the crude odds ratio (i.e. we could not validly estimate RRu from the matched studies, even if the target and study-source populations were identical).

The intervals based on formula (1) tend to be too narrow because they do not account for our uncertainty about P1 or Cu. One might approach the problem by a sensitivity analysis,10 in which for example the factor Cu in formula (3) is varied over plausible values. This approach can be unwieldy, however, because each choice for the factor generates a new estimate and limits. It can also be misleading, insofar as the display effectively gives equal weight to each value of Cu, even though the values vary greatly in plausibility.20,31

Another way to deal with the problem is to attempt to summarize vague background information in a prior distribution for Cu. In the present example, age and sex are thought unlikely to be confounders because they appear to be unassociated with residential magnetic fields (most of the children are below school age, and the sex distribution of children is uniformly about 1/2 male).1 Thus, suppose we summarize our opinion about age and sex confounding with a prior for ln(Cu) that is normal with mean zero and standard deviation ln(1.05) = 0.0488, corresponding to 95% prior certainty that Cu is between 0.91 and 1.10 (roughly 20:1 odds that age–sex confounding in the target is under 10%). We may incorporate this prior into our interval estimate as follows: at step (1), draw a third random normal (0,1) number zui; at step (2) compute Cui = exp(0.0488zui); at step (3) use Cui along with RRai and O0i in formula (3) to compute AFpi, rather than formula (1). With these modifications, the simulated percentiles remain 1.0%, 5.9%. Thus, use of the given prior in formula (3), rather than formula (1), leads to a miniscule change in the simulated percentiles.

Recognizing that the simulation percentiles are in part based on sampling from an informative prior distribution for an unknown parameter, Cu, they have been called Monte Carlo uncertainty limits. They can be viewed as approximate Bayesian posterior probability limits under the specified prior for Cu, given independent and very diffuse priors for RRa and O0.2,20,26 Given these priors, other features of the simulated distribution of the AFpi also have approximate Bayesian interpretations. For example, the percentage of AFpi below zero (which is only 0.05% in the above simulation) approximates the posterior probability that the net exposure effect is beneficial. I will discuss this shift in interpretation below.

Adding uncertainty about uncontrolled confounding in the studies
The preceding extension addresses only the confounding by the study adjustment factors A (age and sex) left by using formula (1) rather than the more general formulas (2) or (3). Because our prior stated that age-sex confounding in the target population was probably small, accounting for it made negligible difference in the attributable fraction. Nonetheless, because RRa is adjusted for age and sex only, there is potential for confounding by other, possibly unknown factors.

Let Cr be the residual confounding that would be left after age–sex adjustment of the target population risk ratio. If the latter is validly estimated by the age–sex adjusted RRa from the studies, Cr would be validly estimated by RRa/RRt, but of course RRt is unknown. Because RRa/Cr estimates RRt, however, if we were given Cr we could use

(4)
(Appendix 2). The major problem is that Cr is also unknown. Nonetheless, there are studies of the relation of fields to potential confounders, (e.g. ref 32) and formulas that give Cr as a function of these relations and possible confounder effects. These formulas have been used to do sensitivity analyses,33 although again such analyses can be misleading.20,31

One may instead develop a distribution for Cr based on the same studies and formulas used to construct the sensitivity analysis, then at each resampling draw Cr from that distribution and use it in computing the resampled estimate AFpi. Assume the following simplifications hold:

  1. The residual confounding Cr could be removed by stratification on some unknown binary confounder summary B, so that adjustment by A (age and sex, the factors used for adjustment) and B would yield a valid estimate of RRt. In other words, the A–B adjusted estimate RRab would validly estimate RRt and hence Cr could be estimated by RRa/RRab.
  2. Adjustment by B alone would in expectation change the odds ratio by the same amount as adjustment for B after adjustment for A (order of adjustment does not matter).
These are plausible approximations for the present example.20 Let ORdb be the leukaemia-B odds ratio given exposure, let OReb be the exposure-B odds ratio given leukaemia, and let Ob be the odds of B = 1 among unexposed non-cases. Then34

For common outcomes one may use an analogous risk ratio formula.35

From the Bracken et al.32 data I proposed a prior for ln(OReb) that was normal with mean zero and standard deviation ln(6)/1.645 = 1.089, which places 90% probability for OReb between 1/6 and 6.20 Based on current confounding hypotheses36 I also used the same prior for ln(ORdb). Finally, I used a normal prior for ln(Ob) with mean zero and standard deviation 2, which produces a roughly uniform prior for the prevalence of B = 1. The simulation is then modified as follows: At step (1), draw three more random normal (0,1) numbers zdbi, zebi, zbi; at step (2) compute ORdbi = exp(1.089zdbi), ORebi = exp (1.089zebi), Obi = exp(2zbi), then compute Cri from these parameter values; and at step (3) use formula (4) to get AFpi.

Given the above assumptions, Cu could still be estimated by RRa/RRu, although uncertainties about possible divergence in the degree of age–sex confounding between the target and study suggest that the Cu prior should be widened. There is no reason to expect this divergence to be large, however, since most of the exposed study subjects contributing to RRa are from population-based US and Canadian studies. Thus I will only modestly increase the prior standard deviation for ln(Cu), to ln(1.10) = 0.0953, which assigns 95% prior probability to Cu being between 0.83 and 1.21 (roughly 20:1 odds that RRu and RRa are within 20% of each other). This leads to modifying step (2) by computing Cui = exp(0.0953zui).

Adding in Cr and expanding the prior for Cu as just described, the simulation yields 2.5th and 97.5th percentiles of 0.4%, 6.3%, with 1.7% of the AFpi below zero; nearly all of the change from the earlier results is due to adding Cr to the simulation. For illustration, Cu and Cr were handled differently: Cu was given a direct normal prior, whereas Cr was decomposed into constituent bias parameters ORdb, OReb, and Ob which were assigned priors, and the Cri were computed using parameter values sampled from those priors. One could develop priors for constituents of Cu and then compute the Cui from parameter samples, as was done for the Cri. Conversely, if one had direct information on the size of Cr, one could develop a direct prior for Cr and sample from that, instead of computing Cri from constituent parameters. The choice depends on what information is available to assign priors; here, using constituents of Cr allowed a prior for OReb based on survey data.32,37 One can also combine direct and constituent information, although that involves more complicated formulas than presented here. Finally, the bias parameters in Cr may be allowed to vary across studies or strata.2,20

Refusal bias
Refusal (response) bias has been a major issue in studies that require subject co-operation and effort (which is most of the studies). Suppose that, in addition to residual confounding Cr, the estimate of RRa is biased by factor Sr due to differential refusal (refusal rates varying by exposure and disease), so that If Sr were known, we would divide it into RRa to remove the refusal bias. For formula (4) this yields

(5)
Sr is unknown; using data from the largest study, however, Hatch et al.38 examined variations in estimates based on inclusion or exclusion of partially co-operative subjects and found some evidence of upward bias. Based on these results and others39 I proposed a prior distribution for ln(Sr) among studies requiring co-operation, with mean zero and standard deviation ln(1.5)/ 1.645 = 0.246, corresponding to 90% prior probability that Sr is between 2/3 and 1.5.20 For simplicity, I will apply this prior to the summary estimate rather than only to the studies requiring co-operation. Because the latter studies contribute about 90% of exposed cases, this simplification makes little difference. Sr can however be allowed to vary across studies or strata.20

To incorporate the ln(Sr) prior into the simulation, at step (1) draw another standard normal number zsi, at step (2) compute Sri = exp(0.246zsi), and at steps (3) and (4) plug this into formula (5) along with the other generated quantities, to get AFpi. The 2.5th and 97.5th simulation percentiles that result from adding this bias factor to all the other modifications are –0.7% and 8.8%, with 6.0% of the AFpi below zero (implying that values near or below the null are no longer very improbable). For comparison, Table 1 summarizes the simulations given thus far. The initial refinement of moving from formula (1) to (3) had no impact, but the subsequent accountings for uncontrolled confounding and for selection bias expanded the simulation distribution, reflecting the uncertainty about these potential biases implied by the priors.


View this table:
[in this window]
[in a new window]
 
Table 1 Estimating the population attributable fraction (AFp) for the effect of magnetic-field exposure above 3 mG on childhood leukaemia in the US and Canada, under various bias priors: Summary of results from 250 000 simulation trials (AFpi for i = 1 to 250 000; see text for details)

 
Further extensions
One can include factors for other sources of uncertainty. For example, the factor Sr could be expanded to include other forms of selection bias beyond refusal bias. Instead of assuming the survey and target population prevalences are the same, one could introduce a factor C0 for the ratio of the prevalence odds O0s in the surveyed population to the prevalence odds O0t in the target population, so that O0t = O0s/C0. Refusal bias or other selection bias in the survey could be included as a factor S0 that multiplies survey prevalence odds O0s to yield the observed prevalence odds O0; S0 is the ratio of the net selection rates (after sampling and refusals) for the exposed and unexposed in the survey, and the selection-corrected survey odds is O0/S0 = O0s. Including both bias sources would result in replacing O0 in formula (5) with O0t = O0/C0S0. Possible modification of the true (causal) risk ratio between the study and target population could be represented by a factor Cm for the ratio of the true study and target risk ratios, which would result in replacing RRa in formula (5) with RRa/Cm. With all three factors formula (5) becomes

(6)
To use formula (6) one could develop prior distributions for each of the new factors and sample from them in the simulation, as done for the other factors. I will not illustrate these steps because none of the additional factors is thought important in the present example (e.g. measurements were obtained on 99% of the homes sampled for the survey, so refusal bias cannot be large in the survey).

Misclassification (and more generally, measurement error) is often the largest source of uncertainty. Unfortunately, because it depends on disease-specific and study-specific prevalence, the general form of misclassification bias cannot be represented as a simple multiplier of the original estimates and the other bias factors.10,25 For a more thorough analysis of the risk ratio estimate from the above data, which includes misclassification bias, further studies, and statistical theory for the methods, see Greenland.2 The simplified analysis presented here captures the fact that one should not be too sure intervention on magnetic fields would yield any benefit, and that any benefit is likely to be small in population terms. Appendix 1 presents some statistical theory for simulation estimation of epidemiological measures from general risk regression and survey models.

Estimating exposure prevalence in the absence of target population data
In the example, survey data are available for conventional estimation of O0 (target population prevalence odds). If instead we had survey estimate of P1 (exposure prevalence in target population cases) it could be used in a simulation which at step (1) generates a normal random number z1i, at step (2) computes P1i from the estimate of logit(P1) plus its standard deviation times z1i, and at step (3) plugs P1i along with RRai into formula (2) for AFp. Unfortunately, there is rarely a target survey estimate of P1, and there is usually no survey estimate of O0. Often however there is some background information on which to base a prior distribution for O0, and this distribution can be used in place of the estimated sampling distribution when generating O0i in bootstrap step (2).

With no survey data, it is tempting to use the cases to estimate P1, or else use the controls (or, in a cohort study, the study cohort) to estimate O0. Both approaches can suffer bias because the source population for a study is rarely representative of the intended target population, and may be an entirely separate population. Use of study data to estimate exposure prevalence would thus best be accompanied by inclusion of a factor for non-representativeness, like C0 above.

A more technical problem arises from the use the same subjects to estimate the exposure prevalence and the risk ratio. Computation of O0 and RRa from the same controls creates a negative correlation of O0 and RRa, which if not accounted for in resampling the estimates will lead to intervals that are too wide. On the other hand, computation of P1 and RRa from the same cases creates a positive correlation of P1 and RRa, which if not accounted for in resampling the estimates will lead to intervals that are too narrow. These correlations contribute to the complexity of valid analytic formulas for AFp confidence intervals. Nonetheless, resampling the data and recomputing the estimates at each resampling (rather than resampling the estimates) will automatically account for these correlations.

Use of controls to estimate O0 can suffer another bias when (as in the example) the studies are matched. To the extent that the matching factors are related to the exposure, matching shifts the control exposure prevalence from that in the source population toward that in the cases. If exposure is causal, the case prevalence will be higher than the population prevalence, and so the matching will make the exposure look more prevalent than it is (i.e. it will inflate the O0 estimate) and hence upwardly bias the AFp estimate from formulas (1) or (3). If exposure is preventive, the case prevalence will be higher than the population prevalence, and so matching will make the exposure look less prevalent than it is (i.e., it will deflate the O0 estimate) and hence downwardly bias the AFp estimate from formulas (1) or (3). Given the distribution of the matching factors in the target (e.g. the age–sex distribution), one can reduce or avoid these biases by standardizing the estimated prevalence proportion P0 to this distribution and the using the result to compute O0 = P0/(1 – P0).


    Discussion
 Top
 Abstract
 Confidence intervals for an...
 Some ideas from bootstrapping
 Adding sources of uncertainty...
 Discussion
 Appendix 1
 Appendix 2
 References
 
Are conventional statistics anything other than misleading?
One conclusion from the above analyses above is that, under the given prior distributions for the sources of bias (bias parameters), we should be much less than 95% certain that the true attributable fraction falls between the original confidence limits of 1.0%, 5.6%. In the final simulation above, 16% of estimates fall below 1.0% and 16% fall above 5.6%, leaving a mere 68% within the original limits; on the ln(RR) scale, the original 95% interval is only half the width of the final simulation interval.

Formally, an interval estimation method provides a valid 95% confidence interval for a causal effect if the true effect is within at least 95% of the intervals computed by the method when exposure is randomized and the data are randomly sampled from the target (possibly within strata of controlled factors). This concept, like all frequentist concepts, involves an enormous set of counterfactuals—it is based on what would happen under each possible exposure allocation and sampling outcome, and the probability of each allocation and sampling combination. Only one of these allocation-sampling combinations gets observed; its relation to the unobserved combinations is then evaluated using the probabilities. A study design with randomized allocation and sampling provides known probabilities for evaluating the place of the single observed combination among all possibilities. In an observational study, however, the probabilities of the combinations are unknown. Because the probabilities depend on the actual (and highly non-random) allocation and sampling mechanisms, a credible statistical evaluation of the observed results must take account of major influences on those mechanisms.

Without successful randomization and random sampling, the grounds for validity of conventional estimation and testing methods are lost, and we have only speculative ideas about the distribution of all allocation-sampling combinations. It is indefensible that conventional methods get treated as 'objective' even though there is no objective way to determine what confidence or certainty we should place in them. Realistic determinations require consideration of possible bias sources within studies, as well as publication biases and differences between study and target populations. Methodological problems will substantially reduce below 95% the coverage of conventional intervals, to an unknown degree; our confidence in those intervals should be reduced accordingly. Experts will disagree, however, on just how much that reduction should be.

Another interpretation of the conventional intervals is that they give some idea of the sensitivity of estimates to small shifts of numbers in the data. While this may be a defensible heuristic, it is not foolproof: Anyone familiar with influence analysis and leverage artefacts will recognize that this interpretation can mislead when the data are sparse in crucial sectors, even if the total sample size is large. Inferential statistics are no substitute for direct inspection of the basic data. Introduction of explicit priors does further remove statistics from contact with the data at hand, but if those priors are well informed they will improve contact with other valid observations and so bring intervals closer to the parameter of interest. The latter should be a primary goal of estimation. Contact with the data is best maintained by presenting transparent descriptive statistics—tables, plots, etc.—rather than a pointless clutter of P-values for irrelevant comparisons (e.g. comparisons of age distributions across exposure or disease levels).

Simulation methods can be used to generate conventional confidence intervals and P-values, as well intervals that go beyond these statistics. To avoid the labour of moving into this beyond, it is sometimes asserted that conventional intervals, unsullied by a prior distribution, still represent the potential random error in the results. Unfortunately, this assertion is almost never supported by any evidence that the probability distributions used to derive the intervals (hypergeometric, binomial, Poisson, etc.) correctly represent the random errors. In the absence of randomization and purely random sampling, those distributions are hypothetical, much like the priors introduced above.40,41 In fact, from a Bayesian perspective they are priors: specifically, they are prior distributions for how the data will appear, given the parameters.42 Furthermore, conventional intervals approximate posterior probability intervals under a prior in which all the parameters that govern the bias—ln(Cu), ln(Cr), ln(Sr), etc.—are set to zero or result in biases that cancel each other perfectly. In other words, conventional confidence limits correspond to posterior (Bayesian) limits under the absurd fantasy that the data are from a perfect randomized trial conducted on a random sample of the target population, and that any deviations from this ideal are inconsequential. Thus, almost any neutral or survey-based prior that allows non-zero as well as zero values for bias parameters will produce an interval less misleading than a conventional confidence interval.

Nonetheless, it may be fair to treat a conventional confidence interval as a summary of the uncertainty allowed for random error—provided one is clear that this uncertainty refers not to the effect of interest, but to uncertainty about the biased association that the conventional interval is estimating, under a hypothetical random-error model. When wide it will also caution savvy users against reaching a conclusion (other than 'more research is needed').43 Of course, foolish users will take wide intervals as evidence favouring the null hypothesis if those intervals include the null. Although this kind of error is common, an interval estimate will have even more potential to mislead when it appears narrow or decisive. When the latter occurs, the interval reflects in part unsupported assumptions about the form of random error and the inconsequential size of bias. These problems can be addressed by examining the entire distribution produced by a simulation, not just a pair of limits (since the choice of 95% limits is nothing more than a social convention), and by asking what other sources of uncertainty and information ought to be incorporated into the distribution before taking it seriously.


KEY MESSAGES

  • Interval estimation is an essential part of epidemiological analysis and presentation.
  • Interval estimation for compound effect measures (such as attributable fractions) has been hampered by the variety and complexity of variance formulas required for different study designs and regression models.
  • All these variance formulas can be replaced by basic simulation methods, such as bootstrapping.
  • Simulation methods have the advantage of being easily extended to account for information or models about bias sources. In many circumstances they also can supply approximate posterior probabilities and posterior probability intervals.
  • Given their advantages, simulation methods should be introduced into basic epidemiological analysis training.

 


    Appendix 1
 Top
 Abstract
 Confidence intervals for an...
 Some ideas from bootstrapping
 Adding sources of uncertainty...
 Discussion
 Appendix 1
 Appendix 2
 References
 
Theory for simulation estimation
The simulation technique extends to any measure of association or effect, including general effect measures based on regressions that have complex covariance structure.2,20 Suppose we have a target-population structure parameterized by the vector {alpha}, and are interested in the measure {theta} = {theta}({alpha}). We observe a data set y with distribution p(y | {alpha}, {eta}), where {eta} is a vector of observation-process parameters (e.g. selection, coarsening, and classification parameters). Ordinarily, {alpha} and hence {theta} is not identified by y without further restrictions on {eta}. Ordinary confidence limits are obtained from a null setting for h (e.g. {eta} = 0) that corresponds to random selection and exposure assignment within levels of controlled covariates, with no coarsening or misclassification. Given a prior p({alpha},{eta}) a Bayesian could sample {alpha},{eta} from the posterior density p(y | {alpha}, {eta})p({alpha}, {eta})/p(y) and base inferences on percentiles of the resulting {theta}({alpha}).23 Suppose however {eta}t is the true value of {eta}, and we have an estimator t(y; {eta}t) of {theta} that is consistent asymptotically normal efficient (or nearly so), such as that derived from maximizing p(y | {alpha},{eta}t) over {alpha}. The simulation approach repeatedly samples {eta} from p({eta}), computes t(y; {eta}), and presents percentiles of the latter. Given p({alpha}, {eta}) < p({alpha})p({eta}) with p({alpha}) diffuse, these percentiles approximate the corresponding posterior percentiles for {theta}.2 In the final example simulation, y comprises the data from the 11 case-control studies and the survey, {alpha} = (RRt,O0), {eta} = ln(Cu,ORdb,OReb,Ob,Sr), and {theta} is given by formula (5). The full likelihood thus contains 12 distinct factors (one for each study) which share components of {alpha} and h. More generally, the factors may have distinct parameters related through hierarchical covariates.2,20


    Appendix 2
 Top
 Abstract
 Confidence intervals for an...
 Some ideas from bootstrapping
 Adding sources of uncertainty...
 Discussion
 Appendix 1
 Appendix 2
 References
 
Derivations of formulas (3)–(6)
Let R1u and R0u be the unadjusted average risks (incidence proportions) in the target population at exposure level j, and let R0s be the average risk of the exposed if the exposure effect were absent. Then RRt = R1u/R0s, RRu = R1u/R0u, R0u/R0s = RRt/RRu, and

(A1)
Let Cu be the ratio change in RRu that would be produced by adjustment for A in the target. When there is no further confounding in the target and RRt = RRa, we have Cu = RRt/RRu = RRa/RRu and formula (3) follows from (A1). When there is further (residual) confounding and RRt = RRa/Cr = CuRRu/Cr, we have Cu = CrRRt/RRu and formula (4) follows from (A1). Formula (5) follows when RRt = RRa/CrSr = CuRRu/Cr, and formula (6) follows when RRt = RRa/CmCrSr = CuRRu/Cr and O0 = O0s/C0S0.


    Acknowledgments
 
I would like to thank Katherine Hoggatt and the referees for helpful comments.


    References
 Top
 Abstract
 Confidence intervals for an...
 Some ideas from bootstrapping
 Adding sources of uncertainty...
 Discussion
 Appendix 1
 Appendix 2
 References
 
1 Greenland S., Sheppard AR, Kaune WT, Poole C, Kelsh MA. A pooled analysis of magnetic fields, wire codes, and childhood leukemia. Epidemiology 2000;11:624–63.[CrossRef][ISI][Medline]

2 Greenland S. Multiple-bias modeling for observational studies (with discussion). J R Statist Soc Ser A 2005 (in press).

3 Levin ML. The causes of lung cancer in man. Acta Unio Int Contra Cancrum 1953;9:531–41.

4 Greenland S. Estimating population attributable fractions from fitted incidence ratios and exposure survey data, with an application to electromagnetic fields and childhood leukemia. Biometrics 2001;57:182–88.[CrossRef][ISI][Medline]

5 Holt JD, Darlington GA. A note on the estimation of population attributable fractions. Biometrics 2004;60: (in press).

6 Efron B, Tibshirani R. An Introduction to the Bootstrap. New York: Chapman and Hall, 1993.

7 Carpenter J, Bithell J. Bootstrap confidence intervals: when, which, and what? Stat Med 2000;19:1141–64.[CrossRef][ISI][Medline]

8 Bishop YMM, Fienberg SE, Holland PW. Discrete Multivariate Analysis: Theory and Practice. Cambridge, MA: MIT Press, 1975.

9 Greenland S. Smoothing epidemiologic data. In: Armitage P, Colton T (eds). Encyclopedia of Biostatistics, 2nd Edn. New York, Wiley, 2004.

10 Rothman KJ, Greenland S. Modern Epidemiology. 2nd Edn. Philadelphia: Lippincott, 1998.

11 Kooperberg C, Pettiti DB. Using logistic regression to estimate the adjusted attributable risk of low birthweight in an unmatched case-control study. Epidemiology 1991;2:363–66.[ISI][Medline]

12 Hall P, Wilson SR. Response to Becher. Biometrics 1995;49:1271–72.[ISI]

13 Greenland S, Schwartzbaum JA, Finkle WD. Problems from small samples and sparse data in conditional logistic regression analysis. Am J Epidemiol 2000;151:531–39.[Abstract]

14 Morgan MG, Henrion M. Uncertainty. New York: Cambridge, 1990.

15 Eddy DM, Hasselblad V, Schachter R. Meta-Analysis by the Confidence Profile Method. New York: Academic Press, 1992.

16 Vose D. Risk Analysis. New York: Wiley, 2000.

17 Lash TL, Silliman RA. A sensitivity analysis to separate bias due to confounding from bias due to predicting misclassification by a variable that does both. Epidemiology 2000;11:544–49.[CrossRef][ISI][Medline]

18 Lash TL, Fink AK. Semi-automated sensitivity analysis to assess systematic errors in observational epidemiologic data. Epidemiology 2003;14:451–58.[CrossRef][ISI][Medline]

19 Phillips CV. Quantifying and reporting uncertainty from systematic errors. Epidemiology 2003;14:459–66.[CrossRef][ISI][Medline]

20 Greenland S. The impact of prior distributions for uncontrolled confounding and response bias. J Am Stat Assoc 2003;98:47–54.[CrossRef][ISI]

21 Steenland K, Greenland S. Monte-Carlo sensitivity analysis and Bayesian analysis of smoking as an unmeasured confounder in a study of silica and lung cancer. Am J Epidemiol 2004;160: (in press).

22 Leamer EE. False models and post-data model construction. J Am Stat Assoc 1974;69:122–31.[ISI]

23 Gelman A, Carlin JB, Stern HS, Rubin DB. Bayesian Data Analysis, 2nd Edn. New York: Chapman and Hall/CRC, 2003.

24 Graham P. Bayesian inference for a generalized population attributable fraction. Stat Med 2000;19:937–56.[CrossRef][ISI][Medline]

25 Gustafson P. Measurement Error and Misclassification in Statistics and Epidemiology. New York: Chapman and Hall, 2003.

26 Greenland S. Sensitivity analysis, Monte-Carlo risk analysis, and Bayesian uncertainty assessment. Risk Analysis 2001;21:579–83.[CrossRef][ISI][Medline]

27 Greenland S. Bias in methods for deriving standardized morbidity ratios and attributable fraction estimates. Stat Med 1984;3:131–41.[ISI][Medline]

28 Deubner DC, Wilkinson WE, Helms MJ. Logistic model estimation of deaths attributable to risk factors for cardiovascular disease in Evan County, Georgia. Am J Epidemiol 1980;112:135–43.[Abstract]

29 Greenland S. Variance estimators for attributable fraction estimates consistent in both large-strata and sparse data. Stat Med 1987;6:701–08.[ISI][Medline]

30 Greenland S, Drescher K. Maximum-likelihood estimation of attributable fractions from logistic models. Biometrics 1993;49:865–72.[ISI][Medline]

31 Greenland S. The sensitivity of a sensitivity analysis. In: 1997 Proceedings of the Biometrics Section. Alexandria, VA: American Statistical Association 1998, pp. 19–21.

32 Bracken MB, Belanger K, Hellebrand K et al. Correlates of residential wiring configurations. Am J Epidemiol 1998;148:467–74.[Abstract]

33 Langholz B. Factors that explain the power line configuration wiring code-childhood leukemia association: what would they look like (with discussion). Bioelectromagnetics Suppl 2001;5:S19–S31.

34 Yanagawa T. Case-control studies: assessing the effect of a confounding factor. Biometrika 1984;71:191–94.[ISI]

35 Flanders WD, Khoury MJ. Indirect assessment of confounding. Epidemiology 1990;1:199–246.

36 Kavet R, Zaffanella LE. Contact voltage measured in residences: implications for the association between magnetic fields and childhood leukemia. Bioelectromagnetics 2002;23:464–74.[CrossRef][ISI][Medline]

37 Brain JD, Kavet R, McCormick DL et al. Childhood leukemia: Electric and magnetic fields as possible risk factors. Environ Health Perspect 2003;111:962–70.[ISI][Medline]

38 Hatch EE, Kleinerman RA, Linet MS et al. Do confounding or selection factors of residential wire coes and magnetic fields distort findings of electromagnetic fields studies? Epidemiology 2000;11:189–98.[CrossRef][ISI][Medline]

39 EPRI. Selection Bias in Epidemiologic Studies of EMF and Childhood Leukemia. EPRI Report 1008149. Geneva: World Health Organization; 2003.

40 Greenland S. Randomization, statistics, and causal inference. Epidemiology 1990;1:421–29.[Medline]

41 Berk RA. Regression Analysis: A Constructive Critique. Thousand Oaks, CA: Sage Publications, 2004.

42 Greenland S. Probability logic and probabilistic induction. Epidemiology 1998;9:322–32.[CrossRef][ISI][Medline]

43 Poole C. Low P-values or narrow confidence intervals: which are more durable? Epidemiology 2001;12:291–94.[CrossRef][ISI][Medline]