From the Division of Epidemiology, Norwegian Institute of Public Health, Oslo, Norway.
Received for publication December 9, 2001; accepted for publication January 8, 2003.
![]() |
ABSTRACT |
---|
additivity; case-control studies; epidemiologic methods; interaction
Abbreviations: Abbreviations: AP, attributable proportion due to interaction; RERI, relative excess risk due to interaction; S, synergy index.
![]() |
INTRODUCTION |
---|
In cohort studies, the desired interaction assessment can easily be accomplished by fitting linear rate or risk models. However, the parameters of linear models cannot be validly estimated for case-control studies unless the sampling fractions for cases and controls are known or can be estimated. On the other hand, it is well known that odds ratios can be estimated in case-control studies. Furthermore, relative risks are often well approximated by odds ratios in case-control studies.
On the basis of these observations, Rothman (1, 2) suggested a synergy index (S) which can be used in case-control studies to measure interaction as departure from additive risks. Moreover, Rothman considered statistical inference for the index, deriving confidence intervals using the delta method. Rothman presented several additional measures of interaction (3), including the relative excess risk due to interaction (RERI), renamed the ICR by Rothman and Greenland (6), and the attributable proportion due to interaction (AP), which is the focus in Rothmans latest book (7). Rothman furthermore pointed out (3, p. 324) that estimates of RERI, AP, and S are easily obtained from logistic regression analysis, as are Wald tests and confidence intervals (9). Alternatively, a likelihood ratio test of additive risks could be performed in the logistic regression model. Although this test would be expected to have better properties than the Wald test, it would be much harder to implement.
Discussion of the measures advocated by Rothman is typically confined to the somewhat unrealistic situation in which there are two exposures but no additional covariates to control for confounding. An exception is Flanders and Rothman (10), who suggested a likelihood approach to estimating S from stratified case-control data. As Rothman acknowledged (3), their approach only handles one or possibly two additional covariates, because otherwise data in each stratum become too sparse. Hence, Rothman suggests invoking "multivariate methods" in estimating RERI, AP, and S when there are additional covariates. Specifically, Rothman states, "Confounding factors can be controlled by including terms for those factors in the multiple logistic model" (3, p. 324). This suggestion has been adhered to by epidemiologists (for instance, see Olsen et al. (11)).
There has been a paucity of studies investigating the performance of RERI, AP, and S. The only paper I am aware of is that of Assmann et al. (12), where the investigation was limited to coverage of confidence intervals for RERI and AP in models without additional covariates. The primary concern in this article is the extent to which RERI, AP, and S are useful summary measures of interaction as departure from additive risks. In addition to the conventional approach based on logistic regression, I also suggest an alternative approach based on linear odds models. Attention is focused on the more realistic setting in which there are additional covariates. However, the concepts are best introduced in a setting with two exposures and no additional covariates.
![]() |
MODELS FOR TWO EXPOSURES |
---|
Let Rjk P(Y = 1|xl, x2) be the conditional risk or probability that the outcome variable Y takes the value 1 given the values of the exposures. For all j and k, define risk differences as RDjk
Rjk R00, relative risks as RRjk
Rjk/R00, odds as Ojk
Rjk/(l Rjk), and odds ratios as ORjk
Ojk/O00.
The linear risk model
A linear risk model is now specified as
Rjk = a + b1x1 + b2x2 +b3x1x2,
where it is assumed that a > 0, b1 > 0, and b2 > 0. It follows that a = R00, b1 = R10 R00 = RD10, and b2 = R0l R00 = RD0l. Hence, a is interpreted as the risk when there is no exposure, b1 as the excess risk under exposure x1 (compared with no exposure whatsoever), and b2 as the excess risk under exposure x2. The parameter b3 can be expressed as
b3 = RD11 RD10 RD01 = R11 R10 R01 + R00,
representing the excess risk due to interaction of the exposures. If b3 = 0, RD11 = RD01 + RD10, which is risk-difference additivity. According to Rothman (3, p. 320), b3 is the most fundamental epidemiologic measure of interaction.
Unfortunately, the linear risk model cannot in general be validly estimated from case-control designs, unless the sampling fraction of cases and controls is known or can be estimated. Since this rarely appears to be the case, it follows that direct inference regarding the fundamental interaction parameter b3 cannot be performed in this case. This was the impetus for the development of the surrogate interaction measures RERI, AP, and S.
The logistic risk model
A logistic risk model is specified as
Note that the parameters , ß1, ß2, and ß3 are different from the corresponding parameters a, b1, b2, and b3 in the linear risk model. The model can alternatively be expressed as
Often RRjk ª ORjk, giving ,
, and
. If ß3 = 0, RR11 = RR01 x RR10 is obtained, which is relative-risk multiplicativity.
Importantly, the logistic model can be employed for case-control designs under reasonable assumptions (13). Regarding the parameters, the only difference is that the intercept now becomes
where 1 and
0 are the sampling fractions of cases and controls, respectively.
![]() |
MEASURES OF INTERACTION |
---|
Relative excess risk due to interaction
Rothman defines RERI (3, p. 323) as
RERI can be interpreted as the excess risk due to interaction relative to the risk without exposure. Rothman suggests substituting estimated approximate risk ratios ,
, and
from the logistic risk model. Under our parameterization of the logistic risk model (equation 2), this leads to
Attributable proportion due to interaction
Rothman defines AP (3, p. 321) as
AP is interpreted as the attributable proportion of disease which is due to interaction among persons with both exposures. However, this interpretation does not make sense under negative interaction (b3 < 0), since the proportion would then be negative.
Substituting the estimated approximate risk ratios from the logistic risk model gives us
Synergy index
Rothman defines S (3, p. 322) as
S can be interpreted as the excess risk from exposure (to both exposures) when there is interaction relative to the excess risk from exposure (to both exposures) without interaction.
Substituting the estimated approximate risk ratios from the logistic risk model (equation 2) gives us
![]() |
MODELS INCLUDING ADDITIONAL COVARIATES |
---|
Let Rjkz P(Y = 1|xl, x2, z) be the conditional risk of Y taking the value 1 given covariates. Define stratum-specific risk differences as RDjkz
Rjkz R00z, relative risks as RRjkz
Rjkz/R00z, odds as Ojkz
Rjkz/(1 Rjkz), and odds ratios as ORjkz
Ojkz/O00z.
The linear risk model
Consider a linear risk model with an additional covariate, where there is interaction among exposures but not between the exposures and the additional covariate:
Rjkz = a + b1x1 + b2x2 + b3x1x2 + gz,
where a > 0, b1 > 0, b2 > 0, and gz > 0. a = R000 and g = R001 R000; a is the risk under no exposure when z = 0, whereas g represents the excess risk when z = 1 (compared with z = 0). Hence, the risk when there is no exposure can be expressed as a + gz; note that it depends on the value taken by the additional covariate. Irrespective of the value of z, it follows that b1 = R10z R00z = RD10z, b2 = R01z R00z = RD01z, and b3 = R11z R10z R01z + R00z = RD11z RD10z RD01z. It also follows that
Note that RR10z, RR01z, and RR11z are functions of the covariate z, in contrast to the risk differences.
The logistic risk model
A logistic risk model with an additional covariate, where there is interaction among exposures but not between exposures and the covariate, is specified as
It follows that OR00z = 1, OR10z = , OR01z =
, and OR11z =
. When RRjkz
ORjkz,
Hence, the relative risks implied by the logistic risk model do not depend on the covariate z, in contrast to the linear case. On the other hand, risk differences depend on the covariates, unlike the case in the linear risk model.
![]() |
PROBLEMS WITH ADDITIONAL COVARIATES |
---|
The uniqueness problem
Noting that the interaction parameter of interest b3 is invariant across the strata defined by the covariates z, I investigate whether this also applies for the surrogate measures.
Consider RERI for a given value of the covariates z. Substituting for the relative risk from the true linear risk model (equation 7) gives us
demonstrating that the magnitude of RERI generally depends on the values of z. In contrast, Rothmans suggestion of including additional covariates in the logistic model would produce a single , given in equation 3, where
,
, and
are now estimates from the logistic model (equation 8) including the covariate but no interactions between the covariate and either of the exposures or their product. Hence, there is clearly a tension between the suggested estimator, based on the implicit assumption that there is one measure to be estimated, and the fact that there are several unknown measures. The exception is when there is no interaction, b3 = 0, since RERI = 0 in this case, whatever the value of z. Also note that RERI retains the sign of b3, since a + gz > 0.
Regarding AP, substitution for the relative risk from the true linear risk model (equation 7) produces
and there is a different AP for each stratum defined by the covariates, unless b3 = 0. Following Rothmans strategy, on the other hand, a single AP would be estimated as in equation 4, with estimates substituted from the logistic model with covariate (equation 8).
For S, substituting for the relative risk from the linear risk model (equation 7) gives us a unique measure
which does not depend on the covariate z. Analogous to the case without additional covariates, Rothman suggests estimating S using equation 5, with estimates substituted from equation 8. S does not suffer from the uniqueness problem when additional covariates are included, in contrast to RERI and AP, which suggests that S is the surrogate measure of choice.
The misspecification problem
If a logistic model is used in estimation of the surrogate interaction measures, specified with interaction among exposures (but not between exposures and additional covariates), the model is misspecified in the sense that it does not produce a relative risk identical to that of the corresponding true linear model when there are additional covariates. This is evident from noting that the relative risk from the logistic model (equation 9) does not depend on the value of the covariate z, whereas the relative risk from the linear model in equation 7 does. Hence, RERI, AP, and S based on the logistic risk model with an additional covariate (equation 2) only approximate the true measures from the corresponding linear risk model (equation 6). This stands in contrast to the case with solely two exposures, where the logistic and linear models are both "saturated" (both have as many parameters as conditional probabilities) and produce identical relative risks (and hence RERI, AP, and S). An important implication is that the estimated logistic model cannot be used to check the validity of the linear model, since a linear model without interaction between exposures and covariate implies interaction in the logistic model.
![]() |
RECTIFYING THE MISSPECIFICATION PROBLEM |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
enables us to estimate a* = ka, = kb1,
= kb2,
= kb3, and g* = kg based on a case-control study (6, pp. 418419; 14). The linear odds model is a misspecified version of the linear risk model (equation 6) in the sense that the parameters a, bl, b2, b3, and g of the latter model are recovered up to a proportionality factor k. This proportionality misspecification has two important implications: First, it follows that hypotheses specifying that parameters of linear risk models are zero can be tested, particularly the hypothesis of no departure from additive risks b3 = 0 by testing
= 0 in the model shown by equation 10. Second, the surrogate measures of interaction as departure from additivity can be validly estimated from the linear odds model. Considering S,
Note that the unknown proportionality factor cancels out. Thus, although the linear odds model is a misspecified version of the linear risk model, no misspecification problem is involved in obtaining S (or RERIz and APz), in contrast to the approach based on logistic regression. However, the uniqueness problem involving RERI and AP persists, suggesting that linear odds modeling of S is the method of choice in assessing interaction as departure from additivity in case-control studies with additional covariates. The linear odds model can be fitted in software packages such as STATA, EPICURE, and SAS (a reparameterization is available in EGRET).
![]() |
EXAMPLE |
---|
To ease presentation, I now let risk and risk difference be expressed as number of cases per 100,000. That is, a risk of 0.0004 is written as 40. Remember that only (approximate) relative risks are generally available from case-control studies, and inference must hence be based on these.
I let z be coded 1 if male and 0 if female. A linear risk model with interaction between exposures but no interactions between the covariate and the exposures is specified:
Rjkz = a + b1x1 + b2x2 + b3x1x2 + gz = 10 + 100x1 + 40x2 +40x1x2 +90z.
This setup is exhibited in table 1.
|
The example illustrates the problems previously uncovered. Although the fundamental interaction parameter b3 is invariant over gender, the surrogates RERI and AP both vary across gender. S is the only adequate measure, attaining a unique value for both gender strata. Regarding the misspecification problem, the relative risks for males in table 1 do not equal those for females, as would be the case for a logistic model without interaction between exposures and covariate. A hypothetical case-control study can be obtained from the table by letting the figures reported in the "Risk" column represent cases and considering 500 controls in each group. If logistic regression were used, the estimates = 0.80,
= 0.18, and
= 1.31 would be obtained, whereas using the linear odds approach produces
= 1.29.
![]() |
SIMULATION STUDY |
---|
On the basis of the resulting case-control data, I first consider the approach advocated by Rothman, basing inference regarding RERI, AP, and S on the logistic risk model (equation 8). Ninety-five percent confidence intervals for all measures are obtained as described by Hosmer and Lemeshow (9). I then consider the performance of the alternative approach based on fitting the linear odds model (equation 10). The Wald test of H0: , which is also a test of the hypothesis that the fundamental interaction parameter b3 is zero, is investigated. The actual rejection probability at the nominal level of 5 percent represents the actual significance level when H0 is true and the power of the test otherwise. The performance of point estimates of S and corresponding 95 percent confidence intervals obtained via the delta method are also studied. Confidence intervals for S are not part of the standard output from linear odds modeling; therefore, I demonstrate in the Appendix how a calculator or spreadsheet can be used to obtain these. Since RERI and AP suffer from the uniqueness problem when there are additional covariates, I do not consider inference regarding these measures based on the linear odds model.
The nine scenarios investigated are presented in the left-hand portion of table 2. Throughout, I specify a = bl = b2 = 0.0001 but consider several scenarios for the interaction parameter b3 and the covariate effect g. Regarding the magnitude of interaction, no interaction (b3 = 0), a moderate positive interaction (b3 = 0.0001), and a strong positive interaction (b3 = 0.001) are studied. Regarding the covariate effect, I consider no effect (g = 0), a moderate effect (g = 0.0001), and a strong effect (g = 0.001) on disease. The corresponding values of the interaction measures are given, where RERI and AP are given subscripts designating the strata defined by z.
|
the variance as
and coverage as the fraction of the 1,000 95 percent confidence intervals including the true S. Analogous definitions apply for RERI and AP, but note that coverage cannot be defined when these measures vary across strata. For each scenario, the mean estimates and variances are reported in the right-hand portion of table 2, and the coverage of the 95 percent confidence intervals is reported when applicable.
Considering the performance of inference based on logistic regression, it is evident that RERI and AP are very problematic under scenarios 5, 6, 7, and 8, where there is not a unique measure. The evidence for bias in estimating RERI and AP for the remaining scenarios is statistically significant, except for scenarios 4 and 7, respectively, where p > 0.05. Bias in estimating S is significant for scenarios 1, 4, 5, and 6. However, the estimated bias is fairly tolerable in magnitude for all unique measures, apart from scenarios 6 and 9 for S. Regarding precision, did not perform satisfactorily for scenarios 6, 7, 8, and 9. This is due to its construction as a fraction, often producing very large absolute values when the denominator by chance approaches zero. Coverage was generally quite dismal, and it grew worse (more discrepant from 95) as the interaction and the magnitude of the covariate effect increased.
From a theoretical point of view, the linear odds model is the model of choice for estimating S. Interestingly, the results from the simulations are somewhat mixed. Regarding coverage, the performance of the linear odds approach is good, and it clearly outperforms the logistic approach. When it comes to estimation, the evidence of bias in from the linear odds model is significant (p
0.05) for all scenarios except 3, 6, and 9 (lack of significance for the latter is due to extreme imprecision). Disappointingly, the estimated bias is generally somewhat more pronounced than for the logistic approach. The variances of the estimates are also generally higher for the linear odds model than for the logistic model, leading to larger mean squared errors. The nominal significance level for testing the fundamental interaction parameter b3 in the linear odds model is reasonably well recovered. Observe that the power is low when the interaction parameter is of the same magnitude as the main effects, notwithstanding that there are as many as 500 cases and 500 controls. The power also appears to decrease as the covariate effect increases.
As expected, all measures perform fairly well in terms of bias when there is no covariate effect (scenarios 1, 4, and 7). Results based on the logistic and linear odds models differ because the estimated models are misspecified by inclusion of the covariate z. Identical results would be obtained for both models if the estimated models were correctly specified by omitting the covariate.
A simulation study with smaller samples, 250 cases and 250 controls, was also conducted. The results were similar but a bit more pronounced and are not reported here.
![]() |
CONCLUSION |
---|
I conclude that considerable caution should be exercised in assessing interaction as departure from additivity in case-control studies with additional covariates.
![]() |
ACKNOWLEDGMENTS |
---|
![]() |
APPENDIX |
---|
S based on the linear odds model was given in equation 11 as
from which it follows that
.
Since S is a fraction, the coverage properties of a confidence interval for ln S are likely to be superior. Estimated standard errors of ,
, can be obtained using the multivariate delta method (15) as
where
and
An approximate 95 percent confidence interval for S will then have the lower confidence limit
and the upper confidence limit
![]() |
NOTES |
---|
![]() |
REFERENCES |
---|