1 Division of Pharmacoepidemiology and Pharmacoeconomics, Brigham and Women's Hospital, Harvard Medical School, Boston, MA
2 Division of Preventive Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, MA
Correspondence to Dr. Til Stürmer, Division of Pharmacoepidemiology and Pharmacoeconomics, Brigham and Women's Hospital, Harvard Medical School, 1620 Tremont Street, Suite 3030, Boston, MA 02120 (e-mail: til.sturmer{at}post.harvard.edu).
Received for publication July 29, 2004. Accepted for publication December 14, 2004.
![]() |
ABSTRACT |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
anti-inflammatory agents, non-steroidal; bias (epidemiology); cohort studies; confounding factors (epidemiology); epidemiologic methods; research design
![]() |
INTRODUCTION |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
So far, we know that EPSs and DRSs give nonnominal p values under the null hypothesis in situations in which exposure-confounder associations are very strong and likely unrealistic (4). Omitting an important confounder from analysis leads to a similar magnitude and direction of bias when using EPSs compared with outcome models (5
). EPSs have a reduced efficiency compared with outcome models (6
). EPSs have been shown to perform better than outcome models if fewer than eight events are observed per covariate that we want or need to control for (7
). Little is known, however, about the effect of different ways of using EPSs and DRSs with respect to control for confounding and statistical efficiency, especially in small studies.
In this paper, we illustrate the different ways that both methods can be used to control for confounding in a large cohort study based on claims data, evaluating the association between nonsteroidal antiinflammatory drug (NSAID) use and 1-year mortality in an elderly population. We chose the specific empirical example of the previously observed inverse association between NSAIDs and all-cause mortality (8, 9
) since there is no known biologic reason to expect that NSAID use would reduce the risk of short-term mortality. Even if such an association existed, the observed magnitude of about a 26 percent risk reduction (8
) is implausible. Instead, the apparent association is likely spurious because of patient selection leading to confounding bias: physicians are more likely to treat symptomatic pain with narcotic agents rather than NSAIDs in patients close to death (8
). We also randomly resampled smaller subcohorts to assess the effect of study size on the performance of these methods.
![]() |
MATERIALS AND METHODS |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Disease risk scores
DRSs reduce the number of covariates by summarizing the predictive information for disease risk of all potential confounders in a multivariable model, conditional on nonexposure (3, 4
). The DRS can then be used as a summary confounder and can be controlled for by using stratification or a multivariable outcome model. Just as the purpose of randomization is to create groups for comparison that have the same underlying risk of disease, apart from the effect of exposure, the DRS achieves this comparability based on the multivariate distribution of identified confounders.
Study population
The study population was assembled to analyze pain medication use by the elderly, and it consists of all New Jersey residents aged 65 years or older who filled prescriptions (as defined below) through Medicaid or the Pharmaceutical Assistance to the Aged and Disabled program and were hospitalized any time between January 1, 1995, and December 31, 1997. Patients discharged after December 31, 1997, as well as those residing in a nursing home before hospitalization, were excluded. For persons with more than one such hospitalization, one hospitalization was selected at random to permit valid estimation of the 1-year risk of all-cause mortality in this population at the time of a hospital admission. Eligible persons were those who filled at least one prescription within 120 days before hospitalization and at least one prescription more than 365 days before hospitalization, since covariates were assessed during that time period. The index date was the date of hospitalization.
The exposure of interest, NSAID use, was defined as having filled at least one prescription for any NSAID during the 120 days before hospital admission. For all subjects, the following covariates were then extracted: age (in years), sex, race (Caucasian, African American, other), and 17 diagnoses based on inpatient and outpatient visits that are part of the Charlson comorbidity index (10) within 365 days before the index date (acquired immunodeficiency syndrome, congestive heart failure, chronic obstructive pulmonary disease, dementia, hematologic disease, cancer, metastatic cancer, myocardial infarction, diabetes (with and without complications), liver disease (mild and severe), peripheral vascular disease, peptic ulcer, renal disease, arthritis (rheumatoid arthritis or osteoarthritis), and stroke).
Further covariates were indicators for prescriptions of distinct generic entities filled within 120 days before the index date, including those for narcotics, other analgesics, angiotensin-converting enzyme inhibitors, beta blockers, calcium channel blockers, thiazide diuretics, other antihypertensives, lipid-lowering drugs, antiarrhythmics, coumadin, digitalis, rheologic drugs, oral antidiabetics, insulin, antidiarrheals, histamine2 receptor blockers, other antiulcer drugs, anticonvulsants, beta agonists, xanthines, steroids, other bronchodilators, loop diuretics, potassium, anxiolytics, antidepressants, phenobarbital, other antipsychotics, sedatives, stimulants, penicillins, cephalosporins, macrolide antibiotics, quinolones, sulfonamides, folic acid, influenza vaccines, glaucoma drugs, topical antibiotics, topical sulfonamides, and topical enzymes.
We computed number of hospitalizations (three categories), number of physician visits (three categories), and screening examinations, including cholesterol, electrocardiogram, mammography, Papanicolaou smear, and prostate-specific antigen during that 1-year time interval.
All 71 covariates were available for inclusion in the analyses. In the absence of a record for a specific diagnosis, procedure, or prescription, patients were coded as free of these characteristics. As a result of this coding rule, there were no subjects for whom exposure, confounder, or outcome information was missing (with the exception of unknown race, which was classified as other than White or Black).
We assessed time until death or study end at 365 days of follow-up (whichever came first), starting from the date of hospital admission, based on exact linkage to Medicare claims data (11). The study was approved by the Center for Medicare and Medicaid Services and the Institutional Review Board of the Brigham and Women's Hospital.
Analytic strategies
EPS stratification, modeling, and matching.
We estimated the EPS for NSAID use (yes/no) during the last 4 months before hospitalization by using logistic regression and a forward variable selection with an alpha value of 0.3. The value of 0.3 was less stringent than the value of 0.2 used in the "conventional" disease models to allow more variables to be entered into the EPS models compared with the "conventional" models. The estimated EPS was used in several ways. We first adjusted the multivariate Cox proportional hazards outcome model including the EPSs as categories (quintiles), as linear splines (12), or as a linear predictor (continuous). Second, we matched every exposed participant to one unexposed participant on the EPS (1:1 matching) and used a stratified Cox proportional hazards model in the matched sample to control for confounding. We used two different matching algorithms: 1) greedy matching, using calipers of the estimated EPS with increasing width to find matches (13
); and 2) fixed one-digit matching, using a fixed width of 0.1 (i.e., matching on EPS ± 0.05).
Inverse probability of treatment weighting (IPTW).
This strategy uses the estimated EPS to assign individual weights to all observations resulting in an altered composition of the study population (14). The altered study population is then analyzed by using a Cox proportional hazards model with NSAID use as the only covariate (15
). We used "stabilized" weights that take the marginal prevalence of the exposure into account to maximize efficiency and to obtain a reweighted study population of equal size (14
, 15
). The weights for exposed participants are obtained by dividing P, the marginal prevalence of exposure, by the individual EPS, and those in unexposed participants by dividing (1 P) by (1 EPS).
"Conventional" multivariable disease model.
In addition to an unadjusted estimate of the association between NSAID use and short-term mortality, we also used a "conventional" multivariable Cox proportional hazards model to adjust for confounding. Variables were included by forward selection by using an alpha value of 0.2, a value that has been shown to perform well with respect to control for confounding (16). To avoid overfitting, the number of variables was restricted to allow at least eight outcomes per variable in the model (7
).
Disease risk score.
We then estimated the DRS for 1-year mortality from all causes by using a Cox proportional hazards model and a forward variable selection with an alpha value of 0.3, including all 71 covariates described above at the beginning of the selection algorithm as well as the primary exposure, NSAID use. The value of 0.3 was again chosen to allow more variables to be entered into the DRS model than into the disease model. For the same reason, the overall number of variables was not restricted. The regression coefficients from this model were then multiplied by the individual covariate values of the variables entered into the model, except for NSAID use, which was set to 0 (nonuse) for all participants (3, 4
). The sum of these products gave the subject-specific DRS, which was then used to control for confounding in separate Cox proportional hazards models of the study outcome. We included the DRSs as categories (5 or 10) or as a continuous linear predictor together with the primary exposure, NSAID use.
Combination of methods.
Finally, we combined some of these methods by simultaneously adjusting for EPS and DRS and by adding a selection of risk indicators to the EPS (again using forward variable selection). Although this ad hoc approach is not equivalent to "doubly robust" estimation (17), it might nevertheless offer advantages if either the EPS model or the "conventional" disease model is misspecified.
Random resampling of subcohorts
From the total cohort of 103,133 elderly, we created 1,000 randomly sampled subcohorts of 10,000, 1,000, or 500 persons, with replacement, and applied the analytic strategies described above within each resampled subcohort (3,000 cohorts overall). Using this approach, we obtained the empirical distribution of the compositions of the cohorts and selected model characteristics as well as parameter estimates for each of the 13 analytic strategies.
![]() |
RESULTS |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
|
|
Despite the decreasing number of covariates used to estimate the EPS with decreasing size of the subcohorts, the median area under the receiver operating characteristic curve, which estimates the ability of the EPS model to discriminate exposure status (18), increased from 0.68 (n = 10,000) to 0.79 (n = 500). This area can range from 0.5 (chance prediction) to 1.0 (perfect prediction).
In the full cohort, 99.6 percent of all NSAID users could be matched to nonusers when we used either greedy matching or fixed-width-caliper, one-digit matching. Despite this large proportion, both matching strategies resulted in a loss of over 70 percent of all events (71.2 percent and 70.9 percent, respectively) because a large proportion of those unexposed were not included. With decreasing size of the subcohorts, the proportion of successfully matched exposed subjects decreased. When we considered the decreasing absolute number of events with decreasing size of the subcohorts, the decline in the number of events on which final analyses were based was even more pronounced. These results were essentially the same for both matching techniques.
Effect of analytic strategy and study size on NSAID effect estimates
Table 3 describes the association between NSAID use and 1-year mortality derived from Cox proportional hazards models using various approaches to control for confounding. Without any control for confounding, NSAID use appeared to be associated with more than a 30 percent reduction in mortality risk. With decreasing size of the subcohorts, this estimate remained stable while the empirical 95 percent confidence interval became wider.
|
Using the EPS in various ways to adjust for confounding in the full cohort resulted in estimates for the NSAID-mortality association ranging from 0.81 (quintiles) to 0.83 (continuous). The same point estimate was obtained when fixed one-digit matching was used, whereas the point estimate using greedy matching was slightly closer to the null value (0.85). Owing to the loss of information, both matched estimates were slightly less precise. Using IPTW based on the estimated EPS also resulted in a point estimate of 0.85. All outcome modelsincluding the "conventional" model, all DRS models, and the model including the EPS in combination with risk indicatorsresulted in essentially identical estimates of between 0.80 and 0.81.
Results from subcohorts in which n = 10,000 were nearly identical to those from the full cohort (with the exception of slightly wider, now empirical confidence intervals). With decreasing size of the subcohorts, however, estimates from all analytic strategies moved further away from the null and closer to the crude value, indicating increasing residual confounding. Despite the fact that the number of variables used in the corresponding models decreased much more sharply with decreasing study size in the outcome models than in the EPS and DRS, this decrease was not reflected in any substantial differences between these strategies. Specifically, use of DRSs did not translate into any gain compared with the "conventional" outcome model with respect to either point estimate or precision (results stratifying on 10 rather than five categories of the DRS were virtually identical and are therefore not presented here). Taking the absence of major differences into account, greedy matching resulted in the estimate closest to the null value (0.80), and IPTW resulted in an estimate (0.72) very close to the age- and sex-adjusted estimate (0.71) in the smallest subcohort. Both analyses had wide confidence intervals compared with the other analytic strategies.
![]() |
DISCUSSION |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Use of the EPS (but not the DRS) comes at the price of losing potentially useful information about predictors of the outcome (since the covariates are not included in the disease model), and we know much less about variable selection and model-building strategies for EPS compared with "conventional" disease models (16). Therefore, it seems desirable to use the EPS only if bias could be reduced or efficiency improved.
Cook and Goldman (4) compared the performance of tests of significance under the null hypothesis (i.e., assuming no difference between treatments) for EPS, DRS, and "conventional" multivariable outcome models by using simulations. EPSs appeared to produce nominal results in most circumstances, but not in situations with very strong treatment-confounder associations. This result was even more pronounced for DRSs. Such a constellation was not present in our realistic example.
In some practical situations, the choice of analytic method will be limited. Because 10 events per covariate is usually considered a minimum requirement for stable estimates in multivariable models (18, 20
), EPS analyses combining multiple covariates into a single score are especially desirable if the outcome is rare (19
, 21
). A recent simulation study comparing EPS with multivariable outcome models concluded that the EPS performed better in situations with fewer than eight outcomes per covariate (7
). We therefore used this proportion to limit the maximum number of variables available for selection in all of our outcome models. Since precision of estimates is likely to be of minor importance for the EPS and DRS, we used a value of alpha = 0.3 in these models to allow more variables to be entered than in the disease model, for which a value of 0.2 has been shown to perform well for control of confounding (16
). Neither the restriction of the absolute number of variables nor the more stringent p value requirement handicapped the performance of "conventional" disease models compared with the EPS or DRS.
We chose the example of NSAIDs and all-cause mortality since NSAIDs are unlikely to affect mortality substantially in an elderly population (8). The Physicians' Health Study trial reported an equal number of deaths in both the aspirin and the placebo arms after 5 years of (low-dose) treatment (22
), although the overall number of deaths was too small to rule out a meaningful difference. Even if NSAIDs were protective for some cancers, including colorectal (23
, 24
), chronic use in the elderly is associated with several adverse outcomes, including increased risk of gastrointestinal hemorrhage (25
28
), impaired kidney function (29
31
), hypertension (32
, 33
), and perhaps even cardiovascular disease, stemming from their possible antagonism of the preventive effect of aspirin (34
) (because cohort enrollment ended in 1997, NSAID use does not include cyclooxygenase-2 inhibitors). Therefore, either no association with mortality or, if anything, a slightly increased risk of mortality seems biologically more plausible than a reduced risk of death.
Glynn et al. (8) have argued that in an elderly population, selected drug classes, including lipid-lowering drugs, NSAIDs, or antiglaucoma drugs, are more likely to be prescribed to healthier subjects. Drugs with a preventive component are less likely to be prescribed if death seems near based on the assessment of the prescribing physician (9
). Thus, even if we do not know the precise relation between NSAIDs and mortality, our conclusions are valid for a wide range of possible effects, including a reduction of up to 15 percent in risk. More pronounced risk reductions in mortality from all causes seem biologically implausible. The generally similar estimates resulting from applying the various analytic strategies might indicate that we were able to control adequately for observed confounding, although we do not know what the best estimate of the NSAID-mortality association would be, given the observed covariates.
Our results are limited to one specific setting with essentially the same prevalence of exposure and cumulative incidence of disease (about 20 percent each). This restriction might explain why we did not find differences between the EPS and the DRS. However, it would not explain why we did not observe any of these methods that combine multiple variables into a single score to perform better than "conventional" disease models. Generally speaking, the data structure at hand is likely to influence the choice of the preferred method. EPSs are likely to perform better than "conventional" outcome models or DRSs with respect to control for confounding when the exposure is prevalent and the disease is rare, since it may be possible to build a richer model of the exposure than of the disease, and vice versa (21). We suppose that DRSs might have an advantage over "conventional" outcome models with respect to bias and precision if the disease is rare, because 1) they allow truncation of the risk score distribution so that only the range of scores common to both exposed and unexposed subjects is included in the analysis; and 2) the final disease model can be fit with only two variables (i.e., the exposure of interest and the risk score).
The set of variables available in this claims database might not be broad enough to sufficiently predict exposure or outcome. However, this limitation would invalidate our comparisons only if a small subset of variables included in all models were responsible for all of the confounding, with additional variables showing no confounding above and beyond these "core" confounders. It is nevertheless intriguing that differences between the methods are minor compared with the remaining residual confounding, assuming that there is no protective effect of NSAIDs on short-term, all-cause mortality. Incorporating additional information on factors strongly associated with the prescribing of NSAIDs and on short-term mortality not available in claims data (e.g., measures of over- and underweight or activities of daily living (35, 36
)) seems more promising than using different strategies to analyze the available data in our specific example.
The parameters estimated by using individual matching on the EPS and IPTW are not exactly the same as the ones from the other analytic strategies (37, 38
). The population-averaged interpretation of these estimates might explain why they were closest to the null value in the full cohort but not why the IPTW estimate seemed furthest away from the null value in the smallest studies. The latter finding might be due to influential weights attributed to observations with "wrong" exposure status, that is, exposed observations with a very low estimated propensity for exposure, or vice versa.
We conclude that in the setting of claims data on an elderly population, various ways to apply EPSs and DRSs to control for confounding were not generally superior to "conventional" multivariable outcome modeling. Differences in effect estimates between analytic strategies became more pronounced with smaller study size.
![]() |
References |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|