1 Department of Nutrition, School of Public Health and School of Medicine, University of North Carolina, Chapel Hill, NC.
2 Carolina Population Center, University of North Carolina, Chapel Hill, NC.
3 Department of Statistics, North Carolina State University, Raleigh, NC.
4 Department of Maternal and Child Health, School of Public Health, University of North Carolina, Chapel Hill, NC.
Received for publication May 14, 2003; accepted for publication November 18, 2003.
![]() |
ABSTRACT |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
causality; confounding factors (epidemiology); epidemiologic methods; longitudinal studies; models, structural
Abbreviations: Abbreviation: MSM(s), marginal structural model(s).
![]() |
INTRODUCTION |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Marginal structural models (MSMs), developed by Robins et al. (14), allow proper adjustment for time-dependent confounding. Although MSMs are relatively simple to implement, they have been used almost exclusively by methodologists (510), not practicing epidemiologists. In this paper, we describe the application of MSMs to estimation of the causal effect of iron supplementation during pregnancy, a time-varying exposure, on odds of anemia at delivery. Compliance with use of iron supplements does not cause anemia, so this example is a good test case for causal methods. In this example, subjects and clinicians made decisions to change iron treatment over time based on subjects attributes, including hemoglobin and serum ferritin concentrations (blood markers of iron status) and treatment side effects. Thus, these time-dependent covariates not only are independent predictors of anemia risk but also predict subsequent iron treatment (i.e., they are confounders). These covariates are also affected by prior iron treatment (i.e., they are intermediates). Without proper adjustment, iron supplementation will probably appear harmful rather than protective. Therefore, comparing adjustment methods yields useful insights into the advantages of MSMs. In this paper, we compare results obtained using MSMs with those obtained using ordinary logistic regression, discuss the strengths and weaknesses of MSMs, and highlight key issues epidemiologists should recognize before and while undertaking such an analysis.
![]() |
MATERIALS AND METHODS |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
A midgestation assessment of hemoglobin and serum ferritin concentrations was made during a visit in the interval of 2429 weeks gestation, after which the standard clinic protocol was implemented. Typically, women with good iron status (the definition was left up to the clinicians discretion) received a supplement containing 30 mg of iron in a standard prenatal vitamin through delivery, while women with poorer iron status were prescribed higher-dose supplements (60325 mg/day). Supplement use from 2429 weeks gestation to delivery was assessed with abstracted pharmacy data.
After delivery, medical records were abstracted for ascertainment of hemoglobin concentration no more than 7 days before delivery, sociodemographic data, and pregnancy events. Prenatal iron status was rarely measured at visits other than enrollment, 2429 weeks, and delivery. Maternal anemia at delivery was defined by gestational-age-specific hemoglobin concentration cutpoints after adjustment of hemoglobin level for smoking (12).
Treatment
From enrollment to 2429 weeks gestation, 41 percent of subjects did not pick up their assigned supplements, and 85 percent of women who returned pill bottles had less-than-perfect adherence. Thus, cumulative dose of iron was estimated using a combination of pill-count and pharmacy data. We estimated the percentage of compliance on the basis of pill-count data, since a validation study found these data to be more accurate than questionnaires in assessing compliance. For each woman, percentage of compliance was estimated as (32 pills per bottle number of pills remaining)/(number of days between dispensing and refill) x 100. However, only 31 percent of the women returned their pill bottles, so we estimated pill counts for the remaining women using data on pharmacy prescription pickups, since these data were available for all of the women and were moderately correlated with pill-count compliance (r = 0.5). Percentage of compliance using pharmacy data was estimated as (number of pills dispensed/number of days between dispensing and refill) x 100. We used simple regression imputation with percentage of pharmacy compliance as the main independent variable to predict pill-count compliance for women who did not return pill bottles (13).
For each supplement, we calculated the therapeutic dose of iron as (mg/day in the supplement x days between dispensing and refill x percent compliance) and summed these doses to estimate cumulative dose between enrollment and 2429 weeks and between 2429 weeks and delivery. Finally, we categorized cumulative dose in each time period into four groups reflecting iron intake relative to the 30-mg/day dose recommended for nonanemic pregnant women (12), as follows. For the period from the initial visit to 2429 weeks (hereafter called time period 0), 30 mg/day corresponded to a cumulative dose of 2.03.5 g, since most women had the opportunity for 1016 weeks of treatment. Therefore, we identified four groups, (0, 1, 2, 3), which corresponded to iron intakes of 0 g, 0.11.9 g (less than the prescribed amount), 2.03.5 g (approximately the prescribed amount), and >3.5 g (more than the prescribed amount), respectively. For the period from 2429 weeks to delivery (hereafter called time period 1), most women had 1014 weeks of supplementation, so the four groups (0, 1, 2, 3) corresponded to 0 g, 0.11.9 g (less than the prescribed amount), 2.03.0 g (approximately the prescribed amount), and >3.0 g (more than the prescribed amount), respectively.
Our objective was to estimate the causal effect of all 16 combinations, or regimes, of iron treatment throughout pregnancy on the odds of anemia at delivery, while adjusting for time-dependent and -independent confounders.
Potential confounders
Potentially confounding maternal characteristics were maternal ethnicity/race (non-Hispanic White, non-Hispanic Black, or other); education (12 years vs. >12 years); age (<20, 2029, or
30 years); parity (1, 2, or
3); marital status (married vs. unmarried); anemia at baseline (dichotomous variable); serum ferritin concentration at baseline (continuous variable); gestational age at entry (continuous variable); smoking status (smoker vs. nonsmoker); gestational age at delivery (continuous variable); and prepregnancy body mass index (weight (kg)/height (m)2; <19.8, 19.826.0, 26.129.0, or >29.0 (14)). Symptoms that may interfere with supplement use (stomach discomfort, nausea, vomiting, gas, constipation, diarrhea, appetite loss, or cramps in the past month; dichotomous variable) were self-reported on the questionnaire.
After 2429 weeks gestation, data were available on the presence of severe nausea and vomiting from the medical record, serum ferritin concentration at 2429 weeks (continuous variable), and hemoglobin concentration at 2429 weeks (<100, 100109, or 110 g/liter).
Because collection of data on covariates was not central to the studys goal, certain women were missing some of these data. Since our objective was to provide a pedagogic demonstration of MSMs, we used data from 426 women with all measured covariates required for fitting the models described in the equations presented below. Forty-three women were censored between baseline and 2429 weeks, leaving 383 with full exposure data at 2429 weeks. Of these, 81, 138, 135, and 29 women fell into treatment groups 0, 1, 2, and 3, respectively. Fifty-five women were censored between 2429 weeks and delivery, leaving 328 with full exposure data through delivery. Of these, 40, 139, 97, and 52 women fell into treatment groups 0, 1, 2, and 3, respectively. Lastly, there were 234 uncensored women with complete exposure, covariate, and outcome data, 119 of whom were anemic at delivery. Censoring was taken into account according to techniques described by Robins et al. (1) and discussed below.
Causal inference and MSMs
Ideally, for estimation of the causal effect of supplementation on anemia, a large sample of women should be randomized to different treatment regimes at enrollment, with perfect adherence ensured and no censoring; here, the assumption of no confounding would be reasonable. Although, in our study, women were randomized to treatment initially, noncompliance was common, and decisions were made in each period to modify or terminate treatment that may be associated with both subject attributes and previous treatment. Side effects from a high iron dose may have caused a health-care provider to lower the iron dose during the second period. Poor iron status from an initially low iron dose may have caused a provider to increase the iron dose during the second period. Thus, time-dependent confounding is to be expected in this situation, and estimation of causal effects based on the observed exposures is likely to yield biased estimates.
Borrowing the notation of Robins et al. (1), let A0 equal observed cumulative iron dose from baseline (time 0 = treatment initiation) to 2429 weeks, and let L0 denote a vector of measured confounders that may be associated with A0 (ethnicity/race, baseline hemoglobin level, etc.). Let A1 equal observed cumulative iron dose from 2429 weeks to delivery (time 1 = 2429 weeks gestation), and let L1 denote measured confounders possibly associated with A1 (e.g., 24- to 29-week hemoglobin level measured before treatment modification at time 1). A0 and A1 have four possible values, (0, 1, 2, 3). Because we use a cumulative treatment to account for noncompliance with assigned treatment, our notation differs slightly from that of Robins et al. (1). However, we maintain their notation here for consistency. Let C0 indicate loss to follow-up (censoring) before 2429 weeks (C0 = 1 if censored and 0 otherwise). Because cumulative dose from entry to 2429 weeks may be determined only if a woman is not censored before 2429 weeks, A0 is not always observed for these women. Similarly, let C1 indicate censoring before delivery for a woman whose A0 and L1 are observed but A1 is not necessarily determined (C1 = 1 if censored and 0 otherwise). Women who had therapeutic or spontaneous abortions were considered censored, whereas women with stillbirths were uncensored if they had undergone hemoglobin measurement before delivery. It is possible that there are vectors of unmeasured confounders at times 0 and 1; denote these by U0 and U1, respectively. Figure 1 shows a possible directed acyclic graph (causal diagram) (1517) that explicates the temporal ordering of all variables, assuming no unmeasured confounding.
|
Our goal is to estimate the probabilities P( = 1) for all 16 possible
s to compare treatment regimes and estimate causal effects. In an ideal randomized study (described above) with women assigned to the 16 combinations, this would be possible. However, with our sample, we cannot estimate these probabilities directly; we can only estimate the probability of having anemia at delivery (Y = 1) among those with observed treatment histories
=
, denoted P(Y = 1|
=
).
An MSM is a model for P( = 1). If
contains all confounders for treatment and outcome (i.e., no unmeasured confounders exist), it is possible to fit this model unbiasedly (1). This is accomplished by fitting a model to the observed data (i.e., P(Y = 1|
=
)) but weighting the contribution from each uncensored subject by the inverse of, approximately, the joint probability of having her treatment and censoring histories as a function of her covariate history (1). The weighting creates a "pseudopopulation" in which confounders are no longer associated with treatment (see the paper by Robins et al. (1) for a simple example of inverse weighting). Of course, we cannot determine from the data whether there are unmeasured confounders; thus, as in any causal analysis, the method is predicated on the assumption that there are none.
To demonstrate, we propose the MSM
Define logit P = log(P/1 P). Here, the logit of the probability of anemia at delivery depends on through indicator variables corresponding to the treatment combination involved in
; (dose10, dose20, dose30) are indicators for the treatment categories (1, 2, 3) during time period 0, and (dose11, dose21, dose31) are defined similarly for time period 1. For simplicity, we consider a model that includes only main effects for treatments in each time period, though more complex models are possible. Thus, in model 1, ß10, ß11, etc., have a causal interpretation:
is the causal anemia odds ratio that would result if all women received 0.11.9 g of iron during time period 0 and 0 g during time period 1 relative to the reference group of all women receiving no iron during either period;
is the odds ratio that would result if the population of women received 0.10.9 g of iron during each period relative to the reference group; etc. While we cannot fit model 1 directly, we can fit
Models 1 and 2 are different; model 1 is a model for the populations probability of anemia if everyone received , relevant for causal inference, while model 2 describes the probability of anemia for those observed to have history
. Thus,
, etc., in general. From the paper by Robins et al. (1), if L0 and L1 contain all confounders associated with subsequent treatment, censoring, and potential response, we can unbiasedly estimate
in model 1 by estimating
in model 2 using the inverse probability weighting. The model is fitted to all women whose response Y is observed (with
= (0, 0) and complete data on
(n = 234)), where the weights are estimated on the basis of all women (n = 426) as discussed below. Because such weighting can lead to unstable fitting, a "stabilized" modification of the weights is recommended (1), the construction of which we illustrate below. (Doubly robust estimators weighted by the inverse probability of treatment have also been developed to significantly decrease the influence of extreme weights (18) but will not be presented here.)
Calculating weights
The stabilized weight for the ith woman with an observed response is formed by the product of two factors, one accounting for treatment history and the other for censoring. If the ith woman is observed to have A0 = a0i, A1 = a1i, L0 = l0i, and L1 = l1i, then her treatment history weight is
swi = {P(A0 = a0i|C0 = 0) x P(A1 = a1i|A0 = a0i, C1 = 0, C0 = 0)}/ {P(A0 = a0i|L0 = l0i, C0 = 0) x P(A1 = a1i|A0 = a0i, L0 = l0i, L1 = l1i, C1 = 0, C0 = 0)}.
Each component of model 3 represents a probability for observed data. For instance, P(A1 = a1i|A0 = a0i, L0 = l0i, L1 = l1i, C1 = 0, C0 = 0) is the probability of receiving a1i for subjects observed to receive a0i having measured confounders l0i and l1i who are not censored. The denominator is approximately the probability of having the subjects observed treatment and confounder history, given that she was not lost to follow-up. Her censoring history weight is
swi = {P(C0 = 0) x P(C1 = 0|A0 = a0i, C0 = 0}/{P(C0 = 0)|L0 = l0i) x P(C1 = 0|A0 = a0i, L0 = l0i, L1 = l1i, C0 = 0)}.
For example, P(C1 = 0|A0 = a0i, L0 = l0i, L1 = l1i, C0 = 0) is the probability of not being censored in time period 1 for subjects observed to receive a0i having measured confounders l0i and l1i who are not censored in time period 0. The denominator is roughly the probability of being uncensored with the given treatment/confounder history. The full stability weight is the product of the predicted probabilities that come from models 3 and 4: msmwti = swi x swi. The probabilities in the denominators must be positive for all possible values of A0, A1, L0, and L1; that is, it must be possible for a subject to receive all of the available treatments at each time point.
For calculation of msmwti, each probability in models 3 and 4 is modeled and fitted on the basis of the observed data from all women. These fits are then used to assign each uncensored woman a set of predicted probabilities, which may be substituted in models 3 and 4 to obtain her estimated swi and swi.
We estimated P(A0 = a0i|C0 = 0) in model 3 by the simple frequency for each possible a0i value, (0, 1, 2, 3), from women among the 426 who were not censored in the first time period; the predicted probability for each uncensored woman was the frequency corresponding to her observed value a0i. We modeled the remaining probabilities by ordinal logistic regression models:
;
;
,
where (dose10, dose20, dose30) have values corresponding to the value of a0 and where l0 and l1 are vectors of measured confounders. Models including interaction terms could also be used. For models 5 and 7, the fit was based on uncensored subjects (i.e., C0 = 0, C1 = 0). For model 6, the fit was based on all women not censored in the first time period (i.e., C0 = 0). For each woman, we then substituted her treatment and confounder values a0i, l0i, and l1i in the right-hand sides of models 57 and obtained her predicted probabilities by taking j = a1i in models 5 and 7 and k = a0i in model 6. From model 4, we estimated P(C0 = 0) by the simple frequency of women uncensored in the first period among the 426 and similarly obtained predicted probabilities to substitute in model 4 from fits of the following logistic regression models to the observed data on censoring:
logit P(C0 = 0|L0 = l0) = 0 +
1l0;
In hopes of satisfying the assumption of no unmeasured confounders, for models 510 we included the full list of potential confounders described above for l0 and l1; no attempt was made to simplify these models. Robins has argued that such a conservative approach may be preferable to the risk of mistakenly eliminating covariates that are true confounders (J. M. Robins, Harvard University, personal communication, 2002).
From these fits, we calculated for each woman the full stability weight msmwti, estimated the coefficients in model 1 by fitting model 2 using msmwti to weight each observation using SAS PROC GENMOD (19) and Stata (20) (see Appendix), and obtained confidence intervals using "robust" methods (1). The software treats the weights as fixed instead of estimated, which leads to conservative intervals guaranteed to have at least a 95 percent coverage probability.
For comparison with the inferences obtained from fitting the MSM in model 1, we fitted two additional models directly to the observed data on the 234 uncensored women. First, we fitted a model for the probability of anemia at delivery as a function of treatment historyfor example, a crude ordinary logistic regression model (i.e., model 2 without inverse weighting), where we would expect iron to appear harmful, since we have not properly adjusted for confounding. This would be the correct model if all women had been randomized to one of the 16 treatment regimes at baseline and had not deviated from their assigned treatment. Second, we fitted a model for the probability of anemia at delivery as a function of treatment and covariate historyfor example, an ordinary logistic regression model adjusting for confounder history by inclusion of all covariates in the model as regressors:
Although including L0 in this model would adequately control for confounding by pretreatment variables, addition of the L1 variables causes biased estimates (1). We fitted model 11 to show the effect that "traditional" (but incorrect) adjustment has on inference.
![]() |
RESULTS |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
|
|
An important consideration for MSMs is the effect of possibly "influential" individuals, since data from persons with very large msmwti may play a major role in dictating results. Because the weights involve probabilities, they are positive and thus are likely to have skewed distributions, so some fraction of influential subjects is expected. Mean values, median values, and standard deviations for estimated swi, swi, and msmwti were (4.75, 0.92, 21.08), (1.09, 0.97, 0.39), and (4.76, 0.93, 19.56), respectively, reflecting the skewness of swi and msmwti. However, it is important to examine the estimated msmwti to understand whether data from persons with potentially unusual observed covariate histories are driving results. For msmwti, the 95th percentile was 17.99 with a maximum of 225.58. We examined the influence of the 11 women with msmwti in this range in two ways: 1) by deleting these women and refitting models 510 and MSM model 1, which potentially involves selection bias, and 2) by maintaining these women but truncating msmwti to equal 20 when fitting the MSM. In both instances, qualitative results were similar to those shown in tables 1 and 2, indicating that, despite large weights, these women did not appear to drive conclusions. Covariate patterns for these women did not seem unusual relative to the rest of the sample.
We included all covariates in the fits of models 510, so inclusion of variables that are actually not confounders could have resulted in reduced estimation precision of these models and the MSM. To gain a sense of this, we refitted the MSM using only covariates with p < 0.30 in models 510; the results did not meaningfully change. Finally, we refitted the models using alternative specifications for confounders in models 510 (i.e., categorized variables that were previously continuous, etc.); the MSM results did not meaningfully change.
![]() |
DISCUSSION |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
MSMs are limited to the study of nondynamic treatment regimes (regimes that are "fixed in advance")for example, treatment with 30 mg of iron per day throughout pregnancy regardless of intervening events before delivery. Thus, MSMs inform us about the difference in the probability of anemia between two fixed regimes a health-care provider could specify at the first prenatal visit. Dynamic regimes, in contrast, allow treatment to change over time based on history (21)for example, treatment with 30 mg/day from the start of care to 2429 weeks, then measurement of hemoglobin concentration, and then an increase in the dose to 60 mg/day for the remainder of pregnancy if the hemoglobin level is less than 105 g/liter but maintenance at 30 mg/day otherwise. Dynamic regimes are more consistent with clinical practice, since providers rarely specify fixed regimes at the beginning of treatment. Modeling and causal inference for dynamic regimes can be accomplished with structural nested models or G-estimation (21).
MSMs and other models for causal inference from complex longitudinal data (1, 3, 22) are important, since costs and ethical concerns can rule out the conduct of randomized trials. If observational data with little missingness and sufficiently rich information on confounders for exposure and censoring can be collected, MSMs are an accessible tool for causal inference.
![]() |
ACKNOWLEDGMENTS |
---|
The authors thank the numerous persons who contributed thoughtful comments on early drafts of this paper, most notably Drs. Cande Ananth, Mary Cogswell, Irva Hertz-Piccioto, Jay Kaufman, and Charles Poole.
![]() |
APPENDIX |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
proc genmod descending data = work;
class id;
model anemia = dose0_1 dose0_2 dose0_3
dose1_1 dose1_2 dose1_3/ link = logit dist = bin;
weight msmwt;
repeated subject = id/type = ind;
run;
The Stata code (version 7.0 (20)) for obtaining estimators weighted by the inverse probability of treatment using stabilized weights is as follows:
logit anemia dose0_1 dose0_2 dose0_3 dose1_1 dose1_2 dose1_3 [pweight = msmwt], robust cluster(id)
![]() |
NOTES |
---|
![]() |
REFERENCES |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|