1 Department of Epidemiology, Bloomberg School of Public Health, Johns Hopkins University, Baltimore, MD.
2 Department of Epidemiology, School of Public Health, Harvard University, Boston, MA.
3 Department of Biostatistics, School of Public Health, Harvard University, Boston, MA.
4 Lincoln Medical and Mental Health Center, New York, NY.
5 Department of Preventive Medicine, Feinberg School of Medicine, Northwestern University, Chicago, IL.
6 Department of Epidemiology, School of Public Health, University of California, Los Angeles, Los Angeles, CA.
7 Kenneth Norris Jr. Cancer Hospital, Los Angeles, CA.
8 Department of Preventive Medicine and Community Health, Health Science Center at Brooklyn, State University of New York, Brooklyn, Brooklyn, NY.
9 Departments of Medicine and Epidemiology, University of California, San Francisco, San Francisco, CA.
10 Department of Infectious Diseases and Microbiology, Graduate School of Public Health, University of Pittsburgh, Pittsburgh, PA.
11 Division of Infectious Diseases, Georgetown University Hospital, Washington, DC.
12 Cook County Hospital, Chicago, IL.
Received for publication February 6, 2002; accepted for publication March 10, 2003.
![]() |
ABSTRACT |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
acquired immunodeficiency syndrome; antiretroviral therapy, highly active; causality; confounding factors (epidemiology)
Abbreviations: Abbreviations: AIDS, acquired immunodeficiency syndrome; CI, confidence interval; HAART, highly active antiretroviral therapy; HIV, human immunodeficiency virus; HR, hazard ratio; NRTI, nucleoside reverse transcriptase inhibitor; PCP, Pneumocystis carinii pneumonia.
![]() |
INTRODUCTION |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
In observational studies, persons who initiate HAART are usually those with poorer values for prognostic biomarkers (i.e., confounding by indication) (4), such as low CD4 cell count and high plasma levels of HIV type 1 (HIV-1) RNA (5). Therefore, one needs to adjust for these time-varying confounders to estimate the effect of HAART on AIDS or death. However, including these time-varying confounders as covariates in standard survival models (e.g., Cox models) may yield an association measure (e.g., hazard ratio) that cannot be interpreted as the overall or net effect of HAART, because current CD4 cell count and HIV RNA level are themselves strongly influenced by past HAART exposure (68). In the absence of confounding by other unmeasured factors, the HAART effect in such a model may represent the direct effect of HAART not mediated through CD4 count and HIV RNA. Since it is likely that much of the effect of HAART on AIDS-free survival is mediated by its effect on CD4 count and HIV RNA level, one could expect that such an association measure for HAART would be an underestimate of the net effect of HAART. Below, we estimate the net effect of HAART on AIDS-free survival in prospective observational data using a marginal structural model, which appropriately adjusts for confounding by time-varying factors affected by treatment.
![]() |
MATERIALS AND METHODS |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Every 6 months, participants in both studies completed an extensive interviewer-administered questionnaire giving information on antiretroviral treatment and HIV-related symptoms and provided a blood sample for the determination of CD4 cell count and plasma HIV-1 RNA level. The definition of HAART followed the Department of Health and Human Services/Kaiser Panel guidelines (11). HAART was defined as 1) use of two or more nucleoside (or nucleotide) reverse transcriptase inhibitors (NRTIs) in combination with at least one protease inhibitor or one non-NRTI; 2) use of one NRTI in combination with at least one protease inhibitor and at least one non-NRTI; 3) a regimen containing ritonavir and saquinavir in combination with one NRTI and no non-NRTIs; or 4) an abacavir-containing regimen of three or more NRTIs in the absence of both protease inhibitors and non-NRTIs. Combinations of zidovudine and stavudine with either a protease inhibitor or a non-NRTI were not considered HAART. Therapy regimens not classified as HAART were categorized as either monotherapy or combination antiretroviral therapy. Once a participant reported initiation of HAART, he or she was assumed to have remained on HAART for the duration of follow-up. This simplifying assumption correctly classified 94 percent of the observed person-time. An indicator variable for Pneumocystis carinii pneumonia (PCP) prophylaxis was constructed using reports of trimethoprim, bactrim, aerosolized pentamidine, and dapsone use. T-cell subsets were determined by immunofluorescence using flow cytometry in laboratories participating in the National Institute of Allergy and Infectious Diseases quality assurance program. Baseline CD4 cell count was modeled in three categories: <200, 200350, and >350 cells/mm3. Time-varying CD4 cell count was modeled using a restricted cubic spline with four knots located at the 5th, 35th, 65th, and 95th percentiles.
HIV-1 RNA viral load was quantified using a reverse transcription polymerase chain reaction amplification technique (Roche Molecular Systems, Branchburg, New Jersey). Baseline RNA level was modeled in three categories: <401, 40110,000, and >10,000 copies/ml. Time-varying RNA level was modeled as an indicator of detection (the detection limit was 400 copies/ml) in concert with a restricted cubic spline with four knots located at the 5th, 35th, 65th, and 95th percentiles for the log10-transformed detected measurements (set to zero for undetected measurements). An indicator variable for the presence of any HIV-related symptoms was constructed using reports of persistent fever, diarrhea, night sweats, and weight loss. Longitudinal data were carried forward from the most recent observed value for the 10 percent of anticipated visits that were missed. Alternate analyses restricted to participants with complete data at baseline or multiply imputed missing baseline data yielded similar results (data not shown).
The outcomes of interest were first diagnosis of clinical AIDS or death from any cause. The 1993 Centers for Disease Control and Prevention clinical conditions criteria were used to define clinical AIDS (12). Therefore, participants with CD4 cell counts less than 200 cells/mm3 but no clinical conditions were not considered to have clinical AIDS. A description of outcome ascertainment has been published elsewhere (9, 13). Briefly, physician or hospital records were used to confirm reported cases of clinical AIDS in the cohort of men, while in the cohort of women, clinical AIDS was self-reported. Deaths were ascertained using death certificate abstractions upon notification and national death registry searches.
Each participant contributed a maximum of 13 person-visits of follow-up from the baseline visit (first visit after October 1995) to the last visit at which he or she was seen free of clinical AIDS and alive or the visit before April 2002, whichever came first. Follow-up of participants missing any time-varying characteristic at baseline started at the first subsequent visit at which values were observed.
Marginal structural model
We used a weighted pooled logistic regression model to approximate the parameters of a marginal structural Cox model, as described by Hernán et al. (14, 15). Pooled logistic regression approximates the Cox model well when the risk of events is less than 10 percent per person-time interval (16); herein, the maximum visit-specific risk of AIDS or death was 6 percent.
Time was measured in semiannual visits from the beginning of follow-up and took values (k) from zero (October 1995April 1996) to 12 (October 2001April 2002). The subscript i, denoting the subject, is often suppressed, because we assumed that the random vector of data for each subject was drawn independently from an identical distribution. Let D(k + 1) be an indicator of first diagnosis of clinical AIDS or death between visits k and k + 1. Let X(k) be a time-varying indicator of HAART initiation at or before visit k, with X(1) 0, since the study population was selected to not have HAART exposure prior to the first eligible visit. Let L(k) be a vector of time-varying covariates measured at visit k 1, so that L(k) is temporally prior to X(k), with L(0) being the vector of covariates measured at the visit preceding the study period (i.e., the "baseline" visit). For the present analyses, L(0) consisted of age, gender, race, calendar year at study entry, baseline use of (mono- and combination) antiretroviral therapy, and baseline CD4 and RNA categories; L(k) further consisted of CD4 count, RNA level, HIV symptoms, indicators of (mono- and combination) antiretroviral therapy and PCP prophylaxis, and number of days since the prior visit.
For persons who remained AIDS-free, alive, and under follow-up at visit k + 1, we fit the pooled logistic regression model
logit PR[D(k + 1) = 1|X(k), L(0)] = ß0(k) + ß1X(k) + ß'2L(0),
where ß0(k) is a visit-specific intercept (which we modeled as a restricted cubic spline with four knots at the 5th, 35th, 65th, and 95th percentiles for the number of days since the baseline visit). The contribution of participant i to the calculation at visit k is weighted by Wi(k), which is the product of the estimated inverse probability-of-treatment weight and the inverse probability-of-censoring weight, namely Wi(k) = WiX(k) ¥ WiC(k). In the absence of unmeasured confounding, unmeasured informative censoring, and model misspecification, exp(ß1) is a consistent and asymptotically normal estimator of the hazard ratio, which compares the hazard of AIDS or death had everyone initiated HAART at baseline with the hazard had no one initiated HAART during follow-up (8). Therefore, we compared continuous HAART exposure against the collective of no therapy, monotherapy, or combination therapy.
Informally, each participants inverse probability-of-treatment weight is the inverse of the probability of receiving the treatment history he or she did in fact receive by visit k. Specifically, where f[·] is by definition the conditional density function evaluated at the observed covariate values for a given subject and
is the history of time-varying covariates up to time j, including baseline covariates L(0). The approach using the inverse probability-of-treatment weight adjusts for confounding by the variables that are used to create the weights and can be viewed as a generalization of the Horvitz-Thompson estimator (17). Since the inverse probability-of-treatment weight and the inverse probability-of-censoring weight are unknown, we estimate them using the predicted values from pooled logistic models for the probabilities of initiating HAART and of censoring, respectively.
In the absence of unmeasured confounding, unmeasured informative censoring, and model misspecification, weighting creates a pseudo-population in which 1) the probabilities of treatment (i.e., HAART) and censoring are not a function of the time-varying covariates but 2) the effect of HAART on time to clinical AIDS or death is the same as in the original population. Thus, the inverse probability-of-treatment weight effectively removes any association between prior confounding variables and HAART but preserves the relation between HAART and clinical AIDS or death.
A fuller account of the covariate histories (i.e., including covariates measured at k 2 and k 3), a less restrictive functional form for age, and a broader set of covariates (e.g., white blood, red blood, platelet, CD3, and CD8 cell counts; body mass index (weight (kg)/height (m)2); an indicator of the last visit having been missed) did not appreciably alter our results. The Hosmer-Lemeshow goodness-of-fit 2 value for the final model for the denominator of the weights WiX(k) was 23 with 8 degrees of freedom.
To increase the efficiency of our estimator, we stabilized the weights (14, 15). Note that the marginal structural model includes as regressors the baseline variables (age, gender, race, baseline CD4 count, and RNA level) used to stabilize the weights. For computational details and an example of the SAS code, see Hernán et al. (14). Confidence intervals for the inverse probability-of-treatment weight estimators of the marginal structural model are based on robust variance estimates (18) and are conservative (wider than need be) (19, 20). To ensure that we were not being overly conservative in using the robust variance estimate, we compared the conservative confidence intervals with a simple percentile-based nonparametric bootstrap confidence interval calculated from 500 full samples (with replacement) from the observed data.
Note that since baseline covariates L(0) are included in the model, one can also include in the model interaction terms between time-dependent HAART and baseline covariates in order to estimate the hazard ratio at specific levels of the baseline covariates. Specifically, we report on how the effect of HAART is modified by gender and by baseline CD4 cell count categories. Since we are comparing the static regimens "treat always" and "treat never," baseline CD4 count is the CD4 count that subjects would have had at HAART initiation. To our knowledge, this is the first application of marginal structural models with planned exploration of effect modification by baseline covariates. The proportional hazards assumption was not rejected when we estimated the effect of HAART in subperiods (halves) of follow-up time (robust p = 0.58) (21).
We also estimated the joint effects of HAART and PCP prophylaxis on time to AIDS or death using a marginal structural model (15). Briefly, we restricted the analysis to the 1,016 (of 1,498) men and women who were naïve to HAART and had not been on PCP prophylaxis during the year prior to study initiation and then estimated inverse probability weights for HAART, PCP prophylaxis, and censoring. The final pooled logistic model was weighted by the product of all three weights. This model included baseline covariates, time-varying HAART and PCP prophylaxis, and their interaction. Using this model, we estimated a pair of hazard ratios for AIDS or death. The first hazard ratio was for the comparison of HAART with no HAART under continuous PCP prophylaxis, while the second was for the comparison of HAART with no HAART under no PCP prophylaxis. All analyses were conducted using SAS, version 8 (SAS Institute, Inc., Cary, North Carolina).
![]() |
RESULTS |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
|
|
|
|
![]() |
DISCUSSION |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Exposure to HAART extended time to clinical AIDS or death, with modification by baseline CD4 count: The effect of HAART on the hazard ratio scale was strongest among persons with baseline CD4 counts less than 200 cells/mm3. Our primary analysis may be thought of as an attempt to use observational data to simulate the results one would obtain in an intention-to-treat analysis of an unmasked randomized clinical trial. In the stratum where data overlap (i.e., baseline CD4 count less than 200 cells/mm3), our result is consistent with, albeit stronger than, that of the AIDS Clinical Trials Group 320 randomized trial (1). Our somewhat stronger result may be due to 1) our comparison groups being more heterogeneous (i.e., therapy-naïve, monotherapy and combination therapy) than that in the trial (i.e., combination therapy alone), 2) our duration of follow-up being considerably longer than the trials, 3) noncompliance with initial randomized assignment in the trial, 4) model misspecification or other uncontrolled sources of bias in our observational analysis, and/or 5) sampling variability. The secondary analysis of the joint effects of HAART and PCP prophylaxis initiation may be thought of as an observational analog to an intention-to-treat analysis of an unmasked 2 x 2 factorial randomized clinical trial.
Our result is consistent with the findings of Detels et al. (24). They used calendar period as an instrumental variable (25) for HAART exposure in a subset of Multicenter AIDS Cohort Study men for whom seroconversion dates were known (n = 536) and reported a hazard ratio for incident AIDS or death of 0.35 (95 percent CI: 0.20, 0.61) in a comparison of the time period following HAART introduction with the time period of monotherapy. Men initiating HAART in the analysis of Detels et al. were likely to have low CD4 counts, because the sickest individuals were treated with HAART during its inception.
We found no strong beneficial effect of HAART for persons with baseline CD4 counts greater than 350 cells/mm3. This result differs from the results of Jacobson et al. (26) among Multicenter AIDS Cohort Study participants who initiated HAART at CD4 counts greater than 350 cells/mm3. Jacobson et al. compared the static regimens "treat always" and "treat never" using historical controls. The present analysis compared the static regimens "treat always" and "treat never" using contemporaneous controls. Jacobson et al. demonstrated notable HIV disease progression in the stratum where baseline CD4 count was greater than 350 cells/mm3 using the historical comparison group. In contrast, in our analogous stratum, a large proportion of the contemporaneous controls did not demonstrate notable HIV disease progression. Thus, our results in the stratum where baseline CD4 count was greater than 350 cells/mm3 are probably approximately equal to those from a comparison of the static regimen "treat always" with the dynamic regimen "treat when CD4 count is less than 350." Evidence is accumulating on the finding that initiating HAART while the CD4 count is greater than 350 cells/mm3 may not confer additional protection relative to initiating HAART when the CD4 count reaches 350 cells/mm3 (2729).
Our hazard ratio estimate can be interpreted as the net effect of HAART only under the assumptions of no unmeasured confounding, no unmeasured informative censoring, and no model misspecification. The foremost assumption may hold approximately, because the most important clinical and laboratory data used by physicians as indications to initiate HAART were collected and used in models for the inverse probability-of-treatment weight (5). Neither the present analyses nor past analyses (14, 15) suggested that there was notable informative censoring in these data due to measured covariates. Regarding model misspecification, exploration of a broad class of functional forms and summary measures of covariate histories (as described in Materials and Methods) did not appreciably alter our results. However, our results may be sensitive to the relative infrequency of data collection (i.e., 6-month intervals). Misclassification due to this coarse measurement (with respect to time) could have reintroduced some confounding, which could bias the estimated hazard ratio in either direction (30). An explicit examination of the sensitivity of our findings to such coarse measurement is warranted. Exploration of the change in a biomarker (e.g., CD4 count) may provide a more sensitive test of the effect of HAART on HIV disease progression than the clinical endpoints used in this analysis (i.e., AIDS or death). This is the topic of ongoing research.
Inverse probability weighting estimation of marginal structural models is an alternative to g-estimation of nested structural models or the g-computation formula (6). The nonparametric g-formula requires low-dimension data and is therefore practical only in select applications. As with any statistical method, marginal structural models have limitations. First, methods based on inverse probability-of-treatment weights make an internal comparison and therefore are valid to the extent that the unexposed group reflects the potential outcomes of the exposed group had they not been exposed (31). While context-specific arguments can be made that external comparison groups may better reflect the potential outcomes of the exposed group, such external comparisons are subject to a similar comparability assumption. Second, marginal structural models, unlike nested structural models, cannot be applied to scenarios where there is a structural probability of 0 or 1 for treatment at a certain level of the covariates. Third, in assessment of the effect of a dynamic treatment regimen (i.e., when interest lies in describing how a time-varying treatment interacts with a time-varying covariate), marginal structural models are less useful than nested structural models (6). Our analysis concentrated on the regimens "treat always" and "treat never." Therefore, our analysis does not directly answer the question of when, with respect to the evolution of CD4 cell count, to initiate HAART. To answer such a question with randomized data, one would conduct a "deferment" trial, wherein, for example, patients with CD4 counts between 200 cells/mm3 and 350 cells/mm3 are randomized to immediate treatment with HAART or HAART treatment deferred until the CD4 count crosses below 200 cells/mm3. In future work using nested structural models and these observational data, we will attempt to answer such questions. We expect that more marked differences between structural and standard methods will be found as an increasing number of epidemiologists become familiar with these novel and appealing quantitative methods.
![]() |
ACKNOWLEDGMENTS |
---|
Dr. Miguel Hernán was supported by National Institutes of Health grant K08-AI-49392, and Dr. James Robins was supported by National Institutes of Health grant R01-AI-32475.
Data were collected by the Multicenter AIDS Cohort Study Investigators and the Womens Interagency HIV Study Collaborative Study Group. Study centers/groups (and Principal Investigators) are as follows: Multicenter AIDS Cohort StudyJohns Hopkins Bloomberg School of Public Health (Drs. Joseph B. Margolick and Alvaro Muñoz), Baltimore, Maryland; Howard Brown Health Center and Northwestern University Medical School (Dr. John Phair), Chicago, Illinois; University of California, Los Angeles (Drs. Roger Detels and Beth Jamieson), Los Angeles, California; and University of Pittsburgh (Dr. Charles Rinaldo), Pittsburgh, Pennsylvania; Womens Interagency HIV StudyNew York City/Bronx Consortium (Dr. Kathryn Anastos); Brooklyn, New York (Dr. Howard Minkoff); Washington, DC, Metropolitan Consortium (Dr. Mary Young); Connie Wofsy Study Consortium of Northern California (Drs. Ruth Greenblatt and Phyllis Tien); Los Angeles County/Southern California Consortium (Dr. Alexandra Levine); Chicago Consortium (Dr. Mardge Cohen); and Data Coordinating Center (Dr. Alvaro Muñoz).
![]() |
NOTES |
---|
![]() |
REFERENCES |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|