1 Division of Public Health Sciences, King's College London, Capital House, London SE1 3QD, England.
2 Department of Social Medicine, University of Bristol, Canynge Hall, Bristol BS8 2PR, England.
3 Department of Epidemiology, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD.
![]() |
ABSTRACT |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
cardiovascular diseases;; epidemiologic methods;; heart diseases;; mortality
Abbreviations: ARIC, Atherosclerosis Risk in Communities; BMI, body mass index; CHD, coronary heart disease; CI, confidence interval; DBP, diastolic blood pressure; HDL, high density lipoprotein; LDL, low density lipoprotein; SBP, systolic blood pressure
![]() |
INTRODUCTION |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
For any exposure, other risk factors may be both confounders and on the causal pathway. Standard statistical models for the analysis of cohort studies may be biased in this case (3). For example, a study examining quitting smoking and survival could not control adequately for systolic blood pressure (SBP) because the association between quitting and survival includes both any "direct" association and any association because quitting modifies SBP, which modifies survival. An unadjusted analysis would underestimate the benefits of quitting if people with high SBP were more likely to quit (e.g., because of health concerns), and controlling for SBP would estimate only the direct association.
G-Estimation has been proposed to estimate exposure effects allowing for time-varying confounders. A covariate is a time-varying confounder (as defined by Mark and Robins (3)) for the effect of exposure on outcome if 1) past covariate values predict current exposure, 2) past exposure predicts current covariate value, and 3) current covariate value predicts outcome. G-Estimation has been used to evaluate the association between quitting smoking and time to death or first coronary heart disease (CHD) (3
), isolated systolic hypertension and cardiovascular mortality (4
) and therapy and survival for human immunodeficiency virus-positive men (5
, 6
). To our knowledge, there have been no published comparisons of G-estimated effects of risk factors with those estimated using standard methods.
We use G-estimation to examine relations between potentially modifiable risk factors and both all-cause mortality and time to first CHD.
![]() |
MATERIALS AND METHODS |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
At each clinical examination, data collected included blood pressure, height and weight, high density lipoprotein (HDL) and low density lipoprotein (LDL) cholesterol, presence of diabetes (fasting blood glucose 126 mg/dl, nonfasting glucose
200 mg/dl, physician diagnosis of diabetes, or being on diabetes medication) and use of antihypertensive medications. Data on incident stroke and CHD were obtained by annual telephone contacts, systematic review of medical charts, investigation of out-of-hospital deaths, and the follow-up examinations. Stroke events were defined as definite or probable hospitalized ischemic stroke events, and incident CHD was defined as definite or probable hospitalized heart attack or coronary procedures according to ARIC Study criteria published previously (7
). Date and cause of death were identified for participants who died before December 31, 1996.
The time-varying exposures considered here were smoking, diabetes, HDL and LDL cholesterol (mmol/liter), SBP and diastolic blood pressure (DBP) (mmHg)), antihypertensive medication use, and body mass index (BMI) (kg/m2). Continuous exposures were categorized based on commonly used cutpoints. These were: HDL cholesterol, 0.91 and 1.55 mmol/liter (35 and 60 mg/dl); LDL cholesterol, 3.36 and 4.14 mmol/liter (130 and 160 mg/dl); SBP, 140 mmHg; DBP, 90 mmHg; and BMI, 20 and 30 kg/m2.
Relations between time-varying exposures were examined by using a logistic regression of each exposure on concurrent values of the other exposures, values of all exposures at the previous visit and at baseline (visit 1), and non-time-varying covariates. All data from visits 24 were used, so each person could contribute up to three observations to the model for an exposure j*:
![]() |
![]() |
Associations between time-varying exposures and outcome (death and incident CHD) were examined using Weibull survival analysis and G-estimation. Both Weibull and G-estimation analyses controlled for age, non-time-varying variables (sex, education, ethnicity, and prevalent CHD/stroke), and baseline (visit 1) values of all exposures. Lagged (previous) exposure predicted current exposure and therefore had to be included in the models, so all analyses are of survival from visit 2 onward. Data from the fourth visit were included only for subjects with follow-up after that visit. In models for all-cause mortality, baseline stroke and CHD were included as non-time-varying covariates, and occurrences of stroke or CHD between each visit were included as time-varying covariates. In models with CHD as an outcome, subjects with CHD before the second visit were excluded, and baseline stroke and occurrence of stroke between visits were included as non-time-varying and time-varying covariates, respectively.
Weibull survival analysis
The Weibull hazard function at time t is h(t) = t
-1, where
is referred to as the scale parameter, and
is referred to as the shape parameter. If the vector of covariates xi does not affect
, the Weibull regression model can be written as either the usual epidemiologic proportional hazards:
![]() |
![]() |
One Weibull model was fitted to examine the relations between all exposures and all-cause mortality using all data from visits 24 (with each person contributing up to three observations). Using proportional hazards, this model can be written:
![]() |
![]() |
G-Estimation of relations between outcome and time-varying exposures
The method of G-estimation has been described in detail elsewhere (3, 4
), so only brief details will be given here. For each subject, Ui is defined as the time to failure if the subject was unexposed throughout follow-up. This time is called the "counterfactual" failure time (3
) because it is unobservable for subjects who were exposed at any time.
The crucial assumption made in G-estimation is that of no unmeasured confounders, that is, that all variables influencing both exposure and survival have been included in the model. This implies that, conditional on measured history (past and present confounders and past exposure), present exposure is independent of Ui. An example of this assumption is that, conditional on past weight, smoking status, blood pressure, and cholesterol, a person's decision to quit smoking is independent of what his or her survival time would have been if he or she had never smoked. Exposure does not have to be independent of subjects' current life expectancy (smokers may choose to quit precisely because they recognize that smoking has already reduced their life expectancy). The assumption of no unmeasured confounders is made implicitly with any standard survival analysis of observational data, but is made explicit when fitting a nested structural model by G-estimation.
G-Estimation proceeds by assuming that exposure j* accelerates failure time by exp(-). If
was known, the counterfactual survival time Uij*,
could be derived from the observed data for subjects who experience an event by:
![]() |
G-Estimation uses the assumption of no unmeasured confounders to estimate the effect of exposure on survival by examining a range of values for and choosing the value
0 for which current exposure is independent of Ui. This was done by fitting a series of logistic regression models:
![]() |
![]() |
A separate G-estimation model was fitted for each exposure. G-Estimation controls for non-time-varying covariates, but can only be used to estimate the effect on survival of time-varying covariates. As described above, the Weibull shape parameter was used to express the G-estimated survival ratio as a hazard ratio for the exposure (4).
In previous applications of G-estimation, only binary exposures have been considered. We used G-estimation to analyze the effects of trichotomous exposures as follows. The middle category was chosen as the reference. One of the other two categories was selected, and the effect of the dichotomous exposure defined by that category and the middle category was estimated using G-estimation. This estimate was then included as a fixed value in G-estimation of the effect of the dichotomous exposure defined by the third category and the middle category. This procedure was iterated to convergence (defined here as a difference between successive estimates of less than 0.001).
Censoring
Two types of censoring occurred in this study. First, persons were censored by the planned end of study, so some persons experienced no events by end of follow-up. As described by Witteman et al. (4), the G-estimation procedure was modified to allow for this censoring by replacing Uij*,
in the logistic regression model by an indicator variable for whether the event would have been observed both if the persons had been exposed throughout follow-up and if they had been unexposed throughout follow-up (4
).
Second, censoring by competing risks occurred when subjects left the study early or, in models for CHD, died from other causes. In the models for SBP and DBP, persons were censored when they first reported use of antihypertensive medication. Following the approach outlined by Witteman et al. (4), we used logistic regression (with all data from visits 24) to model the probability of being censored at each time point and, hence, estimate the probability of being uncensored to the end of the study for each person. The inverse of this probability was used to weight the contributions of persons to both G-estimation and Weibull models. For example, suppose a smoker had a chance of 0.25 of being uncensored at the end of the study. The contribution of such a person to the models would be multiplied by four, representing the "total" of four smokers, three of whom were censored before the end of the study. This approach means that observations within the same person are no longer independent, so we used robust standard errors allowing for clustering within persons.
All analyses were conducted by using Stata (8). A Stata program, stgest, which performs G-estimation, is available from the authors.
![]() |
RESULTS |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
The distributions of exposures across the four visits for all 13,898 subjects are shown in table 1. The distributions changed with each visit: For example, the proportion of subjects with high DBP decreased from visit 1 to visit 4, whereas the proportion with high SBP increased. The patterns of blood pressure over time were similar, although the actual proportions were lower, when only those not taking antihypertensive medications (n = 9,754) were included.
|
For identification of time-varying confounders, the relations between each exposure and past and current values of all covariates were examined by using a logistic regression model for each exposure to which each person could contribute up to three observations. Table 2 lists the time-varying factors with strong evidence of a relation (p < 0.05) with each exposure. Each exposure was related to exposure at the previous visit. Previous and current exposures often had differing relations: For example, high SBP is related to low BMI at the previous visit, but to high BMI at the current visit.
|
|
|
High DBP and low BMI were associated with an approximately 40 percent reduction in survival time, diabetes and low HDL cholesterol with an approximately 30 percent reduction, and current smoking and high LDL cholesterol with an approximately 15 percent reduction. High BMI and use of antihypertensives were associated with increases in survival time of approximately 30 and 20 percent, respectively.
Table 4 shows the G-estimates of the hazard ratio for each exposure, for comparison with estimates from the Weibull model. The G-estimated hazard ratio for high SBP (hazard ratio = 1.79) was similar to the Weibull estimate (hazard ratio = 1.72). The G-estimated and Weibull estimated hazard ratios for DBP were similar, and both had wide confidence intervals. This may be because of the low proportion of subjects with high DBP and its high variability over time.
The G-estimated association between time-varying diabetes and mortality was stronger than the Weibull estimate and was closer to the estimated effect of baseline diabetes (table 3). G-Estimation showed slightly stronger effects of smoking and antihypertensive medication use than did the Weibull model. The G-estimated and Weibull estimated hazard ratios were similar for BMI and HDL cholesterol. For LDL cholesterol, G-estimation shows a less U-shaped relation with mortality than did the Weibull model.
Of the 13,898 ARIC participants, 13,100 had no history of CHD before visit 2. A total of 525 new CHD events (4 percent) occurred before the censoring date. Of these, 298 occurred between the second and third visits, 214 between the third and fourth, and 13 after the fourth visit. Among those with untreated blood pressure and no baseline history of CHD (9,381 subjects), there were 276 CHD events.
Table 5 shows the baseline (visit 1) values of all time-varying exposures related to risk of CHD using a multivariable Weibull model controlling for non-time-varying covariates. Baseline high SBP and DBP, diabetes, smoking, low BMI, low HDL cholesterol, and high LDL cholesterol were all associated with increased risk of CHD. Increasing age, male sex, and lower educational level were also associated with increased risk of CHD (data not shown).
|
|
![]() |
DISCUSSION |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Alternatively, we may estimate time-varying effects of exposure, usually by assuming that exposure remains constant between measurement occasions. Here, the time-varying association between smoking and mortality represents the relation between smoking at a given visit and mortality after that visit. If follow-up is fairly short (as it is here), this represents the instantaneous association between smoking and mortality.
Standard survival analysis of time-varying exposures can be biased because of interrelations between exposures over time. G-Estimation takes these interrelations into consideration, and this study has confirmed that G-estimated relations between cardiovascular risk factors and all-cause mortality and incident CHD may differ from standard survival analyses. Replication of these results in other datasets, with longer follow-up, is needed to assess the likely bias in using standard survival analyses to estimate time-varying effects. We used Weibull models to compare G-estimated results (in the accelerated failure time parameterization) with the more usual hazard ratios. Hazard ratios estimated by using corresponding Cox proportional hazards models were very similar to those from the Weibull model.
Both the Weibull model and G-estimation found a strong association between outcome and time-varying high SBP. Other studies have found a threshold effect (9) or a U- or J-shaped relation between blood pressure and adverse outcomes in the general population (10
) and in treated hypertensives (11
). Our analyses separate the effects of treated and untreated hypertension by censoring when antihypertensive medication was prescribed. This informative censoring (probability of censoring is related to blood pressure) is taken into account in both the G-estimation and the Weibull models. We also examined the effect of antihypertensive medication use: The G-estimated effect was stronger than that estimated by the Weibull model and showed that antihypertensive medication increased time to both death and first CHD by approximately 20 percent.
The hazard ratio for self-reported diabetes at baseline, which approximately doubles overall mortality (2), was found in our study to be 2.04. The G-estimated hazard ratio for time-varying diabetes was 1.62. Compared with this, the Weibull estimate for time-varying diabetes (hazard ratio = 1.26) is a substantial underestimate.
The associations between outcome and baseline smoking were much stronger than the time-varying associations, which were underestimated by the Weibull model. The G-estimated relations between outcome and time-varying smoking were small (hazard ratio = 1.24 for all-cause mortality and 1.36 for CHD); each additional year of smoking reduced life expectancy by just under 2 months. Quitting smoking has been shown to be associated with an approximately 50 percent increase in life expectancy (3). However, another study has reported similar hazard ratios for quitters and nonquitters (12
). Risk may decline gradually after quitting: We had insufficient visits to examine this, so we may be underestimating the benefits of quitting. Although G-estimation allows for the influence of several disease indicators (e.g., blood pressure, CHD, and occurrence of stroke) on the decision to quit smoking, other factors (e.g., diagnosis of cancer) that we have not included could influence this relation.
G-Estimation and Weibull analysis showed a higher risk of death for those with low BMI and no evidence of increased mortality among subjects with high BMI. A study of nearly 8,000 European men also found an increased risk of death with low BMI and some evidence of an increased risk for those with high BMI (mainly in never-smokers) (13). In contrast, two previous studies found a U-shaped effect of baseline BMI in men (14
, 15
). The validity of G-estimation depends on there being no unmeasured confounders. Confounders not included here, such as comorbid conditions, may influence the relation between BMI and mortality. Alternatively, BMI may have a cumulative effect, and so short-term changes in weight (assessed by these time-varying models) have a different relation to mortality than long-term weight. Change in weight (whether loss or gain) has been associated with increased mortality (16
).
G-Estimation showed approximately J-shaped and inverse J-shaped relations between all-cause mortality and LDL and HDL cholesterol, respectively. A U-shaped relation between serum total cholesterol and all-cause mortality has been shown in diabetics (17) and in middle-aged men (18
) (although mainly in men with at least one cardiovascular risk factor). The Framingham Study found an excess risk with low serum total cholesterol, and only slight increases of risk with high cholesterol (19
). Time-varying HDL and LDL cholesterol had linear relations with CHD, with the G-estimated relations stronger than the Weibull estimated relations.
The 95 percent confidence intervals for G-estimated effects were generally wider than were those for corresponding Weibull estimates. G-Estimation discards information when censoring by dichotomizing the outcome variable. The standard errors for the effects of trichotomous variables (HDL cholesterol, LDL cholesterol, and BMI) may be underestimated because we assume that the effect of the other category on survival is known (rather than estimated). Ideally, both parameters should be estimated simultaneously, and a 95 percent confidence region for their joint distribution should be calculated. The iterative G-estimation procedure we used converged quickly for all three trichotomous exposures, and estimates were similar to those obtained if each category was examined in a single G-estimation procedure.
Marginal structural models are alternatives to G-estimation for analyzing longitudinal data (20, 21
). In these models, each observation is weighted by the probability of exposure based on past history, and a model is then fitted and coefficients interpreted as in standard analysis (20
). G-Estimation has the advantage that only the relation between exposure and covariate history has to be modeled (20
).
These results highlight the importance of choosing statistical models based on the known epidemiology of the outcome. For example, the relation between smoking and lung cancer should be examined using baseline smoking because the effect is likely to be cumulative over a substantial period. Alternatively, if the outcome were CHD, effects of both baseline smoking (e.g., cumulative atheroma) and time-varying smoking (e.g., instantaneous hemostatic factors) might be examined. In our G-estimation models, we assumed that exposure had an immediate effect on outcome. Alternatives include examining a lagged effect of exposure or allowing the effect of exposure to decrease over time (6).
These results have implications for the analysis of observational cohort studies. Standard survival analyses may differ substantially from G-estimation, which accounts for time-varying confounding. This could explain previously observed discrepancies between results from observational studies and those from randomized trials, in which the effect of intervention will usually be an instantaneous rather than a cumulative (baseline) exposure effect.
![]() |
ACKNOWLEDGMENTS |
---|
The authors thank the staff and participants in the Atherosclerosis Risk in Communities Study for their important contributions. They would also like to thank Professor Jamie Robins of Harvard University, Professor George Davey Smith of the University of Bristol, and Professor Lloyd Chambless of the University of North Carolina for their helpful advice.
![]() |
NOTES |
---|
![]() |
REFERENCES |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|