Estimating the Effect of Cardiovascular Risk Factors on All-Cause Mortality and Incidence of Coronary Heart Disease Using G-Estimation

The Atherosclerosis Risk in Communities Study

Kate Tilling1, Jonathan A. C. Sterne2 and Moyses Szklo3

1 Division of Public Health Sciences, King's College London, Capital House, London SE1 3QD, England.
2 Department of Social Medicine, University of Bristol, Canynge Hall, Bristol BS8 2PR, England.
3 Department of Epidemiology, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD.


    ABSTRACT
 TOP
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS
 DISCUSSION
 REFERENCES
 
Standard methods for analysis of cohort studies may give biased estimates of exposure effects in the presence of time-varying confounding. Such effects may instead be estimated by using G-estimation. This study aimed to examine the relations between important cardiovascular risk factors and all-cause mortality and risk of coronary heart disease (CHD), accounting for confounding between exposures over time using G-estimation. Results were compared with those from standard survival analyses (e.g., Weibull regression) with time-updated covariates. The dataset consisted of all participants in the Atherosclerosis Risk in Communities cohort study who had complete data on the first two of four visits, giving a sample of 13,898 people at baseline. Death and occurrence of CHD or stroke were recorded. G-estimated associations between several risk factors and mortality/CHD incidence differed from those estimated using standard survival analysis. The associations between mortality/CHD incidence and smoking, presence of diabetes, and use of antihypertensives were stronger than the standard survival estimates, while the G-estimated effect of low density lipoprotein and high density lipoprotein cholesterol on CHD incidence were more linear than the standard estimate. Complex relations between exposures over time may lead to biased exposure effect estimates in standard survival analyses. G-Estimation can be used to overcome such biases, and thus may have important implications for the analysis of observational studies.

cardiovascular diseases;; epidemiologic methods;; heart diseases;; mortality

Abbreviations: ARIC, Atherosclerosis Risk in Communities; BMI, body mass index; CHD, coronary heart disease; CI, confidence interval; DBP, diastolic blood pressure; HDL, high density lipoprotein; LDL, low density lipoprotein; SBP, systolic blood pressure


    INTRODUCTION
 TOP
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS
 DISCUSSION
 REFERENCES
 
Cardiovascular diseases are leading causes of morbidity and mortality in the United States (1Go). Major, potentially modifiable, risk factors for cardiovascular disease are well established and include smoking and high blood pressure (2Go). Many of these risk factors are also associated with increased all-cause mortality (2Go).

For any exposure, other risk factors may be both confounders and on the causal pathway. Standard statistical models for the analysis of cohort studies may be biased in this case (3Go). For example, a study examining quitting smoking and survival could not control adequately for systolic blood pressure (SBP) because the association between quitting and survival includes both any "direct" association and any association because quitting modifies SBP, which modifies survival. An unadjusted analysis would underestimate the benefits of quitting if people with high SBP were more likely to quit (e.g., because of health concerns), and controlling for SBP would estimate only the direct association.

G-Estimation has been proposed to estimate exposure effects allowing for time-varying confounders. A covariate is a time-varying confounder (as defined by Mark and Robins (3Go)) for the effect of exposure on outcome if 1) past covariate values predict current exposure, 2) past exposure predicts current covariate value, and 3) current covariate value predicts outcome. G-Estimation has been used to evaluate the association between quitting smoking and time to death or first coronary heart disease (CHD) (3Go), isolated systolic hypertension and cardiovascular mortality (4Go) and therapy and survival for human immunodeficiency virus-positive men (5Go, 6Go). To our knowledge, there have been no published comparisons of G-estimated effects of risk factors with those estimated using standard methods.

We use G-estimation to examine relations between potentially modifiable risk factors and both all-cause mortality and time to first CHD.


    MATERIALS AND METHODS
 TOP
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS
 DISCUSSION
 REFERENCES
 
A subset of the Atherosclerosis Risk in Communities (ARIC) Study, described elsewhere, was included (7Go). Briefly, 15,792 members of four US communities had a baseline examination in 1987–1989, followed by three examinations at three-yearly intervals. Data collected at the baseline visit included past and current health problems, smoking history, education level, age, sex, and ethnic group. A self-reported history of physician-diagnosed stroke and heart attack defined baseline stroke and baseline CHD, respectively.

At each clinical examination, data collected included blood pressure, height and weight, high density lipoprotein (HDL) and low density lipoprotein (LDL) cholesterol, presence of diabetes (fasting blood glucose >=126 mg/dl, nonfasting glucose >=200 mg/dl, physician diagnosis of diabetes, or being on diabetes medication) and use of antihypertensive medications. Data on incident stroke and CHD were obtained by annual telephone contacts, systematic review of medical charts, investigation of out-of-hospital deaths, and the follow-up examinations. Stroke events were defined as definite or probable hospitalized ischemic stroke events, and incident CHD was defined as definite or probable hospitalized heart attack or coronary procedures according to ARIC Study criteria published previously (7Go). Date and cause of death were identified for participants who died before December 31, 1996.

The time-varying exposures considered here were smoking, diabetes, HDL and LDL cholesterol (mmol/liter), SBP and diastolic blood pressure (DBP) (mmHg)), antihypertensive medication use, and body mass index (BMI) (kg/m2). Continuous exposures were categorized based on commonly used cutpoints. These were: HDL cholesterol, 0.91 and 1.55 mmol/liter (35 and 60 mg/dl); LDL cholesterol, 3.36 and 4.14 mmol/liter (130 and 160 mg/dl); SBP, 140 mmHg; DBP, 90 mmHg; and BMI, 20 and 30 kg/m2.

Relations between time-varying exposures were examined by using a logistic regression of each exposure on concurrent values of the other exposures, values of all exposures at the previous visit and at baseline (visit 1), and non-time-varying covariates. All data from visits 2–4 were used, so each person could contribute up to three observations to the model for an exposure j*:


where eij,v is the indicator variable for exposure j for individual i at visit v, eij,v-1 equals exposure j for individual i at the previous visit, eij,1 equals exposure j for individual i at visit 1, and xikis the non-time-varying covariate k for individual i.

Associations between time-varying exposures and outcome (death and incident CHD) were examined using Weibull survival analysis and G-estimation. Both Weibull and G-estimation analyses controlled for age, non-time-varying variables (sex, education, ethnicity, and prevalent CHD/stroke), and baseline (visit 1) values of all exposures. Lagged (previous) exposure predicted current exposure and therefore had to be included in the models, so all analyses are of survival from visit 2 onward. Data from the fourth visit were included only for subjects with follow-up after that visit. In models for all-cause mortality, baseline stroke and CHD were included as non-time-varying covariates, and occurrences of stroke or CHD between each visit were included as time-varying covariates. In models with CHD as an outcome, subjects with CHD before the second visit were excluded, and baseline stroke and occurrence of stroke between visits were included as non-time-varying and time-varying covariates, respectively.

Weibull survival analysis
The Weibull hazard function at time t is h(t) = {phi}{gamma}t{gamma}-1, where {phi} is referred to as the scale parameter, and {gamma} is referred to as the shape parameter. If the vector of covariates xi does not affect {gamma}, the Weibull regression model can be written as either the usual epidemiologic proportional hazards:

or accelerated failure time, using the expected failure time:

where e has an extreme value distribution with scale parameter 1/{gamma}. The Weibull shape parameter {gamma} can be used to express results from the accelerated failure time parameterization as proportional hazards: {theta} = -ß/{gamma}.

One Weibull model was fitted to examine the relations between all exposures and all-cause mortality using all data from visits 2–4 (with each person contributing up to three observations). Using proportional hazards, this model can be written:


where eij,v and xik are as defined previously, and tv is the time from visit v to either an event or the next visit. A similar model was fitted for the effect of exposures on time to first CHD. Separate models were fitted for the effects of SBP and DBP on survival and time to first CHD (see Censoring).

G-Estimation of relations between outcome and time-varying exposures
The method of G-estimation has been described in detail elsewhere (3Go, 4Go), so only brief details will be given here. For each subject, Ui is defined as the time to failure if the subject was unexposed throughout follow-up. This time is called the "counterfactual" failure time (3Go) because it is unobservable for subjects who were exposed at any time.

The crucial assumption made in G-estimation is that of no unmeasured confounders, that is, that all variables influencing both exposure and survival have been included in the model. This implies that, conditional on measured history (past and present confounders and past exposure), present exposure is independent of Ui. An example of this assumption is that, conditional on past weight, smoking status, blood pressure, and cholesterol, a person's decision to quit smoking is independent of what his or her survival time would have been if he or she had never smoked. Exposure does not have to be independent of subjects' current life expectancy (smokers may choose to quit precisely because they recognize that smoking has already reduced their life expectancy). The assumption of no unmeasured confounders is made implicitly with any standard survival analysis of observational data, but is made explicit when fitting a nested structural model by G-estimation.

G-Estimation proceeds by assuming that exposure j* accelerates failure time by exp(-{Psi}). If {Psi} was known, the counterfactual survival time Uij*,{Psi} could be derived from the observed data for subjects who experience an event by:

tv is the time from visit v to either the event or the next visit.

G-Estimation uses the assumption of no unmeasured confounders to estimate the effect of exposure on survival by examining a range of values for {Psi} and choosing the value {Psi}0 for which current exposure is independent of Ui. This was done by fitting a series of logistic regression models:


for different values of {Psi}, where eij,v and xik are as defined previously. All data from visits 2–4 were used, so each person contributed up to three observations. The G-estimate {Psi}0 is the value of {Psi} for which the Wald statistic of µ in this logistic regression was zero (p = 1, i.e., no association between current exposure and Uij*,{Psi}0. The ratio of the survival time of a continuously exposed person to that if they had not been exposed is estimated by exp(-{Psi}0). We have referred to this multiplicative effect as the "G-estimated survival ratio." The upper and lower limits of the 95 percent confidence interval for {Psi}0 are values of {Psi} for which the two-sided p value for the Wald statistic of µ in this logistic regression was 0.05.

A separate G-estimation model was fitted for each exposure. G-Estimation controls for non-time-varying covariates, but can only be used to estimate the effect on survival of time-varying covariates. As described above, the Weibull shape parameter was used to express the G-estimated survival ratio as a hazard ratio for the exposure (4Go).

In previous applications of G-estimation, only binary exposures have been considered. We used G-estimation to analyze the effects of trichotomous exposures as follows. The middle category was chosen as the reference. One of the other two categories was selected, and the effect of the dichotomous exposure defined by that category and the middle category was estimated using G-estimation. This estimate was then included as a fixed value in G-estimation of the effect of the dichotomous exposure defined by the third category and the middle category. This procedure was iterated to convergence (defined here as a difference between successive estimates of less than 0.001).

Censoring
Two types of censoring occurred in this study. First, persons were censored by the planned end of study, so some persons experienced no events by end of follow-up. As described by Witteman et al. (4Go), the G-estimation procedure was modified to allow for this censoring by replacing Uij*,{Psi} in the logistic regression model by an indicator variable for whether the event would have been observed both if the persons had been exposed throughout follow-up and if they had been unexposed throughout follow-up (4Go).

Second, censoring by competing risks occurred when subjects left the study early or, in models for CHD, died from other causes. In the models for SBP and DBP, persons were censored when they first reported use of antihypertensive medication. Following the approach outlined by Witteman et al. (4Go), we used logistic regression (with all data from visits 2–4) to model the probability of being censored at each time point and, hence, estimate the probability of being uncensored to the end of the study for each person. The inverse of this probability was used to weight the contributions of persons to both G-estimation and Weibull models. For example, suppose a smoker had a chance of 0.25 of being uncensored at the end of the study. The contribution of such a person to the models would be multiplied by four, representing the "total" of four smokers, three of whom were censored before the end of the study. This approach means that observations within the same person are no longer independent, so we used robust standard errors allowing for clustering within persons.

All analyses were conducted by using Stata (8Go). A Stata program, stgest, which performs G-estimation, is available from the authors.


    RESULTS
 TOP
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS
 DISCUSSION
 REFERENCES
 
All ARIC participants who had data from the baseline and at least one further examination were included. This gave a sample of 13,898 persons, of whom 7,699 (55.4 percent) were female. The average age at baseline was 54 years (5.7 standard deviation), and 10,501 (75.6 percent) participants were classified as White. CHD was present at baseline in 625 (4.5 percent), and stroke was present in 207 (1.5 percent) people. Of the 13,898 subjects, 9,754 (70.2 percent) did not receive antihypertensive medication at baseline or visit 2. These form the sample for the examination of blood pressure as an exposure.

The distributions of exposures across the four visits for all 13,898 subjects are shown in table 1. The distributions changed with each visit: For example, the proportion of subjects with high DBP decreased from visit 1 to visit 4, whereas the proportion with high SBP increased. The patterns of blood pressure over time were similar, although the actual proportions were lower, when only those not taking antihypertensive medications (n = 9,754) were included.


View this table:
[in this window]
[in a new window]
 
TABLE 1. Distribution of selected cardiovascular risk factors across baseline and follow-up visits for Atherosclerosis Risk in Communities participants with data from at least the first two visits (1987–1989 and 1990–1993)

 
Table 1 also shows the proportion of those exposed at baseline who had continuous exposure throughout their time in the study. The most constant exposures were diabetes and current smoking, both of which remained constant for more than 90 percent of the subjects. The 12 percent with nonconstant diabetes exposure probably reflects random variability in the measurement of blood glucose. The least constant exposures were LDL and HDL cholesterol. This could be partly because of measurement error, intraperson variation (e.g., because of diet, sleep, or time of day), or use of cholesterol-lowering medication.

For identification of time-varying confounders, the relations between each exposure and past and current values of all covariates were examined by using a logistic regression model for each exposure to which each person could contribute up to three observations. Table 2 lists the time-varying factors with strong evidence of a relation (p < 0.05) with each exposure. Each exposure was related to exposure at the previous visit. Previous and current exposures often had differing relations: For example, high SBP is related to low BMI at the previous visit, but to high BMI at the current visit.


View this table:
[in this window]
[in a new window]
 
TABLE 2. Relations between selected cardiovascular risk factors over time, for Atherosclerosis Risk in Communities participants with data from at least the first two visits (1987–1989 and 1990–1993)

 
There were 717 deaths (5 percent) before the censoring date. Of these, 439 occurred between the second and third visits, 273 between the third and fourth, and five after the fourth visit. Among those with untreated blood pressure (9,754 subjects), there were 371 deaths. Table 3 shows the hazard ratios for baseline (visit 1) values of all time-varying exposures from a multivariable Weibull model controlling for non-time-varying covariates (sex, education level, race, prevalent stroke, and CHD). Among the time-varying exposures, all baseline measurements were strongly associated with survival except for baseline LDL cholesterol and use of antihypertensives. All of the non-time-varying covariates and age were associated with survival.


View this table:
[in this window]
[in a new window]
 
TABLE 3. Relation between variables measured at baseline and survival for Atherosclerosis Risk in Communities participants with data from at least the first two visits (1987–1989 and 1990–1993), from a multivariable Weibull model including baseline (visit 1) values of all time-varying variables and all nontime-varying variables

 
Table 4 shows the association between time-varying exposures and survival, controlling for baseline (visit 1) values of all time-varying exposures and non-time-varying covariates using a multivariable Weibull model, as well as results from G-estimation. Using the Weibull model, high SBP and DBP, presence of diabetes, current smoking, low BMI, low HDL cholesterol, and high LDL cholesterol were associated with decreased survival. High BMI and use of antihypertensives were associated with increased survival. The shape parameter for this Weibull model was 1.26 (95 percent confidence interval (CI): 1.17, 1.36). The shape parameter for the separate model used for SBP and DBP was 1.20 (95 percent CI: 1.09, 1.33).


View this table:
[in this window]
[in a new window]
 
TABLE 4. The Weibull survival analysis and G-estimated relations between time-varying cardiovascular risk factors and survival for Atherosclerosis Risk in Communities participants with data from at least the first two visits (1987–1989 and 1990–1993)

 
The G-estimated survival ratio for the relation between each exposure and survival time is shown in table 4. For high SBP, this was 0.62 (95 percent CI: 0.51, 0.77). Thus, survival time for someone who had high SBP continuously from visit 2 onward would be 0.62 of their survival time had their SBP never been high. Equivalently, having high blood pressure for 1 year lowers life expectancy by approximately 4.5 months.

High DBP and low BMI were associated with an approximately 40 percent reduction in survival time, diabetes and low HDL cholesterol with an approximately 30 percent reduction, and current smoking and high LDL cholesterol with an approximately 15 percent reduction. High BMI and use of antihypertensives were associated with increases in survival time of approximately 30 and 20 percent, respectively.

Table 4 shows the G-estimates of the hazard ratio for each exposure, for comparison with estimates from the Weibull model. The G-estimated hazard ratio for high SBP (hazard ratio = 1.79) was similar to the Weibull estimate (hazard ratio = 1.72). The G-estimated and Weibull estimated hazard ratios for DBP were similar, and both had wide confidence intervals. This may be because of the low proportion of subjects with high DBP and its high variability over time.

The G-estimated association between time-varying diabetes and mortality was stronger than the Weibull estimate and was closer to the estimated effect of baseline diabetes (table 3). G-Estimation showed slightly stronger effects of smoking and antihypertensive medication use than did the Weibull model. The G-estimated and Weibull estimated hazard ratios were similar for BMI and HDL cholesterol. For LDL cholesterol, G-estimation shows a less U-shaped relation with mortality than did the Weibull model.

Of the 13,898 ARIC participants, 13,100 had no history of CHD before visit 2. A total of 525 new CHD events (4 percent) occurred before the censoring date. Of these, 298 occurred between the second and third visits, 214 between the third and fourth, and 13 after the fourth visit. Among those with untreated blood pressure and no baseline history of CHD (9,381 subjects), there were 276 CHD events.

Table 5 shows the baseline (visit 1) values of all time-varying exposures related to risk of CHD using a multivariable Weibull model controlling for non-time-varying covariates. Baseline high SBP and DBP, diabetes, smoking, low BMI, low HDL cholesterol, and high LDL cholesterol were all associated with increased risk of CHD. Increasing age, male sex, and lower educational level were also associated with increased risk of CHD (data not shown).


View this table:
[in this window]
[in a new window]
 
TABLE 5. Relation between variables measured at baseline and time to first coronary heart disease for Atherosclerosis Risk in Communities participants with data from at least the first two visits (1987–1989 and 1990–1993) from a multivariable Weibull model including baseline (visit 1) values of all time-varying variables and all non-time-varying variables

 
Table 6 shows the Weibull and G-estimated relations between time-varying exposures and incidence of CHD. The strong association between high SBP and incidence of CHD was slightly underestimated by the Weibull model (hazard ratio = 2.21) compared with the G-estimate (hazard ratio = 2.33). The relations between incidence of CHD and diabetes, current smoking, and use of antihypertensives also appeared to be underestimated by the Weibull model. For HDL cholesterol (inverse association with CHD) and LDL cholesterol (positive association with CHD), both models showed a linear relation that appeared stronger using G-estimation. There was little evidence of an effect of DBP or BMI. The shape parameter for this Weibull model was 1.03 (95 percent CI: 0.95, 1.13). The shape parameter for the separate model used for SBP and DBP was 1.07 (95 percent CI: 0.96, 1.20).


View this table:
[in this window]
[in a new window]
 
TABLE 6. Weibull survival analysis and G-estimated relations between time-varying cardiovascular risk factors and time to first coronary heart disease for Atherosclerosis Risk in Communities participants with data from at least the first two visits (1987–1989 and 1990–1993)

 

    DISCUSSION
 TOP
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS
 DISCUSSION
 REFERENCES
 
In longitudinal studies, the effects of exposures on outcome may be estimated in different ways, with different interpretations. One approach is to relate baseline exposure to outcome. For reasonably constant exposures, this estimates the cumulative effects of exposure. Here, the association between baseline diabetes and mortality represents the association of average lifetime exposure to diabetes (for this cohort) with mortality.

Alternatively, we may estimate time-varying effects of exposure, usually by assuming that exposure remains constant between measurement occasions. Here, the time-varying association between smoking and mortality represents the relation between smoking at a given visit and mortality after that visit. If follow-up is fairly short (as it is here), this represents the instantaneous association between smoking and mortality.

Standard survival analysis of time-varying exposures can be biased because of interrelations between exposures over time. G-Estimation takes these interrelations into consideration, and this study has confirmed that G-estimated relations between cardiovascular risk factors and all-cause mortality and incident CHD may differ from standard survival analyses. Replication of these results in other datasets, with longer follow-up, is needed to assess the likely bias in using standard survival analyses to estimate time-varying effects. We used Weibull models to compare G-estimated results (in the accelerated failure time parameterization) with the more usual hazard ratios. Hazard ratios estimated by using corresponding Cox proportional hazards models were very similar to those from the Weibull model.

Both the Weibull model and G-estimation found a strong association between outcome and time-varying high SBP. Other studies have found a threshold effect (9Go) or a U- or J-shaped relation between blood pressure and adverse outcomes in the general population (10Go) and in treated hypertensives (11Go). Our analyses separate the effects of treated and untreated hypertension by censoring when antihypertensive medication was prescribed. This informative censoring (probability of censoring is related to blood pressure) is taken into account in both the G-estimation and the Weibull models. We also examined the effect of antihypertensive medication use: The G-estimated effect was stronger than that estimated by the Weibull model and showed that antihypertensive medication increased time to both death and first CHD by approximately 20 percent.

The hazard ratio for self-reported diabetes at baseline, which approximately doubles overall mortality (2Go), was found in our study to be 2.04. The G-estimated hazard ratio for time-varying diabetes was 1.62. Compared with this, the Weibull estimate for time-varying diabetes (hazard ratio = 1.26) is a substantial underestimate.

The associations between outcome and baseline smoking were much stronger than the time-varying associations, which were underestimated by the Weibull model. The G-estimated relations between outcome and time-varying smoking were small (hazard ratio = 1.24 for all-cause mortality and 1.36 for CHD); each additional year of smoking reduced life expectancy by just under 2 months. Quitting smoking has been shown to be associated with an approximately 50 percent increase in life expectancy (3Go). However, another study has reported similar hazard ratios for quitters and nonquitters (12Go). Risk may decline gradually after quitting: We had insufficient visits to examine this, so we may be underestimating the benefits of quitting. Although G-estimation allows for the influence of several disease indicators (e.g., blood pressure, CHD, and occurrence of stroke) on the decision to quit smoking, other factors (e.g., diagnosis of cancer) that we have not included could influence this relation.

G-Estimation and Weibull analysis showed a higher risk of death for those with low BMI and no evidence of increased mortality among subjects with high BMI. A study of nearly 8,000 European men also found an increased risk of death with low BMI and some evidence of an increased risk for those with high BMI (mainly in never-smokers) (13Go). In contrast, two previous studies found a U-shaped effect of baseline BMI in men (14Go, 15Go). The validity of G-estimation depends on there being no unmeasured confounders. Confounders not included here, such as comorbid conditions, may influence the relation between BMI and mortality. Alternatively, BMI may have a cumulative effect, and so short-term changes in weight (assessed by these time-varying models) have a different relation to mortality than long-term weight. Change in weight (whether loss or gain) has been associated with increased mortality (16Go).

G-Estimation showed approximately J-shaped and inverse J-shaped relations between all-cause mortality and LDL and HDL cholesterol, respectively. A U-shaped relation between serum total cholesterol and all-cause mortality has been shown in diabetics (17Go) and in middle-aged men (18Go) (although mainly in men with at least one cardiovascular risk factor). The Framingham Study found an excess risk with low serum total cholesterol, and only slight increases of risk with high cholesterol (19Go). Time-varying HDL and LDL cholesterol had linear relations with CHD, with the G-estimated relations stronger than the Weibull estimated relations.

The 95 percent confidence intervals for G-estimated effects were generally wider than were those for corresponding Weibull estimates. G-Estimation discards information when censoring by dichotomizing the outcome variable. The standard errors for the effects of trichotomous variables (HDL cholesterol, LDL cholesterol, and BMI) may be underestimated because we assume that the effect of the other category on survival is known (rather than estimated). Ideally, both parameters should be estimated simultaneously, and a 95 percent confidence region for their joint distribution should be calculated. The iterative G-estimation procedure we used converged quickly for all three trichotomous exposures, and estimates were similar to those obtained if each category was examined in a single G-estimation procedure.

Marginal structural models are alternatives to G-estimation for analyzing longitudinal data (20Go, 21Go). In these models, each observation is weighted by the probability of exposure based on past history, and a model is then fitted and coefficients interpreted as in standard analysis (20Go). G-Estimation has the advantage that only the relation between exposure and covariate history has to be modeled (20Go).

These results highlight the importance of choosing statistical models based on the known epidemiology of the outcome. For example, the relation between smoking and lung cancer should be examined using baseline smoking because the effect is likely to be cumulative over a substantial period. Alternatively, if the outcome were CHD, effects of both baseline smoking (e.g., cumulative atheroma) and time-varying smoking (e.g., instantaneous hemostatic factors) might be examined. In our G-estimation models, we assumed that exposure had an immediate effect on outcome. Alternatives include examining a lagged effect of exposure or allowing the effect of exposure to decrease over time (6Go).

These results have implications for the analysis of observational cohort studies. Standard survival analyses may differ substantially from G-estimation, which accounts for time-varying confounding. This could explain previously observed discrepancies between results from observational studies and those from randomized trials, in which the effect of intervention will usually be an instantaneous rather than a cumulative (baseline) exposure effect.


    ACKNOWLEDGMENTS
 
Dr. Kate Tilling was funded by the Medical Research Council (United Kingdom).

The authors thank the staff and participants in the Atherosclerosis Risk in Communities Study for their important contributions. They would also like to thank Professor Jamie Robins of Harvard University, Professor George Davey Smith of the University of Bristol, and Professor Lloyd Chambless of the University of North Carolina for their helpful advice.


    NOTES
 
Correspondence to Dr. Kate Tilling, Division of Public Health Sciences, Capital House, 42 Weston Street, London SE1 3QD, England (e-mail: kate.tilling{at}kcl.ac.uk).


    REFERENCES
 TOP
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS
 DISCUSSION
 REFERENCES
 

  1. American Heart Association. 2000 heart and stroke statistical update. Dallas, TX: American Heart Association, 1999.
  2. Yusuf HR, Giles WH, Croft JB, et al. Impact of multiple risk factor profiles on determining cardiovascular disease risk. Prev Med 1998;27:1–9.[ISI][Medline]
  3. Mark SD, Robins JM. Estimating the causal effect of smoking cessation in the presence of confounding factors using a rank preserving structural failure time model. Stat Med 1993;12:1605–28.[ISI][Medline]
  4. Witteman JC, D'Agostino RB, Stijnen T, et al. G-Estimation of causal effects: isolated systolic hypertension and cardiovascular death in the Framingham Heart Study. Am J Epidemiol 1998;148:390–401.[Abstract]
  5. Robins JM, Blevins D, Ritter G, et al. G-Estimation of the effect of prophylaxis therapy for Pneumocystis carinii pneumonia on the survival of AIDS patients. Epidemiology 1992;3:319–36.[ISI][Medline]
  6. Joffe MK, Hoover DR, Jacobson LP, et al. Estimating the effect of zidovudine on Kaposi's sarcoma from observational data using a rank preserving structural failure-time model. Stat Med 1998;17:1073–1102.[ISI][Medline]
  7. The ARIC investigators. The Atherosclerosis Risk in Communities (ARIC) Study: design and objectives. Am J Epidemiol 1989;129:687–702.[Abstract]
  8. StataCorp. Stata statistical software: release 5.0. College Station, TX: Stata Corporation, 1997.
  9. Port S, Demer L, Jennrich R, et al. Systolic blood pressure and mortality. Lancet 2000;355:175–80.[ISI][Medline]
  10. Okumiya K, Matsubayashi K, Wada T, et al. A U-shaped association between home systolic blood pressure and four-year mortality in community-dwelling older men. J Am Geriatr Soc 1999;47:1415–21.[ISI][Medline]
  11. Voko Z, Bots ML, Hofman A, et al. J-shaped relation between blood pressure and stroke in treated hypertensives. Hypertension 1999;34:1181–5.[Abstract/Free Full Text]
  12. Johansson S, Sundquist J. Change in lifestyle factors and their influence on health status and all-cause mortality. Int J Epidemiol 1999;28:1073–80.[Abstract]
  13. Visscher TL, Seidell JC, Menotti A, et al. Underweight and overweight in relation to mortality among men aged 40–59 and 50–69 years: the Seven Countries Study. Am J Epidemiol 2000;151:660–6.[Abstract]
  14. Seidell JC, Verschuren WM, van Leer EM, et al. Overweight, underweight, and mortality: a prospective study of 48,287 men and women. Arch Intern Med 1996;156:958–63.[Abstract]
  15. Shaper AG, Wannamethee SG, Walker M. Body weight: implications for the prevention of coronary heart disease, stroke, and diabetes mellitus in a cohort study of middle aged men. BMJ 1997;314:1311–17.[Abstract/Free Full Text]
  16. Dyer AR, Stamler J, Greenland P. Associations of weight change and weight variability with cardiovascular and all-cause mortality in the Chicago Western Electric Company Study. Am J Epidemiol 2000;152:324–33.[Abstract/Free Full Text]
  17. Larson MG. Assessment of cardiovascular risk factors in the elderly: the Framingham Heart Study. Stat Med 1995;14:1745–56.[ISI][Medline]
  18. Iribarren C, Reed DM, Burchfiel CM, et al. Serum total cholesterol and mortality. Confounding factors and risk modification in Japanese-American men. JAMA 1995;273:1926–32.[Abstract]
  19. D'Agostino RB, Belanger AJ, Kannel WB, et al. Role of smoking in the U-shaped relation of cholesterol to mortality in men: the Framingham Study. Am J Epidemiol 1995;141:822–7.[Abstract]
  20. Hernan MA, Brumback B, Robins JM. Marginal structural models to estimate the causal effect of zidovudine on the survival of HIV-positive men. Epidemiology 2000;11:561–70.[ISI][Medline]
  21. Robins JM, Hernan MA, Brumback B. Marginal structural models and causal inference in epidemiology. Epidemiology 2000;11:550–60.[ISI][Medline]
Received for publication April 2, 2001. Accepted for publication November 9, 2001.





This Article
Abstract
FREE Full Text (PDF)
Alert me when this article is cited
Alert me if a correction is posted
Services
Email this article to a friend
Similar articles in this journal
Similar articles in ISI Web of Science
Similar articles in PubMed
Alert me to new issues of the journal
Add to My Personal Archive
Download to citation manager
Search for citing articles in:
ISI Web of Science (3)
Disclaimer
Request Permissions
Google Scholar
Articles by Tilling, K.
Articles by Szklo, M.
PubMed
PubMed Citation
Articles by Tilling, K.
Articles by Szklo, M.