Simulation Study of the Effect of the Early Mortality Exclusion on Confounding of the Exposure-Mortality Relation by Preexisting Disease
Pramil N. Singh1 and
Xiaoying Wang1,2
1 Department of Epidemiology and Biostatistics, School of Public Health, Loma Linda University, Loma Linda, CA.
2 Current affiliation: Ischemia Research and Education Foun-dation, San Francisco, CA.
 |
ABSTRACT
|
---|
The authors conducted a simulation study to evaluate whether exclusion of the early mortality (deaths occurring during a prespecified period immediately after baseline) reduces confounding of the exposure-mortality relation by preexisting disease. The simulation specified an exposure that decreased mortality risk in the absence of confounding and then introduced confounding by preexisting disease that biased the "true" protective effect of exposure towards greater risk. In 2,000 cohorts, exclusion of the early mortality (deaths occurring during the first 25 months of a 60-month follow-up period) did not alter the mean hazard ratio for exposure under conditions of confounding by preexisting disease that produced a constant, threefold increase in mortality risk during follow-up (the mean hazard ratio was 1.72 for all subjects and 1.72 after exclusion of the early mortality). However, when the authors specified confounding by preexisting disease which produced a threefold increase in mortality risk that attenuated over time, exclusion of the early mortality consistently identified the "true" protective effect of exposure (the mean hazard ratio was 1.07 for all subjects and 0.31 after exclusion of the early mortality). Thus, under conditions of confounding by preexisting disease which produces an increase in mortality risk that attenuates over timean effect that does have empirical supportthe early mortality exclusion can be very effective in revealing the "true" exposure-mortality relation.
aging; body mass index; confounding factors (epidemiology); epidemiologic methods
Abbreviations:
EEM, exclusion of the early mortality; HR, hazard ratio
 |
INTRODUCTION
|
---|
Prospective observational studies that relate a baseline exposure to a health outcome (i.e., mortality or morbidity) often face the methodological challenge that the exposure-outcome relation is confounded by the effects of disease that is present at baseline (preexisting disease). Confounding due to preexisting disease can be very difficult to control, since, in addition to including the detectable effect of diagnosed disease, it may also include the effect of undiagnosed, subclinical disease that is far more difficult to measure directly. The potential for confounding by preexisting disease has been given much attention in prospective investigations that have reported an increased mortality risk for very lean individuals (1
). It has long been proposed that at least some of the increase in mortality risk among the lean may be attributable to deaths among those who were lean because of disease-related weight loss (2


6
).
In a review of studies of the body weight-mortality relation, Manson et al. (3
) proposed that exclusion of the early mortality (i.e., deaths occurring during a prespecified period immediately after baseline) is an effective method of decreasing the bias due to preexisting disease. To date, exclusion of the early mortality has been used in a number of studies of body weight and mortality, with some but not all of the studies indicating an attenuation of the hazard due to low body weight after exclusion of the early mortality (7
, 8
). Allison et al. (9
, 10
) recently reported findings from a simulation indicating that exclusion of the early mortality does not decrease the confounding due to preexisting disease. This simulation has been used to question whether the early mortality exclusion is an effective analytical method with which to control for confounding by preexisting disease (8
10
).
When evaluating the efficacy of the early mortality exclusion in simulations, the impact of a number of biologically plausible, commonly occurring sample characteristics must be considered. For example, in an early review of studies on the body weight-mortality relation, Simopoulos and Van Itallie cited actuarial data indicating that the "extra mortality associated with various morbid conditions or unhealthy habits is generally highest in the short duration but decreases with time" (2
, p. 293). Such a time dependency in the effect of preexisting disease would support the efficacy of the early mortality exclusion, since exclusion of deaths occurring early in the follow-up would remove from the analysis the period when the confounding effect of preexisting disease would be strongest. In this context, it is noteworthy that in simulations of the early mortality exclusion that did not show efficacy for this method (9
, 10
), the variable for preexisting disease was not specified as time-dependent but rather was specified to produce a constant increase in risk during follow-up.
In this study, we conducted a simulation to evaluate the efficacy of the early mortality exclusion in reducing confounding of the exposure-mortality relation by preexisting disease. Our simulation included two different types of preexisting disease variables. The first type was specified to have the time-dependent effect described above, whereby the increase in mortality risk due to preexisting disease attenuated during later follow-up. The second type was specified to be time-invariant, and, as in previous simulations (9
, 10
), it produced a constant increase in mortality risk during follow-up. We then examined the effect of the early mortality exclusion on the confounding created by these two preexisting disease variables and compared the results. The impact of several other pertinent sample characteristics on the effect of the early mortality exclusion was also considered.
 |
MATERIALS AND METHODS
|
---|
Variable definitions
We simulated data sets for a prospective cohort study with a number of prespecified characteristics, described in variables that we define as follows. For each simulated data set, we assigned n subjects to be followed up during a study period of length t. Baseline data on these subjects included a dichotomous exposure variable (0 = unexposed, 1 = exposed) and a dichotomous preexisting disease variable (0 = disease, 1 = no disease). A random variable T representing time on study for each subject was randomly generated from the exponential distribution. Among those subjects with T < t, a prespecified proportion, f, were randomly designated as lost to follow-up, while the remainder (1 - f) were designated as deaths. Under the assumption of exponentially distributed survival times, the zth percentile of time on study is given by
If we let tz = t, p = z/100, and divide the n subjects in a simulated data set into i subgroups, we can compute the estimated hazard function for the ith subgroup by
where pi then represents Pr(death) for the ith subgroup and
i, the resulting hazard function, can be used to randomly generate exponentially distributed survival times for the ith subgroup.
For each simulated data set of n subjects, we designated four different subgroups based on levels of exposure and preexisting disease, and to each subgroup we assigned a different pi. Using the relation above, the pi = 1,2,3,4 were then used to specify the following four hazard functions:
where
i = 1,2,3,4 was used to randomly generate survival times for the four subgroups of subjects in a given data set.
Exposure variable specification
For each simulated data set, we specified an exposure variable for which "exposed" subjects experienced a decreased risk of death in the absence of confounding by preexisting disease. This was accomplished by specifying
i = 1,2,3,4 as follows:
where re < 1.
Preexisting disease variable specification
For each simulated data set, we specified one of two different types of preexisting disease variables (d1, d2). We let d1 be a preexisting disease variable that did not have time-dependent effects and that produced a constant increase in risk over t. This was accomplished by further specifying
i = 1,2,3,4 as follows:
where rd1 > 1.
Alternatively, we let d2 be a preexisting disease variable with time-dependent effects, whereby the increase in risk due to disease attenuated during later follow-up. This was accomplished by further specifying
i = 1,2,3,4 as follows. We designated a follow-up time t' such that t' < t and then considered the interval from 0 to t' the "early follow-up interval." For subjects with 0 < T
t', we specified
i = 1,2,3,4 by
where rd2E = rd1. Subjects with T > t' were then considered to be in the "late follow-up interval." We transformed the
i = 1,2,3,4 from the "early follow-up interval" by
where rd2L = kx rd2E, and k was chosen to fall between 0 and 1 (0.1 was used in the simulations presented in this paper). The choice of k in this range ensures that the increased risk due to preexisting disease during the early follow-up (rd2E) will be attenuated by a factor of k during the later follow-up (rd2L).
Confounding of the exposure effect by the preexisting disease effect
For each simulated data set, confounding of the exposure-mortality relation was created by 1) specifying the preexisting disease-mortality relation as described above in terms for rd1, rd2E, and rd2L and 2) specifying a preexisting disease-exposure relation by
where C > 1. This latter relation ensures that preexisting disease is C times more prevalent among exposed subjects than among unexposed subjects.
Computation of the mean hazard ratios
For a given set of the prespecified characteristics described above (summarized in table 1), we randomly generated survival times for 2,000 data sets. We computed two hazard ratios for exposure (exposed vs. unexposed) from each data set using a Cox proportional hazards model with time on study as the time variable, all-cause mortality as the event variable, and an independent variable for exposure. The first hazard ratio (HR) for exposure, HRall, was computed for all subjects. The second hazard ratio for exposure, HREEM, was computed after exclusion of the early mortality (EEM)deaths that occurred during a prespecified interval from baseline (T = 0) to te, where te < t. Mean hazard ratios (
all,
EEM) from 2,000 data sets were then computed for a given set of prespecified characteristics.
View this table:
[in this window]
[in a new window]
|
TABLE 1. Sample characteristics, variable names, and prespecified values in a simulation study of the early mortality exclusion
|
|
Sample characteristics
Although our primary purpose was to examine the effect of type of preexisting disease variable (d1, d2) on the effect of the early mortality exclusion, we also varied the effects of several other sample characteristics described in table 1. Note that we did not need to vary the absolute value of t, since the hazard ratio computed under Cox regression will not vary with scalar transformations of the time variable (i.e., a x t, where a = constant). By this same reasoning, the choice of months as the unit for the time variable was purely arbitrary, and it had no effect on the hazard ratios computed. However, we did vary the value of te relative to t and t' in order to examine the effect of changes in the length of the early mortality exclusion period.
 |
RESULTS
|
---|
Hazard plots from two simulated data sets
To explore in detail the effect of the early mortality exclusion on confounding of the exposure-mortality relation by preexisting disease, we plotted the natural log of the hazard function versus time for levels of exposure and preexisting disease in two simulated data sets (figures 1 and 2).

View larger version (20K):
[in this window]
[in a new window]
|
FIGURE 1. Plots of the natural log of the hazard function versus time for each level of a dichotomous exposure variable (specified to decrease risk of death in the absence of confounding by preexisting disease) and a dichotomus preexisting disease variable (specified to produce a time-invariant increase in risk of death) in a simulated cohort of 50,000 subjects enrolled in a 60-month follow-up.
|
|

View larger version (20K):
[in this window]
[in a new window]
|
FIGURE 2. Plots of the natural log of the hazard function versus time for each level of a dichotomous exposure variable (specified to decrease risk of death in the absence of confounding by preexisting disease) and a dichotomous preexisting disease variable (specified to produce an increase in risk of death that attenuates after 25 months of follow-up) in a simulated cohort of 50,000 subjects enrolled in a 60-month follow-up.
|
|
In the first data set (figure 1), we set a sample size (n) of 50,000 subjects to be enrolled in a 60-month follow-up (t) after baseline collection of data on exposure and a preexisting disease that does not have a time-dependent effect (d1). Therefore, the preexisting disease variable was specified to produce a constant, time-invariant threefold increase in risk of death during the follow-up period (rd1 = 3; figure 1, part a). The exposure variable was specified to decrease risk of death (re = 0.7) in the absence of confounding by preexisting disease (figure 1, part b). Preexisting disease was further specified to be six times more prevalent among the exposed than among the unexposed (C = 6). We also prespecified the following additional characteristics for the first data set: f = 0, Pr(exposed) = 0.25, Pr(disease) = 0.25, Pr(death) = 0.05, and te = 25. The plots depicted in figure 1 indicate that when a protective exposure (figure 1, part b) is confounded by a time-invariant increase in risk due to preexisting disease, the risk due to exposure observed among all subjects is increased (figure 1, part c). In the scenario depicted in figure 1, exclusion of early mortality (te = 25) from the analysis (figure 1, part d) did not substantially alter the confounded increase in risk due to exposure that was observed among all subjects (figure 1, part c). We conducted a proportional hazards regression analysis using the data from figure 1 and found that the hazard ratio for exposure (exposed vs. unexposed) among all subjects (HRall = 1.41) did not differ substantially from the hazard ratio for exposure after exclusion of deaths occurring during the first 25 months of follow-up (HREEM = 1.46).
For the second data set (figure 2), we maintained all of the characteristics of the first data set, with the exception that the preexisting disease variable was specified to be time-dependent (d2; figure 2, part a), such that after 25 months of follow-up (t' = 25), the threefold increase in risk due to preexisting disease (rd2E = rd1 = 3) would be attenuated by a factor of 1/10 (k = 0.1). Figure 2 shows that when a protective exposure (part b) is biased towards greater risk (part c) by this type of preexisting disease (d2; part a), exclusion of early mortality (te = 25) from the analysis reveals the "true" protective effect of the exposure (part d). We conducted a proportional hazards regression analysis using the data from figure 2 and found that exclusion of deaths occurring during the first 25 months of follow-up altered the hazard ratio for exposure (exposed vs. unexposed) in the direction of the true protective effect of exposure (HRall = 1.00, HREEM = 0.46).
Mean hazard ratios from 2,000 simulated data sets
In tables 2 and 3, we further explored whether the findings shown in figures 1 and 2, indicating a dependence of the effect of the early mortality exclusion on the type of preexisting disease variable used (time-invariant, d1; time-dependent, d2), would remain in analyses where 2,000 data sets were generated for a given set of characteristics and mean hazard ratios (
all,
EEM) were computed.
The mean hazard ratios presented in table 2 show that the early mortality exclusion reveals the "true" protective effect of exposure only in the presence of confounding by a time-dependent preexisting disease which produces an increase in mortality risk that attenuates over time. The data in table 2 further indicate that this effect remains evident across a wide range of sample sizes (5,000150,000) and mortality rates (120 percent).
We next examined the effect of variation in six other sample characteristicsthe magnitude of the disease-exposure relation, the magnitude of the mortality risk due to preexisting disease, the prevalence of preexisting disease, the rate of loss to follow-up, the percentage of persons exposed at baseline, and the length of the early follow-up exclusion period (table 3). The findings from samples exhibiting wide variation in these characteristics continued to indicate that the early mortality exclusion changed the hazard ratio for exposure towards its "true" protective effect only in the presence of confounding by a preexisting disease variable with time-dependent effects (d2). This change (under confounding by d2) was less evident under the following conditions: a weak disease-exposure relation (C = 1.5), a low mortality risk due to preexisting disease (50 percent increase in mortality risk), a lower prevalence of preexisting disease (5 percent), and a shorter duration of early follow-up exclusion (te = 5, 10).
 |
DISCUSSION
|
---|
We conducted a simulation study to evaluate whether exclusion of the early mortality (deaths occurring during a prespecified period immediately following baseline) is an ef-fective method of reducing the confounding of an exposure-mortality relation by preexisting disease (prevalent disease that increases mortality risk). Our simulation specified an exposure that decreased risk of death in the absence of confounding by preexisting disease (figure 1, part b, and figure 2, part b). We introduced confounding by a preexisting disease that biased the "true" protective effect of this exposure towards greater risk (figure 1, part c, and figure 2, part c). We then tested whether excluding the early mortality reduced this confounding by altering the exposure risk towards its "true" protective effect.
Our findings from simulation of a preexisting disease which produced a constant, time-invariant increase in mortality risk (figure 1, part a) indicated that exclusion of the early mortality (figure 1, part d) did not substantially alter the risk due to exposure that was observed among all subjects (figure 1, part c). These findings are in concordance with simulations conducted by Allison et al. (9
, 10
), who also used a preexisting disease variable that produced a constant, time-invariant increase in mortality risk. In contrast, our findings from simulation of a preexisting disease which produced a mortality risk that attenuated over time (figure 2, part a) indicated that exclusion of the early mortality consistently decreased the mortality risk due to exposure (figure 2, part d) to the point of indicating its "true" protective effect (figure 2, part b). Thus, our findings indicate that if an exposure-mortality relation is confounded by a preexisting disease which produces an increase in mortality risk that attenuates over time (figure 2, part a), exclusion of the early mortality is very effective in revealing the "true" exposure-mortality relation (tables 2 and 3).
Is there empirical support for confounding by a preexisting disease which produces an increase in mortality risk that attenuates over time (figure 2, part a)? In the 1951 Impairment Study (11), the Society of Actuaries reported mortality ratios for "impairments characterized by extra mortality decreasing slowly with duration" and "impairments characterized by extra mortality decreasing rapidly with duration." These data, reproduced in figure 3, indicate that during 15 years of follow-up of approximately 725,000 insurance policy-holders, the increase in mortality risk for commonly occurring preexisting diseases (cardiovascular disease, cancer, gastrointestinal diseases) declined in magnitude over time. In the Manitoba Longitudinal Study on Aging, Mossey and Shapiro (12
) reported a 155 percent increase in mortality risk for "poor" baseline health status (physician-assessed) during the first 4 years of follow-up that attenuated to a 56 percent increase in risk for poor health after 4 years of follow-up. Kaplan and Camacho (13
) showed a similar attenuation in the mortality risk for poor baseline health status during the latter part of a 9-year follow-up of the Human Population Laboratory cohort. More recently, Dyer et al. (14
) reported findings from a 25-year prospective study of Chicago Western Electric Company workers indicating that weight loss, a putative indicator of preexisting illness, produced an increase in risk during the first 15 years of follow-up that attenuated to equivocal risk during years 1625 of follow-up. Lastly, 26-year follow-up data from the Adventist Mortality Study (15
) indicated that, for a number of disease-specific mortality endpoints (cardiovascular disease, cancer, other diseases), exclusion of subjects with putative indicators of preexisting illness (i.e., severe physical complaints) strengthened the protective effect of low body mass index during the early years of follow-up (years 114) but had no effect on hazard ratios for the later years of follow-up (years 1526). Moreover, the "severe physical complaints" variable in this study produced an increase in mortality risk that attenuated over time (unpublished data; see figure 4). Taken together, these empirical data indicate that the mortality risk from preexisting diseases attenuates during long-term follow-up and therefore support the scenario depicted in figure 2, which argues for the efficacy of early mortality exclusion.

View larger version (37K):
[in this window]
[in a new window]
|
FIGURE 3. Increase in mortality risk according to preexisting disease at baseline in a 15-year follow-up of 725,000 insurance policy-holders in the 1951 Impairment Study (11 ), Chicago, Illinois.
|
|

View larger version (14K):
[in this window]
[in a new window]
|
FIGURE 4. Plot of the natural log of the hazard function versus time for a variable indicating "severe physical complaints" at baseline (chest pain, shortness of breath, blood in stool, blood in urine, lump or thickening in breast, unusual discharge from breast, unusual bleeding from vagina, fatigue, loss of appetite) among women in the Adventist Mortality Study (15 ), California, 19601985.
|
|
Our findings from simulation of a preexisting disease with time-dependent effects indicate that the early mortality exclusion produces only slight changes in the exposure risk under conditions indicative of weak confounding: a weak disease-exposure relation (table 3, C = 1.5), a weak disease-mortality relation (table 3, 50 percent increase in risk for disease), or a low prevalence of preexisting disease (table 3, 5 percent prevalence). Therefore, our findings predict that in a cohort of healthy adults (i.e., a cohort characterized by younger age, the healthy volunteer effect, or the healthy worker effect) who exhibit a low prevalence of virulent preexisting disease that affects exposure, we should expect the exclusion of the early mortality to have little or no impact. Conversely, in cohorts of high-risk adults (i.e., the elderly, patient populations) with a high prevalence of virulent preexisting disease that affects exposure, we should expect to see substantial changes in mortality risk after the exclusion of the early mortality. It follows that substantial variation in the observed effect of the early mortality exclusion in real cohorts may be attributable to between-cohort differences in the characteristics of preexisting disease (i.e., prevalence, virulence, effect on exposure) and is not prima facie evidence of a lack of efficacy of the early mortality exclusion in reducing confounding by preexisting diseasea conclusion reached in a recent meta-analysis of the early mortality exclusion (8
).
In our simulation of a preexisting disease with time-dependent effects, the effect of early mortality exclusion became less pronounced (table 3, where te = 5 or 10 and t' = 25) as we decreased the duration of early follow-up time that was excluded (te). This was also an expected effect, since higher values of te would either reduce or eliminate the contribution to the exposure-mortality relation of the early follow-up period (0 < T
t')the period when the preexisting disease variable produces maximum risk and thus has the strongest confounding effect (figure 2, part a).
What is the optimal value for the length of early follow-up time to be excluded (te)? Our findings in this simulation indicate that the early mortality exclusion is most effective when te is at least as long as t' (table 3, where te = 25 or 40 and t' = 25).
How then do we gain insight into the value of t' in real cohorts? It is reasonable to assume that t' will be shorter in cohorts with a high prevalence of rapidly fatal preexisting diseases (i.e., certain cancers) and longer in cohorts with a high prevalence of nonaggressive but ultimately fatal preexisting disease (i.e., cardiovascular disease or respiratory disease). This assumption is supported by the data in figure 3, which indicate a rapid decline in the mortality risk for preexisting cancers and a slower decline for preexisting cardiovascular disease. We have presented data from healthy adults (insurance policy-holders (reference 11
; figure 3), electric company workers (14
), and Seventh-day Adventists (reference 15
; figure 4)) indicating that all-cause mortality risk due to preexisting disease is most evident during the first 15 years of follow-up. This suggests that 15 years may be a good estimate of t' (and thus a good choice for te) in prospective studies of healthy adults.
When specifically considering the body weight-mortality relation, Lee and Manson have proposed, based on findings from the Harvard College Alumni Study (16
), that the first 1519 years of follow-up be excluded (17
). In this context, it is noteworthy that most of the studies indicating no appreciable effect of the early mortality exclusion on the body weight-mortality relation used exclusion intervals (te) substantially shorter than 15 years (8
).
In summary, our simulation study indicated that if the exposure-mortality relation is confounded by a preexisting disease which produces an increase in mortality risk that attenuates over time, exclusion of the early mortality represents an effective analytical method for reducing this confounding. We found empirical support for this type of confounding in data from a number of large prospective studies that related baseline indicators of disease to subsequent risk of death.
 |
NOTES
|
---|
Reprint requests to Dr. Pramil N. Singh, School of Public Health, Nichol Hall, Room 2010, Loma Linda University, Loma Linda, CA 92350 (e-mail: psingh{at}sph.llu.edu).
 |
REFERENCES
|
---|
-
Kushner RF. Body weight and mortality. Nutr Rev 1993;51:12736.[ISI][Medline]
-
Simopoulos AP, Van Itallie TB. Body weight, health, and longevity. Ann Intern Med 1984;100:28595.[ISI][Medline]
-
Manson JE, Stampfer MJ, Hennekens CH, et al. Body weight and longevity: a reassessment. JAMA 1987;257:3538.[Abstract]
-
Losonczy KG, Harris TB, Cornoni-Huntley J, et al. Does weight loss from middle age to old age explain the inverse weight mortality relation in old age? Am J Epidemiol 1995;141:31221.[Abstract]
-
Rumpel C, Harris TB, Madans J. Modification of the relationship between the Quetelet index and mortality by weight-loss history among older women. Ann Epidemiol 1993;3:34350.[Medline]
-
Wannamethee G, Shaper AG. Weight change, perceived health status and mortality in middle-aged British men. Postgrad Med J 1990;66:91013.[Abstract]
-
Gaesser GA. Thinness and weight loss: beneficial or detrimental to longevity? Med Sci Sports Exerc 1999;31:111828.[ISI][Medline]
-
Allison DB, Faith MS, Heo M, et al. Meta-analysis of the effect of excluding early deaths on the estimated relationship between body mass index and mortality. Obes Res 1999;7:34254.[Abstract]
-
Allison DB, Heo M, Flanders DW, et al. Examination of "early mortality exclusion" as an approach to control for confounding by occult disease in epidemiologic studies of mortality risk factors. Am J Epidemiol 1997;146:67280.[Abstract]
-
Allison DB, Heo M, Flanders DW, et al. Simulation study of the effects of excluding early deaths on risk factor-mortality analyses in the presence of confounding due to occult disease: the example of body mass index. Ann Epidemiol 1999;9:13242.[ISI][Medline]
-
Society of Actuaries. 1951 Impairment Study. (Transactions of the Society of Actuaries, vol 6, no. 15). Chicago, IL: Society of Actuaries, 1954:2934.
-
Mossey JM, Shapiro E. Self-rated health: a predictor of mortality among the elderly. Am J Public Health 1982;72:8008.[Abstract]
-
Kaplan GA, Camacho T. Perceived health and mortality: a nine-year follow-up of the Human Population Laboratory cohort. Am J Epidemiol 1983;117:292304.[Abstract]
-
Dyer AR, Stamler J, Greenland P. Associations of weight change and weight variability with cardiovascular and all-cause mortality in the Chicago Western Electric Company Study. Am J Epidemiol 2000;152:32433.[Abstract/Free Full Text]
-
Singh PN, Lindsted KD. Body mass and 26-year risk of mortality from specific diseases among women who never smoked. Epidemiology 1998;9:24654.[ISI][Medline]
-
Lee IM, Manson JE, Hennekens CH, et al. Body weight and mortality: a 27-year follow-up of middle-aged men. JAMA 1993;270:28238.[Abstract]
-
Lee IM, Manson JE. Body weight and mortality: what is the shape of the curve? (Editorial). Epidemiology 1998;9:2278.[ISI][Medline]
Received for publication December 28, 2000.
Accepted for publication May 25, 2001.