1 Institute for Clinical Evaluative Sciences, Toronto, Ontario, Canada
2 Department of Public Health Sciences, Faculty of Medicine, University of Toronto, Toronto, Ontario, Canada
3 Statistics Canada, Ottawa, Ontario, Canada
4 Institute for Work & Health, Toronto, Ontario, Canada
Correspondence to Dr. Douglas G. Manuel, Institute for Clinical Evaluative Sciences, Room G106, 2075 Bayview Avenue, Toronto, Ontario M4N 3M5, Canada (e-mail: doug.manuel{at}ices.on.ca).
Received for publication February 16, 2004. Accepted for publication October 26, 2004.
![]() |
ABSTRACT |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
bias (epidemiology); effect modifiers (epidemiology); epidemiologic methods; mortality; prevalence; risk; smoking
![]() |
INTRODUCTION |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
The general principle of AFp was first discussed by Levin in 1953 (2) and can be described as AFp = (It Iu)/It, where It is the incidence rate of the outcome in the target population (defined as the population for which AFp is being derived) and Iu is the incidence rate of the outcome in the unexposed population. Since incidence rates for unexposed populations are usually unavailable, AFp is commonly derived in terms of the prevalence of risk exposure (Pe) in the population and the relative risk (RR) of the outcome for those exposed:
![]() |
A competing method for estimating SAM uses the approach of Peto et al. (6). This "indirect method" infers the prevalence of smoking by observing the excess rate of lung cancer mortality (primarily influenced by smoking) in the target population, as compared with an unexposed reference population. In AFp calculations, the indirect method substitutes observed current exposure prevalence estimates with prevalence that is considered necessary for causing the current lung cancer mortality burden. This method provides prevalence and SAM estimates for populations without reliable exposure data (data on lung cancer mortality are often available when data on smoking prevalence are not). In the estimation of AFp and SAM, use of the indirect method's prevalence estimates also avoids the potential error resulting from the lag time between population changes in smoking prevalence and the resulting change in disease outcome. For most smoking-related outcomes, the current burden of disease is largely influenced by the past smoking exposure in the population. Lag-time error occurs when AFp is calculated with the current prevalence of smoking to estimate the current burden of illness in a population with changing smoking exposure (7
9
).
While attributable fraction methods in general (9) and SAM in particular are helpful tools for public health planning (10
), evaluation of the impact of different assumptions in their application has been infrequent. Evaluation of SAM estimates has typically been limited to criticism regarding the estimates of relative risks used. For example, following the 1989 Surgeon General's report (4
), one set of criticisms pointed out that the relative risk estimates used, obtained from the American Cancer Society's Cancer Prevention Study II (CPS II), were adjusted for age but not for possible confounding factors such as alcohol consumption or dietary factors (11
13
). When smoking relative risk estimates were adjusted for potential confounding, the effects on total SAM were found to range from a 26 percent decrease (11
) to a 1 percent decrease (12
) to a 2.5 percent increase (13
). Another set of criticisms remarked that the CPS II relative risk estimates were not generalizable to the entire population (11
, 13
). Using an alternate set of relative risk estimates from a more nationally representative US survey yielded SAM estimates 39.5 percent and 16.2 percent lower than those in the Surgeon General's 1989 report (11
, 13
). Other issues relating to AFp estimation that have been examined include interpretation of multiple competing risks (14
17
), the extrapolation of AFp findings to new populations (18
, 19
), the theoretical effect of nondifferential exposure and outcome misclassification on the attributable fraction (20
, 21
), and the use of broad definitions of exposure (22
24
).
Beyond these concerns, however, there are a number of issues related to the estimation of AFp that have not been empirically examined. Included are three sources of potential error that result from measuring and analytically combining prevalence and outcome estimates from target populations with relative risks from etiologic studies. The first arises from differences in risk factor exposure measurement in the target population (the source of prevalence estimates) and the etiologic study used (the source of relative risk estimates). The second results from differences in the categorization of risk factor exposure, relative risks associated with the exposure, and the outcome associated with the risk. These include differences in the grouping of diseases attributable to smoking and differences in the age-specificity of the estimates. The third source of potential error relates to temporal issues that arise from uncertainty concerning the lag time between population changes in risk factor exposure and the subsequent changes in disease burden.
In this study, we empirically examined the impact of these potential errors on SAM estimates using data from 87 geographic regions in Canada. The large number of regions examined demonstrates the potential magnitude/importance of these errors in different population settings. We explicitly contrasted conventional AFp estimation methods with those presented by Peto et al. (6), partly to examine potential errors arising from lag-time assumptions. SAM is commonly derived and occasionally validated at the national/local level (3
, 11
13
, 25
, 26
); to our knowledge, SAM has never been validated for such a large number of regions. We expect that these findings will aid in improving the accuracy of AFp and attributable mortality estimation techniques for a wide range of exposures and outcomes.
![]() |
MATERIALS AND METHODS |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Annual average geographic region-, sex-, age-group-, and disease-specific death counts were obtained from the Canadian mortality database for 19951997, the three most recent years for which data were available from Statistics Canada (27). Disease-specific relative risk estimates for current smokers and former smokers, as compared with never smokers, were obtained from CPS II (4
, 28
), an ongoing prospective study of 1,185,106 adults (at baseline) over the age of 30 years living in the United States (Web appendix tables A, B, and C) (29
). A second set of relative risks from a meta-analysis of 10 different studies was also used for sensitivity analyses (Web appendix table D) (Eric Single, University of Toronto, personal communication, 2003) (30
).
Because of concerns about statistical power, prevalence and mortality data for health regions with populations of less than 50,000 were aggregated within each province. We also combined the three northern territories (Nunavut, Yukon, and the Northwest Territories) with Zone 2 in Nova Scotia and Nord-du-Québec in Quebec, the only two health regions within their respective provinces with populations of less than 50,000. This process reduced the number of distinct geographic regions in the analyses from 136 to 87.
Calculation of SAM
Smoking AFp represents the proportion of deaths due to a given disease that has been caused by smoking in a population. Baseline AFp was estimated using the general AFp equation, modified to take into account multiple levels of exposure to smoking (Appendix) (5). While exposure estimates for AFp are often obtained from population health surveys, the indirect method uses the excess lung cancer mortality rate in the target population, as compared with the lung cancer mortality rate of nonsmokers in a reference population, to infer a hypothesized percentage of current smokers in the population. SAM is the sum of the product of AFp and the number of deaths in the population for all diseases considered. Details on both methods can be found in the Appendix.
Sensitivity analyses
Table 1 presents background information on the potential errors we examined and describes the methods we used to estimate their effects. Briefly, we examined the influence of potential error related to risk factor exposure measurement in using alternate definitions of smoking exposure and in treating nonrespondents in different ways (table 1). We examined the effect of errors related to exposure, outcome, and risk categorization through several different specifications, including varying age aggregations of smoking prevalence and relative risk and varying disease groupings for relative risk and mortality (table 1). We quantified the effects of lag time by examining the effect of 1) substituting prevalence estimates obtained by means of the indirect method for prevalence estimates obtained from the CCHS and 2) adjusting current CCHS smoking prevalence estimates to reflect past prevalence (using historic surveys as a guide). We also examined the effect of using alternate relative risks from a different etiologic study and the effect of the indirect method's crude adjustment for confounding (halving of excess risks) (table 1). In addition to differences in prevalence estimation and in the halving of excess risks (Appendix), we examined the effect of the indirect method's use of an alternate set of CPS II (19841988) relative risks, which considered all medical causes of death to be partially attributable to smoking (Web appendix table B).
|
|
|
![]() |
RESULTS |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
|
Sources of potential error and variation in SAM
Table 3 shows changes in local and national SAM resulting from applying the different methods listed in table 1, which explore possible sources of variation and error.
Effects of modifying smoking exposure measurement.
Slightly altering the definition of exposure in the CCHS for former smokers from a minimum of 100 cigarettes smoked in a lifetime to one cigarette smoked in a lifetime increased total SAM by 5.3 percent (table 3, modified condition 1 (T3-1)). Treating all nonrespondents as never or current smokers (T3-2) led to an overall negligible change in SAM.
Effects of modifying exposure categorization.
Differences in the age categorization of exposure prevalence led to largely consistent changes in SAM. The use of age-combined (in place of 5-year age-group-specific) prevalence estimates for AFp estimation increased SAM estimates in all regions and by 6.9 percent nationally (T3-3). The use of only two age categories (as replicated from SAMMEC) led to an increase in SAM for most geographic regions (81 of 87) and an increase in total estimated SAM of 2.1 percent (T3-3).
Effects of modifying risk and outcome categorization.
In place of a select number of disease-specific relative risk and outcome estimates, the use of a single risk and outcome grouping for all diseases (i.e., all-cause relative risk and mortality estimates) led to a large national SAM increase of 28.8 percent (T3-4). Similar changes were observed at the regional level (interquartile range, 23.733.0 percent). These increases were even more striking when SAM was calculated using age-combined prevalence data. Total estimated SAM then increased 49.7 percent, from 40,770 to 61,050, with geographic regions experiencing a percentage increase ranging from 24.1 percent to 82.2 percent (T3-4). Substituting age-specific relative risk estimates (6-year follow-up) for lung cancer, chronic obstructive pulmonary disease, cerebrovascular disease, and coronary heart disease (28, 29
) (T3-5) also increased overall estimated SAM by 15.4 percent (from 37,219 to 42,953).
Effects of adjusting for lag time/latency period.
Addressing the issue of lag time, the use of smoking prevalence estimates obtained by means of the indirect method (T3-6) led to an increase of 14.5 percent in total estimated SAM. Adjustment of current smoking prevalence estimates using past survey data (T3-6) led to national SAM increases of 1.4 percent and 6.2 percent for the two sets of assumed lag times. Four geographic regions experienced declines in SAM after we substituted regional prevalence estimates obtained through the indirect method.
Effect of using alternate sets of relative risk estimates.
The use of an alternative set of relative risk estimates, obtained from Single et al.'s meta-analysis (Eric Single, University of Toronto, personal communication, 2003) (30), led to a decrease in estimated SAM of 13.3 percent nationally and similar declines regionally (T3-8). Replacing relative risks derived from 4 years of follow-up with relative risks from the same CPS II cohort, with 6 years of follow-up (T3-9), led to an overall decrease in estimated SAM of 8.7 percent.
Effects of changes made in the indirect method.
Table 4 shows the effects of using different combinations of the three modifications of the indirect method. Addressing lag time with the use of prevalence data derived by means of the indirect method increased estimated SAM by 14.5 percent. The subsequent use of disease groupings that considered all deaths to be partially attributable to smoking increased estimated SAM by 40.2 percent, to a total count of 57,177. The halving of excess risks to crudely address confounding decreased SAM by approximately one quarter nationally and by 21.429.6 percent for the various geographic regions. The three modifications of the indirect method combined increased SAM for half of the regions (43 of 87), leading to a final SAM national estimate of 41,224, which is close to (1.1 percent higher than) the baseline estimate obtained with the AFp technique. Web appendix table F presents a detailed, disease-specific comparison of SAM estimates obtained by means of the two methods.
![]() |
DISCUSSION |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Exposure measurement
When relative risk estimates are mathematically combined with prevalences of exposure in AFp estimation methods, the same definitions of exposure in the target and etiologic populations should be used. The use of slightly differing definitions of smoking led to changes of up to 11.6 percent in local area SAM estimates. The CPS II relative risks used a definition of at least one cigarette per day for at least 1 year for former smokers (33). For former smoking prevalence in the target population, even the use of the alternate CCHS definition of having smoked at least 100 cigarettes in a lifetime (34
) would probably have led to overestimation of the AFp in the Canadian population, since CPS II required greater exposure, thereby leading to higher relative risk estimates. The manner in which smoking status, especially former smoking, is defined varies considerably between etiologic studies and health surveys (35
), and this issue should be carefully examined in the estimation of AFp.
Exposure, risk, and outcome categorization
Care should also be taken when deciding on how to categorize exposure groups. SAM estimates differed considerably when age-specific prevalence estimates or age-specific relative risk estimates were used (T3-3 and T3-5). The use of stratum-specific estimates is important, since exposure prevalence and disease effect (relative risk and mortality) often vary across age groups, leading to the possibility of effect modification and bias when these components are combined in attributable mortality calculations. For Canada, the prevalence of smoking and the smoking AFp decrease with increasing age. The use of age-pooled prevalence estimates then underestimates smoking AFp for younger age groups and overestimates it for older age groups. Since the number of deaths is consistently greater for older age groups, the degree of overestimation in SAM (the product of AFp and death count) for older persons will be greater in absolute terms than the degree of underestimation for younger persons. This explains the 6.9 percent inflation in national SAM (T3-3). Because these trends exist within the age groups 3564 years and 65 years, the use of these two collapsed age groupings for prevalence estimates (as replicated from SAMMEC) still led to overestimation of SAM.
This bias can similarly be observed when relative risk is modified by age and non-age-specific relative risks are used. Further supporting the use of age-specific estimates, non-age-specific relative risks such as those calculated in CPS II are often age-standardized to a particular standard population, which may not be representative of the target population. When stable age-group-specific prevalence/relative risk estimates are available and age-group-specific mortality data (more readily available) are being used, age-group-specific AFp and SAM values should be calculated.
Furthermore, the use of relative risks and deaths for all diseases substantially increased SAM estimates (T3-4 and use of the indirect method's disease groupings). Investigators in previous studies have argued that calculating SAM using a select number of disease-specific relative risks and outcomes may result in underestimation of the true burden on mortality (36, 37
). However, other investigators have argued that since unadjusted relative risk estimates are prone to the observation of spurious associations in the presence of confounding variables, inclusion of diseases with weak evidence of a causal relation with smoking would lead to overestimation of SAM (36
).
Lag-time issues
The indirect method's prevalence estimation technique may be useful for populations for which prevalence estimates are unavailable (6). Furthermore, the indirect method's use of current lung cancer mortality rates in place of current smoking prevalence estimates avoids error from the lag time between population changes in the prevalence of exposure and the resulting change in mortality. Adjustment for this error is rarely carried out (4
, 11
13
, 30
, 38
41
). Lag-time error is largest when there is a long latency period between exposure and disease (true of smoking and many of its associated diseases) coinciding with rapidly changing prevalence of exposure in the population. The prevalence of current smoking has decreased by more than 10 percent in Canada since 1985 (42
), but the full resulting effect on mortality has not yet been observed. Using the indirect method to obtain prevalence estimates that were based on current mortality rates would thus be expected to lead to a greater SAM estimate than would be obtained from using CCHS smoking prevalence estimates. As expected, the indirect method estimated SAM that was 14.5 percent greater than estimates based on current smoking prevalence estimates.
Indirect prevalence estimates can be used independently of other changes made with the indirect method. One must exercise interpretative caution, however, since applying the indirect prevalence estimates for other diseases (done in the indirect method) would apply to them the same lag time as lung cancer in the calculation of SAM. As an alternative, current SAM estimates can be adjusted for changes in exposure through the use of past smoking prevalence estimates, when available. This method has the advantage of being able to apply different latency periods for different diseases. Doing so led to an increase in estimated SAM that was smaller than that observed from using prevalence data from the indirect method. This difference may result from the indirect method's adjustment for underreporting of smoking in health surveys (4, 43
45
) and from the application of the relatively long lag time of lung cancer to other diseases considered. Since current regional estimates of lung cancer mortality were available while past regional prevalence estimates were not, only the indirect method allowed for bidirectional adjustment for lag time in different regions (the direction of trend in the prevalence of smoking may differ among regions).
Addressing confounding
When SAM is calculated using unadjusted relative risks, the indirect method's halving of excess risks is a crude yet conservative method of adjusting for confounding. While it has been argued that the halving of excess risk does not lead to a guarantee of conservatism (25), our results show that the magnitude of the reduction in SAM estimates for all geographic regions (from 21.4 percent to 29.6 percent) was similar to or greater than the magnitudes of reductions observed in previous attempts to control for confounding (11
13
).
Conclusions
Concerns regarding relative risk generalizability should be extended in AFp calculations for consideration of how relative risks from an etiologic study and prevalences of exposure and outcome (i.e., death) from a target population are defined, categorized, and analytically combined. For the reasons discussed above, in the calculation of SAM estimates we recommend (when possible) the use of closely matching definitions of exposure in the study and target populations; the use of age- and sex-specific prevalence, mortality, and relative risk estimates; and (to be conservative) the use of a select number of causally linked, disease-specific relative risk estimates. Consideration of lag-time effects will also be important if there is a long latency period between exposure and disease occurrence combined with changing prevalence in the population over time. Attention to the often-neglected issues examined in this paper will lead to AFp estimates with greater validity or will, at least, inform authors and readers about the magnitude of potential biases when adjustment is not performed.
![]() |
APPENDIX |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
![]() | (1) |
The calculation of smoking attributable mortality (SAM) involves taking the product of smoking AFp and mortality count (D). For each geographic region, the SAM estimate for each disease group, k, for which relative risk estimates (RR1's and RR2's) were obtained (Web appendix tables AD) is
![]() | (2) |
Indirect method
The indirect method of Peto et al. (6) can be broken down into three steps:
![]() |
![]() |
![]() |
![]() |
ACKNOWLEDGMENTS |
---|
Dr. Douglas G. Manuel is a Career Scientist with the Ontario Ministry of Health and Long-Term Care.
![]() |
References |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|