Methodological Issues in Estimating Smoking-attributable Mortality in the United States

Ann M. Malarcher1, Jane Schulman2, Leonardo A. Epstein2, Michael J. Thun3, Paul Mowery2, Ben Pierce2, Luis Escobedo1 and Gary A. Giovino1

1 Office on Smoking and Health, Centers for Disease Control and Prevention, Atlanta, GA.
2 Battelle Memorial Institute, Centers for Public Health Research and Evaluation, Baltimore, MD.
3 American Cancer Society, Atlanta, GA.


    ABSTRACT
 TOP
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS
 DISCUSSION
 APPENDIX A
 APPENDIX B
 APPENDIX C
 REFERENCES
 
The authors explored two methodological issues in the estimation of smoking-attributable mortality for the United States. First, age-specific and age-adjusted relative risk, attributable fraction, and smoking-attributable mortality estimates obtained using data from the American Cancer Society's second Cancer Prevention Study (CPS II), a cohort study of 1.2 million participants (1982–1988), were compared with those obtained using a combination of data from the National Mortality Follow-back Survey (NMFS), a representative sample of US decedents in which information was collected from informants (1986), and the National Health Interview Survey (NHIS), a nationally representative household survey (1987). Second, the potential for residual confounding of the disease-specific age-adjusted smoking-attributable mortality estimates was addressed with a model-based approach. The estimated smoking-attributable mortality based on the CPS II for the four most common smoking-related diseases—lung cancer, chronic obstructive pulmonary disease, coronary heart disease, and cerebrovascular disease—was 19% larger than the estimated smoking-attributable mortality based on the NMFS/NHIS, yet the two data sources yielded essentially the same smoking-attributable mortality estimate for lung cancer alone. Further adjustment of smoking-attributable mortality for disease-appropriate confounding factors (education, alcohol intake, hypertension status, and diabetes status) indicated little residual confounding once age was taken into account. Am J Epidemiol 2000;152:573–584.

cardiovascular diseases; cerebrovascular disorders; confounding factors (epidemiology); lung diseases; obstructive; lung neoplasms; mortality; smoking

Abbreviations: CPS II, Cancer Prevention Study II; ICD-9, International Classification of Diseases, Ninth Revision; NHIS, National Health Interview Survey; NMFS, National Mortality Followback Survey.


    INTRODUCTION
 TOP
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS
 DISCUSSION
 APPENDIX A
 APPENDIX B
 APPENDIX C
 REFERENCES
 
In 1989, the Surgeon General published estimates of the number of deaths attributable to cigarette smoking in the United States (1Go). The attributable fractions were based on relative risks estimated from the American Cancer Society's second Cancer Prevention Study (CPS II) and smoking prevalences estimated from the National Health Interview Survey (NHIS), a cross-sectional, nationally representative survey of adults aged >=18 years. National estimates of smoking-attributable mortality have also been produced for more recent years using similar methods (2Go, 3Go). There are two basic criticisms of these national estimates (4GoGo–6Go). The first is that because the CPS II was not a nationally representative sample, it may be inappropriate to apply the relative risks obtained from the CPS II to the US population. The second criticism is that although the national estimates were age-adjusted, they were not adjusted for other potentially confounding factors, such as alcohol use, educational level, hypertension, or the presence of diabetes mellitus. As a result, the potential exists for residual confounding.

Both of these criticisms were discussed by Sterling et al. (4Go), who compared age-adjusted relative risks from the 1989 Surgeon General's report with age-adjusted relative risks estimated using a combination of data from the NHIS and the 1986 National Mortality Follow-back Survey (NMFS). The latter survey used a representative sample of all decedents aged 25 years or older in the United States. In general, the relative risks estimated by Sterling et al. were smaller than the corresponding estimates from the Surgeon General's report, and as a result, the attributable fractions and smoking-attributable mortality estimates reported by Sterling et al. were smaller than the Surgeon General's estimates. Sterling et al. also presented disease-specific estimates of attributable fractions and smoking-attributable mortality adjusted for age; for age and income; for age and alcohol; and for age, alcohol, and income simultaneously (4Go). Their estimates of smoking-attributable mortality adjusted for multiple confounders were considerably smaller than their estimates adjusted only for age.

We further explored the two methodological issues raised by Sterling et al. First, we quantified the effect of using different data sources for estimating age-adjusted relative risks, attributable fractions, and smoking-attributable mortality. In particular, we compared age-specific and age-adjusted relative risks estimated using data from the CPS II with those estimated using a combination of data from the NMFS and the NHIS. In contrast to Sterling et al., we directly standardized all age-specific relative risks to the same reference population. Second, we assessed the potential residual confounding of the disease-specific age-adjusted attributable fractions and smoking-attributable mortality estimates. Many researchers have been interested in estimating adjusted attributable fractions by using either case-control data or cohort data (7GoGoGoGoGoGoGoGoGo–16Go). In our application, we adapted the method of Bruzzi et al. (17Go) for use with case-control data to estimate the adjusted attributable fraction when combining information from one cross-sectional national survey and one cohort study.


    MATERIALS AND METHODS
 TOP
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS
 DISCUSSION
 APPENDIX A
 APPENDIX B
 APPENDIX C
 REFERENCES
 
Data sources
We used four data sources to estimate smoking prevalences, relative risks, attributable fractions, and smoking-attributable mortality: US mortality data from the National Center for Health Statistics and data from the CPS II, the NMFS, and the NHIS. From the National Center for Health Statistics' US mortality data, as reported by the states from death certificates (18Go), we obtained the total number of deaths for each smoking-related disease.

The American Cancer Society's CPS II is a cohort study of approximately 1.2 million participants that began in 1982 (19Go). American Cancer Society volunteers identified potential participants—persons whom they would be able to maintain contact with over the course of the study, such as neighbors and friends. CPS II participants reside in all 50 states, the District of Columbia, Puerto Rico, and Guam. Smoking status and age were determined at baseline from a self-administered questionnaire on medical history, current health status, and lifestyle factors. The volunteers reported vital status for participants every year. For participants who died during follow-up, the underlying cause of death (as indicated by the International Classification of Diseases, Ninth Revision (ICD-9) (20Go)) was obtained from the death certificate. After 6 years of follow-up, death certificates had been obtained for 94.1 percent of subjects known to have died.

The 1986 NMFS used a stratified sample of death certificates in the Current Mortality Sample (21Go), which was a 10 percent sample of death certificates of persons aged 25 years or older. To ensure sufficient sample sizes for subpopulations, certain age and racial/ethnic groups and causes of death were oversampled or selected with certainty. In all, 18,733 death certificates were sampled, of which 2,274 were sampled with certainty. Proxy information on decedents' health behaviors was collected from informants (usually next of kin) and from facility abstract questionnaires (i.e., nursing homes). Information was obtained using a combination of self-administered questionnaires and telephone or field interviews. The response rate for the 1986 NMFS was 88.6 percent.

The NHIS is a national, multistage cluster survey that has been conducted continuously since 1957 (22Go). The target population is the civilian, noninstitutionalized population of the United States aged 18 years or older. Questions about tobacco smoking frequently appear on supplements to the NHIS. The 1987 NHIS supplement included tobacco questions as part of a special section on cancer control. Although proxy responses to the core NHIS are permitted, supplemental interviews must be completed by the sampled respondents. A total of 47,240 interviews were completed in 1987.

Comparison of CPS II and NMFS/NHIS estimates
We restricted our comparison of age-adjusted estimates obtained from CPS II data and the NMFS/NHIS data to the four most common smoking-related diseases: lung cancer (ICD-9 code 162), chronic obstructive pulmonary disease (ICD-9 codes 490–492 and 496), coronary heart disease (ICD-9 codes 410–414), and cerebrovascular disease (ICD-9 codes 430–438). We used four age groups (35–59, 60–69, 70–79, and >=80 years) when we analyzed lung cancer, coronary heart disease, and cerebrovascular disease; we used only two age groups (50–69 and >=70 years) when we analyzed chronic obstructive pulmonary disease, because there are so few deaths below age 50 years for this cause of death. All estimates were calculated separately by gender. Because 93 percent of the CPS II respondents are White, we limited our study to Whites. Subjects were classified as current smokers (persons who reported that they smoked now), former smokers (persons who reported that they had ever smoked but did not smoke now), or never smokers (persons who reported that they had never smoked).

We used standard methods to estimate age-, gender-, and cause-specific relative risks and attributable fractions, age-adjusted relative risks, and age-adjusted attributable fractions and smoking-attributable mortality (see Appendix A). Briefly, age-specific relative risks for current or former smokers compared with never smokers for each disease/gender group were directly age-adjusted by taking a weighted sum of the age-specific relative risks (using the proportion of the 1986 US population in each age group as the weights). Age-specific attributable fractions for each disease/gender group were estimated by combining age-specific relative risks and smoking prevalence estimates for never, former, and current smokers. Age-adjusted attributable fractions were calculated as a weighted sum of the age-specific attributable fractions (using the proportion of deaths in the United States in 1986 in each age group as the weights). For each disease/gender group, we estimated age-adjusted smoking-attributable mortality by multiplying the age-specific attributable fractions by the number of deaths within the age group and then summing across age groups.

Comparison of age-adjusted and fully adjusted estimates
To examine the potential for residual confounding of the age-adjusted estimates, we compared age-adjusted estimates and estimates adjusted for multiple confounding factors (age, education, alcohol use, hypertension status, and diabetes status). As before, we restricted the analysis to the four most common smoking-related diseases: lung cancer, chronic obstructive pulmonary disease, coronary heart disease, and cerebrovascular disease. The results are presented separately by gender, and the analysis was limited to Whites. Subjects were classified as current, former, or never smokers.

We varied the set of potential confounders by disease. Age (<59, 60–64, 65–69, 70–75, 76–79, 80–84, and >=85 years) and education (<12 years, 12 years, and >12 years) were included for all diseases. Alcohol use (nondrinking, and <1, 1, and >=2 drinks per day) was also included for chronic obstructive pulmonary disease, coronary heart disease, and cerebrovascular disease. History of hypertension (yes/no) and diabetes mellitus (yes/no) was also included for coronary heart disease and cerebrovascular disease.

The method used for adjusting the attributable fraction was an adaptation of that used by Bruzzi et al. (17Go). Details on the method are presented in Appendix B. Briefly, the adjusted attributable fraction (AF) is given by the formula

where {rho}j is the proportion of deaths in the jth cell defined by exposure and confounder status (e.g., smoking x age) and is the relative risk for smokers compared with nonsmokers, adjusted for confounder(s) C (e.g., age). We used data from the NMFS to estimate the values of {rho}j, accounting for the design of the survey. The values were estimated using Poisson regression with the CPS II data. The purpose of the model in this instance was to pool information from different cells to estimate adjusted relative risks while preserving, to the extent possible, the structure of the data. Our goal was to define disease- and gender-specific models with predicted cell counts similar to the observed cell counts. The final model for each disease/gender group included all main effects and all significant two- and three-factor interactions. The final models also included all relevant two-factor interaction terms needed to maintain a hierarchical model structure (23Go).

We calculated the confidence intervals for the adjusted attributable fraction and smoking-attributable mortality estimates using bootstrap methods (24Go), because traditional methods for estimating standard errors are based on asymptotic results, the validity of which may not hold with sparse data and with models containing many parameters. To generate bootstrap estimates for the adjusted attributable fraction, we combined bootstrap estimates of the relative risks computed from the CPS II data with bootstrap estimates of the proportions computed from the NMFS data. Details on the bootstrap calculation are presented in Appendix C.


    RESULTS
 TOP
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS
 DISCUSSION
 APPENDIX A
 APPENDIX B
 APPENDIX C
 REFERENCES
 
Comparison of CPS II and NMFS/NHIS estimates
Patterns in the age-specific relative risks estimated using the CPS II data were similar to those estimated using a combination of data from the NMFS and the NHIS in that 1) relative risks for current smokers versus never smokers were larger than relative risks for former smokers versus never smokers; 2) relative risks tended to decrease with increasing age; and 3) relative risks for lung cancer and chronic obstructive pulmonary disease were generally larger than the corresponding relative risks for coronary heart disease and cerebrovascular disease (table 1 and table 2).


View this table:
[in this window]
[in a new window]
 
TABLE 1. Age-specific and age-adjusted relative risks of mortality for current and former smokers versus never smokers (US White men)*

 

View this table:
[in this window]
[in a new window]
 
TABLE 2. Age-specific and age-adjusted relative risks of mortality for current and former smokers versus never smokers (US White women)*

 
Among men, the age-specific relative risks from the CPS II were generally (up to 3.2 times) larger than the corresponding NMFS/NHIS relative risks, except for lung cancer, for which NMFS/NHIS estimates were 1.4–3.0 times larger than CPS II estimates among persons aged 35–59 years and persons aged 80 years or older (table 1). For coronary heart disease and cerebrovascular disease among men aged 60 years or older, the relative risks from the CPS II indicated excess risk among former smokers, whereas the NMFS/NHIS relative risks for former smokers indicated a protective effect. The relative risk estimates from the NMFS/NHIS data were considerably more variable than the CPS II estimates. Among women, the differences between the age-specific relative risks estimated from the CPS II and those estimated from the NMFS/NHIS were smaller (table 2). The pattern for lung cancer was similar to that of men, and the NMFS/NHIS relative risk estimates were up to 3.7 times larger than CPS II estimates among women aged 35–59 years and women aged 70 years or older.

For both genders, age-adjusted relative risks for lung cancer were larger when estimated using the NMFS/NHIS data than when estimated using the CPS II data, and the age-adjusted relative risks for cerebrovascular disease were slightly higher in the CPS II than in the NMFS/NHIS (table 1 and table 2). The age-adjusted relative risks for chronic obstructive pulmonary disease and coronary heart disease based on the CPS II were larger than those based on the NMFS/NHIS for men but not for women.

The total age-adjusted smoking-attributable mortality for the four diseases combined was 36 percent higher for men and 8 percent lower for women when it was estimated using CPS II data rather than NMFS/NHIS data (table 3). For men, age-adjusted attributable fraction and smoking-attributable mortality estimates based on CPS II data were consistently larger for all four diseases than they were for estimates based on NMFS/NHIS data. For women, however, estimates based on the two data sources were quite comparable for chronic obstructive pulmonary disease, coronary heart disease, and cerebrovascular disease, whereas the age-adjusted attributable fraction and smoking-attributable mortality for lung cancer based on CPS II data were approximately 11 percent lower than the age-adjusted attributable fraction and smoking-attributable mortality based on NMFS/NHIS data.


View this table:
[in this window]
[in a new window]
 
TABLE 3. Age-adjusted attributable fractions (AFs) and smoking-attributable mortality (SAM) among US White men and women aged >=35 years*

 
For both genders combined, the CPS II smoking-attributable mortality estimates were larger than the NMFS/NHIS estimates for three of the four diseases: They were 13 percent higher for chronic obstructive pulmonary disease, 54 percent higher for coronary heart disease, and 31 percent higher for cerebrovascular disease. The smoking-attributable mortality estimates for lung cancer differed by only 2.6 percent. Overall (for both genders and all four diseases combined), the estimated smoking-attributable mortality based on CPS II data was 19 percent larger than that based on NMFS/NHIS data.

Comparison of age-adjusted and fully adjusted estimates from the CPS II data
The age-adjusted attributable fractions were within 10 percent of the fully adjusted attributable fractions for six of the eight disease- and gender-specific comparisons (table 3 and table 4); four of the eight age-adjusted attributable fractions were within 5 percent of the fully adjusted attributable fractions. The most meaningful difference was for males who died from cerebrovascular disease, for whom the age-adjusted attributable fraction was 60 percent larger than the fully adjusted attributable fraction (0.16 vs. 0.10).


View this table:
[in this window]
[in a new window]
 
TABLE 4. Fully adjusted* attributable fractions (AFs) and smoking-attributable mortality (SAM), by gender and disease, in the American Cancer Society's second Cancer Prevention Study (1982–1988)

 
For all four diseases and both genders combined, the age-adjusted smoking-attributable mortality (236,942) was 2.5 percent smaller than the fully adjusted smoking-attributable mortality (243,047). For men, for all four diseases combined, the age-adjusted smoking-attributable mortality almost equaled the fully adjusted smoking-attributable mortality (167,468 vs. 167,504). For women, for all four diseases combined, the age-adjusted smoking-attributable mortality was 8.0 percent smaller than the fully adjusted smoking-attributable mortality (69,474 vs. 75,543).

For both genders combined, the fully adjusted smoking-attributable mortality was almost equal to the age-adjusted smoking-attributable mortality for lung cancer (94,670 vs. 93,387) and chronic obstructive pulmonary disease (53,587 vs. 52,821). The fully adjusted smoking-attributable mortality was 10.0 percent larger than the age-adjusted smoking-attributable mortality for coronary heart disease (82,666 vs. 75,092) and 22.5 percent smaller than that for cerebrovascular disease (12,124 vs. 15,642).


    DISCUSSION
 TOP
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS
 DISCUSSION
 APPENDIX A
 APPENDIX B
 APPENDIX C
 REFERENCES
 
The number of deaths attributable to cigarette smoking in the United States has been estimated by the Surgeon General (1Go), by Sterling and colleagues (4Go, 5Go), and by Peto et al. (25Go). In contrast to the Surgeon General and Sterling and colleagues, Peto et al. estimated smoking-attributable mortality indirectly using vital statistics data. Sterling et al. (4Go) found that the disease-specific, age-adjusted relative risks calculated from NMFS/NHIS data were considerably smaller than those presented in the Surgeon General's report (1Go). We found that the age-adjusted relative risks estimated from the two data sources for coronary heart disease and cerebrovascular disease did not differ substantially. The age-adjusted relative risks for lung cancer estimated using NMFS/NHIS data were 1.3–2.0 times larger than the relative risks estimated using the CPS II data (for both current and former smokers). This pattern applied to African Americans as well (data not shown). We also observed that the age-adjusted relative risks for chronic obstructive pulmonary disease based on the CPS II data were approximately 1.7 times larger than those based on the NMFS/NHIS data for men but were about the same for women.

Differences in methodology may explain why our findings concerning the age-adjusted relative risks differed from Sterling et al.'s (4Go). We directly adjusted the NMFS/NHIS data for age, whereas Sterling et al. (4Go) used indirect adjustment. The method of age adjustment did not appear to affect the magnitude of the age-adjusted relative risks from the NMFS/NHIS data for current and former smoking with regard to coronary heart disease, cerebrovascular disease, and chronic obstructive pulmonary disease (or with regard to lung cancer for former smoking). However, our age-adjusted relative risks from the NMFS/NHIS data for current smoking and lung cancer were 2–3 times larger than Sterling et al.'s (4Go) (our relative risks were also larger for African Americans (data not shown)). We also used the same reference population for direct standardization of both the NMFS/NHIS and the CPS II relative risks, whereas different populations were used for standardization in Sterling et al.'s (4Go) comparison of relative risks from the NMFS/NHIS and the Surgeon General's report.

We noted that the NMFS/NHIS data produced several age-specific relative risk estimates in former smokers that were less than 1.0 for coronary heart disease and cerebrovascular disease, particularly among older men. Sterling et al. (4Go) also reported this pattern. This pattern is probably related to 1) limited sample sizes in specific cells (e.g., the confidence intervals for cerebrovascular disease for men included 1.0) and 2) the fact that information about decedents' smoking habits was obtained from proxy respondents for the NMFS (several studies have shown that proxy respondents tend to underreport smoking history (26GoGo–28Go). There is no biologically plausible explanation for smoking's being protective against coronary heart disease or cerebrovascular disease, or indeed against lung cancer or chronic obstructive pulmonary disease (29Go).

Similar to other cohort studies (19Go, 30Go), we observed lower age-adjusted relative risks for lung cancer among female smokers than among male smokers in the CPS II and NMFS/NHIS data. These differences largely reflect gender differences in patterns of smoking across birth cohorts. Women began smoking in the 1930s and 1940s, approximately 20–30 years after men, which resulted in a lower cumulative smoking exposure (31Go). In addition to starting smoking at a later age, women in the CPS II reported smoking fewer cigarettes per day and were less likely to inhale deeply than men (19Go). Lung cancer risk increases with amount smoked, duration of smoking, and depth of inhalation in both men and women (30Go). However, the CPS II relative risks for lung cancer were still higher among men than among women within strata of 5-year duration of smoking (30–50 years) and cigarettes per day (20 vs. 40) (19Go). These differences may have been due to remaining gender differences in duration and amount within these broad strata. In contrast to cohort studies, some recent case-control studies have found a comparable risk or even a higher risk for female smokers than for male smokers (32GoGoGoGo–36Go). Possible explanations include a lower risk of lung cancer among non-smoking women (perhaps due to occupational exposures that increase the nonsmoker risk in men), differences in the reporting of exposure information, differences in the dynamics of cigarette smoking, increased female susceptibility to carcinogens in tobacco smoke, and increased exposure levels among women (32Go, 36Go, 37Go).

Overall (for both genders and for all four diseases combined), we found that the age-adjusted smoking-attributable mortality estimate using CPS II data was 19 percent larger than the estimate using NMFS/NHIS data. Sterling et al. (4Go), however, found the CPS II-based estimate from the Surgeon General's report to be approximately 40 percent larger than the NMFS/NHIS-based estimate. The lower percentage we calculated may reflect the fact that the NMFS/NHIS estimates in our analysis were not uniformly smaller than the CPS II estimates. For example, the CPS II-based estimated smoking-attributable mortality for the four diseases combined was 36 percent larger for men but 8 percent smaller for women. In addition, the two data sources yielded essentially the same smoking-attributable mortality estimates for lung cancer, which accounts for approximately 40 percent of the total smoking-attributable mortality for all four diseases combined.

Two reasons why our results concerning smoking-attributable mortality differ from Sterling et al.'s (4Go) are that 1) we used similar methods to calculate the relative risks, attributable fractions, and smoking-attributable mortalities from the NMFS/NHIS and the CPS II and 2) we substituted the point estimate of 1.0 for the age-, sex-, and cause-specific relative risks that were less than 1.0 (thereby assuming that smoking did not cause or prevent deaths for these groups). We made this substitution because the confidence intervals for all of the relative risks that were less than 1.0 included 1.0 (partly due to small sample sizes) and because the literature does not support a protective relation between cigarette smoking and these four diseases (29Go). Sterling et al. (4Go) calculated smoking-attributable mortality from the NMFS/NHIS data using age-specific relative risks, and they compared these with smoking-attributable mortality estimates from the Surgeon General's report that were calculated from age-adjusted relative risks. Sterling et al. (4Go) used relative risks that were less than 1.0 in their smoking-attributable mortality calculations from the NMFS/NHIS data, thereby estimating that cigarette smoking prevented 7,200 deaths from coronary heart disease and cerebrovascular disease among persons aged 65 years or older.

Further adjustment of the attributable fractions and smoking-attributable mortality estimates from the CPS II for disease-appropriate confounders (education, alcohol intake, hypertension status, and diabetes status) indicated little residual confounding once age was taken into account. For example, for both genders combined, the age-adjusted estimate of smoking-attributable mortality was only 2.5 percent smaller than the fully adjusted estimate. This result contrasts with that of Sterling et al. (4Go), who found that the age-adjusted smoking-attributable mortality estimate (using the NMFS/NHIS data) for these same four diseases combined was 26 percent larger than the estimate adjusted for age, alcohol intake, and income.

We noted that adjusting the attributable fraction for covariates other than age was more important for women than for men and more important for coronary heart disease and cerebrovascular disease than for lung cancer and chronic obstructive pulmonary disease. Sterling et al. (4Go) also found that there is more residual confounding for coronary heart disease and cerebrovascular disease than for lung cancer and chronic obstructive pulmonary disease. Several methodological differences between our work and Sterling et al.'s (4Go) are likely to account for the disparate findings regarding adjustment for multiple confounders. First, we used a model-based approach to control for confounding, while Sterling et al. (4Go) used a stratified analysis. We believe the model-based approach produced more robust estimates in comparison with the stratified analysis, where the sample sizes for estimating relative risks may have been small for certain cells. Second, as discussed above, we substituted 1.0 in calculations when the estimated relative risks were less than 1.0. Much of the change that Sterling et al. (4Go) noted between the age-adjusted and the fully adjusted smoking-attributable mortalities was due to the apparent protective effect of smoking in older men for coronary heart disease and cerebrovascular disease (they calculated 7,200 deaths prevented in the age-adjusted model and 47,600 deaths prevented in the alcohol-, income-, and age-adjusted model). Third, we examined the issue of confounding within the CPS II, whereas Sterling et al. (4Go) used the NMFS/NHIS. Therefore, it is possible that the nature of the confounding of the relation between smoking and mortality varied by data set.

It is not surprising that the results from the NMFS/NHIS data and the CPS II data differ because of the differences in study populations and in data collection. The NMFS was based on a stratified sample of death certificates, and information was obtained from proxy respondents through mailed self-administered questionnaires. The CPS II was a longitudinal study of a cohort of friends and relatives of American Cancer Society volunteers which obtained self-reported data. Proxy respondents are more likely to misreport smoking history (26GoGo–28Go, 38Go). It has been debated whether it is inappropriate to use relative risks from the CPS II to estimate smoking-attributable mortality in the United States, because the CPS II was not a representative sample of the target population. However, Rothman has argued that the utility of the relative risks should be determined not by whether the study population is a representative subgroup of the target population but by understanding the biologic relation between smoking and disease (39Go).

Epidemiologists and public health leaders have used the attributable fraction and its corresponding number of attributable deaths to educate public health policy-makers on the potential benefits of intervention with regard to cigarette smoking and several other underlying causes of disease (40Go, 41Go). Policy-makers, in turn, use these estimates to help determine current priorities and strategies for disease control (41Go). Therefore, it is important to calculate the most valid and accurate smoking-attributable mortality estimates for the United States and the world as possible. In our analysis, we assumed that the stratum-specific relative risks were generalizable to the US population, because we stratified the data according to disease-specific biologically relevant covariates. With such an assumption, we estimate that approximately 240,000 annual deaths from lung cancer, chronic obstructive pulmonary disease, coronary heart disease, and cerebrovascular disease among US Whites can be attributed to smoking. In 1990–1994, the total annual average smoking-attributable mortality for the entire US population—for all smoking-related cancers, cardiovascular diseases, and respiratory diseases, lung cancer deaths due to environmental tobacco smoke, and deaths due to smoking-related fires—was approximately 430,000 (3Go).

If current smoking patterns continue, an estimated 5 million US residents who were under age 18 years in 1995 will die prematurely from smoking-related illnesses (42Go). Unfortunately, the recent increase in adolescent smoking will probably have an undesirable effect on smoking-attributable mortality for this birth cohort (43Go). Annual medical costs attributable to cigarette smoking have been estimated at $53.3 billion (44Go), and annual indirect costs associated with morbidity and premature mortality from cigarette smoking have been estimated at $6.9 billion and $40.3 billion, respectively (45Go). The human and economic costs of smoking will continue to accumulate until effective public health efforts to prevent initiation, promote cessation, and protect nonsmokers from the adverse effects of environmental tobacco smoke are established at every level of society.


    APPENDIX A
 TOP
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS
 DISCUSSION
 APPENDIX A
 APPENDIX B
 APPENDIX C
 REFERENCES
 
Point and Interval Estimation for Relative Risks, Attributable Fractions, and Smoking-Attributable Mortality
This appendix outlines the methods we used to compare age-specific and age-adjusted relative risks, smoking-attributable fractions, and smoking-attributable mortality as calculated from the National Mortality Follow-back Survey (NMFS)/National Health Interview Survey (NHIS) data with those calculated from the Cancer Prevention Study II (CPS II) data. In this application, the data are stratified by gender and are limited to Whites. First, we examine age- and cause-specific relative risks, smoking-attributable fractions, and death rates; then the age- and cause-specific relative risks from the two data sources are directly age-adjusted to the 1986 US population. Age-adjusted attributable fractions for each cause are obtained as a weighted sum of the age-specific estimates (using the proportion of deaths in each age group as the weights).

Point and confidence interval estimation for relative risks calculated from the CPS II data
We estimate age-specific relative risks directly from the CPS II data, which are in the form of deaths and person-years. As usual, confidence intervals for the relative risks are obtained by finding confidence intervals for the (natural) log relative risks and exponentiating.

Point and confidence interval estimation for relative risks calculated from the NMFS/NHIS data
Smoking prevalences within each age group, stratified by gender, are obtained using the 1987 NHIS data as pij (i = 1, 2, ... total number of age groups, j = 0, 1, 2 smoking categories). The corresponding smoking prevalences within each age group, stratified by gender and disease, among US decedents are zij and are obtained from the 1986 NMFS. pij and zij represent age-specific probabilities, i.e., and . We estimated these prevalences and their associated variances by using SUDAAN (version 6.34; Research Triangle Institute, Research Triangle Park, North Carolina), which accounted for the survey designs.

Let Di be the total number of US deaths for each disease/gender group in age group i and Ni the number in the US population of each gender in age group i. Then, the age-specific death rate for smoking category j is , and the relative risk is a function of the smoking prevalence estimates pij and zij; that is, .

A confidence interval for Ri is obtained by finding a confidence interval for the log relative risk and exponentiating. The natural log of the relative risk is ln(Rij) = . Using a Taylor series approximation, the variance of the log relative risk is

(A1)
assuming that the age-specific prevalences (zij and pij) follow a multinomial distribution within each age group. The di and ni are the numbers of deceased and living persons, respectively, in age group i from the NMFS and NHIS surveys, so and are the total numbers of respondents in each survey.

Directly age-adjusted relative risks
From both the NMFS data and the CPS II data, we obtain the age-adjusted log relative risks (ln(RA)) by taking weighted sums of the age-specific log relative risks. The weights are based on the proportion of the US population in each age group. The variance of ln(RA) is , where wi represents the proportion of the population in age group i and Var(ln(Rij)) is estimated as appropriate for either the CPS II or the NMFS (see above paragraphs). To obtain a confidence interval for RA, we assume that ln(RA) follows a normal distribution.

Age-specific attributable fractions
The age-specific attributable fraction (AFi) is

(A2)
where pij represents the joint smoking prevalence in age group i and smoking category j, so that pi is the proportion of the US population in age group i. The logit of AFi is logit The variance of the logit(AFi) is



where b = ni0 for CPS II and b = zi0 Di for the NMFS.

Age-adjusted attributable fractions and smoking-attributable mortality
We obtained cause-specific age-adjusted attributable fractions (AFA) as a weighted sum of the age-specific estimates by using the proportion of deaths in each age group as the weights, such that and Var(logit(AFij)), using Var(logit(AFi) from the previous section. An approximate confidence interval for AFA is then AFA ± 1.96 . Similarly, smoking-attributable mortality (SAM) = {sum}i Di AFi, Var(SAM) = {sum}i [Di AFi (1 - AFi)]2 Var(logit(AFi)), and an approximate confidence interval for smoking-attributable mortality is SAM ± 1.96 .


    APPENDIX B
 TOP
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS
 DISCUSSION
 APPENDIX A
 APPENDIX B
 APPENDIX C
 REFERENCES
 
Estimation of the Adjusted Smoking-Attributable Fraction
Bruzzi et al. (17Go) have described a method for estimating the adjusted attributable fraction when using case-control data. This appendix describes the adaptation of their method for application to the CPS II cohort data. We first introduce the notation to present the adapted method in general terms. We then describe a Poisson regression model for the number of deaths in the CPS II and close with a derivation of a formula for the adjusted attributable fraction in terms of parameters in the Poisson regression model.

Notation
The derivation of a formula for the adjusted attributable fraction from cohort data requires the following notation. Let E denote the exposure factor with levels e = 0, 1, ..., LE, and let C denote the (cross-classification of) confounding factor(s) with levels c = 0, 1, ..., LC. That is, c indexes the combination of levels of the confounding factors. Let j = (e,c) denote the index for the stratum in the cross-classification of exposure by confounding factors. If C is a cross-classification of several confounding factors, then the index c is multidimensional. More concretely, let there be F factors, Cf, = 1, ..., F, with factor 1 the exposure (that is, C1 = E) and factors Cf, f = 2, ..., F, the potential confounders, and let Lf be the number of levels of factor Cf. Then index j is an F-tuple (jl, ..., jF), and the set of indices J is the Cartesian set Ll x ... LF.

The data from the CPS II cohort study consist of counts of deaths dj and person-years nj in stratum j, j {varepsilon} J. Usually, these data are modeled conditionally on the nj, such that the dj are independently and Poisson distributed, dj ~ Poisson ({lambda}jnj), j {varepsilon} J. With these assumptions, the expectation of dj is E(dj|nj) = {lambda}j nj, which clarifies the interpretation of {lambda}j as a rate.

Poisson regression model
The search for a parsimonious description of the CPS II data may be attempted with the fit of a log-linear model for the Poisson rates {lambda}j; that is, log E(dj|{lambda}j,nj, = log nj + log {lambda}j, and

(B1)

The adjusted attributable fraction
The adjusted attributable fraction may be expressed in terms of rate ratios, which in turn depend on parameters in a Poisson regression model. To derive an expression for the adjusted rate ratios, one compares the death rate of individuals in an elevated risk stratum (e.g., e) and level c of the confounding factors with the death rate of individuals at the baseline level of the risk factor (e = 0), keeping level c fixed. To conduct this comparison, with each j = (e,c) associate j* = (0,c), which has the same level of the confounder but shifts the exposure to the baseline level.

To follow steps similar to the original derivation of Bruzzi et al. (17), let Nj be the population number of person-years in stratum j and let Dj be the population number of deaths in that stratum. Now define the following:

The attributable fraction AFC = ({lambda}TN - {lambda}CN)/({lambda}TN) = 1 - {lambda}C/{lambda}T may be expressed in terms of the adjusted relative risks j; that is, , where {rho}j = Dj/D. The last formula is used for the estimation of AFC. The rate ratios j may be obtained directly from the Poisson regression via , or alternatively they can be computed from the parameters in the log-linear parameterization shown in equation B1.


    APPENDIX C
 TOP
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS
 DISCUSSION
 APPENDIX A
 APPENDIX B
 APPENDIX C
 REFERENCES
 
Confidence Intervals for the Adjusted Attributable Fractions
The stratification of the CPS II data by exposure and by levels of potential confounders yields tables in which some strata have low counts. Furthermore, the decision to include all main effects and certain interactions in the models limits the reduction of the number of parameters in the models. The combination of sparse data and models with many parameters warrants concern that the Wald standard errors reported in the SAS statistical packages (versions 6.08, 6.10, and 6.11; SAS Institute, Inc., Cary, North Carolina) may be inadequate. To avoid the use of Wald standard errors to derive the standard error for the attributable fraction, we made recourse to the bootstrap by approximating the distribution of c, the adjusted estimate of AFC.

A broad description of this application of the bootstrap follows. One obtains a large number of independent bootstrap estimates of the AFC, that is, c(b), b = 1, ..., B, by independently generating B samples with replacement from the original sample.

There are two sources of data used in this analysis, the CPS II and the NMFS. Therefore, to generate a bootstrap estimate c(b), we combined bootstrap estimates of the rate ratios j(b), j {varepsilon} J, computed using the CPS II data, with bootstrap estimates of the prevalences , j {varepsilon} J, computed using the NMFS data, by means of c(b) = 1 - {sum}j (j(b)/ j(b)), b = 1, ... B.

Below, we first describe the generation of j(b) and then the generation of j(b). The generation of j(b) requires resampling from the CPS II, whereas the generation of j(b) requires resampling from the NMFS.

Bootstrap estimates j(b)
The method of generating bootstrap estimates j(b) for each disease/gender combination has four steps: 1) restructuring the data; 2) drawing the sample; 3) stratifying the sample; and 4) calculating the rate ratio.

In step 1, the CPS II data are restructured because the data are stratified counts and individual-specific covariate values are unavailable. Therefore, we compiled the CPS II data into separate tables for each disease/gender combination. We further stratified each one of these tables by factors of interest (smoking status, age, education, etc.). Let j {varepsilon} J index the stratum in a given disease x gender table and let J denote the index set for that table. Then the number of deaths (dj) and the person-years (nj) are calculated for every j {varepsilon} J. Therefore, the tables contain counts of deaths and person-years stratified by factor levels. In this fashion, we built a list of d = {Sigma}dj records (one record for each death) with individual-specific factor levels. These expanded data sets were amenable to bootstrap resampling.

For a given disease/gender combination, in the bth bootstrap iteration d records were randomly and independently selected with replacement from an expanded table (step 2). We used the selected records to build a bootstrap stratification table having the same factors as the original stratification, namely with index set J, where the data associated with stratum j are the number of selected records (deaths) with that combination of factors. We then combined the deaths in stratum j of the bootstrap table with the person-years in that stratum in the original CPS II stratification table (step 3).

The final step was to fit a Poisson regression model to the bootstrap table defined by disease and gender. We used the estimated coefficients from the model to develop bootstrap estimates of j(b), j {varepsilon} J (step 4).

Bootstrap estimates j(b)
For a given disease/gender combination, the method of generating separate bootstrap estimates of j(b), b = 1, ..., B, has five steps: 1) generating the sample; 2) developing weights to adjust for nonresponse; 3) developing poststratification weights; 4) applying the weights and stratifying the sample; and 5) calculating .

Unlike the CPS II, the NMFS data set contains individual-specific data records, and therefore bootstrap sampling was immediate. Thus, for each bootstrap iteration, the first step was to take, with replacement, a separate sample of records from each of the original 18 NMFS sampling strata (step 1). Each of these samples was of the same size as the corresponding original NMFS sampling stratum. For example, there were 540 records in the first stratum of the original NMFS; i.e., the first stratum is defined to include deaths among American Indians, Eskimos, and Aleuts, regardless of the cause. Therefore, the bootstrap samples also had 540 records in the first stratum. It is important to note that deaths in three of the 18 original sampling strata were selected with certainty for inclusion in the NMFS. These records were also selected with certainty for each of the bootstrap samples.

We weighted each record in the NMFS so that a nationally representative estimate of j(b) could be obtained. Therefore, the next step was to develop weights for the bootstrap sample drawn in the first step. Three weighting factors were required: sampling weights, nonresponse weights, and poststratification weights. The sampling weights, the inverse of the probabilities of selection, are fixed by design and therefore were held constant in the boostrap. In contrast, the values for the weighting factors used to adjust for nonresponse and poststratification depended on the distribution of deaths among the factor levels. Therefore, these weighting factors differed from bootstrap sample to boostrap sample.

The adjustment for nonresponse in the original NMFS was calculated for subsets, defined by age intervals, within each of the 18 sampling strata. We conducted this adjustment for each bootstrap sample (step 2). That is, we stratified each bootstrap sample into the same subsets as the original NMFS sample and calculated the response rate within each of these subsets. The reciprocals of these response rates were used for the nonresponse adjustment. With this approach, all deaths in a particular subset have the same nonresponse adjustment.

The next step was to develop poststratification weights (step 3). We used the original NMFS poststrata, which were defined by race, sex, and age. We calculated poststratification weights for each stratum in a bootstrap sample by dividing an estimate of the number of US deaths for that poststratum (estimated from the original NMFS sample) by the weighted sum of the deaths in the poststratum for that sample.

After calculation and application of the final weight for each death (the final weight for each death being the product of three weights: the inverse of the probability of selection, the reciprocal of the response rate, and the poststratification weight), step 4 was to create eight stratification tables defined by disease and gender from each bootstrap sample (step 4). We further stratified each one of these by other factors of interest (smoking status, age, education, etc.). The final step was to calculate , separately for each one of the eight tables (step 5).


    ACKNOWLEDGMENTS
 
The authors gratefully acknowledge Dr. Sander Greenland for serving as a consultant to this project and Cathy Day-Lally for her help and expertise with the CPS II data.


    NOTES
 
Reprint requests to Dr. Ann M. Malarcher, Office on Smoking and Health, Centers for Disease Control and Prevention, 4770 Buford Highway NE, Mailstop K-50, Atlanta GA 30341 (e-mail: aym8{at}cdc.gov).


    REFERENCES
 TOP
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS
 DISCUSSION
 APPENDIX A
 APPENDIX B
 APPENDIX C
 REFERENCES
 

  1. Centers for Disease Control. Reducing the health consequences of smoking: 25 years of progress. A report of the Surgeon General. Rockville, MD: US Department of Health and Human Services, 1989. (DHHS publication no. (CDC) 89–8411).
  2. Cigarette smoking-attributable mortality and years of potential life lost—United States, 1990. MMWR Morb Mortal Wkly Rep 1993;42:645–9.[Medline]
  3. Smoking-attributable mortality and years of potential life lost—United States, 1984. (With editorial note—1997). MMWR Morb Mortal Wkly Rep 1997;46:444–51.[Medline]
  4. Sterling TD, Rosenbaum WL, Weinkam JJ. Risk attribution and tobacco-related deaths. Am J Epidemiol 1993;138:128–39.[Abstract]
  5. Weinkam JJ, Rosenbaum WL, Sterling TD. Computation of relative risk based on simultaneous surveys: an alternative to cohort and case-control studies. Am J Epidemiol 1992;136:722–9.[Abstract]
  6. Lee PN. Mortality from tobacco in developed countries: are indirect estimates reliable? Regul Toxicol Pharmacol 1996;24:60–8.[ISI][Medline]
  7. Benichou J. Methods for adjustment for estimating the attributable risk in case-control studies: a review. Stat Med 1991;10:1753–73.[ISI][Medline]
  8. Benichou J, Gail MH. Variance calculations and confidence intervals for estimates of the attributable risk based on logistic models. Biometrics 1986;46:991–1003.
  9. Gefeller O, Eide GE. Methods of adjustment for estimating the attributable risk in case-control studies: a review. Stat Med 1993;12:91–6.[ISI][Medline]
  10. Gefeller O. Comparison of adjusted attributable risk estimators. Stat Med 1992;11:2083–91.[ISI][Medline]
  11. Greenland S, Drescher K. Maximum likelihood estimation of the attributable fraction from logistic models. Biometrics 1993;49:865–72.[ISI][Medline]
  12. Greenland S. Variance estimators for attributable fraction estimates consistent in both large strata and sparse data. Stat Med 1987;6:701–8.[ISI][Medline]
  13. Kuritz SJ, Landis JR. Summary attributable risk estimation from unmatched case-control data. Stat Med 1988;7:507–17.[ISI][Medline]
  14. Leung HM, Kupper LL. Comparison of confidence intervals for attributable risk. Biometrics 1981;37:293–302.[ISI][Medline]
  15. Whittemore AS. Statistical methods for estimating attributable risk from retrospective data. Stat Med 1982;1:229–43.[Medline]
  16. Whittemore AS. Estimating attributable risk from case-control studies. Am J Epidemiol 1982;117:76–85.[Abstract]
  17. Bruzzi P, Green SB, Byar DP, et al. Estimating the population attributable risk for multiple risk factors using case-control data. Am J Epidemiol 1985;122:904–14.[Abstract]
  18. National Center for Health Statistics. Compressed mortality file, 1979–1992. (Public use data file). Hyattsville, MD: National Center for Health Statistics, 1999.
  19. Thun MJ, Day-Lally C, Myers DG, et al. Trends in tobacco smoking and mortality from cigarette use in Cancer Prevention Studies I (1959–1965) and II (1982–1988). In: Shopland DR, ed. Changes in cigarette-related disease risks and their implication for prevention and control. (Smoking and tobacco control monograph no. 8). Bethesda, MD: National Cancer Institute, 1997.
  20. World Health Organization. International classification of diseases. The international statistical classification of diseases, injuries, and causes of death. Ninth Revision. Geneva, Switzerland: World Health Organization, 1977.
  21. Seeman I, Poe GS, Powell-Griner E. Development, methods and response characteristics of the 1986 National Mortality Follow-back Survey. Series 1. Programs and collection procedures, no. 29. Washington, DC: US Department of Health and Human Services, 1993. (DHHS publication no. (PHS) 93–1305).
  22. Massey J, Moore TF, Parsons VL, et al. Design and estimation for the National Health Interview Study, 1985–94. Hyattsville, MD: National Center for Health Statistics, 1989. (Vital and health statistics, series 2, no. 110).
  23. Kleinbaum DG, Kupper LL, Morgenstern H. Epidemiologic research: principles and quantitative methods. Belmont, CA: Lifetime Learning Publications, 1982.
  24. Efron B, Tibshirani RJ. An introduction to the bootstrap. New York, NY: Chapman and Hall, 1993.
  25. Peto R, Lopez AD, Boreham J, et al. Mortality from tobacco in developed countries: indirect estimation from national vital statistics. Lancet 1992;339:1268–78.[ISI][Medline]
  26. Boyle CA, Brann EA. Proxy respondents and the validity of occupational and other exposure data. The Selected Cancers Cooperative Study Group. Am J Epidemiol 1992;136:712–21.[Abstract]
  27. 27. Herrmann N. Retrospective information from questionnaires. I. Comparability of primary respondents and their next-of-kin. Am J Epidemiol 1985;121:937–47.[Abstract]
  28. Pickle LW, Brown LM, Blot WJ. Information available from surrogate respondents in case-control interview studies. Am J Epidemiol 1983;118:99–108.[Abstract]
  29. Centers for Disease Control. The health benefits of smoking cessation. A report of the Surgeon General. Rockville, MD: US Department of Health and Human Services, 1990. (DHHS publication no. (CDC) 90-8416).
  30. Centers for Disease Control. The health consequences of smoking: cancer. A report of the Surgeon General. Rockville, MD: US Department of Health and Human Services, 1982. (DHHS publication no. (PHS) 82-50179).
  31. Waldron I. Patterns and causes of gender differences in smoking. Soc Sci Med 1991;32:989–1005.[ISI][Medline]
  32. Risch HA, Howe GR, Jain M, et al. Are female smokers at higher risk for lung cancer than male smokers? A case-control analysis by histologic type. Am J Epidemiol 1993;138:281–93.[Abstract]
  33. Zang EA, Wynder EL. Differences in lung cancer risk between men and women: examination of the evidence. J Natl Cancer Inst 1996;88:183–92.[Abstract/Free Full Text]
  34. Brownson RC, Chang JC, Davis JR. Gender and histologic type variations in smoking-related risk of lung cancer. Epidemiology 1992;3:61–4.[ISI][Medline]
  35. Osann KE, Anton-Culver H, Kurosaki T, et al. Sex differences in lung-cancer risk associated with cigarette smoking. Int J Cancer 1993;54:44–8.[ISI][Medline]
  36. Wilcox AJ. Re: "Are female smokers at higher risk for lung cancer than male smokers? A case-control analysis by histologic type." (Letter). Am J Epidemiol 1994;140:186.
  37. Hoover DR. Re: "Are female smokers at higher risk for lung cancer than male smokers? A case-control analysis by histologic type." (Letter). Am J Epidemiol 1994;140:186–7.
  38. Patrick DL, Cheadle A, Thompson DC, et al. The validity of self-reported smoking: a review and meta-analysis. Am J Public Health 1994;84:1086–93.[Abstract]
  39. Rothman KJ. Modern epidemiology. Boston, MA: Little, Brown and Company, 1986.
  40. McGinnis JM, Foege WH. Actual causes of death in the United States. JAMA 1993;270:2207–12.[Abstract]
  41. McKenna MT, Taylor WR, Marks JS, et al. Current issues and challenges in chronic disease control. In: Brownson RC, Remington PL, Davis JR, eds. Chronic disease epidemiology and control. 2nd ed. Washington, DC: American Public Health Association, 1998.
  42. Projected smoking-related deaths among youth—United States. MMWR Morb Mortal Wkly Rep 1996;45:971–4.[Medline]
  43. Johnston LD, O'Malley PM, Bachman JG. National survey results on drug use from the Monitoring the Future Study, 1975–1998. Vol 1. Secondary school students. Rockville, MD: National Institute on Drug Abuse, 1999. (NIH publication no. 99-4660).
  44. Miller VS, Ernst C, Collin F. Smoking-attributable medical care costs in the USA. Soc Sci Med 1999;48:375–91.[ISI][Medline]
  45. Herdman R, Hewitt M, Lashover M. Smoking-related deaths and financial costs: Office of Technology Assessment estimates for 1990. (OTA testimony before the Senate Special Committee on Aging, May 6, 1993). Washington, DC: Office on Smoking and Health, Centers for Disease Control and Prevention, 1993. (http://www.cdc.gov/tobacco).
Received for publication March 29, 1999. Accepted for publication October 11, 1999.