1 Office on Smoking and Health, Centers for Disease Control and Prevention, Atlanta, GA.
2 Battelle Memorial Institute, Centers for Public Health Research and Evaluation, Baltimore, MD.
3 American Cancer Society, Atlanta, GA.
![]() |
ABSTRACT |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
cardiovascular diseases; cerebrovascular disorders; confounding factors (epidemiology); lung diseases; obstructive; lung neoplasms; mortality; smoking
Abbreviations: CPS II, Cancer Prevention Study II; ICD-9, International Classification of Diseases, Ninth Revision; NHIS, National Health Interview Survey; NMFS, National Mortality Followback Survey.
![]() |
INTRODUCTION |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Both of these criticisms were discussed by Sterling et al. (4), who compared age-adjusted relative risks from the 1989 Surgeon General's report with age-adjusted relative risks estimated using a combination of data from the NHIS and the 1986 National Mortality Follow-back Survey (NMFS). The latter survey used a representative sample of all decedents aged 25 years or older in the United States. In general, the relative risks estimated by Sterling et al. were smaller than the corresponding estimates from the Surgeon General's report, and as a result, the attributable fractions and smoking-attributable mortality estimates reported by Sterling et al. were smaller than the Surgeon General's estimates. Sterling et al. also presented disease-specific estimates of attributable fractions and smoking-attributable mortality adjusted for age; for age and income; for age and alcohol; and for age, alcohol, and income simultaneously (4
). Their estimates of smoking-attributable mortality adjusted for multiple confounders were considerably smaller than their estimates adjusted only for age.
We further explored the two methodological issues raised by Sterling et al. First, we quantified the effect of using different data sources for estimating age-adjusted relative risks, attributable fractions, and smoking-attributable mortality. In particular, we compared age-specific and age-adjusted relative risks estimated using data from the CPS II with those estimated using a combination of data from the NMFS and the NHIS. In contrast to Sterling et al., we directly standardized all age-specific relative risks to the same reference population. Second, we assessed the potential residual confounding of the disease-specific age-adjusted attributable fractions and smoking-attributable mortality estimates. Many researchers have been interested in estimating adjusted attributable fractions by using either case-control data or cohort data (716
). In our application, we adapted the method of Bruzzi et al. (17
) for use with case-control data to estimate the adjusted attributable fraction when combining information from one cross-sectional national survey and one cohort study.
![]() |
MATERIALS AND METHODS |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
The American Cancer Society's CPS II is a cohort study of approximately 1.2 million participants that began in 1982 (19). American Cancer Society volunteers identified potential participantspersons whom they would be able to maintain contact with over the course of the study, such as neighbors and friends. CPS II participants reside in all 50 states, the District of Columbia, Puerto Rico, and Guam. Smoking status and age were determined at baseline from a self-administered questionnaire on medical history, current health status, and lifestyle factors. The volunteers reported vital status for participants every year. For participants who died during follow-up, the underlying cause of death (as indicated by the International Classification of Diseases, Ninth Revision (ICD-9) (20
)) was obtained from the death certificate. After 6 years of follow-up, death certificates had been obtained for 94.1 percent of subjects known to have died.
The 1986 NMFS used a stratified sample of death certificates in the Current Mortality Sample (21), which was a 10 percent sample of death certificates of persons aged 25 years or older. To ensure sufficient sample sizes for subpopulations, certain age and racial/ethnic groups and causes of death were oversampled or selected with certainty. In all, 18,733 death certificates were sampled, of which 2,274 were sampled with certainty. Proxy information on decedents' health behaviors was collected from informants (usually next of kin) and from facility abstract questionnaires (i.e., nursing homes). Information was obtained using a combination of self-administered questionnaires and telephone or field interviews. The response rate for the 1986 NMFS was 88.6 percent.
The NHIS is a national, multistage cluster survey that has been conducted continuously since 1957 (22). The target population is the civilian, noninstitutionalized population of the United States aged 18 years or older. Questions about tobacco smoking frequently appear on supplements to the NHIS. The 1987 NHIS supplement included tobacco questions as part of a special section on cancer control. Although proxy responses to the core NHIS are permitted, supplemental interviews must be completed by the sampled respondents. A total of 47,240 interviews were completed in 1987.
Comparison of CPS II and NMFS/NHIS estimates
We restricted our comparison of age-adjusted estimates obtained from CPS II data and the NMFS/NHIS data to the four most common smoking-related diseases: lung cancer (ICD-9 code 162), chronic obstructive pulmonary disease (ICD-9 codes 490492 and 496), coronary heart disease (ICD-9 codes 410414), and cerebrovascular disease (ICD-9 codes 430438). We used four age groups (3559, 6069, 7079, and 80 years) when we analyzed lung cancer, coronary heart disease, and cerebrovascular disease; we used only two age groups (5069 and
70 years) when we analyzed chronic obstructive pulmonary disease, because there are so few deaths below age 50 years for this cause of death. All estimates were calculated separately by gender. Because 93 percent of the CPS II respondents are White, we limited our study to Whites. Subjects were classified as current smokers (persons who reported that they smoked now), former smokers (persons who reported that they had ever smoked but did not smoke now), or never smokers (persons who reported that they had never smoked).
We used standard methods to estimate age-, gender-, and cause-specific relative risks and attributable fractions, age-adjusted relative risks, and age-adjusted attributable fractions and smoking-attributable mortality (see Appendix A). Briefly, age-specific relative risks for current or former smokers compared with never smokers for each disease/gender group were directly age-adjusted by taking a weighted sum of the age-specific relative risks (using the proportion of the 1986 US population in each age group as the weights). Age-specific attributable fractions for each disease/gender group were estimated by combining age-specific relative risks and smoking prevalence estimates for never, former, and current smokers. Age-adjusted attributable fractions were calculated as a weighted sum of the age-specific attributable fractions (using the proportion of deaths in the United States in 1986 in each age group as the weights). For each disease/gender group, we estimated age-adjusted smoking-attributable mortality by multiplying the age-specific attributable fractions by the number of deaths within the age group and then summing across age groups.
Comparison of age-adjusted and fully adjusted estimates
To examine the potential for residual confounding of the age-adjusted estimates, we compared age-adjusted estimates and estimates adjusted for multiple confounding factors (age, education, alcohol use, hypertension status, and diabetes status). As before, we restricted the analysis to the four most common smoking-related diseases: lung cancer, chronic obstructive pulmonary disease, coronary heart disease, and cerebrovascular disease. The results are presented separately by gender, and the analysis was limited to Whites. Subjects were classified as current, former, or never smokers.
We varied the set of potential confounders by disease. Age (<59, 6064, 6569, 7075, 7679, 8084, and 85 years) and education (<12 years, 12 years, and >12 years) were included for all diseases. Alcohol use (nondrinking, and <1, 1, and
2 drinks per day) was also included for chronic obstructive pulmonary disease, coronary heart disease, and cerebrovascular disease. History of hypertension (yes/no) and diabetes mellitus (yes/no) was also included for coronary heart disease and cerebrovascular disease.
The method used for adjusting the attributable fraction was an adaptation of that used by Bruzzi et al. (17). Details on the method are presented in Appendix B. Briefly, the adjusted attributable fraction (AF) is given by the formula
![]() |
We calculated the confidence intervals for the adjusted attributable fraction and smoking-attributable mortality estimates using bootstrap methods (24), because traditional methods for estimating standard errors are based on asymptotic results, the validity of which may not hold with sparse data and with models containing many parameters. To generate bootstrap estimates for the adjusted attributable fraction, we combined bootstrap estimates of the relative risks computed from the CPS II data with bootstrap estimates of the proportions computed from the NMFS data. Details on the bootstrap calculation are presented in Appendix C.
![]() |
RESULTS |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
|
|
For both genders, age-adjusted relative risks for lung cancer were larger when estimated using the NMFS/NHIS data than when estimated using the CPS II data, and the age-adjusted relative risks for cerebrovascular disease were slightly higher in the CPS II than in the NMFS/NHIS (table 1 and table 2). The age-adjusted relative risks for chronic obstructive pulmonary disease and coronary heart disease based on the CPS II were larger than those based on the NMFS/NHIS for men but not for women.
The total age-adjusted smoking-attributable mortality for the four diseases combined was 36 percent higher for men and 8 percent lower for women when it was estimated using CPS II data rather than NMFS/NHIS data (table 3). For men, age-adjusted attributable fraction and smoking-attributable mortality estimates based on CPS II data were consistently larger for all four diseases than they were for estimates based on NMFS/NHIS data. For women, however, estimates based on the two data sources were quite comparable for chronic obstructive pulmonary disease, coronary heart disease, and cerebrovascular disease, whereas the age-adjusted attributable fraction and smoking-attributable mortality for lung cancer based on CPS II data were approximately 11 percent lower than the age-adjusted attributable fraction and smoking-attributable mortality based on NMFS/NHIS data.
|
Comparison of age-adjusted and fully adjusted estimates from the CPS II data
The age-adjusted attributable fractions were within 10 percent of the fully adjusted attributable fractions for six of the eight disease- and gender-specific comparisons (table 3 and table 4); four of the eight age-adjusted attributable fractions were within 5 percent of the fully adjusted attributable fractions. The most meaningful difference was for males who died from cerebrovascular disease, for whom the age-adjusted attributable fraction was 60 percent larger than the fully adjusted attributable fraction (0.16 vs. 0.10).
|
For both genders combined, the fully adjusted smoking-attributable mortality was almost equal to the age-adjusted smoking-attributable mortality for lung cancer (94,670 vs. 93,387) and chronic obstructive pulmonary disease (53,587 vs. 52,821). The fully adjusted smoking-attributable mortality was 10.0 percent larger than the age-adjusted smoking-attributable mortality for coronary heart disease (82,666 vs. 75,092) and 22.5 percent smaller than that for cerebrovascular disease (12,124 vs. 15,642).
![]() |
DISCUSSION |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Differences in methodology may explain why our findings concerning the age-adjusted relative risks differed from Sterling et al.'s (4). We directly adjusted the NMFS/NHIS data for age, whereas Sterling et al. (4
) used indirect adjustment. The method of age adjustment did not appear to affect the magnitude of the age-adjusted relative risks from the NMFS/NHIS data for current and former smoking with regard to coronary heart disease, cerebrovascular disease, and chronic obstructive pulmonary disease (or with regard to lung cancer for former smoking). However, our age-adjusted relative risks from the NMFS/NHIS data for current smoking and lung cancer were 23 times larger than Sterling et al.'s (4
) (our relative risks were also larger for African Americans (data not shown)). We also used the same reference population for direct standardization of both the NMFS/NHIS and the CPS II relative risks, whereas different populations were used for standardization in Sterling et al.'s (4
) comparison of relative risks from the NMFS/NHIS and the Surgeon General's report.
We noted that the NMFS/NHIS data produced several age-specific relative risk estimates in former smokers that were less than 1.0 for coronary heart disease and cerebrovascular disease, particularly among older men. Sterling et al. (4) also reported this pattern. This pattern is probably related to 1) limited sample sizes in specific cells (e.g., the confidence intervals for cerebrovascular disease for men included 1.0) and 2) the fact that information about decedents' smoking habits was obtained from proxy respondents for the NMFS (several studies have shown that proxy respondents tend to underreport smoking history (26
28
). There is no biologically plausible explanation for smoking's being protective against coronary heart disease or cerebrovascular disease, or indeed against lung cancer or chronic obstructive pulmonary disease (29
).
Similar to other cohort studies (19, 30
), we observed lower age-adjusted relative risks for lung cancer among female smokers than among male smokers in the CPS II and NMFS/NHIS data. These differences largely reflect gender differences in patterns of smoking across birth cohorts. Women began smoking in the 1930s and 1940s, approximately 2030 years after men, which resulted in a lower cumulative smoking exposure (31
). In addition to starting smoking at a later age, women in the CPS II reported smoking fewer cigarettes per day and were less likely to inhale deeply than men (19
). Lung cancer risk increases with amount smoked, duration of smoking, and depth of inhalation in both men and women (30
). However, the CPS II relative risks for lung cancer were still higher among men than among women within strata of 5-year duration of smoking (3050 years) and cigarettes per day (20 vs. 40) (19
). These differences may have been due to remaining gender differences in duration and amount within these broad strata. In contrast to cohort studies, some recent case-control studies have found a comparable risk or even a higher risk for female smokers than for male smokers (32
36
). Possible explanations include a lower risk of lung cancer among non-smoking women (perhaps due to occupational exposures that increase the nonsmoker risk in men), differences in the reporting of exposure information, differences in the dynamics of cigarette smoking, increased female susceptibility to carcinogens in tobacco smoke, and increased exposure levels among women (32
, 36
, 37
).
Overall (for both genders and for all four diseases combined), we found that the age-adjusted smoking-attributable mortality estimate using CPS II data was 19 percent larger than the estimate using NMFS/NHIS data. Sterling et al. (4), however, found the CPS II-based estimate from the Surgeon General's report to be approximately 40 percent larger than the NMFS/NHIS-based estimate. The lower percentage we calculated may reflect the fact that the NMFS/NHIS estimates in our analysis were not uniformly smaller than the CPS II estimates. For example, the CPS II-based estimated smoking-attributable mortality for the four diseases combined was 36 percent larger for men but 8 percent smaller for women. In addition, the two data sources yielded essentially the same smoking-attributable mortality estimates for lung cancer, which accounts for approximately 40 percent of the total smoking-attributable mortality for all four diseases combined.
Two reasons why our results concerning smoking-attributable mortality differ from Sterling et al.'s (4) are that 1) we used similar methods to calculate the relative risks, attributable fractions, and smoking-attributable mortalities from the NMFS/NHIS and the CPS II and 2) we substituted the point estimate of 1.0 for the age-, sex-, and cause-specific relative risks that were less than 1.0 (thereby assuming that smoking did not cause or prevent deaths for these groups). We made this substitution because the confidence intervals for all of the relative risks that were less than 1.0 included 1.0 (partly due to small sample sizes) and because the literature does not support a protective relation between cigarette smoking and these four diseases (29
). Sterling et al. (4
) calculated smoking-attributable mortality from the NMFS/NHIS data using age-specific relative risks, and they compared these with smoking-attributable mortality estimates from the Surgeon General's report that were calculated from age-adjusted relative risks. Sterling et al. (4
) used relative risks that were less than 1.0 in their smoking-attributable mortality calculations from the NMFS/NHIS data, thereby estimating that cigarette smoking prevented 7,200 deaths from coronary heart disease and cerebrovascular disease among persons aged 65 years or older.
Further adjustment of the attributable fractions and smoking-attributable mortality estimates from the CPS II for disease-appropriate confounders (education, alcohol intake, hypertension status, and diabetes status) indicated little residual confounding once age was taken into account. For example, for both genders combined, the age-adjusted estimate of smoking-attributable mortality was only 2.5 percent smaller than the fully adjusted estimate. This result contrasts with that of Sterling et al. (4), who found that the age-adjusted smoking-attributable mortality estimate (using the NMFS/NHIS data) for these same four diseases combined was 26 percent larger than the estimate adjusted for age, alcohol intake, and income.
We noted that adjusting the attributable fraction for covariates other than age was more important for women than for men and more important for coronary heart disease and cerebrovascular disease than for lung cancer and chronic obstructive pulmonary disease. Sterling et al. (4) also found that there is more residual confounding for coronary heart disease and cerebrovascular disease than for lung cancer and chronic obstructive pulmonary disease. Several methodological differences between our work and Sterling et al.'s (4
) are likely to account for the disparate findings regarding adjustment for multiple confounders. First, we used a model-based approach to control for confounding, while Sterling et al. (4
) used a stratified analysis. We believe the model-based approach produced more robust estimates in comparison with the stratified analysis, where the sample sizes for estimating relative risks may have been small for certain cells. Second, as discussed above, we substituted 1.0 in calculations when the estimated relative risks were less than 1.0. Much of the change that Sterling et al. (4
) noted between the age-adjusted and the fully adjusted smoking-attributable mortalities was due to the apparent protective effect of smoking in older men for coronary heart disease and cerebrovascular disease (they calculated 7,200 deaths prevented in the age-adjusted model and 47,600 deaths prevented in the alcohol-, income-, and age-adjusted model). Third, we examined the issue of confounding within the CPS II, whereas Sterling et al. (4
) used the NMFS/NHIS. Therefore, it is possible that the nature of the confounding of the relation between smoking and mortality varied by data set.
It is not surprising that the results from the NMFS/NHIS data and the CPS II data differ because of the differences in study populations and in data collection. The NMFS was based on a stratified sample of death certificates, and information was obtained from proxy respondents through mailed self-administered questionnaires. The CPS II was a longitudinal study of a cohort of friends and relatives of American Cancer Society volunteers which obtained self-reported data. Proxy respondents are more likely to misreport smoking history (2628
, 38
). It has been debated whether it is inappropriate to use relative risks from the CPS II to estimate smoking-attributable mortality in the United States, because the CPS II was not a representative sample of the target population. However, Rothman has argued that the utility of the relative risks should be determined not by whether the study population is a representative subgroup of the target population but by understanding the biologic relation between smoking and disease (39
).
Epidemiologists and public health leaders have used the attributable fraction and its corresponding number of attributable deaths to educate public health policy-makers on the potential benefits of intervention with regard to cigarette smoking and several other underlying causes of disease (40, 41
). Policy-makers, in turn, use these estimates to help determine current priorities and strategies for disease control (41
). Therefore, it is important to calculate the most valid and accurate smoking-attributable mortality estimates for the United States and the world as possible. In our analysis, we assumed that the stratum-specific relative risks were generalizable to the US population, because we stratified the data according to disease-specific biologically relevant covariates. With such an assumption, we estimate that approximately 240,000 annual deaths from lung cancer, chronic obstructive pulmonary disease, coronary heart disease, and cerebrovascular disease among US Whites can be attributed to smoking. In 19901994, the total annual average smoking-attributable mortality for the entire US populationfor all smoking-related cancers, cardiovascular diseases, and respiratory diseases, lung cancer deaths due to environmental tobacco smoke, and deaths due to smoking-related fireswas approximately 430,000 (3
).
If current smoking patterns continue, an estimated 5 million US residents who were under age 18 years in 1995 will die prematurely from smoking-related illnesses (42). Unfortunately, the recent increase in adolescent smoking will probably have an undesirable effect on smoking-attributable mortality for this birth cohort (43
). Annual medical costs attributable to cigarette smoking have been estimated at $53.3 billion (44
), and annual indirect costs associated with morbidity and premature mortality from cigarette smoking have been estimated at $6.9 billion and $40.3 billion, respectively (45
). The human and economic costs of smoking will continue to accumulate until effective public health efforts to prevent initiation, promote cessation, and protect nonsmokers from the adverse effects of environmental tobacco smoke are established at every level of society.
![]() |
APPENDIX A |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Point and confidence interval estimation for relative risks calculated from the CPS II data
We estimate age-specific relative risks directly from the CPS II data, which are in the form of deaths and person-years. As usual, confidence intervals for the relative risks are obtained by finding confidence intervals for the (natural) log relative risks and exponentiating.
Point and confidence interval estimation for relative risks calculated from the NMFS/NHIS data
Smoking prevalences within each age group, stratified by gender, are obtained using the 1987 NHIS data as pij (i = 1, 2, ... total number of age groups, j = 0, 1, 2 smoking categories). The corresponding smoking prevalences within each age group, stratified by gender and disease, among US decedents are zij and are obtained from the 1986 NMFS. pij and zij represent age-specific probabilities, i.e., and
. We estimated these prevalences and their associated variances by using SUDAAN (version 6.34; Research Triangle Institute, Research Triangle Park, North Carolina), which accounted for the survey designs.
Let Di be the total number of US deaths for each disease/gender group in age group i and Ni the number in the US population of each gender in age group i. Then, the age-specific death rate for smoking category j is , and the relative risk is a function of the smoking prevalence estimates pij and zij; that is,
.
A confidence interval for Ri is obtained by finding a confidence interval for the log relative risk and exponentiating. The natural log of the relative risk is ln(Rij) = . Using a Taylor series approximation, the variance of the log relative risk is
![]() | (A1) |
Directly age-adjusted relative risks
From both the NMFS data and the CPS II data, we obtain the age-adjusted log relative risks (ln(RA)) by taking weighted sums of the age-specific log relative risks. The weights are based on the proportion of the US population in each age group. The variance of ln(RA) is , where wi represents the proportion of the population in age group i and Var(ln(Rij)) is estimated as appropriate for either the CPS II or the NMFS (see above paragraphs). To obtain a confidence interval for RA, we assume that ln(RA) follows a normal distribution.
Age-specific attributable fractions
The age-specific attributable fraction (AFi) is
![]() | (A2) |
![]() |
![]() |
![]() |
Age-adjusted attributable fractions and smoking-attributable mortality
We obtained cause-specific age-adjusted attributable fractions (AFA) as a weighted sum of the age-specific estimates by using the proportion of deaths in each age group as the weights, such that and
Var(logit(AFij)), using Var(logit(AFi) from the previous section. An approximate confidence interval for AFA is then AFA ± 1.96
. Similarly, smoking-attributable mortality (SAM) =
i Di AFi, Var(SAM) =
i [Di AFi (1 - AFi)]2 Var(logit(AFi)), and an approximate confidence interval for smoking-attributable mortality is SAM ± 1.96
.
![]() |
APPENDIX B |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Notation
The derivation of a formula for the adjusted attributable fraction from cohort data requires the following notation. Let E denote the exposure factor with levels e = 0, 1, ..., LE, and let C denote the (cross-classification of) confounding factor(s) with levels c = 0, 1, ..., LC. That is, c indexes the combination of levels of the confounding factors. Let j = (e,c) denote the index for the stratum in the cross-classification of exposure by confounding factors. If C is a cross-classification of several confounding factors, then the index c is multidimensional. More concretely, let there be F factors, Cf, = 1, ..., F, with factor 1 the exposure (that is, C1 = E) and factors Cf, f = 2, ..., F, the potential confounders, and let Lf be the number of levels of factor Cf. Then index j is an F-tuple (jl, ..., jF), and the set of indices J is the Cartesian set Ll x ... LF.
The data from the CPS II cohort study consist of counts of deaths dj and person-years nj in stratum j, j J. Usually, these data are modeled conditionally on the nj, such that the dj are independently and Poisson distributed, dj ~ Poisson (
jnj), j
J. With these assumptions, the expectation of dj is E(dj|nj) =
j nj, which clarifies the interpretation of
j as a rate.
Poisson regression model
The search for a parsimonious description of the CPS II data may be attempted with the fit of a log-linear model for the Poisson rates j; that is, log E(dj|
j,nj, = log nj + log
j, and
![]() | (B1) |
The adjusted attributable fraction
The adjusted attributable fraction may be expressed in terms of rate ratios, which in turn depend on parameters in a Poisson regression model. To derive an expression for the adjusted rate ratios, one compares the death rate of individuals in an elevated risk stratum (e.g., e) and level c of the confounding factors with the death rate of individuals at the baseline level of the risk factor (e = 0), keeping level c fixed. To conduct this comparison, with each j = (e,c) associate j* = (0,c), which has the same level of the confounder but shifts the exposure to the baseline level.
To follow steps similar to the original derivation of Bruzzi et al. (17), let Nj be the population number of person-years in stratum j and let Dj be the population number of deaths in that stratum. Now define the following:
The attributable fraction AFC = (TN -
CN)/(
TN) = 1 -
C/
T may be expressed in terms of the adjusted relative risks
j; that is,
, where
j = Dj/D. The last formula is used for the estimation of AFC. The rate ratios
j may be obtained directly from the Poisson regression via
, or alternatively they can be computed from the parameters in the log-linear parameterization shown in equation B1.
![]() |
APPENDIX C |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
A broad description of this application of the bootstrap follows. One obtains a large number of independent bootstrap estimates of the AFC, that is, c(b), b = 1, ..., B, by independently generating B samples with replacement from the original sample.
There are two sources of data used in this analysis, the CPS II and the NMFS. Therefore, to generate a bootstrap estimate c(b), we combined bootstrap estimates of the rate ratios
j(b), j
J, computed using the CPS II data, with bootstrap estimates of the prevalences
, j
J, computed using the NMFS data, by means of
c(b) = 1 -
j (
j(b)/
j(b)), b = 1, ... B.
Below, we first describe the generation of j(b) and then the generation of
j(b). The generation of
j(b) requires resampling from the CPS II, whereas the generation of
j(b) requires resampling from the NMFS.
Bootstrap estimates j(b)
The method of generating bootstrap estimates j(b) for each disease/gender combination has four steps: 1) restructuring the data; 2) drawing the sample; 3) stratifying the sample; and 4) calculating the rate ratio.
In step 1, the CPS II data are restructured because the data are stratified counts and individual-specific covariate values are unavailable. Therefore, we compiled the CPS II data into separate tables for each disease/gender combination. We further stratified each one of these tables by factors of interest (smoking status, age, education, etc.). Let j J index the stratum in a given disease x gender table and let J denote the index set for that table. Then the number of deaths (dj) and the person-years (nj) are calculated for every j
J. Therefore, the tables contain counts of deaths and person-years stratified by factor levels. In this fashion, we built a list of d =
dj records (one record for each death) with individual-specific factor levels. These expanded data sets were amenable to bootstrap resampling.
For a given disease/gender combination, in the bth bootstrap iteration d records were randomly and independently selected with replacement from an expanded table (step 2). We used the selected records to build a bootstrap stratification table having the same factors as the original stratification, namely with index set J, where the data associated with stratum j are the number of selected records (deaths) with that combination of factors. We then combined the deaths in stratum j of the bootstrap table with the person-years in that stratum in the original CPS II stratification table (step 3).
The final step was to fit a Poisson regression model to the bootstrap table defined by disease and gender. We used the estimated coefficients from the model to develop bootstrap estimates of j(b), j
J (step 4).
Bootstrap estimates j(b)
For a given disease/gender combination, the method of generating separate bootstrap estimates of j(b), b = 1, ..., B, has five steps: 1) generating the sample; 2) developing weights to adjust for nonresponse; 3) developing poststratification weights; 4) applying the weights and stratifying the sample; and 5) calculating
.
Unlike the CPS II, the NMFS data set contains individual-specific data records, and therefore bootstrap sampling was immediate. Thus, for each bootstrap iteration, the first step was to take, with replacement, a separate sample of records from each of the original 18 NMFS sampling strata (step 1). Each of these samples was of the same size as the corresponding original NMFS sampling stratum. For example, there were 540 records in the first stratum of the original NMFS; i.e., the first stratum is defined to include deaths among American Indians, Eskimos, and Aleuts, regardless of the cause. Therefore, the bootstrap samples also had 540 records in the first stratum. It is important to note that deaths in three of the 18 original sampling strata were selected with certainty for inclusion in the NMFS. These records were also selected with certainty for each of the bootstrap samples.
We weighted each record in the NMFS so that a nationally representative estimate of j(b) could be obtained. Therefore, the next step was to develop weights for the bootstrap sample drawn in the first step. Three weighting factors were required: sampling weights, nonresponse weights, and poststratification weights. The sampling weights, the inverse of the probabilities of selection, are fixed by design and therefore were held constant in the boostrap. In contrast, the values for the weighting factors used to adjust for nonresponse and poststratification depended on the distribution of deaths among the factor levels. Therefore, these weighting factors differed from bootstrap sample to boostrap sample.
The adjustment for nonresponse in the original NMFS was calculated for subsets, defined by age intervals, within each of the 18 sampling strata. We conducted this adjustment for each bootstrap sample (step 2). That is, we stratified each bootstrap sample into the same subsets as the original NMFS sample and calculated the response rate within each of these subsets. The reciprocals of these response rates were used for the nonresponse adjustment. With this approach, all deaths in a particular subset have the same nonresponse adjustment.
The next step was to develop poststratification weights (step 3). We used the original NMFS poststrata, which were defined by race, sex, and age. We calculated poststratification weights for each stratum in a bootstrap sample by dividing an estimate of the number of US deaths for that poststratum (estimated from the original NMFS sample) by the weighted sum of the deaths in the poststratum for that sample.
After calculation and application of the final weight for each death (the final weight for each death being the product of three weights: the inverse of the probability of selection, the reciprocal of the response rate, and the poststratification weight), step 4 was to create eight stratification tables defined by disease and gender from each bootstrap sample (step 4). We further stratified each one of these by other factors of interest (smoking status, age, education, etc.). The final step was to calculate , separately for each one of the eight tables (step 5).
![]() |
ACKNOWLEDGMENTS |
---|
![]() |
NOTES |
---|
![]() |
REFERENCES |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|