Affiliations of authors: B. I. Graubard (Biostatistics Branch), E. L. Korn (Biometric Research Branch), National Cancer Institute, Bethesda, MD.
Correspondence to: Barry L. Graubard, Ph.D., National Institutes of Health, Executive Plaza South, Rm. 8024, Bethesda, MD 20892.
![]() |
ABSTRACT |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
![]() |
INTRODUCTION |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
|
![]() |
CONDUCTING A LARGE-SCALE HEALTH SURVEY: NHANES III |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
The 81 selected counties (or areas) were not chosen as a simple random sample from a list of all such areas in the United States. Instead, to decrease the variability of parameter estimators based on data from the completed survey, large-population counties were sampled with a higher probability than small-population counties. Also, since one of the design considerations of NHANES III was to provide reliable estimates for the African-American and Mexican-American minority groups, counties with larger proportions of these minorities were included in the sample with higher probabilities. The sampling of certain subpopulations with higher probabilities than others is a hallmark of large surveys. NHANES III used "stratified sampling," in which the units to be sampled were divided into a small number of groups ("strata") and then sampled at different rates in the different strata.
For a sampled county, the second stage of sampling in NHANES III involved sampling area segments consisting of city or suburban blocks or other contiguous geographic areas contained within the county. Segments with larger minority populations were sampled with higher probability. The third stage of sampling involved listing all of the households within the sampled segments and then sampling them at a rate that depended on the segment characteristics, e.g., racial or ethnic composition. The fourth stage of sampling was to sample individuals within sampled households to be interviewed. The probabilities of individuals being chosen in this final stage of sampling were based on their sex, age, and race/ethnicity. Because of design considerations, only about one in five households sampled contributed sampled persons who were interviewed.
The NHANES III household interview consisted of an individual questionnaire for each sampled person, blood pressure measurements for persons aged 17 years and over, and a family questionnaire. The questionnaires contained questions about dietary intake and nutritional status, reproductive history and sexual behavior, use of vitamin and mineral supplements and medications, tobacco and alcohol use, physical activity, health care utilization and health insurance, and sociodemographic characteristics. Sampled persons who completed a household interview were invited to have a medical examination at a mobile examination center; transportation and a small cash payment were provided. The examinations included additional dietary and health interviews, body measurements, physical and dental examinations, venipuncture, urine collection, audiometry, x-rays, electrocardiograms, spirometry, oral glucose tolerance tests, ophthalmologic examinations, ultrasonography, bone density measurements, cognitive assessments, and allergy tests. Examination data were recorded, for the most part, directly into an automated data collection system.
![]() |
ANALYSES ACCOMMODATING SAMPLE WEIGHTS AND SAMPLE CLUSTERING |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
The data from each sampled individual are associated with a sample weight, which estimates the number of people in the population that he or she represents. To calculate these weights, consideration is taken of the differential probabilities that individuals were sampled, so that individuals sampled from a subgroup at a rate of one per 10 000 have larger sample weights than individuals sampled from a subgroup at a rate of one per 1000. Sample weights also adjust for the facts that not all sampled individuals participate in a survey ("nonresponse"), and inadvertently not all individuals may have had a chance to be sampled ("frame undercoverage"). These adjustments, which are based on statistical models, are thought to typically lessen bias due to nonresponse and frame undercoverage, but there are no guarantees that they accomplish this.
The classic approach to analyzing data with sample weights is to use (sample-)weighted estimators. Weighted estimation is equivalent to performing standard (unweighted) estimation on a new expanded dataset created from the original dataset of the sampled observations by duplicating each observation the number of times given by its sample weight. For example, an individual with a sample weight of 12 239 would have his or her data appearing 12 239 times in the expanded dataset. Of course, there are simple formulas for performing weighted estimation that do not require actually duplicating observations. In addition, one cannot calculate standard errors of parameter estimators using standard nonsurvey formulas applied to the expanded dataset, but other methods are required as described below.
The advantage of using weighted estimation over using unweighted estimation is that weighted estimators are estimating population parameters and not parameters that depend on the particular sample design used in the survey. In addition, unweighted estimation can sometimes give misleading results when estimating cause/effect relationships (41). A disadvantage of using weighted estimation is that parameter estimators can be very variable when the weights are very variable, especially when a few individuals have very large sample weights. Because of this potential, there has been some debate in the statistical literature concerning the role of sample weights in the analysis of survey data [e.g., (42,43)]. Our own approach is to use weighted estimation either when the analysis is primarily descriptive (as opposed to investigating cause/effect relationships) or when weighted estimation does not greatly increase the variability of estimators. Otherwise, we use unweighted estimation with variables that are used to construct the sample weights additionally included in the analysis model (44).
When estimating standard errors for parameter estimators by use of survey data, one needs to take into account the fact that the data are typically clustered. For example, in NHANES III, there are a few hundred observations in each sampled county. Since data from the same cluster tend to be more correlated than data from different clusters, clustering tends to make standard errors larger than would be obtained from a simple random sample with the same sample size (45). (This problem is not unique to survey data; analysts of animal litter data need to account for the correlation of observations within litters.) Fortunately, simple techniques have been developed to estimate standard errors that require only a designation for each individual of the cluster at the first stage of sampling to which he or she belongs (46). [This is true provided that either the fraction of first-stage units sampled is relatively small or the inference required is for the assumed model underlying the generation of the data; see (47).] These first-stage sampling clusters are called "primary sampling units." Public-use data files for most health surveys contain the primary sampling unit designations of each sampled individual. An exception is some institutional surveys, e.g., the National Hospital Discharge Survey (48), where there are confidentiality concerns in releasing information about the hospitals (which are the primary sampling units). For these surveys, approximate methods for standard error estimation involving "variance curves" ("generalized variance functions") have been developed (46).
An alternative approach to estimating standard errors, which might be attractive to a statistician unfamiliar with survey methods, would be to model all of the stages of sampling with fixed and random effects [e.g., (49)]. However, we have shown on theoretical grounds that such modeling, besides being overly complex, does not automatically improve standard error estimation (50). The one instance when simple survey methods do require some modification is when the sample design is such that only a small number of primary sampling units are available for standard error estimation, e.g., 16 in the Hispanic Health and Nutrition Examination Survey (51). We have given recommendations for this special situation elsewhere (44).
Fuller discussions of the statistical issues involved in analyzing health surveys are given elsewhere (44,50,52-69).
![]() |
APPLICATIONS |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
We utilized the computer software SUDAAN (70) and some of our own computer programs to perform the analyses. Other commercial software designed for survey analyses could equally well be used (71).
Factors Associated With Snuff Use, Derived From the 1987 National Health Interview Survey
Smokeless tobacco, i.e., snuff and chewing tobacco, have a number of
adverse health effects (72). By use of data from the 1987
National Health Interview Survey (1987 NHIS), a model is developed to
identify individuals at higher risk of using snuff based on their
characteristics. Such an identification could be useful for targeting
prevention initiatives. Table 2, A,
contains the estimated proportions of men aged at least 18 years old
who use snuff for each of the categories of a group of descriptive
variables. We see, for example, that snuff use is low in the Northeast,
in central cities, and among blacks and Hispanics. The percentages in
Table 2, A
, are sample-weighted estimates, so that they are
representative of the U.S. civilian noninstitutionalized population in 1987.
|
Instead of cross-classifications, a logistic regression analysis can be used to model the
probability of snuff use as a function of the levels of all the independent variables (Table 2, B). For example, by use of the estimated logistic regression coefficients in
Table 2, B
, one can estimate the probability of snuff use as 5.2%
for a 40-year-old white individual who lives in the South in a non-Metropolitan Statistical Area,
has 12 years of education and a (family) income of $15 000, who is a white-collar
worker, and who has never been married. The interpretation of the individual regression
coefficients in Table 2, B
, is the same as that for a logistic
regression in the nonsurvey setting. For example, the coefficient for a categorical independent
variable not involved in an interaction is the logarithm of the estimated odds ratio (relative risk)
of that category compared with the baseline category. For example, the estimated odds ratio
associated with being unmarried is 0.88 = exp(-.013). Note that this odds ratio is
less than 1.0, which is the opposite of what is suggested by Table 2, A
.
However, the logistic regression result controls for possible imbalances in the other variables
(e.g., age) between married and unmarried individuals.
|
The American Cancer Society recommends annual digital rectal
examinations for individuals aged 40 years or over for cancer screening
(73). Of interest is the association of the probability that
an individual has had an annual digital rectal examination with his or
her type of health insurance; a full analysis including other types of
cancer screening is given elsewhere (13). The third column of
Table 3 shows the estimated proportions of digital
rectal examinations cross-classified by type of health insurance, by
use of the Cancer Control Supplement to the 1992 National Health
Interview Survey (1992 NHIS) (74). In terms of a causal
association between health insurance and the probability of
examination, the observed proportions can be misleading because they do
not control for patient characteristics. Thus, for example, the
estimated proportion for individuals with public health insurance may
appear low because these individuals have a lower income, and income is
positively associated with the probability of examinations.
|
The predictive margin in the fifth column of Table 3 estimates the
probability of an
examination for a population with the same distribution of individual characteristics as the
subgroup with no health insurance, but where all of the individuals had each one of the health
insurance types in turn. This predictive margin is relevant if one was contemplating what would
happen to the probabilities if the individuals with no insurance obtained different kinds of health
insurance.
Body Iron Stores and Risk of Developing Cancer, Derived From the NHANES I Epidemiologic Followup Study
The first National Health and Nutrition Examination Survey (NHANES I) was conducted during the period from 1971 through 1975. Individuals sampled who were aged 25-74 years in this survey (or their proxies) have been periodically contacted concerning their health status as part of the NHANES I Epidemiologic Followup Study. This longitudinal study allows for estimating the association of risk factors measured at the baseline survey (and possibly updated with information from the follow-ups) with the development of various diseases. This type of longitudinal analysis could also be approached with a single cross-sectional survey by asking sampled individuals for a history of their risk factors. However, longitudinal surveys have the advantage that individuals sampled in the baseline surveybut who later dieprovide important information, whereas obtaining information about dead individuals is problematic in a cross-sectional survey. In addition, individuals in a cross-sectional survey may not be able to recall accurately their risk factor exposures from the past, and certain risk factors that require immediate evaluation (such as blood chemistries) would not be available.
In this section, we estimate the association of transferrin saturation, a measure of body iron stores, with the development of cancer. This association was previously studied as part of a more general analysis (30). We restrict the analysis to the 1971-1974 cohort of NHANES I because the blood chemistry variables were measured only for that cohort. Individuals with cancer at the baseline survey are excluded from the analysis. Stevens et al. (30) expressed the theoretical concern that preclinical cancer might affect serum chemistry values and therefore restricted analysis to individuals who were alive and cancer free for at least 4 years after the baseline survey. For the same reason, in our analyses, we remove the first 4 years of follow-up for all individuals.
Table 4 shows the results of proportional hazards regressions for the
association of developing cancer with transferrin saturation, smoking, race, income (family), and
type of census enumeration district in which the individual lived. The same as in a
proportional-hazards regression used to analyze a randomized clinical trial, the regression
coefficient for a variable is interpreted as the logarithm of the relative hazard associated with a
change in one unit in that variable. So, for example, the interquartile range (75th percentile minus
25th percentile) of transferrin saturation for men in this population is 13.7, and the relative
hazard associated with this difference is 1.22 = exp(13.7 x 14.29 x 10-3). Therefore, transferrin saturation is positively associated with the risk of cancer
for men; see also Fig. 1.
For women, the association is not
statistically significantly different from zero.
|
|
Beside using the clustering in the data, there are some other important methodological
differences between the analyses shown in Table 4 and the typical
analysis of a randomized clinical trial. (Since these differences are somewhat technical, some
readers may wish to skip the rest of this paragraph.) (a) In a typical analysis of a
randomized clinical trial, one would start the time axis at the time of randomization, which is
usually close to the diagnosis of a certain stage of disease or the start of treatment. The time scale
here is taken as age rather than time from the baseline survey because age is a more important
determinant of the risk of cancer than time from the baseline survey for a healthy population (65). (b) Note that transferrin saturation is being measured only
once for each individual, at the time of his or her baseline survey. The possibility that transferrin
saturation means different things for individuals of different ages is controlled for in the analysis
by stratifying on 5-year birth cohorts. (c) Individuals dying of other causes are
considered to have their data censored at the time of death. Although there are other possibilities,
this "cause-specific" analysis is most appropriate for understanding the biology of
the association (76).
The analyses shown here are based on the 1987 follow-up of the NHANES I cohort; there was also a follow-up in 1992. Beside follow-ups involving contacts with the sampled individuals, the National Center for Health Statistics is also providing follow-up by use of the National Death Index (77) for this survey and for some of their other surveys, e.g., the second National Health and Nutrition Examination Survey. This type of follow-up provides less information than a personal contact but can be quite useful for studying associations with causes of death.
Changing Rates of Mammography Screening by Use of the National Health Interview Survey
Many of the annual National Health Interview Surveys include supplemental questionnaires that contain questions on cancer-related topics. The questionnaires in the 1987, 1990, and 1992 surveys contained similar questions about mammography screening (74,78,79), which we exploit to examine trends in mammography screening. We focus here on women more than 40 years of age and on whether or not they reported a screening mammographic examination in the last year. In particular, we examine how the percentages of these women reporting a screening mammographic examination are changing over time and whether the time trends are different depending on the educational levels of the women. Similar analyses with the 1987 and 1990 surveys have been previously performed (3).
Table 5 shows the estimated percentages of women reporting
screening mammograms, cross-classified by level of education. The percentage of women
reporting screening mammograms is increasing over the years from 1987 through 1992. It also
appears that the increases are similar for the different educational levels. This could be formally
checked by performing further analyses, which could also control for additional variables, such
as race and whether or not the woman lived in an urban or rural area.
|
Smoking and Alcohol Consumption as Risk Factors for Digestive Cancer Using the 1986 National Mortality Followback Survey and the 1987 National Health Interview Survey
We used data from the 1987 National Health Interview Survey (1987 NHIS) and the 1986 National Mortality Followback Survey (1986 NMFS) to estimate digestive cancer death rates and their association with alcohol consumption and smoking. The 1986 NMFS is an example of a "followback" survey, in which individuals associated with sampled records are interviewed for further information. In this case, the records are death certificates, but other followback surveys use other types of records, e.g., birth certificates (80). In this section, we follow the general approach of Sterling et al. (35), who used the 1987 NHIS and the 1986 NMFS to examine the association of smokeless tobacco with oral and digestive cancer mortality. These surveys can be used together to estimate the death rates, where the numerators of the rates are estimated from the 1986 NMFS and the denominators are estimated from the 1987 NHIS. This is an example of a population-based, case-control study. Two advantages of a population-based, case-control study over a study that is not population based (e.g., hospital case patients with hospital control patients) are that many of the biases involved in choosing an appropriate control population are eliminated, and one can estimate death rates in addition to relative risks.
Because the information on deaths and alive individuals come from two different surveys, biases can arise because the modes of data collection of the surveys are different (mail response from an informant for the decedent for the 1986 NMFS and face-to-face interview for the 1987 NHIS) and the questions asked are different. In addition, the populations sampled differ in ways other than vital status. For example, the 1987 NHIS sampled only civilians and noninstitutionalized individuals, and the 1986 NMFS sampled deaths only for individuals aged 25 years or older. To make the populations sampled in the two surveys comparable, the analysis is restricted to civilians aged 25 years or older who, if they died, were institutionalized less than half of their last year of life.
Table 6, A, displays a univariate descriptive analysis of the annual
digestive cancer death rates cross-classified by sex, race, age, drinking, and smoking levels. The
outcome variable is death due to digestive cancer in 1986, defined by an underlying cause of
death with ICD-9th Revision codes 150-159 (81). (The 1986 NMFS
public-use data files include a variable coded for this cause of death.) There were 785 such
deaths sampled in the 1986 NMFS. Drinking or smoking information from 134 of the 785
digestive cancer deaths and from 1560 observations of the 19 240 living individuals was
partially or completely missing. Rather than just eliminating from the analyses these observations
with missing drinking or smoking data, we used a "hot-deck" imputation to fill in
their missing data with the data from randomly chosen individuals with nonmissing data who
were similar to the individuals with the missing data [(82), p.
62-67]; each donor was chosen to be in the same age/race/sex category, and, if available,
the same smoking or drinking category as the recipient. Imputation is a common technique in the
analysis of surveys to minimize potential bias in measuring associations due to missing data. In
the present application, in which the amount of missing drinking/smoking data is larger for the
1986 NMFS than the 1987 NHIS, the imputation also serves to keep the rates in Table 6, A
, from being biased low.
|
Since the rates in Table 6, A, are not age adjusted (except for the
cross-classification by age), they can be misleading. For example, individuals in the three
smoking categories have mean ages of
47, 43, and 57 years, respectively, so that the differences in the observed cancer death rates could
easily be due to the age differences. A logistic regression analysis of digestive cancer deaths was
performed with the independent variables being sex, race, age, smoking, and drinking (results not
shown). By use of this regression, we calculated the predictive margin for drinking, smoking, and
drinking by smoking (Table 6, B
). When controlling for age, we see no
association of smoking and digestive cancer death rates (last row of Table 6, B
). There is, however, an association of drinking and digestive cancer mortality (last
column of Table 6, B
). The exact pattern of the association of
drinking and the rates seems to depend on level of smoking; we have no plausible explanation for
this pattern.
|
COMMIT, a Community Intervention Trial of Smoking Cessation
The Community Intervention Trial for Smoking Cessation (COMMIT) was an experiment in which one community from each of 11 matched community pairs was randomly assigned to a 4-year community level intervention to help smokers quit smoking (85,86). The other community from each matched pair served as a control for comparison purposes. The two communities within each pair were matched for geographic location, population size, general sociodemographic factors, and estimated smoking prevalence rates. Although there were multiple objectives and analyses of COMMIT, we focus here on the effects of the intervention on the prevalence of adult cigarette smoking, one of the secondary analyses. These effects were assessed by performing telephone surveys in the 22 communities before and after the intervention, in 1988 and 1993, respectively (87). The telephone surveys used list-assisted random-digit dialing, in which blocks of 100 consecutive telephone numbers were first classified into two strata, depending on whether one or more numbers in the block were listed in residential telephone directories. Telephone numbers in blocks with residential numbers were sampled at a higher rate. The sample sizes and population sizes of the community surveys were varied but averaged about 4900 and 77 000, respectively. The baseline surveys were performed before the randomization.
Table 7 shows the estimated smoking prevalence for each of the
communities before and after the intervention. Averaged over the 11 intervention communities,
the smoking prevalence went down 2.9 percentage points, from 24.6% to 21.6%.
However, the smoking prevalence also went down in the 11 comparison communities an average
of 2.7 percentage points, from 25.1% to 22.5%. The intervention offered a 0.27
percentage point advantage in the lowering of the smoking prevalence over the comparison, with
a 90% confidence interval for this advantage given by (-0.74 to 1.28). One can
conclude that the intervention did not have an impact on smoking prevalence beyond the general
decreasing time trends. This study reinforces the importance of having a control group; without
such a group, one might have incorrectly assumed that the intervention was successful in
lowering smoking prevalence.
|
![]() |
AVAILABILITY OF PUBLIC-USE DATA FILES |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
The "Appendix" section contains a list of such surveys that collect health information on a national scale. Some of these surveys are focused on health information, whereas others collect health-related information peripherally. We have not tried to restrict this list to surveys collecting cancer-related information because any health-related variable could conceivably be of interest to a cancer-related investigation. We do not claim that this list of surveys is exhaustive and suggest contacting the individual agencies for further information about their surveys; see also the FEDSTATS website (www.fedstats.gov) and the website of the Interuniversity Consortium for Political and Social Research, Institute for Social Research, University of Michigan (www.icpsr.umich.edu).
Besides the information collected in the surveys listed in the "Appendix" section, investigators also have the option of collaborating with one of the government sponsors to have certain required information collected in the future sample of one of the continuing surveys. This approach is useful only for long-term projects, since the time span for a national survey from idea for data collection to data availability can be years.
Although the focus of this review has been surveys, it is useful to note that health-related data are also available for some variables on essentially the whole population. For example, the National Center for Health Statistics makes available data files that have information on every birth and death occurring within the United States ("vital statistics"), and the Health Care Financing Administration has information available on all Medicare claims in the United States. Interested investigators should contact the relevant agencies for information about what resources are available.
![]() |
LIMITATIONS OF USING SURVEY DATA FOR ANALYSES |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
It is sometimes possible to avoid the problem of small sample sizes sampled by health surveys in local geographic areas and subpopulations by the use of other data sources. For example, the Current Population Survey, which is designed to provide characteristics of the labor force, occasionally uses supplements that involve health issues. An example is given by the Tobacco Use Supplement that was added to 3 months of the Current Population Survey during the period from 1992 through 1993 (89). (The Current Population Survey samples approximately 60 000 households in the United States each month, which is much larger than nearly all health surveys.) As mentioned previously, an additional data source is information on vital statistics or other records that are kept on all individuals. If the research question can be answered with these resources, then the problem of small numbers can be avoided.
The availability of public-use data files from health surveys and commercial software for performing survey analyses makes it easy for investigators to address their research questions by use of survey data. However, the ability to analyze survey data should not lead one to ignore the principles of good scientific method. In particular, given the large numbers of variables recorded in a typical survey, it is easy to examine a multitude of possible associations and discover some spurious ones. In addition, except in the situation of randomized assignment of an intervention (e.g., COMMIT described above), associations found between risk factors and outcomes could theoretically always be due to confounding variables and not be of a causal nature. Finally, as is well known, statistical significance is not the same as scientific importance. With large sample sizes of some surveys, it is possible that an association between two variables can be small, yet still be statistically significant (e.g., P<.05). Presentation of confidence intervals with estimates can help avoid misinterpretations. With these caveats in mind, we believe that there is tremendous potential in using health surveys as a resource of addressing cancer-related objectives.
![]() |
APPENDIX |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Agency for Health Care Policy and Research (http://www.ahcpr.gov):
Medical Expenditure Panel Survey (1996)
National Medical Care Expenditure Survey (1977)
National Medical Expenditure Survey (1987)
Bureau of Labor Statistics (http://www.bls.gov):
Consumer Expenditure Survey (1980-)
National Longitudinal Surveys (four cohorts in 1966, 1967, 1968, and 1979, with follow-up)
Bureau of the Census (http://www.census.gov):
Current Population Survey (1980-)
Centers for Disease Control and Prevention (http://www.cdc.gov):
Behavioral Risk Factor Surveillance System (1984-)
Youth Risk Behavior Surveillance Survey (1990-)
Department of Agriculture (http://www.usda.gov):
Continuing Survey of Food Intakes by Individuals (1985-1986, 1989-1991, and 1994-1996)
Diet and Health Knowledge Survey (1989-1991 and 1994-1996)
Nationwide Food Consumption Survey (periodic, starting in 1936)
Health Care Financing Administration (http://www.hcfa.gov):
Medicare Current Beneficiary Survey (1991-, with follow-up):
National Center for Health Statistics (http://www.cdc.gov/nchswww)
Hispanic Health and Nutrition Examination Survey (1982-1984)
National Ambulatory Medical Care Survey (1974-1981, 1985, and 1989-)
National Employer Health Survey (1994)
National Health Examination Survey (1959-1970)
National Health Interview Survey (1957-, with Longitudinal Study of Aging)
National Health and Nutrition Examination Surveys (periodic, starting in 1971, with follow-up of NHANES I)
National Home and Hospice Care Survey (1992-)
National Hospital Ambulatory Medical Care Survey (1992-)
National Hospital Discharge Survey (1965-)
National Maternal and Infant Health Survey (1988, with follow-up)
National Medical Care Utilization and Expenditure Survey (1980)
National Mortality Followback Survey (periodic, starting in 1961)
National Nursing Home Survey (periodic, starting in 1973)
National Natality Surveys (1963, 1964-1966, 1968-1969, 1972, and 1980)
National Survey of Ambulatory Surgery (1994-)
National Survey of Family Growth (periodic, starting in 1973)
National Survey of Personal Health Practices and Consequences (1979-1980)
National Institute on Aging/Center for Demographic Studies, Duke University (http://cds.duke.edu):
National Long Term Care Survey (1982, 1984, 1989, and 1994, with follow-up)
National Institute on Alcohol Abuse and Alcoholism (http://www.niaaa.nih.gov):
National Longitudinal Alcohol Epidemiologic Survey (1992)
National Institute of Dental and Craniofacial Research (http://www.nidr.nih.gov):
National Surveys of Oral Health (1985-1987)
National Institute on Drug Abuse (http://www.nida.nih.gov):
Monitoring the Future Study (1975-, with follow-up)
National Pregnancy and Health Survey (1992-1993)
National Institute for Mental Health/Department of Health Care Policy, Harvard Medical School (http://www.hcp.med.harvard.edu):
National Comorbidity Survey (1991):
Substance Abuse and Mental Health Services Administration (http://www.samhsa.gov)
National Household Survey on Drug Abuse (1971-)
![]() |
REFERENCES |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
1 Brett KM, Madans JH. Use of postmenopausal hormone replacement therapy: estimates from a nationally representative cohort study. Am J Epidemiol 1997;145:536-45.[Abstract]
2 Anderson LM, May DS. Has the use of cervical, breast, and colorectal cancer screening increased in the United States? Am J Public Health 1995;85:840-2.[Abstract]
3 Breen N, Kessler L. Changes in the use of screening mammography: evidence from the 1987 and 1990 National Health Interview Surveys. Am J Public Health 1994;84:62-7.[Abstract]
4 Gilpin EA, Lee L, Evans N, Pierce JP. Smoking initiation rates in adults and minors: United States, 1944-1988. Am J Epidemiol 1994;140:535-43.[Abstract]
5 Haynes SG, Harvey C, Montes H, Nickens H, Cohen BH. Patterns of cigarette smoking among Hispanics in the United States: results from HHANES 1982-84. Am J Public Health 1990;80 Suppl:47-53.[Abstract]
6 Klevens RM, Giovino GA, Peddicord JP, Nelson DE, Mowery P, Grummer-Strawn L. The association between veteran status and cigarette-smoking behaviors. Am J Prev Med 1995;11:245-50.[Medline]
7 Pirkle JL, Flegal KM, Bernert JT, Brody DJ, Etzel RA, Maurer KR. Exposure of the US population to environmental tobacco smoke: the Third National Health and Nutrition Examination Survey, 1988 to 1991. JAMA 1996;275:1233-40.[Abstract]
8 Kutz FW, Cook BT, Carter-Pokras OD, Brody D, Murphy RS. Selected pesticide residues and metabolites in urine from a survey of the U.S. general population.J Toxicol Environ Health 1992;37:277-91.[Medline]
9 Patterson BH, Block G, Rosenberger WF, Pee D, Kahle LL. Fruit and vegetables in the American diet: data from the NHANES II survey. Am J Public Health 1990;80:1443-9.[Abstract]
10 Cotugna N, Subar AF, Heimendinger J, Kahle L. Nutrition and cancer prevention knowledge, beliefs, attitudes, and practices: the 1987 National Health Interview Survey. J Am Diet Assoc 1992;92:963-8.[Medline]
11 Eheman CR, Ford E, Garbe P, Staehling N. Knowledge about indoor radon in the United States: 1990 National Health Interview Survey. Arch Environ Health 1996;51:245-7.[Medline]
12 Calle EE, Flanders WD, Thun MJ, Martin LM. Demographic predictors of mammography and Pap smear screening in US women. Am J Public Health 1993;83:53-60.[Abstract]
13 Potosky AL, Breen N, Graubard BI, Parsons PE. The association between health care coverage and the use of cancer screening tests. Results from the 1992 National Health Interview Survey [published erratum appears in Med Care 1998;36:1470]. Med Care 1998;36:257-70.[Medline]
14 Horowitz AM, Nourjah PA. Factors associated with having oral cancer examinations among US adults 40 years of age or older. J Public Health Dent 1996;56:331-5.[Medline]
15 Engel A, Johnson ML, Haynes SG. Health effects of sunlight exposure in the United States. Results from the first National Health and Nutrition Examination Survey, 1971-1974. Arch Dermatol 1988;124:72-9.[Abstract]
16 Byrne J, Kessler LG, Devesa SS. The prevalence of cancer among adults in the United States: 1987. Cancer 1992;69:2154-9.[Medline]
17 Schatzkin A, Jones DY, Hoover RN, Taylor PR, Brinton LA, Ziegler RG, et al. Alcohol consumption and breast cancer in the epidemiologic follow-up study of the first National Health and Nutrition Examination Survey. N Engl J Med 1987;316:1169-73.[Abstract]
18 Swanson CA, Jones DY, Schatzkin A, Brinton LA, Ziegler RG. Breast cancer risk assessed by anthropometry in the NHANES I epidemiological follow-up study.Cancer Res 1988;48:5363-7.[Abstract]
19 Micozzi MS, Carter CL, Albanes D, Taylor PR, Licitra LM. Bowel function and breast cancer in US women. Am J Public Health 1989;79:73-5.[Abstract]
20 Jones DY, Schatzkin A, Green SB, Block G, Brinton LA, Ziegler RG, et al. Dietary fat and breast cancer in the National Health and Nutrition Examination Survey I Epidemiologic Follow-up Study. J Natl Cancer Inst 1987;79:465-71.[Medline]
21 Madigan MP, Ziegler RG, Benichou J, Byrne C, Hoover RN. Proportion of breast cancer cases in the United States explained by well-established risk factors. J Natl Cancer Inst 1995;87:1681-5.[Abstract]
22 Freni SC, Eberhardt MS, Turturro A, Hine RJ. Anthropometric measures and metabolic rate in association with risk of breast cancer (United States). Cancer Causes Control 1996;7:358-65.[Medline]
23 Wurzelmann JI, Silver A, Schreinemachers DM, Sandler RS, Everson RB. Iron intake and the risk of colorectal cancer. Cancer Epidemiol Biomarkers Prev 1996;5:503-7.[Abstract]
24 Funkhouser EM, Sharp GB. Aspirin and reduced risk of esophageal carcinoma. Cancer 1995;76:1116-9.[Medline]
25 Yong LC, Brown CC, Schatzkin A, Dresser CM, Slesinski MJ, Cox CS, et al. Intake of vitamins E, C, and A and risk of lung cancer. The NHANES I epidemiologic follow-up study. First National Health and Nutrition Examination Survey. Am J Epidemiol 1997;146:231-43.[Abstract]
26 Leigh JP. Occupations, cigarette smoking and lung cancer in the epidemiological follow-up to the NHANES I and the California Occupational Mortality Study. Bull N Y Acad Med 1996;73:370-97.[Medline]
27 Bourguet CC, Logue EE. Antigenic stimulation and multiple myeloma. A prospective study. Cancer 1993;72:2148-54.[Medline]
28 Reichman ME, Hayes RB, Ziegler RG, Schatzkin A, Taylor PR, Kahle LL, et al. Serum vitamin A and subsequent development of prostate cancer in the first National Health and Nutrition Examination Survey Epidemiologic Follow-up Study. Cancer Res 1990;50:2311-5.[Abstract]
29 Albanes D, Jones DY, Schatzkin A, Micozzi MS, Taylor PR. Adult stature and risk of cancer. Cancer Res 1988;48:1658-62.[Abstract]
30 Stevens RG, Jones DY, Micozzi MS, Taylor PR. Body iron stores and the risk of cancer. N Engl J Med 1988;319:1047-52.[Abstract]
31 Zonderman AB, Costa PT Jr, McCrae RR. Depression as a risk for cancer morbidity and mortality in a nationally representative sample. JAMA 1989;262:1191-5.[Abstract]
32 Albanes D, Blair A, Taylor PR. Physical activity and risk of cancer in the NHANES I population. Am J Public Health 1989;79:744-50.[Abstract]
33 Schatzkin A, Hoover RN, Taylor PR, Ziegler RG, Carter CL, et al. Serum cholesterol and cancer in the NHANES I epidemiologic follow-up study. National Health and Nutrition Examination Survey. Lancet 1987;2:298-301.[Medline]
34 Hsing AW, Nam JM, Co Chien HT, McLaughlin JK, Fraumeni JF Jr. Risk factors for adrenal cancer: an exploratory study. Int J Cancer 1996;65:432-6.[Medline]
35 Sterling TD, Rosenbaum WL, Weinkam JJ. Analysis of the relationship between smokeless tobacco and cancer based on data from the National Mortality Followback Survey. J Clin Epidemiol 1992;45:223-231.[Medline]
36 Hsing AW, Hoover RN, McLaughlin JK, Co-Chien HT, Wacholder S, Blot WJ, et al. Oral contraceptives and primary liver cancer among young women. Cancer Causes Control 1992;3:43-8.[Medline]
37 Nam JM, McLaughlin JK, Blot WJ. Cigarette smoking, alcohol, and nasopharyngeal carcinoma: a case-control study among U.S. whites. J Natl Cancer Inst 1992;84:619-22.[Abstract]
38 Chow WH, Linet MS, McLaughlin JK, Hsing AW, Chien HT, Blot WJ. Risk factors for small intestine cancer. Cancer Causes Control 1993;4:163-9.[Medline]
39 Breslow NE, Day NE. Statistical methods in cancer research. Vol. IIthe design and analysis of cohort studies. International Agency for Research on Cancer. IARC Sci Publ 1987:45-6.
40 National Center for Health Statistics. Plan and operation of the Third National Health and Nutrition Examination Survey, 1988-94. Vital Health Stat Series 1(32) 1994.
41 Korn EL, Graubard BI. Examples of differing weighted and unweighted estimates from a sample survey. Am Stat 1995;49:291-5.
42 Hoem JM. The issue of weights in panel surveys of individual behavior. In: Kasprzyk D, Duncan G, Kalton G, Singh MP, editors. Panel surveys. New York (NY): John Wiley & Sons; 1989. p. 539-65.
43 Kalton G. Modeling considerations: discussion from a survey sampling perspective. In: Kasprzyk D, Duncan G, Kalton G, Singh MP, editors. Panel surveys. New York (NY): John Wiley & Sons; 1989. p. 575-85.
44 Korn EL, Graubard, BI. Analysis of large health surveys: accounting for the sampling design. J Royal Stat Soc A 1995;158:263-95.
45 Cochran WG. Sampling techniques. 3rd ed. New York (NY): John Wiley & Sons; 1977. p. 240-3.
46 Wolter KM. Introduction to variance estimation. New York (NY): Springer-Verlag; 1985.
47 Korn EL, Graubard BI. Variance estimation for superpopulation parameters. Stat Sinica 1998;8:1131-51.
48 Graves EJ. National Hospital Discharge Survey: annual summary, 1990. Vital Health Stat Series 13 (112)1992.
49 Pfeffermann D, LaVange L. Regression models for stratified multi-stage cluster samples. In: Skinner CJ, Holt D, Smith TM, editors. Analysis of complex surveys. New York (NY): John Wiley & Sons; 1989. p. 237-60.
50 Graubard BI, Korn EL. Modelling the sampling design in the analysis of health surveys. Stat Methods Med Res 1996;5:263-81.[Medline]
51 Kovar MG, Johnson C. Design effects from the Mexican American portion of the Hispanic Health and Nutrition Examination Survey: a strategy for analysts. American Statistical Association 1986 Proceedings of the Section on Survey Research Methods; 1986. p. 396-9.
52 Aday LA. Designing and conducting health surveys. 2nd ed. San Francisco (CA): Jossey-Bass; 1996.
53 Botman SL, Jack SS. Combining National Health Interview Survey Datasets: issues and approaches. Stat Med 1995;14:669-77.[Medline]
54 Brock DB, Beckett LA, Bienias JL. Sample surveys. In: Armitage P, Colton T, editors. Encyclopedia of biostatistics. Vol. 5. Chichester (U.K.): John Wiley & Sons; 1998.
55 Corder LS, Manton KG. National surveys and the health and functioning of the elderly: the effects of design and content. J Am Stat Assoc 1991;86:513-25.
56 Cox BG, Cohen SB. Methodological issues for health surveys. New York (NY): Marcel Dekker; 1985.
57 Delgado JL, Johnson CL, Roy I, Trevino FM. Hispanic Health and Nutrition Examination Survey: methodological considerations. Am J Public Health 1990;80 Suppl:6-10.[Medline]
58 Graubard BI, Fears TR, Gail MH. Effects of cluster sampling on epidemiologic analysis in population-based case-control studies. Biometrics 1989;45:1053-71.[Medline]
59 Graubard BI, Korn EL. Survey inference for subpopulations. Am J Epidemiol 1996;144:102-6.[Abstract]
60 Graubard BI, Korn EL. Predictive margins for survey data.Biometrics . In press 1999.
61 Ingram DD, Makuc DM. Statistical issues in analyzing the NHANES I Epidemiologic Followup Study. Vital Health Stat Series 2 (121) 1994.
62 Korn EL, Graubard BI. Epidemiologic studies utilizing surveys: accounting for the sampling design. Am J Public Health 1991;81:1166-3.[Abstract]
63 Korn EL, Graubard BI. Scatterplots with survey data. Amer Stat 1998;52:58-69.
64 Korn EL, Graubard BI. Analysis of health surveys. New York (NY): John Wiley & Sons. In press 1999.
65 Korn EL, Graubard BI, Midthune D. Time-to-event analysis of longitudinal follow-up for a survey: choice of the time-scale. Am J Epidemiol 1997;145:72-80.[Abstract]
66 Landis JR, Lepkowski JM, Eklund SA, Stehouwer SA. A statistical methodology for analyzing data from a complex survey: the first National Health and Nutrition Examination Survey. Vital Health Stat Series 2 (92) 1982.
67 McCarthy PJ. Replication: an approach to the analysis of data from complex surveys. Vital Health Stat Series 2 (14) 1966.
68 Rust KF, Rao JN. Variance estimation for complex surveys using replication techniques. Stat Methods Med Res 1996;5:283-310.[Medline]
69 Weinkam JJ, Rosenbaum WL, Sterling TD. Computation of relative risk based on simultaneous surveys: an alternative to cohort and case-control studies. Am J Epidemiol 1992;136:722-9.[Abstract]
70 Shah BV, Barnwell BG, Bieler GS. SUDAAN user's manual, release 7.5. Research Triangle Park (NC): Research Triangle Institute; 1997.
71 Cohen SB. An evaluation of alternative PC-based software packages developed for the analysis of complex survey data. Am Stat 1997;51:285-92.
72 Consensus Conference. Health applications of smokeless tobacco use. JAMA 1986;255:1045-8.[Medline]
73 American Cancer Society. Guidelines for the cancer-related checkup (80-1MM-Rev.2/93-No.2070-LE). Atlanta (GA): American Cancer Society; 1993.
74 Benson V, Marano MA. Current estimates from the National Health Interview Survey, 1992. Vital Health Stat Series 10 (189) 1994.
75 Lane PW, Nelder JA. Analysis of covariance and standardization as instances of prediction. Biometrics 1982;38:613-21.[Medline]
76 Korn EL, Dorey FJ. Applications of crude incidence curves. Stat Med 1992;11:813-29.[Medline]
77 Patterson BH, Bilgrad R. Use of the National Death Index in cancer studies. J Natl Cancer Inst 1986;77:877-81.[Medline]
78 Schoenborn CA, Marano M. Current estimates from the National Health Interview Survey, United States, 1987. Vital Health Stat Series 10 (166) 1988.
79 Adams PF, Benson V. Current estimates from the National Health Interview Survey, 1990. Vital Health Stat Series 10 (181) 1991.
80 Sanderson M, Scott C, Gonzalez JF. 1988 National Maternal and Infant Health Survey: methods and response characteristics. Vital Health Stat Series 2 (125) 1998.
81 World Health Organization (WHO). Manual of the International Statistical Classification of Diseases, Injuries, and Causes of Death: based on the recommendations of the ninth Revision Conference, 1975, and adopted by the 29th World Health Assembly. Geneva (Switzerland): WHO; 1977.
82 Little RJ, Rubin DB. Statistical analysis with missing data. New York (NY): John Wiley & Sons; 1987.
83 Haenszel W, Loveland DB, Sirken MG. Lung-cancer mortality as related to residence and smoking histories. I. white males. J Natl Cancer Inst 1962;28:947-1001.
84 Brinton LA, Daling JR, Liff JM, Schoenberg JB, Malone KE, Stanford JL, et al. Oral contraceptives and breast cancer risk among younger women. J Natl Cancer Inst 1995;87:827-35.[Abstract]
85 Community Intervention Trial for Smoking Cessation (COMMIT): summary of design and intervention. COMMIT Research Group. J Natl Cancer Inst 1991;83:1620-8.[Abstract]
86 Gail MH, Byar DP, Pechacek TF, Corle DK. Aspects of statistical design for the Community Intervention Trial for Smoking Cessation (COMMIT) [published erratum in Controlled Clin Trials 1992;14:253-4]. Controlled Clin Trials 1992;13:6-21.[Medline]
87 Community Intervention Trial for Smoking Cessation (COMMIT): II. Changes in adult cigarette smoking prevalence. Am J Public Health 1995;85:193-200.[Abstract]
88 Ghosh M, Rao JN. Small area estimation: an appraisal. Statistical Science 1994;9:55-93.
89
Shopland DR, Hartman AM, Gibson JT, Mueller MD, Kessler
LG, Lynn WR. Cigarette smoking among U.S. adults by state and region: estimates from the
current population survey. J Natl Cancer Inst 1996;88:1748-58.
Manuscript received September 28, 1998; revised February 10, 1999; accepted April 6, 1999.
This article has been cited by other articles in HighWire Press-hosted journals:
![]() |
||||
|
Oxford University Press Privacy Policy and Legal Statement |