1 Channing Laboratory, Department of Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, MA.
2 General Medicine Division, Department of Medicine, Massachusetts General Hospital, Harvard Medical School, Boston, MA.
3 Department of Epidemiology, Harvard School of Public Health, Boston, MA.
4 Department of Environmental Health, Harvard School of Public Health, Boston, MA.
5 Department of Emergency Medicine, Massachusetts General Hospital, Harvard Medical School, Boston, MA.
![]() |
ABSTRACT |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
airway obstruction; cohort studies; epidemiologic methods; lung diseases, obstructive; pulmonary disease, chronic obstructive; questionnaires
Abbreviations: CI, confidence interval; COPD, chronic obstructive pulmonary disease; FEV1, forced expiratory volume in 1 second; FVC, forced vital capacity; NHANES III, Third National Health and Nutrition Examination Survey
![]() |
INTRODUCTION |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Although smoking is the primary cause of COPD, smoking does not explain all of the variance in the development of COPD. Only about 1520 percent of smokers develop symptomatic COPD (6), and approximately 10 percent of COPD-related mortality occurs in persons with no history of cigarette smoking (7
). Given that pulmonary function declines with age (8
), smoking cessation alone may not be adequate to protect aging persons from developing COPD.
Ongoing prospective cohort studies such as the Nurses' Health Study (9) and the Health Professionals' Follow-up Study (10
) potentially provide opportunities to examine nutritional and other lifestyle risk factors for COPD. These two studies have collected large quantities of data on modifiable risk factors biennially over the last 25 years. However, they include large numbers of subjects dispersed throughout the United States and are conducted by mail. Since it was infeasible to conduct personal interviews, physical examinations, and unified, directly measured spirometry on all 121,700 participants in the Nurses' Health Study, we established a questionnaire-based case definition for COPD. All participants in these cohorts have been asked about a physician diagnosis of COPD and asthma every 2 years since 1988, and separate supplemental COPD and asthma questionnaires were mailed in 1998. However, it is not known whether these health professionals reliably report a physician diagnosis of COPD.
Therefore, we conducted a validation study of a questionnaire-based case definition of COPD by reviewing the medical records of a stratified, randomly selected sample of participants reporting COPD. We hypothesized that a self-report of a physician diagnosis of COPD from a participant in the Nurses' Health Study on an original and supplemental form is a valid marker of COPD. We also tested the effect of misclassification of COPD in a hypothetical cohort proportional in size to the Nurses' Health Study.
![]() |
MATERIALS AND METHODS |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Follow-up and definition of cases
In 1998, a one-page electronically scannable COPD questionnaire was sent to all living nurses who reported a physician diagnosis of emphysema or chronic bronchitis from 1988 to 1996. This supplemental questionnaire requested information confirming a physician diagnosis of emphysema, chronic bronchitis, or COPD; dates of symptom onset and diagnosis; tests performed to confirm the diagnosis; symptoms consistent with a diagnosis of chronic bronchitis; and a concurrent physician diagnosis of asthma. A similar supplemental questionnaire was mailed in the same year to all living nurses who reported a physician diagnosis of asthma from 1988 to 1996. This asthma questionnaire asked about dates of asthma onset, diagnosis, remission, and recurrence; physician diagnosis of COPD; history of atopy; asthma-related hospital admission; and past systemic steroid use. Both questionnaires also included items on recent medication use, respiratory symptoms, health care utilization (hospital admissions, emergency department visits, urgent office visits), and results of spirometry in the preceding year. The supplemental questionnaires were mailed to nonresponders two additional times.
We wanted to use the contemporary clinical definition of COPD: a diagnosis of COPD, emphysema, or chronic bronchitis with evidence of airflow obstruction that is not fully reversible (1113
). Since spirometry data were not available for all participants, we defined self-reported COPD with varying levels of certainty. Using data from the supplemental questionnaire, we categorized participants who self-reported COPD into overlapping groups of "definite," "possible," and "probable" cases based on the criteria defined in table 1. Definitions were established independent of smoking status. Cases were excluded if the supplemental questionnaire showed a report of normal spirometry or comorbid pulmonary disease other than asthma. Since COPD is rarely, if ever, diagnosed before age 35 years (11
) (in contrast to asthma), cases were also excluded if their reported age at COPD diagnosis was 35 years or less.
|
We sent these participants as many as five letters requesting authorization to release and review their medical records in 1999. For participants who provided authorization, we requested medical records consisting of two pulmonary function tests, two chest films, one chest computed tomog-raphy scan, two office or emergency department visits for COPD or asthma, and one hospital discharge summary regarding COPD or asthma. As many as three requests were sent to participants' physicians and hospitals. A trained physician (R. G. B.) reviewed all medical records, blinded to self-reported outcomes. Data were extracted using a standardized extraction form, with particular attention to spirometry records and admission notes.
Self-reported cases of disease were confirmed as COPD if spirometry was consistent with a diagnosis of COPD, if the radiologist's interpretation of a chest film or computed tomography scan included COPD or emphysema, or if an attending physician documented a diagnosis of COPD or emphysema in the medical record. Spirometry was classified according to Global Initiative for Chronic Obstructive Lung Disease (GOLD) guidelines, which define moderate-to-severe COPD as forced expiratory volume in 1 second (FEV1)/forced vital capacity (FVC) < 0.70, with FEV1 < 80 percent predicted (13). Documentation of FEV1
80 percent predicted or FEV1/FVC
0.70 generally excluded validation of self-reported COPD. Exceptions were made on an individual basis for participants for whom there was repeated physician documentation of COPD years after a spirometry report showing an FEV1 value of 8099 percent predicted by nomogram or FEV1/FVC
0.70. GOLD guidelines classify mild COPD as FEV1/FVC < 0.70 and FEV1
80 percent predicted (13
). FEV1
100 percent predicted excluded validation of self-reported COPD in all instances. A medical record notation of chronic bronchitis was considered inadequate to confirm a self-reported case in the absence of spirometry records meeting diagnostic criteria. Medical records were also reviewed systematically for alternative pulmonary diagnoses and for a physician diagnosis of asthma.
Statistical analysis
The proportion of cases validated was calculated as confirmed cases divided by reported cases. This proportion corresponds to the positive predictive value of the questionnaire-based definition.
Because reporting of COPD on the questionnaire may be misclassified, we estimated the impact of any potential bias on relative risk estimates in a hypothetical cohort. We performed a data simulation with a hypothetical cohort of 40,000 subjects, half of whom were "exposed" (e.g., high fish oil intake) and half "unexposed" (e.g., low fish oil intake). We specified 1,000 questionnaire-based cases of COPD in this hypothetical cohort, approximating the number of prevalent cases of "possible" COPD reported in the highest and lowest quintiles of a given exposure in the Nurses' Health Study. We distributed these potentially misclassified cases according to possible observed relative risk values varying from 0.4 to 2.5 and assumed that misclassification was independent of exposure.
We then calculated the number of correctly classified cases (i.e., if misclassification were absent) using the following formula (14): A = (A* - Fp x N)/(Se + Sp - 1), where A is the number of correctly classified cases of COPD, A* is the number of questionnaire-based cases of COPD, Fp is the false-positive probability (1 sensitivity), N is the number of participants within a stratum, and Se is the sensitivity and Sp the specificity of a questionnaire-based report of COPD. Unbiased relative risks and 95 percent confidence intervals were calculated from these correctly classified cases.
The sensitivity and specificity of a report of COPD were calculated from the number of true positives and false positives observed in the validation set and by assuming a false-negative probability of 0.01 among a representative 10 percent random sample of the overall cohort. The assumption implies that 1 percent of the cohort had COPD and did not report it. Sensitivity analyses were performed to check this assumption.
Associations between proportions were tested with the 2 test. Student's t test and one-way analysis of variance were used to compare means of normally distributed continuous variables. Two-tailed p < 0.05 was defined as statistically significant. SAS 7.0 software (SAS Institute, Inc., Cary, North Carolina) was used for analyses.
![]() |
RESULTS |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Response to medical record requests
We randomly selected 422 participants with self-reported COPD to achieve a 10 percent sample, and we sampled an additional 171 persons who reported chronic bronchitis alone without diagnostic symptoms. Thirteen participants who reported COPD and five in the latter group died in the interim; among the remaining 575 persons, 510 responded (89 percent). Of the respondents, 50 reconfirmed the diagnosis but refused access to their medical records, 14 neither confirmed nor denied the diagnosis and refused further participation, and 32 denied both the diagnosis and access to their medical records. Sufficient records were obtained to complete the medical record review for 376 of 414 (91 percent) participants who allowed access to their records. Therefore, medical records were available for 273 (65 percent) of self-reported cases of COPD and 103 (60 percent) persons with chronic bronchitis alone without diagnostic symptoms.
Included participants were similar to women for whom data were incomplete with respect to age and year of diagnosis of COPD. However, the former were more likely to have "definite" COPD according to the questionnaire criteria (p = 0.04) and to report a concurrent physician diagnosis of asthma (p < 0.001).
Validation
Overall, 78 percent of self-reported cases of COPD fulfilling "possible" criteria were confirmed by medical record review (table 2). The proportion confirmed increased for questionnaire-based categories of "possible" to "probable" to 86 percent for "definite" COPD (p for trend = 0.02). Participants who reported "definite" COPD were more likely to be confirmed to have COPD by spirometry (p < 0.001) than were other participants; mean FEV1 percent predicted was marginally lower among these participants than among all confirmed cases for whom pulmonary function test data were available (p = 0.05). Restricting the analysis to participants who reported a new diagnosis of COPD after 1988 resulted in confirmed proportions of 0.83, 0.84, and 0.90 for "possible," "probable," and "definite" COPD, respectively. Physician diagnosis alone confirmed 16 percent of "possible" cases, 13 percent of "probable" cases, and 8 percent of "definite" cases.
|
|
|
Unconfirmed cases
Sixty-one self-reports of COPD were not confirmed by medical record review. Fifty-seven percent of these 61 women's medical records included a diagnosis of asthma, and 20 percent had an alternative pulmonary diagnosis such as sarcoidosis or pulmonary embolism.
Thirty-two women who refused access to their medical records also denied that they had previously reported a diagnosis of COPD. However, three women in this group had previously reported a hospitalization for COPD, and 50 percent had reported an emergency room or urgent office visit for COPD in 1998. Sixty percent reported taking medication for COPD during the week before the 1998 supplemental questionnaire was administered, and the median number of pack-years of smoking at the beginning of follow-up (1988) was 40.
Effect of misclassification
In the data simulation, nondifferential misclassification of COPD of the magnitude found in the validation sample biased relative risks toward the null value, as shown in table 5. For example, if the relative risk estimated from questionnaire-based cases of COPD was 0.50 (95 percent confidence interval (CI): 0.44, 0.57), the unbiased relative risk estimated from correctly classified cases of COPD was 0.39 (95 percent CI: 0.34, 0.44) after correction for misclassification (table 5). As anticipated, the same pattern was found for harmful exposures: a questionnaire-based relative risk of 2.00 (95 percent CI: 1.76, 2.28) suggested an unbiased relative risk of 2.58 (95 percent CI: 2.27, 2.93).
|
![]() |
DISCUSSION |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Data from the Third National Health and Nutrition Examination Survey (NHANES III), a representative sample of the general US population, suggest that our validation results are not generalizable to the overall population. In NHANES III, the positive predictive value of a physician diagnosis of COPD for low lung function (according to American Thoracic Society criteria) can be calculated as 0.33 (15). In contrast, the positive predictive value in our study was 0.78 for medical record documentation of COPD overall and 0.71 for low lung function (excluding participants for whom lung function data were missing, as was done in the NHANES III analysis). These proportions improved with more refined, questionnaire-based definitions of COPD.
The difference in these results is likely due to our repeated questionnaire administration (i.e., affirmative responses were required on two questionnaires separated by 210 years), exclusion of COPD diagnoses prior to age 35 years, the older average age of participants, the requirement for symptoms diagnostic of chronic bronchitis if chronic bronchitis alone were reported, and a higher prevalence of COPD. In addition, since a woman is less likely than is a man with an identical medical history to receive a physician diagnosis of COPD (16, 17
), women have a higher probability of having COPD if labeled with the diagnosis. Finally, health professionals are more likely than nonhealth professionals to reliably report their medical history.
Although the overall proportion of women confirmed to have COPD in our study was high, it varied by "type" of COPD. The lower proportion for chronic bronchitis (with symptoms) is probably due in part to our validation process; because we decided to adhere to the contemporary definition of COPD (1113
), we concentrated on medical record evidence of obstruction on spirometry. Therefore, women who reported chronic bronchitis and had a documented diagnosis of chronic bronchitis on their chart were not considered to have COPD unless there was also evidence of airway obstruction. Many women whose disease was not validated probably had simple, nonobstructive chronic bronchitis.
There was a large overlap of asthma and COPD in our data, similar to population-based studies of physician diagnoses of obstructive airways disease (15, 16
). Although the epidemiologic definition of asthma is unclear (18
), a single questionnaire item on prior diagnosis of asthma has been shown to effectively separate patients with asthmatic bronchitis from patients with purely smoking-related COPD (8
). The higher proportion of cases confirmed for COPD only compared with asthma and COPD suggests a similar pattern in our data. However, stratification by order of diagnosis in the group that reported both diseases was generally unilluminating.
Lacking FEV1 data on all 121,700 members of this cohort, we adopted a diagnosis-based case definition of COPD among these nurses. Use of a physician diagnosis of COPD as an outcome has been criticized in the past as subject to bias and misclassification (19). However, common biases that might apply selectively to cases and not to other members of the Nurses' Health Study are reduced by the prospective cohort design, standardized questionnaires, consistent questionnaire administration, and very high follow-up rates.
Misclassification of the outcome is unavoidable, given the questionnaire-based design, but the bias that it produces may be predictable. Nondifferential misclassification will generally bias relative risks toward the null value (20). If the false-positive probability is knownfrom a validation study, for exampleand if one assumes that misclassification is nondifferential and that exposure is classified correctly, then misclassification can be corrected (21
). Given the false-positive probability calculated from this validation study and the assumptions above, it is possible to estimate corrected, unbiased relative risks, as we showed in the data simulation.
COPD is underdiagnosed in the general US population (15, 22
); therefore, even with perfect reporting, it is likely that a sizable number of undiagnosed cases would be missed. Spirometry data for a random sample of participants in the overall cohort would be one way to quantify the number of undetected cases of COPD (false negatives), but this approach was impractical given the geographic dispersion of participants throughout the United States and contact by mail only.
Such measurement of false negatives is not necessary for correct estimation of relative risks as long as misclassification is nondifferential (21). As we showed in the sensitivity analysis, bias of relative risks from nondifferential disease misclassification is driven by false positives, not false negatives. Assuming extreme and unlikely values of false negatives, such as 0 or 10 percent of the cohort having unreported COPD, produced trivial changes in relative risks. When person-years is used as the unit of measurement, false negatives that are balanced across exposure groups have no effect on the relative risk (14
). On the other hand, false negatives do bias the risk difference and are required for the calculation of sensitivity and specificity (23
). Since we were primarily interested in relative risks in the person-years data, results from this study are sufficient, and it is unnecessary to sample noncases to find all false negatives.
In summary, the proportion of validated self-reported cases of COPD among participants in the Nurses' Health Study was substantial and sufficient to allow accurate estimation of relative risks. The use of self-reported COPD from health professionals provides an opportunity to explore hypotheses related to nutritional and lifestyle risk factors in larger populations groups than would be feasible if directly measured spirometry were required to rule in or rule out COPD. With these methods, it is hoped that additional risk factors for COPD will be found, modification of which may complement smoking cessation.
![]() |
ACKNOWLEDGMENTS |
---|
The authors thank Karen Corsano, Gary Chase, and Barbara Egan for their invaluable assistance in implementing the study and Drs. Meir J. Stampfer and Scott T. Weiss for their helpful comments on the manuscript.
![]() |
NOTES |
---|
![]() |
REFERENCES |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|