Comparison of National Death Index and World Wide Web Death Searches
Howard D. Sesso1,2,
Ralph S. Paffenbarger1,3 and
I-Min Lee1,2
1 Department of Epidemiology, Harvard School of Public Health, Boston, MA.
2 Division of Preventive Medicine, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA.
3 Division of Epidemiology, Stanford University School of Medicine, Stanford, CA.
 |
ABSTRACT
|
---|
The authors used the National Death Index and a World Wide Web Internet site that searches the Social Security Administration master files of deaths to determine the mortality status of 1,000 US subjects from the College Alumni Health Study. Subjects were classified as definitely dead, possibly dead, or presumed alive. Of 246 definite deaths pinpointed by the National Death Index, the World Wide Web identified 94.7% of them. Of 438 men presumed alive according to the National Death Index, the World Wide Web identified 97.5% of them. However, the World Wide Web was not useful for identifying deaths of women. This study demonstrated that the World Wide Web may provide an alternative, inexpensive method of determining the mortality status of subjects in relatively small epidemiologic studies.
computer systems; epidemiologic methods; Internet; mortality
Abbreviations:
NDI, National Death Index; WWW, World Wide Web
 |
INTRODUCTION
|
---|
The National Center for Health Statistics in Hyattsville, Maryland, has maintained the National Death Index (NDI), the most complete source of US vital records, since 1979, with consistently high sensitivity and specificity for death ascertainment (1









12
). Other US agencies also maintain death records, including the Social Security Administration (4
, 8
, 13
, 14
), Internal Revenue Service (2
), Veterans Administration (2
, 15
), and Equifax Nationwide Death Search (9
). The Internet has become an integral part of epidemiologic teaching (16
18
) and research, and remote literature searches, downloaded references, on-line databases and journals, and even Internet-based studies (19
) are available. Deaths can also be researched on the Internet; Social Security Administration death information is available for public access from several sites on the World Wide Web (WWW).
Previous reports suggest that the sensitivity and specificity of the NDI may be slightly higher than those of the Social Security Administration files (2
, 4
, 8
, 14
, 15
). However, given the differences in cost and time required for NDI and Social Security Administration searches through the WWW, investigators may prefer one search over the other. Therefore, we compared the ability of the NDI and WWW to determine the mortality status of 1,000 subjects from the College Alumni Health Study.
 |
MATERIALS AND METHODS
|
---|
College Alumni Health Study
The College Alumni Health Study is an ongoing cohort study of men who matriculated as undergraduates at Harvard University (Cambridge, Massachusetts) between 1916 and 1950 and of men and women who matriculated as undergraduates or graduates at the University of Pennsylvania (Philadelphia, Pennsylvania) between 1928 and 1940 (20
). The cohort was established when alumni returned an initial health questionnaire in either 1962 or 1966. Subsequent questionnaires have been sent periodically to update information on subjects' health habits and medical history.
For the present study, we identified all subjects who had not returned a questionnaire in 1993 and, according to our records, were not known to be deceased. We randomly selected 500 men from Harvard University who last returned a questionnaire in 1988 and 250 men and 250 women from the University of Pennsylvania who last returned a questionnaire in 1980. Sample sizes were determined arbitrarily. We then compared the ability of the NDI and WWW to identify the mortality status of these 1,000 subjects.
National Death Index
The NDI is a computerized list of death records in the United States compiled by the National Center for Health Statistics through contractual agreements with state vital statistics offices. The database includes deaths since 1979 and is updated with approximately 2 million deaths annually from all 50 US states, the District of Columbia, Puerto Rico, and the Virgin Islands. All data for a given calendar year are added to the NDI approximately 1 year later. At the time of this study, the NDI death records were complete through the end of 1996. NDI users submit information on as many as 12 potential matching variables, from which returned records are ranked on the basis of a probabilistic scoring mechanism to determine the likelihood of a true match. The user then must decide which NDI records may be associated with the subjects in question (21
).
World Wide Web
On the WWW, the Internet site http://www.ancestry.com provides free access to the death master file created from Social Security Administration payment records (22
). When our study was conducted, the current master file contained over 59.7 million records through the end of June 1998. A person is included in the Social Security Administration death master file if his or her lump sum benefit was paid as a result of a request from, for example, a family member, an attorney, or a mortuary. To be eligible for such benefits, a person must work for at least 10 years and have paid into the social security system. Additionally, some persons (e.g., federal employees hired before 1984, certain state and local employees, and railroad employees serving for more than 10 years) who contribute voluntarily to social security are also included in the Social Security Administration death file. Users of the WWW site submit information on as many as six potential matching variables, from which records are returned if a possible match is made on one or more of the variables. The user then must decide which, if any, WWW records represent the subjects in question.
Comparison study
By using the NDI and WWW, we searched for possible deaths after January 1, 1988, of Harvard alumni and after January 1, 1980, of University of Pennsylvania alumni. No social security numbers were available. The following identifiers were available for matching: first name, last name, full date of birth, sex (none missing), middle initial (23.8 percent missing or blank), and last known state of residence (36.8 percent missing). Identifiers sent to the NDI included all of these variables. Identifiers on the WWW site included first and last name, last known state of residence, and date of birth.
We tracked all information about each search, including number of hours worked, costs, and total calendar time dedicated. For both the NDI and WWW, matches on all provided identifiers were considered definite. Although we had no data on fathers' surnames for the 250 women, a woman whose last name matched the NDI father's surname was considered a definite match as long as all other identifiers matched. Possible matches agreed on all fields except for a minor mismatch on a single field (e.g., date of birth differed by 1 month, day, or year). We presumed that remaining subjects without definite or possible matches were alive.
This paper presents data separately for the 750 men and 250 women. In our analyses, we compared results from the Social Security Administration search with the NDI search, our presumed "gold standard."
 |
RESULTS
|
---|
The NDI search took 3 months to complete (from data set creation to tabulation of results) compared with 3 weeks for the WWW search (table 1). The total costs, excluding personnel time incurred by the College Alumni Health Study, were $3,050 and $0 (since hardware and software required for searching the WWW already were in place) for the NDI and WWW, respectively. The NDI identified 291 definite deaths, 104 possible deaths, and 605 subjects presumed alive through 1996; the WWW identified 275 definite deaths, 71 possible deaths, and 654 subjects presumed alive through 1998. The NDI and WWW information matched regarding 229 definite deaths and for 561 subjects presumed to be alive. The WWW identified 27 definite deaths after January 1, 1997, which the NDI was unable to identify when the records were searched.
View this table:
[in this window]
[in a new window]
|
TABLE 1. Comparison of characteristics related to National Death Index and World Wide Web searches conducted to determine the mortality status of 1,000 subjects from the College Alumni Health Study
|
|
The results for men are shown in table 2. Of 246 definite deaths classified by the NDI, the WWW data agreed on 216 (87.8 percent) of them and classified another 18 subjects (7.3 percent) as possibly dead, 233 of whom matched on the date of death. The WWW therefore identified up to 233 (94.7 percent) of 246 definite NDI deaths. Among the 438 men presumed alive according to the NDI, the WWW agreed on 400 (91.3 percent) of them. However, the WWW classified 27 additional men as definitely (n = 25) or possibly (n = 2) dead after 1996, beyond the time period searched by the NDI. Therefore, the WWW correctly identified 427 (97.5 percent) of 438 men presumed alive according to NDI data. Cross-classifications according to school did not appreciably alter the results for men.
View this table:
[in this window]
[in a new window]
|
TABLE 2. Comparison of National Death Index and World Wide Web searches to determine the mortality status of men from the College Alumni Health Study*
|
|
The results of the NDI and WWW searches for women are compared in table 3. The NDI classified 45 definite deaths. However, the WWW agreed on only 13 (28.9 percent) of them and classified 2 (4.4 percent) other subjects as possibly dead, of whom 14 matched on the date of death. Overall, the WWW identified 14 (31.1 percent) of 45 definite NDI deaths. Of the 167 women presumed alive according to NDI data, the WWW agreed on 161 (96.4 percent). In addition, the WWW identified 3 women as definitely or possibly dead after the time period searched by the NDI. Therefore, the WWW correctly identified 164 (98.2 percent) of 167 women presumed alive according to the NDI.
View this table:
[in this window]
[in a new window]
|
TABLE 3. Comparison of National Death Index and World Wide Web searches to determine the mortality status of women from the College Alumni Health Study*
|
|
 |
DISCUSSION
|
---|
This study demonstrated that the WWW may provide an alternative method to determine the mortality status of subjects in epidemiologic studies. We found that the NDI and WWW search results were comparable for men. The WWW identified 94.7 percent of the definite deaths and 98.6 percent of the men classified as presumed alive according to NDI data. However, the NDI appears to be a better strategy for determining the mortality status of women; the WWW identified only 31.1 percent of the women whom the NDI considered definitely dead.
Death searches on the WWW site using the Social Security Administration death master file may be conducted at virtually no cost and in less time than required for an NDI search (table 4). This is particularly true when the number of records to be searched is small, as it was in this study; for larger numbers of people, the WWW may be less efficient in terms of time and personnel costs. Although the amount of work needed to prepare the data set and evaluate the results is approximately equal for both NDI and WWW searches, NDI searches take longer since the NDI application must be approved and data then forwarded to NDI for processing. The WWW database also receives continuous updates from the Social Security Administration death master file and includes more recent deaths than the NDI does. Although not relevant to our study, the Social Security Administration also provides information on deaths that occurred before 1979 (2
); the NDI does not.
However, the NDI has several advantages over the WWW site (table 4), including greater flexibility when conducting searches. The NDI allows matching on several identifiers that the WWW does not. In our NDI results, the information on father's surname improved our ability to identify deaths of women, whereas the WWW site had no such information. Consequently, the WWW search missed many definite deaths of women, a finding unlikely to have improved even if our sample size had been larger. When NDI searches are conducted, matching on the middle initial may also improve results. The NDI further permits matching on race, marital status, and state of birth. Additionally, NDI Plus is an optional service that provides the coded causes of death for the highly probable matches (21
).
Some limitations of the present study should be considered. First, we created an a priori protocol by which to evaluate the NDI and WWW results. We expected a small degree of misclassification, particularly regarding possible deaths. However, when the NDI and WWW searches were conducted, only 10.4 and 7.1 percent of subjects, respectively, were classified as possibly dead. Second, we found weaker results for women when we used the WWW site. This finding might have been due to the lack of sex as a matching variable for WWW searches or because a large proportion of women may not be included in Social Security Administration records because of eligibility restrictions. Third, College Alumni Health Study data did not contain social security numbers, which improve the sensitivity and specificity of the NDI and Social Security Administration in identifying deaths (3
5
, 11
). This information could have increased the number of deaths identified by the NDI and WWW. Again, this problem would affect a smaller number of men than women and to similar degrees in NDI and WWW searches. Fourth, our promising results regarding the WWW may reflect only higher rates of death ascertainment from Social Security Administration records for men but not for women in the present study of older persons. Finally, we assumed that the NDI was our gold standard; we did not select subjects on the basis of their confirmed mortality status.
The Internet and WWW have become integral components of epidemiology and public health for research and dissemination of information (22
). For older men in the present study, we found that the WWW site, which uses Social Security Administration death information, provides results comparable to the NDI when ascertaining mortality status. However, the WWW does not appear useful in determining this information about women. Depending on available resources, size and type of population, and available data, the WWW could be a valid source of mortality information.
 |
ACKNOWLEDGMENTS
|
---|
This study was supported by grants CA-44854 and HL-34174 from the National Cancer Institute and the National Heart, Lung, and Blood Institute, Bethesda, Maryland. Dr. Sesso is supported by institutional training grant HL-07575 from the National Heart, Lung, and Blood Institute.
This is report no. LXXV in a series on chronic disease in former college students.
The authors thank Sarah Freeman, Doris Rosoff, and Jawado Sinue for their invaluable assistance in completing this project.
 |
REFERENCES
|
---|
-
Acquavella JF, Donaleski D, Hanis NM. An analysis of mortality follow-up through the National Death Index for a cohort of refinery and petrochemical workers. Am J Ind Med 1986;9:1817.[ISI][Medline]
-
Boyle CA, DecoufléP. National sources of vital status information: extent of coverage and possible selectivity in reporting. Am J Epidemiol 1990;131:1608.[Abstract]
-
Calle EE, Terrell DD. Utility of the National Death Index for ascertainment of mortality among Cancer Prevention Study II participants. Am J Epidemiol 1993;137:23541.[Abstract]
-
Curb JD, Ford CE, Pressel S, et al. Ascertainment of vital status through the National Death Index and the Social Security Administration. Am J Epidemiol 1985;121:75466.[Abstract]
-
Davis KB, Fisher L, Gillespie MJ, et al. A test of the National Death Index using the Coronary Artery Surgery Study (CASS). Control Clin Trials 1985;6:17991.[ISI][Medline]
-
Edlavitch SA, Feinleib M, Anello C. A potential use of the National Death Index for postmarketing drug surveillance. JAMA 1985;253:12925.[Abstract]
-
Edlavitch SA, Baxter J. Comparability of mortality follow-up before and after the National Death Index. Am J Epidemiol 1988;127:116478.[Abstract]
-
Kraut A, Chan E, Landrigan PJ. The costs of searching for deaths: National Death Index vs Social Security Administration. Am J Public Health 1992;82:7601.[ISI][Medline]
-
Rich-Edwards JW, Corsano KA, Stampfer MJ. Test of the National Death Index and Equifax Nationwide Death Search. Am J Epidemiol 1994;140:101619.[Abstract]
-
Stampfer MJ, Willett WC, Speizer FE, et al. Test of the National Death Index. Am J Epidemiol 1984;119:8379.[ISI][Medline]
-
Williams BC, Demitrack LB, Fries BE. The accuracy of the National Death Index when personal identifiers other than Social Security number are used. Am J Public Health 1992;82:11457.[Abstract]
-
LaVeist TA, Diala C, Torres M, et al. Vital status in the National Panel Survey of Black Americans: a test of the National Death Index among African Americans. J Natl Med Assoc 1996;88:5015.[ISI][Medline]
-
Schnorr TM, Steenland K. Identifying deaths before 1979 using the Social Security Administration Death Master File. Epidemiology 1997;8:3213.[ISI][Medline]
-
Wentworth DN, Neaton JD, Rasmussen WL. An evaluation of the Social Security Administration master beneficiary record file and the National Death Index in the ascertainment of vital status. Am J Public Health 1983;73:12704.[Abstract]
-
Fisher SG, Weber L, Goldberg J, et al. Mortality ascertainment in the veteran population: alternatives to the National Death Index. Am J Epidemiol 1995;141:24250.[Abstract]
-
Dean AG, Shah SP, Churchill JE. DoEpi. Computer-assisted instruction in epidemiology and computing and a framework for creating new exercises. Am J Prev Med 1998;14:36771.[ISI][Medline]
-
Joffres MR, LaPorte RE. Bringing epidemiology manuals and books onto the Internet through the Epilink. Am J Epidemiol 1998;147:3259.[ISI][Medline]
-
Macfarlane SB, Cuevas LE, Moody JB, et al. Epidemiology training for primary health care: the use of computer-assisted distance learning. J R Soc Health 1996;116:31721.[ISI][Medline]
-
Soetikno RM, Mrad R, Pao V, et al. Quality-of-life research on the Internet: feasibility and potential biases in patients with ulcerative colitis. J Am Med Inform Assoc 1997;4:42635.[Abstract/Free Full Text]
-
Paffenbarger RS Jr, Lee IM, Wing AL. The influence of physical activity on the incidence of site-specific cancers in college alumni. Adv Exp Med Biol 1992;322:715.[Medline]
-
National Center for Health Statistics. National Death Index user's manual. Hyattsville, MD: US Department of Health and Human Services and Centers for Disease Control, 1997. (Publication no. 7-0810).
-
Laporte RE, Barinas E, Chang YF, et al. Global epidemiology and public health in the 21st century. Applications of new technology. Ann Epidemiol 1996;6:1627.[ISI][Medline]
Received for publication December 28, 1998.
Accepted for publication September 14, 1999.