1 Department of Preventive Medicine and Community Health, Division of Sociomedical Sciences, University of Texas Medical Branch, Galveston, TX.
2 Sealy Center on Aging, University of Texas Medical Branch, Galveston, TX.
3 Department of Preventive Medicine and Community Health, Division of Epidemiology and Biostatistics, University of Texas Medical Branch, Galveston, TX.
4 Department of Internal Medicine, Division of Geriatrics, University of Texas Medical Branch, Galveston, TX.
5 Center for Immigration Research, University of Houston, Houston, TX.
Received for publication June 18, 2003; accepted for publication October 2, 2003.
![]() |
ABSTRACT |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
bias (epidemiology); databases; Hispanic Americans; longitudinal studies; mortality; vital statistics
Abbreviations: Abbreviations: EPESE, Established Populations for Epidemiologic Studies of the Elderly; NDI, National Death Index; NHIS, National Health Interview Survey.
![]() |
INTRODUCTION |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Several substantive explanations for the Hispanic mortality advantage in the United States have been proposed, including health-selective immigration, return migration, and advantages in health-related behaviors and social support (1014). However, before accepting such substantive explanations for the Hispanic mortality advantage, it is important to explore whether systematic data errors create the appearance of a mortality advantage. Two kinds of evidence exist for a Hispanic mortality advantage compared with other groups: 1) mortality rates calculated by linking vital registration death data to census population counts and 2) mortality rates calculated from cohort studies in which mortality is ascertained over a follow-up period. Both kinds of evidence are subject to data errors that might lead to misstatement of a Hispanic mortality advantage.
Mortality rates calculated by linking death counts from vital registration to population counts from census enumerations are subject to error because the numerator and denominator come from different sources. Both data systems are subject to coverage errors (i.e., incomplete enumeration and death registration), discrepancies in ethnic classification, and age misstatement. Each of these errors can produce systematic biases in the calculation of death rates that could create a false appearance of ethnic disparities (15).
Hispanics, like other minority populations, are disproportionately exposed to several risk factors for both underregistration of deaths and underenumeration in a census. These factors include low levels of formal schooling and English language literacy; residency in poor, rural, and near-border environments; overrepresentation in agricultural and domestic service occupations; and undocumented immigration status. It is not clear a priori whether the undercount error will be larger in the numerator (deaths) or the denominator (census). However, one particular concern is that health-selective return migration of international immigrants to a country of origin may lead to underestimation of mortality rates in the United States by systematically removing report of deaths from the numerator but not the denominator of mortality calculations (10, 13, 14, 16). This possibility has been called the "salmon bias" hypothesis (12). Another especially important concern is that Hispanic ethnicity may be systematically underreported in death registration compared with census data. This underreporting may occur in particular because the ethnicity field on the death certificate is sometimes filled out by a physician or funeral director who may not know the decedent well (15, 17). In addition, overstatement of age on the death certificate lowers age-adjusted mortality estimates (18), which may occur disproportionately among Hispanics because of their relatively low levels of formal schooling and potentially limited access to a birth certificate. All of these concerns cumulate to raise questions about comparison of Hispanic and non-Hispanic mortality using vital registration death rates alone.
The second kind of data documenting a Hispanic mortality advantage comes from studies that ascertain vital status for large cohorts over a period of follow-up. Data from cohort studies seemingly provide powerful corroborating evidence about the Hispanic mortality advantage because they eliminate concerns about inconsistent ethnic classification in census and vital registration and reduce concerns about age misstatement. Hispanic identification is fixed by self-report at the beginning of the study, so inconsistency of reporting ethnicity at subsequent points does not affect calculated mortality rates. Two important studies have used this design. The National Longitudinal Mortality Study reported mortality follow-up for select cohorts from the Current Population Survey. A second study has also ascertained mortality for nine successive cohorts (19861994) from the National Health Interview Survey (NHIS). Both data sets showed mortality advantages for Hispanics compared with non-Hispanic Whites (1, 2, 4).
However, cohort studies are vulnerable to a potential bias resulting from underascertainment of mortality during the follow-up period. Mortality follow-up for both the National Longitudinal Mortality Study and the NHIS was accomplished by linking data on survey respondents to mortality records in the National Death Index (NDI). The NDI matching process uses the following information to identify cases: first and last names, middle initial, fathers surname, Social Security number, date of birth, state of birth, state of residence, sex, race, marital status, and age at death (19). Researchers have developed probabilistic approaches that classify cohort members as matched or unmatched to death records based on the level of agreement and disagreement between survey and death record information (20). This matching process has been carefully validated by using calibration samples from studies with an active follow-up of vital status (21). Both the NDI database and the matching method used for linkages between survey respondents and NDI records are considered the "gold standard" for ascertainment of mortality for a community cohort (22).
Important questions remain, however, about the accuracy of death information when the NDI is used for Hispanic cohorts. Emigration from the United States presents potential problems for NDI-based studies because deaths outside of the country are not included in the NDI database. Even when a death occurs in the United States, automated matching algorithms may not work as well for Hispanics as they do for other groups, especially non-Hispanic Whites. There are several reasons for this concern. The NDI matching method relies heavily on Social Security number matches. Use and accuracy of Social Security numbers may be lower both in survey responses and on death certificates for segments of the Hispanic population that are undocumented or whose work experience is primarily in the informal agriculture or domestic service sectors. Name matches may also be less reliable for Hispanics than for non-Hispanic Whites because Hispanic naming practices differ from non-Hispanic White conventions in several ways that can affect how names are reported both to survey researchers and on the death certificate. For example, many Hispanics use both fathers and mothers surnames as part of their name. By custom, at marriage, the bride adds her husbands surname while retaining her parents surnames. For many Hispanics, the first surname listed is considered the primary surname, in contrast to the conventional US emphasis placed on "last name." Identifying a single "middle name" may also be difficult for some Hispanics because this term may refer to one of several given names or one of several family surnames. Anglicized name variants, for example, "Mary" rather than "Maria," may also be used in giving information to non-Hispanics (23).
A lower NDI match rate for Hispanics for any of these reasons would increase the appearance of a Hispanic mortality advantage. Unfortunately, performance of the NDI matching algorithm has not been investigated for Hispanics because the calibration samples used thus far have contained few Hispanic members (24). These issues have contributed to continuing debate about the existence and magnitude of the Hispanic mortality advantage (11, 16).
The purpose of this study was to determine empirically whether bias exists in Hispanic mortality estimates based on an NDI search for a large cohort of older Mexican Americans. Data from the Hispanic Established Populations for Epidemiologic Studies of the Elderly (EPESE) were used for this investigation. The Hispanic EPESE collected identifying information for all 12 items used in the NDI matching process, making it possible to reproduce closely the matching methods used in other population surveys linked to the NDI. Additionally, the Hispanic EPESE includes 7 years of active follow-up of vital status through interviews with subjects or reports from proxy informants. This combination of vital status information from proxy reports and the NDI makes the Hispanic EPESE a unique data set evaluating mortality information for Hispanics.
We performed three tasks in this study. First, we compared vital status information from proxy informants and NDI matches. Second, we examined discrepancies in mortality ascertainment by age, sex, and nativity status. Finally, we derived adjustment factors for underascertainment from the Hispanic EPESE and applied them to 19861994 NHISNDI linked data to illustrate the impact of these adjustment factors on estimates of the Hispanic mortality advantage.
![]() |
MATERIALS AND METHODS |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
The first vital status information for the Hispanic EPESE cohort was collected during each of the three follow-up interview waves. If a subject was missing at follow-up, then proxy respondents were asked whether the subject had died, had moved, or had been admitted to a hospital, nursing home, or hospice facility. Proxy informants who reported that the subject was deceased were interviewed by using a death questionnaire that asked the date, location, and cause of death as well as questions about hospital or nursing home admissions prior to the subjects death. The overwhelming majority (86 percent) of proxy informants were family members of the missing subject. In 164 instances, interviews with proxy informants were not completed for subjects lost over the follow-up period (i.e., neither the subject nor a proxy informant was reinterviewed after baseline). Therefore, data were analyzed for 2,886 subjects, with vital status ascertained through interviews with subjects or proxy informants. Excluded subjects for whom vital status information was missing were significantly younger (mean age, 69.2 years; standard deviation, 5.2 years) than those subjects included in the data analysis (mean age, 73.3 years; standard deviation, 6.8 years), although no differences were observed by sex and nativity.
Vital status information obtained from interviews with subjects or proxy informants (hereafter referred to as "proxy reports") was compared with information from a match of the Hispanic EPESE cohort to death records in the NDI database through a search performed in 2002. Deaths through December 31, 2000, were ascertained. The system used by the National Center for Health Statistics (Hyattsville, Maryland) to match the NHIS to the NDI was used to identify deaths of Hispanic EPESE subjects (26). This system involves a two-step process. In the first step, NDI records are identified as potential matches to subjects when one of nine criteria is met (formerly 12 criteria were used by the NDI, but, in 1999, four of the Social Security numberrelated criteria were collapsed into a single Social Security number match criterion). The nine criteria are various combinations of agreement between supplied survey information and NDI records on seven matching items. For a given subject, one or more NDI records may be identified as potential matches; for other subjects, no potential matches are identified. In the second step, potentially matched records are evaluated to determine whether one particular record can be identified as a "true" match to the survey subject. In this step, statistical weights derived from calibration samples provide investigators with probability scores to evaluate potential matches. All potential matches are categorized into mutually exclusive classes based on which items matched and the combination of matched items. Class-specific cutoffs of probability scores are then used to identify "true" matches (21). Using the NHIS-NDI cutoff scores, we identified 753 deaths among the 2,886 subjects in the Hispanic EPESE study sample.
To evaluate the implications of the findings from the Hispanic EPESE for mortality differentials between Hispanics and non-Hispanic Whites, adjustment factors calculated from the Hispanic EPESE data were applied to mortality information estimated from the 19861994 NHIS cohort data that were linked to the NDI. The NHIS is an annual cross-sectional survey of approximately 75,000 persons, representative of the noninstitutionalized US adult population. Vital status searches for subjects from these surveys were performed through December 31, 1997, using the NDI. To provide survival time comparable to that in the Hispanic EPESE (7.5 years), data on subjects from the 19861991 NHIS cohorts were eligible for analysis. Data for all 66,667 non-Hispanic White and Mexican-American (self-identified as Mexican Mexicano, Mexican American, or Chicano) subjects aged 65 years or older were used in the analysis.
Data analysis
To compare the vital status of Hispanic EPESE subjects by proxy reports and the NDI, both simple agreement and the kappa statistic were calculated. Stratified analyses of mortality by age, sex, and nativity were also performed to investigate systematic differences in death ascertainment as a function of each of these characteristics. For each vital status source, logistic regression models were used to examine univariate associations of vital status with age, sex, and nativity. Multivariate proportional hazards models were also estimated to determine the effects of demographic factors on survival. Odds ratio and hazard ratio effect sizes were compared across each vital status source.
To assess the magnitude of death underascertainment bias, ratios of proxy-reported to NDI mortality rates as well as 95 percent confidence intervals were calculated using the Hispanic EPESE data. To determine how this level of underascertainment would affect estimates of mortality for Mexican Americans in comparison to non-Hispanic Whites, these mortality ratios were then applied as adjustment factors to 19861994 NHIS Mexican-American (subjects aged 65 years) mortality rates. Multiplying the stratum-specific mortality ratio by the respective NHIS observed mortality rate provided an adjusted mortality rate. This methodology has been used previously to adjust mortality data for inconsistent race/ethnicity classifications of Native Americans in the United States and Mäoris in New Zealand (2729).
Age standardizations for observed and adjusted mortality rates were made in the NHIS data by using the direct method. Standardized rates were prepared for each sex within each ethnic group (Mexican Americans and non-Hispanic Whites). All subjects aged 65 years or older in the entire 19861997 NHIS data set were considered the standard population. Age-standardized mortality rate ratios of NHIS Mexican Americans to non-Hispanic Whites were calculated with 95 percent confidence intervals by using the Mexican-American (exposed group) distribution as the reference standard (30).
Prior to applying adjustment factors from the Hispanic EPESE to the rates calculated from the NHISNDI, it was important to ensure that the mortality structure was comparable between the two data sets. The age distributions of NDI deaths in the Hispanic EPESE and the NHIS were found to be similar.
For Hispanic EPESE subjects, survival time was estimated separately for each vital status source. The survival clock began with the baseline interview date for both sources. For mortality rates based on the NDI, subjects were right censored either on their date of death according to NDI or on December 31, 2000, for unmatched cases. Censoring subjects for whom proxy report information was available depended on reinterview status. If a subject refused to participate in the study at follow-up, then he or she was right censored at the midpoint of that data collection period (dates of refused interviews were not recorded). Proxy informants who reported that a subject had died were asked for the month and year of the death, which was used as the censoring date (the 15th was used as the day of death for all proxy-reported deaths). Subjects were right censored on December 31, 2000, if a censoring event occurred after this date. Finally, those subjects interviewed in the fourth wave of data collection were right censored either on December 31, 2000, or their date of interview that preceded January 1, 2001.
In calculating the preadjustment survival time from the NHIS-NDI data, survival time also began on the date of interview. To make 19861997 NHIS survival data comparable to the 1993/19942000 Hispanic EPESE data, subjects whose survival time exceeded 7.5 years were right censored at this maximal point (2,740 days). All other NHIS subjects were right censored with death dates from the NDI or December 31, 1997, whichever came first. Data were analyzed by using the SAS and SUDAAN statistical packages (31, 32).
![]() |
RESULTS |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
|
|
|
|
![]() |
DISCUSSION |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Much of the discordance between proxy-reported and NDI-identified deaths likely stems from problems with matching of Social Security numbers. Although researchers evaluating the NDI report that death ascertainment is good when identifiers other than Social Security number (3436) are used, the most important determinant of an NDI match is the Social Security number (37, 38). Indeed, the sensitivity of NDI death identification has exceeded 95 percent when Social Security numbers were used (22). Given the importance of the Social Security number for NDI death ascertainment, it is not surprising that misclassification was most concentrated among the oldest old and foreign-born Mexican-American women. It is likely that some of these women born before 1919 were never employed in the formal US economy or for a time had undocumented immigrant status and thus never acquired a valid Social Security number. For example, our tabulations of the California mortality master file for the years 19931998 confirmed that the Social Security number field was blank for 7.3 percent of Mexican-American women and 5.6 percent of Mexican-American men born before 1919 compared with just 0.4 percent of non-Hispanic White men and women in the same age group. Another possibility is that older Mexican Americans may use relatives or friends Social Security numbers or that women may use their husbands. Thus, reliance on the Social Security number for NDI death matching may be problematic for older Mexican Americans, especially women. The findings from this study are concordant with these expectations.
Other major identifiers, such as date of birth, sex, or race, did not yield many NDI matches for Hispanic EPESE subjects. Only 119 Hispanic EPESE subjects were identified as dead by the NDI when the Social Security number did not match, even though all other identifiers were supplied to the NDI. That is, the Hispanic EPESE had fewer NDI matches with probability scores meeting cutoffs for classification categories that do not require a Social Security number. This problem could result from misrecording of data by survey interviewers or misrecording of information on death certificates, such as name misspelling or use of a name variant. For example, our tabulation of 19931998 California State vital statistics data found that persons identified as Mexican American were more than twice as likely as non-Hispanic Whites (34.0 percent vs. 14.0 percent) to have a blank middle name field in the data and were more than seven times as likely (5.9 percent vs. 0.8 percent) to report mothers maiden name as their own surname. This heightened potential for misrecording of items both on death certificates and in survey responses underscores the challenges of matching Hispanics to the NDI.
The relatively high proportion of underascertained deaths of foreign-born persons is consistent with the "salmon bias" hypothesis, which predicts a higher rate of underascertainment for the foreign born because it is likely that emigration from the United States would be more common for the foreign born than for those born in the United States (10). That is, some of the foreign born may maintain active social ties in the home community that they left when they came to the United States (39). They may choose to return home for social support when they experience illness (40). However, only 21 of the 99 foreign-born Hispanic EPESE decedents (21.2 percent) not matched by the NDI were reported by proxy informants to have died in Mexico. Thus, "salmon bias" accounts for only a small proportion of deaths not ascertained by the NDI.
Use of automated matching algorithms is a practical necessity for studies that seek to link large population-based cohorts to the NDI. That these algorithms can sometimes fail is well understood. NDI documentation emphasizes that the classification and scoring apparatus provided to users of this service are intended as tools rather than definitive determinants of vital status. It is the responsibility of the end user to use the results of the algorithm, along with any additional information available, to determine whether a record is matched (19). The experience of linking the Hispanic EPESE to the NDI suggests that standard algorithms calibrated for the general US population may not work as well for this population subgroup.
Alternative practices for submitting survey records for an NDI search may improve the match rate for Hispanics. For example, researchers who are collecting survey data on Hispanic populations and anticipate linking these data to the NDI database may want to collect information about both mothers and fathers surnames and to submit duplicate records with each parental surname as "last name." All possible middle initials derived from parental surnames and from middle names may be tested. Records may be submitted without Social Security numbers to prevent false nonmatches from misreported Social Security numbers. Researchers should also carefully review potential matches at a lower threshold of probability than is the case for other groups. Until such alternative practices can be developed into calibrated scoring algorithms sensitive to different use of identifiers for Hispanic populations, NDI-based comparisons of mortality rates for Hispanics and other subpopulations should be made with caution.
The current study is limited because death certificates were unavailable to permit a more nuanced evaluation of the NDI matching method for Hispanics. Such evaluation could identify matching elements that may need to be weighted differently for Hispanics to optimize the matching algorithm in this ethnic group. In addition, differences in data collection procedures between the Hispanic EPESE and the NHIS did not allow a direct assessment of the NHISNDI matching process. Application of bias estimates from the Hispanic EPESE to the NHISNDI study is reported only to illustrate the magnitude of underascertainment effects on the Hispanic mortality paradox. Finally, mortality estimates for non-Hispanic Whites were not adjusted for underascertainment because a second source of vital status information was not available for the NHISNDI cohort. In studies that report sensitivity of the NDI, estimates range from 92 percent to 98 percent for Whites (37, 41, 42). Considering that Hispanics are included in these estimates, the sensitivity of the NDI for non-Hispanic Whites is likely closer to the upper bound. Assuming that the NDI has a sensitivity of 95 percent for non-Hispanic White men and women, the mortality rate ratio adjusted for underascertainment would likely reduce to 0.80 for men and 1.12 for women.
The strength and importance of this study is that it is the first known to evaluate NDI matching in a large sample of Hispanics with a second independent source of vital status information. The current study documents a pattern of NDI death underascertainment that has implications for comparing death rates across racial and ethnic groups. An important implication is that small classification errors or underperformance of the matching algorithm for subgroups can have an important effect on calculated ethnic differentials, especially for the older Hispanic population. The social and economic characteristics of this population likely contribute to data quality problems that lead to overstatement of the Hispanic mortality advantage in cohort studies linked to the NDI. Results from this study and others underscore the importance of interpreting mortality rates for ethnic minority groups with caution. Considering that death rates by race/ethnicity are used to set and evaluate "Healthy People 2010" goals (43), improving the quality of vital registration and data reporting systems for ethnic minority populations needs to become a greater public health priority.
![]() |
ACKNOWLEDGMENTS |
---|
The authors thank Drs. Christine Cox and Bryan Sayer for providing useful comments on the manuscript.
![]() |
NOTES |
---|
![]() |
REFERENCES |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|