Estimating deaths from industrial injury by capture-recapture: a cautionary tale

Richard M Cormacka, Yue-Fang Changb and Gordon S Smithc

a Mathematical Institute, School of Mathematical and Computational Sciences, University of St Andrews, Scotland.
b Department of Epidemiology, Graduate School of Public Health, University of Pittsburgh, USA.
c Center for Injury Research and Policy, School of Public Health, Johns Hopkins University, USA.


    Abstract
 Top
 Abstract
 Introduction
 Methods
 Results
 Discussion
 References
 
Background Capture-recapture methods are increasingly being used to improve surveillance for a number of diseases. However, concerns persist regarding the validity of estimates obtained.

Method Capture-recapture methods were applied to estimate the ability of four separate data sources on occupational fatalities to predict the 237 deaths (‘gold standard’) we determined from a special in-depth study of medical examiner records.

Results and Conclusion Capture-recapture results based upon the four sources vary according to different models. However, both separately and in aggregate of industry type and cause of death, most models seriously underestimate the gold standard, and give a misleading impression of precision of the estimate of hidden individuals. It is commonly believed that reliable estimates from such methods require lists with high coverage and parsimonious models. Here, to obtain an estimate consistent with the gold standard, the list with almost complete coverage must be discarded and a complex model fitted. It is argued that this conclusion is of widespread application.

Keywords Occupational injury, capture-recapture, interactions, log-linear model

Accepted 7 June 2000


    Introduction
 Top
 Abstract
 Introduction
 Methods
 Results
 Discussion
 References
 
During the past few years capture-recapture methods have been increasingly used to improve health surveillance methods. They have been applied in a wide variety of the health areas to estimate the size of a hidden population, and the estimate then used to generate the incidence and/or prevalence rate of a specific disease or event. Such information is crucial for public health officials and policy makers. With multiple sources for case identification, capture-recapture methods seem to provide a way to improve the accuracy of the estimates of rates. However, it is necessary to recognize that these methods, originally developed for planned surveys of wildlife populations, need to be re-examined and adapted for application to epidemiology and public health monitoring. Although theoretical development during the past 30 years led to methods more suitable for counting human populations, there are still many unanswered questions. Several articles have discussed the limitations of applying these methods to health issues, for example the influence of source dependency, imperfect data linkage and heterogeneity of the population.1 The focus of the current study is to examine further the ability of capture-recapture analyses to estimate the number of hidden individuals and to pinpoint difficulties by comparing estimates with a ‘gold standard’, using Maryland fatal occupational injury data. Five separate lists were employed for case identification. One of these lists was especially created after detailed record review of injury fatality investigations from medical examiner records and contains all deaths identified from any list. This complete and most ‘expensive’ list serves as a ‘gold standard’ for the size of the population. The purpose of this paper is to determine how well the remaining four relatively inexpensive lists that are available for all states can attain the ‘gold standard’. Results from various models are compared.


    Methods
 Top
 Abstract
 Introduction
 Methods
 Results
 Discussion
 References
 
Data sources
The sources of data employed in the study were death certificate (DC) from the Health Statistics Office, Maryland Occupational Safety and Health (MOSH), the National Traumatic Occupational Fatality (NTOF) database, Maryland Workers' Compensation Commission (WC), and data from a special study to review all deaths at the Office of the Chief Medical Examiner (OCME).

The Health Statistics Office within the Maryland Department of Health and Mental Hygiene maintains data for all deaths that occur in the State of Maryland and also for all deaths that occur to Maryland residents. The injury at work item, demographic variables, and the cause of death data are coded from the death certificate by trained nosologists. All deaths from the E-codes of interest were identified and became the primary case finding tool to examine which of these were work-related.

The National Traumatic Occupational Fatality (NTOF) database was established in 1984 by the National Institute for Occupational Safety and Health (NIOSH) and has collected information nationwide on occupational injury deaths of those aged 16 years or older from 1980 to the present. NIOSH buys death certificates from the state that indicate injuries which occurred at work. NIOSH supplied copies of death certificates for all cases included in NTOF files for Maryland during the years 1980–1986 which had the specified E-codes. In addition, any certificates with a blank or pending code were included, since E-codes were available on only 97% of all cases in the NTOF files.

Maryland Occupational Safety and Health (MOSH) Administration Law and Regulations apply to ‘all working conditions in all work places within the State of Maryland’ except employees of the Federal government, or workers who are protected under the Atomic Energy Act of 1954, the Federal Coal Mine Health and Safety Act of 1969, the Federal Metal and Nonmetallic Mine Safety Act, and the Longshoremen’s and Harbor Workers' Compensation Act. The regulations require employers to report any event that involves a fatality to one or more workers, or involves a catastrophe (the hospitalization of five or more workers). The report must be made within 48 hours of the event and can be made either orally or in writing to the Commissioner of Labor and Industry. Written logs and forms are maintained to document every fatality reported to the office.

The State of Maryland Workers' Compensation Law requires that employers file an ‘Employer’s First Report of Injury or Illness' within 48 hours of an event for all recordable injuries and illnesses. A copy of this report is maintained by the Division of Labor and Industry. It should be noted that this report is not a Workers' Compensation claim and does not indicate if a claim was made or compensation was awarded.

The Maryland OCME has jurisdiction over all sudden or unexpected deaths that occur in the State of Maryland, those due to injury, as well as those of an unusual or suspicious nature. The medical examiner is responsible for investigating these deaths and either completes the death certificate or approves certificates completed by other physicians. The OCME maintains a file on each case processed through the medical examiner system which normally contains extensive details on every injury death in the state (not just those work-related). For all deaths from vital statistics with the E-codes of interest the medical examiner records or death certificates completed by them were reviewed for evidence of work-relatedness (see below). All records from any sources were linked to the medical examiner record and death certificates.

Case definition
For the purposes of this study, a case was defined as any death occurring in the State of Maryland during the 7-year period 1 January 1980 to 31 December 1986 and meeting the following criteria: (a) Age 16 years or older, (Age 16 was chosen for comparability to NTOF.), (b) A fall from a height (ICD-9, E881– E884), a machinery death (E919), or an electrocution (E925) coded as the underlying cause of death by the state nosologist.

A determination that a case was work-related was made using one or more of the following criteria applied to case records from any source of data available on the case: (a) A positive response to the ‘Injury at Work?’ item on the death certificate; (b) Death occurred while performing work directly related to the occupation of the subject as determined from the context of the whole file, the specific event, and the circumstances surrounding the event (as described in medical examiner, workers' compensation, or MOSH files); or (c) Location of injury listed as ‘farm’, unless specifically noted that the death occurred in the course of distinctly non-work-related activity.

Once all potential cases had been identified, the case definition criteria were applied to determine if the potential case met our study criteria, including work-related criteria. For potential cases identified by MOSH and Workers' Compensation in which all other criteria were satisfied and the cause of death appeared to be within the scope of this study, the E-code assigned by Vital Statistics was obtained from photo copies of death certificates maintained by Vital Records. Copies of death certificates were also obtained for MOSH and Workers' Compensation cases in which the description of the event was too vague to allow evaluation. During the years in question, the state nosologist was writing the underlying cause of death code directly onto the certificate. Thus, the state-assigned E-codes on the death certificate served as the source of verification as to whether a potential case met the study criteria. Any of the potential cases that had an E-code outside the range of our case definition was excluded from further analysis. Thus, all evaluations of work-relatedness were confined to the cases with E-codes on the vital statistics file in our study range (E881–E884, E919, E925). These codes represent almost of a third of all work-related deaths nationally out of a possible 177 three-digit ICD E-codes.2

Capture-recapture analysis
Capture-recapture methods with log-linear models3–5 were applied to estimate the number of fatal occupational injuries which had occurred but were not identified by the sources. The analyses were stratified according to the cause of injury (fall from elevation, machinery and electrocution) and separately by the industry type (agriculture, manufacture, construction, transportation, public administration, finance, and unspecified). The goodness-of-fit of a model is measured by the deviance G2, and a confidence interval of the estimate is computed using the method suggested by Cormack.6

As with any multiple-regression-type model there are different criteria and strategies for finding the best model among the many available. With either three or four lists, the strategy adopted here is

Step 1: Fit the independence model.

Step 2: Fit the model with all pairwise interactions.

Step 3: Backward elimination—reduce the previous model sequentially by removing the least significant pairwise interaction, while some criterion is satisfied.

Step 4: If the final model in Step 3 includes any sets of all three pairwise interactions between three lists, add to that model the three-list interaction if it satisfies the same criterion.

The criterion used is statistical significance at the nominal 10% level (i.e. a change of 2.71 in the residual G2), but could be an information criterion, such as Akaike's (AIC), a change of 2 in G2, or any Bayesian version (Bayesian Information Criterion, BIC) (see for example Reference 7). The selection was confirmed later by fitting all 113 possible models.

Because all the criteria are based on asymptotic {chi}2 distributions, their relevance is open to question with small numbers. A referee has suggested that the profile likelihood interval based on the multinomial distribution may be too narrow. A Poisson-based interval8 is slightly wider. We do not usually use it because we know of no theoretical justification for it. The robustness of estimates, confidence intervals and G2 were all examined by simulation—technically a parametric bootstrap. From the estimated population size with the estimated parameters from the selected log-linear model, 100 multinomial samples were generated and the observable cells in each sample fitted by the selected model and also by the best-fitting more parsimonious model, precisely as the real data were.


    Results
 Top
 Abstract
 Introduction
 Methods
 Results
 Discussion
 References
 
A total of 527 injury deaths for ages 16 or older were identified from vital statistics in the range E881–E884, E919, E925, of which 237 (45%) met the work-related case criteria. Machinery injuries comprised the largest proportion of all work-related cases (48%). For all cases, the most complete source for determining those that were work-related was the medical examiner records. For five cases the deputy medical examiners had completed the death certificate and recorded ‘at work'. However, the original medical examiner records cannot be located in archives. Two of the cases (one each of E919 and E925) had been certified by deputy medical examiners in separate counties but there was no record of the case file being sent to the OCME office. Three cases were recorded in the medical examiner logbook but the original medical examiner files could not be retrieved from archives. Because these were certified by the medical examiner we considered these as identified by the medical examiner system. Thus the medical examiner system identified all work-related cases with none of the other sources adding cases. Workers' Compensation did not identify any new work-related cases from those reported by other sources. However MOSH identified two work-related cases not reported by death certificates; for one the death certificate was marked no for injury at work, and the other had been left blank.

To evaluate the ability of capture-recapture analyses to estimate the number of hidden individuals, the most complete and expensive source, medical examiner, was considered the gold standard and not included in the analyses. Thus the gold standard had 237 deaths, the minimum possible number in the population. For the remaining four sources, 215 fatal occupational injuries were identified and lists of deaths were available in a form in which individuals could be cross-identified between lists. With four lists there are 15 possible observable patterns of recording for any individual (Table 1Go). It should be noted that only three individuals were recorded on other lists but not by the death certificate. Table 2Go gives the number of deaths from each list according to the industrial type and cause of death.


View this table:
[in this window]
[in a new window]
 
Table 1 Injury cases by source of ascertainment, industry type and cause of accident
 

View this table:
[in this window]
[in a new window]
 
Table 2 Injury cases by combinations of industry type and cause of injury
 
All subjects
Log-linear models for capture-recapture were applied first to the four lists of all occupational injury deaths. Some of the 113 possible models are unidentifiable because of zero marginal totals. For example, since there are no individuals in Worker's Compensation (WC) but not in Death Certificate (DC), no three-list interaction involving WC, DC and another list is identifiable. Results for the best model with each of a given number of two-list interactions are shown in Table 3aGo, with maximum likelihood estimates and 95% profile likelihood in-tervals for the population size N. All model selection procedures make the same choice, having three two-list interactions (MOSH x WC, DC x WC and DC x NTOF) which gives the estimate of 218 (95% CI : 215–229) which is still less than the gold standard of 237. Simulations of data generated by this and the model without DC x NTOF confirm the choice of this model, and the estimate and confidence interval from it, despite the fact that residual deviance and deviance differences do not have the {chi}2 distributions indicated by asymptotic theory. However, except for the unstable analyses of models with so many terms that there are effectively zero divisors, all of the analyses seriously underestimate the gold standard, despite their apparent conventional precision.


View this table:
[in this window]
[in a new window]
 
Table 3a Log-linear model analysis with four listsa (Gold standard = 237)
 
Of individuals recorded on any of the four lists, only three were not identified as at work by death certificates, indicating that there is very little information on individuals missing from death certificate and therefore about those missing from all lists. Analyses of three lists, ignoring death certificate, may give more realistic, though less precise, estimates. Those 14 individuals recorded only in death certificate are now treated as missing, so that, by the gold standard, the hidden number in total is increased from 22 to 36. With three lists, the model with all two-factor interactions always fits the data perfectly, but casts doubt on the necessary assumption of a zero three-factor interaction. Table 3bGo presents the analyses for the best models with no, one, two and all interactions. The model with one two-factor interaction (MOSH x WC) gives an acceptable residual with estimated total number of occupational injury deaths as 221 (95% CI : 210–237), with the 95% CI just including the gold standard.


View this table:
[in this window]
[in a new window]
 
Table 3b Log-linear model analysis with three listsa
 
Stratification on industry type
Table 4Go gives the analyses of three lists, ignoring death certificate. In general, the models with one two-factor interaction, MOSH x WC, give reasonable fit across the strata of industry type, though not always that selected by AIC. All point estimates from this model underestimate the gold standard. In the industries like construction, manufacturing, transportation and not-specified, all known cases have death certificates, and the zero marginal total ensures that any analysis including death certificate estimates zero hidden cases.


View this table:
[in this window]
[in a new window]
 
Table 4 Log-linear model analysis with three listsa stratified by industry type
 
Stratification on cause of death
The data were also analysed according to the cause of death (fall from elevation, machinery and electrocution). The model with two two-factor interaction MOSH x WC and MOSH x NTOF fits Fall and Machinery strata, and the model with MOSH x WC fits the Electrocution stratum (Table 5Go). However, in the Electrocution stratum the interaction of MOSH x WC introduces indeterminacy with MOSH since all industries recorded in WC are also recorded in MOSH.


View this table:
[in this window]
[in a new window]
 
Table 5 Three-lista log-linear model analysis stratified on cause of death
 

    Discussion
 Top
 Abstract
 Introduction
 Methods
 Results
 Discussion
 References
 
This data set is typical of many in the literature from which estimates, said to be precise and important, have been derived by capture-recapture methodology. One of the unique aspects of the study is that, unlike others, a special study was able to generate a gold standard through review of every single death to determine work-relatedness. Such lists are not normally available and are expensive to construct. If the ‘expensive’ gold standard list had not been available such an estimate might have been derived from the four lists; very precise and very precisely wrong. Most states do not have a medical examiner system to allow such complete investigation of every death. However, the other sources are routinely available for every state.

These analyses show that capture-recapture procedures provide misleading inferences about the size of the hidden population, a minimum value for which is known in this example from the medical examiner records. Some of the problems arise from sparse data; no observations in certain cross-classifications. However, consideration of three lists shows that this is not the only problem. To obtain an interval estimate which covers the gold standard value it is necessary to use a more complex model. The interaction MOSH x WC is known to be essential. Adding either MOSH x NTOF or NTOF x WC results in a three-list estimate whose 95% interval does cover the gold standard. Of these MOSH x NTOF gives numerically a lower residual G2, but with a wider interval estimate (Table 3bGo). It must be emphasized that, without the gold standard, there is no statistical support for either of these models or the saturated model with all two-list interactions, in preference to the model with the single interaction MOSH x WC.

Could the problem have been diagnosed and overcome by alternative statistical analyses of the data? We consider the possibilities with the aggregated data: similar conclusions are reached from the stratified data, the smaller counts reducing further the power of a statistical procedure to reject a simple model.

  1. Separate analysis of the overlaps between each of the six pairs of list give estimates 150, 212, 213, 215, 216, 220. The estimate of 150 from MOSH and WC, much less than the observed total of 215, indicates the positive dependence between these lists. These two data sources are much more likely to capture the traditional industrial injuries occurring in organized large workplaces. However, they do not cover most agriculture. All other evidence points to a population of at most 220.
  2. Perhaps the model selection procedure for the four-list data could be improved. Table 3aGo shows the excellence of the fit of the selected model and the magnitude of the improvement of its fit over any simpler model show this to be highly unlikely. Parsimony has often been recommended as a desirable feature of model selection: not only do the data reject any such model, but also the resulting estimate is even further from the gold standard. Simulations confirm these conclusions even though the asymptotic distribution theory underlying model selection or model averaging procedures does not hold with so many small or zero counts.
  3. Model selection can be avoided by using a weighted mean of the point estimates from all 113 models.9 Since zero counts give rise, with some models, to infinite estimates, a small-sample correction must first be introduced,7 adding some constant to each observation. In this example the choice does not matter: no conclusions are at all changed by any addition which has been proposed. With the addition of 0.12510 the combined estimate is 217 (see point 5 below) with a 95% CI based formally on asymptotic normality, again a dubious assumption, since the lower limit is below the observed number: the upper limit remains well below the gold standard figure.
  4. Sparseness, and particularly the zero counts in Table 1Go, does cause problems. This is often caused11,12 by few overlaps and is then easily recognized. Here it is generated by the fact that nearly all recorded cases have a death certificate. The aim is to estimate the number of individuals missing from all lists, but there is very little information on individuals missing from DC. Paradoxically we must discard the list with the most complete coverage. Pairwise analysis among the three lists is the same as before. No model selection procedure (Table 3) would give other than the model with the single interaction term between MOSH and WC. This extremely good fit still underestimates the known minimum population size, which falls just outside the 95% interval estimate.
  5. Small-sample corrections7 always reduce the estimate, making underestimation of the gold-standard value worse.

Because of the form of the model, the lists MOSH and Worker's Compensation can be amalgamated, and the estimate follows from a two-list estimate between the joint list and NTOF, relying on list NTOF having been shown in this instance to be independent of the other two lists. This shows that list NTOF is needed for a valid estimate of the missing number to be obtained, contradicting the hope that the study would indicate a reliable estimate which did not use the National NTOF database.

Similar conclusions arise from analysis of the data by strata (Tables 4 and 5GoGo), although in some strata the smaller numbers generate more zero observations, with consequent indeterminacy in some models. The stratification does cast additional doubts on any inference from the aggregate data. Although all strata are acceptably represented by the model with the single interaction MOSH x WC, its estimate is different in different industries (G2 = 14.4 with 5 d.f.).

Our conclusion is in line with Hook and Regal's7 finding, from simulations, that ‘use of the saturated model is in general optimal’, unless ‘paradoxically’ ‘an information criterion suggests that the saturated model is optimal’ when no estimate can be recommended.5 As they say, ‘any theoretical preferences for parsimony in model selection appear to be outweighed by considerations of validity’. Individuals in different industries are not listed with the same probabilities by different sources. Capture-recapture methodology has to make an untestable act of faith that in some respect missing individuals resemble listed ones. Clearly in this example they do not. This property is likely to be widespread in many other multiple list databases. Thus both precision and accuracy of estimates of population size will be grossly overstated. It may be that the value of multiple lists lies less in the provision of a numerical estimate of population size, more in identifying when substantial undercount exists, and prompting investigation of possible causes.


    Acknowledgments
 
This research was supported by a CDC grant R49/CCR310285– 01A1, directed by Jeffrey H Coben, MD from September 1995 to May 1997.


    References
 Top
 Abstract
 Introduction
 Methods
 Results
 Discussion
 References
 
1 Hook EB, Regal RR. Capture-recapture methods in epidemiology: methods and limitations. Epidemiol Rev 1995;17:243–64.[ISI][Medline]

2 National Institute for Occupational Safety and Health. Fatal Injuries to Workers in the United States, 1980–1989: A Decade of Surveillance. Centers for Disease Control and Prevention, NIOSH, Cincinnati, OH, 1993 (DHHS-NIOSH, No 93–108).

3 Fienberg SE. The multiple recapture census for closed populations and incomplete 2k contingency tables. Biometrika 1972;59:591–603.[ISI]

4 Cormack RM. Log-linear models for capture-recapture. Biometrics 1989;45:395–413.[ISI]

5 International working group for disease monitoring and forecasting. Capture-recapture and multiple-record systems estimation I: History and theoretical development. Am J Epidemiol 1995;142:1047–58.[Abstract]

6 Cormack RM. Interval estimation for mark-recapture studies of closed populations. Biometrics 1992;48:567–76.[ISI][Medline]

7 Regal RR, Hook EB. Validity of methods for model selection, weighting for model uncertainty and small sample adjustment in capture-recapture estimation. Am J Epidemiol 1997;145:1138–44.[Abstract]

8 Regal RR, Hook EB. Goodness-of-fit based confidence intervals for estimates of the size of a closed population. Stat Med 1984;3:287–91.[ISI][Medline]

9 Draper D. Assessment and propagation of model uncertainty (with discussion). J R Statist Soc (Ser B) 1995;57:45–70.

10 Evans MA, Bonett DG. Bias reduction for multiple-recapture estimators of closed population size. Biometrics 1994;50:388–95.[ISI][Medline]

11 McGilchrist CA, McDonnell LF, Jorm LR, Patel MS. Loglinear models using capture-recapture methods to estimate the size of a measles epidemic. J Clin Epidemiol 1996;49:293–96.[ISI][Medline]

12 Cormack RM. Problems with using capture-recapture in epidemiology: an example of a measles epidemic (with discussion). J Clin Epidemiol 1999;52:909–14.[ISI][Medline]