a Mathematical Institute, School of Mathematical and Computational Sciences, University of St Andrews, Scotland.
b Department of Epidemiology, Graduate School of Public Health, University of Pittsburgh, USA.
c Center for Injury Research and Policy, School of Public Health, Johns Hopkins University, USA.
![]() |
Abstract |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Method Capture-recapture methods were applied to estimate the ability of four separate data sources on occupational fatalities to predict the 237 deaths (gold standard) we determined from a special in-depth study of medical examiner records.
Results and Conclusion Capture-recapture results based upon the four sources vary according to different models. However, both separately and in aggregate of industry type and cause of death, most models seriously underestimate the gold standard, and give a misleading impression of precision of the estimate of hidden individuals. It is commonly believed that reliable estimates from such methods require lists with high coverage and parsimonious models. Here, to obtain an estimate consistent with the gold standard, the list with almost complete coverage must be discarded and a complex model fitted. It is argued that this conclusion is of widespread application.
Keywords Occupational injury, capture-recapture, interactions, log-linear model
Accepted 7 June 2000
![]() |
Introduction |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
![]() |
Methods |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
The Health Statistics Office within the Maryland Department of Health and Mental Hygiene maintains data for all deaths that occur in the State of Maryland and also for all deaths that occur to Maryland residents. The injury at work item, demographic variables, and the cause of death data are coded from the death certificate by trained nosologists. All deaths from the E-codes of interest were identified and became the primary case finding tool to examine which of these were work-related.
The National Traumatic Occupational Fatality (NTOF) database was established in 1984 by the National Institute for Occupational Safety and Health (NIOSH) and has collected information nationwide on occupational injury deaths of those aged 16 years or older from 1980 to the present. NIOSH buys death certificates from the state that indicate injuries which occurred at work. NIOSH supplied copies of death certificates for all cases included in NTOF files for Maryland during the years 19801986 which had the specified E-codes. In addition, any certificates with a blank or pending code were included, since E-codes were available on only 97% of all cases in the NTOF files.
Maryland Occupational Safety and Health (MOSH) Administration Law and Regulations apply to all working conditions in all work places within the State of Maryland except employees of the Federal government, or workers who are protected under the Atomic Energy Act of 1954, the Federal Coal Mine Health and Safety Act of 1969, the Federal Metal and Nonmetallic Mine Safety Act, and the Longshoremens and Harbor Workers' Compensation Act. The regulations require employers to report any event that involves a fatality to one or more workers, or involves a catastrophe (the hospitalization of five or more workers). The report must be made within 48 hours of the event and can be made either orally or in writing to the Commissioner of Labor and Industry. Written logs and forms are maintained to document every fatality reported to the office.
The State of Maryland Workers' Compensation Law requires that employers file an Employers First Report of Injury or Illness' within 48 hours of an event for all recordable injuries and illnesses. A copy of this report is maintained by the Division of Labor and Industry. It should be noted that this report is not a Workers' Compensation claim and does not indicate if a claim was made or compensation was awarded.
The Maryland OCME has jurisdiction over all sudden or unexpected deaths that occur in the State of Maryland, those due to injury, as well as those of an unusual or suspicious nature. The medical examiner is responsible for investigating these deaths and either completes the death certificate or approves certificates completed by other physicians. The OCME maintains a file on each case processed through the medical examiner system which normally contains extensive details on every injury death in the state (not just those work-related). For all deaths from vital statistics with the E-codes of interest the medical examiner records or death certificates completed by them were reviewed for evidence of work-relatedness (see below). All records from any sources were linked to the medical examiner record and death certificates.
Case definition
For the purposes of this study, a case was defined as any death occurring in the State of Maryland during the 7-year period 1 January 1980 to 31 December 1986 and meeting the following criteria: (a) Age 16 years or older, (Age 16 was chosen for comparability to NTOF.), (b) A fall from a height (ICD-9, E881 E884), a machinery death (E919), or an electrocution (E925) coded as the underlying cause of death by the state nosologist.
A determination that a case was work-related was made using one or more of the following criteria applied to case records from any source of data available on the case: (a) A positive response to the Injury at Work? item on the death certificate; (b) Death occurred while performing work directly related to the occupation of the subject as determined from the context of the whole file, the specific event, and the circumstances surrounding the event (as described in medical examiner, workers' compensation, or MOSH files); or (c) Location of injury listed as farm, unless specifically noted that the death occurred in the course of distinctly non-work-related activity.
Once all potential cases had been identified, the case definition criteria were applied to determine if the potential case met our study criteria, including work-related criteria. For potential cases identified by MOSH and Workers' Compensation in which all other criteria were satisfied and the cause of death appeared to be within the scope of this study, the E-code assigned by Vital Statistics was obtained from photo copies of death certificates maintained by Vital Records. Copies of death certificates were also obtained for MOSH and Workers' Compensation cases in which the description of the event was too vague to allow evaluation. During the years in question, the state nosologist was writing the underlying cause of death code directly onto the certificate. Thus, the state-assigned E-codes on the death certificate served as the source of verification as to whether a potential case met the study criteria. Any of the potential cases that had an E-code outside the range of our case definition was excluded from further analysis. Thus, all evaluations of work-relatedness were confined to the cases with E-codes on the vital statistics file in our study range (E881E884, E919, E925). These codes represent almost of a third of all work-related deaths nationally out of a possible 177 three-digit ICD E-codes.2
Capture-recapture analysis
Capture-recapture methods with log-linear models35 were applied to estimate the number of fatal occupational injuries which had occurred but were not identified by the sources. The analyses were stratified according to the cause of injury (fall from elevation, machinery and electrocution) and separately by the industry type (agriculture, manufacture, construction, transportation, public administration, finance, and unspecified). The goodness-of-fit of a model is measured by the deviance G2, and a confidence interval of the estimate is computed using the method suggested by Cormack.6
As with any multiple-regression-type model there are different criteria and strategies for finding the best model among the many available. With either three or four lists, the strategy adopted here is
Step 1: Fit the independence model.
Step 2: Fit the model with all pairwise interactions.
Step 3: Backward eliminationreduce the previous model sequentially by removing the least significant pairwise interaction, while some criterion is satisfied.
Step 4: If the final model in Step 3 includes any sets of all three pairwise interactions between three lists, add to that model the three-list interaction if it satisfies the same criterion.
The criterion used is statistical significance at the nominal 10% level (i.e. a change of 2.71 in the residual G2), but could be an information criterion, such as Akaike's (AIC), a change of 2 in G2, or any Bayesian version (Bayesian Information Criterion, BIC) (see for example Reference 7). The selection was confirmed later by fitting all 113 possible models.
Because all the criteria are based on asymptotic 2 distributions, their relevance is open to question with small numbers. A referee has suggested that the profile likelihood interval based on the multinomial distribution may be too narrow. A Poisson-based interval8 is slightly wider. We do not usually use it because we know of no theoretical justification for it. The robustness of estimates, confidence intervals and G2 were all examined by simulationtechnically a parametric bootstrap. From the estimated population size with the estimated parameters from the selected log-linear model, 100 multinomial samples were generated and the observable cells in each sample fitted by the selected model and also by the best-fitting more parsimonious model, precisely as the real data were.
![]() |
Results |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
To evaluate the ability of capture-recapture analyses to estimate the number of hidden individuals, the most complete and expensive source, medical examiner, was considered the gold standard and not included in the analyses. Thus the gold standard had 237 deaths, the minimum possible number in the population. For the remaining four sources, 215 fatal occupational injuries were identified and lists of deaths were available in a form in which individuals could be cross-identified between lists. With four lists there are 15 possible observable patterns of recording for any individual (Table 1). It should be noted that only three individuals were recorded on other lists but not by the death certificate. Table 2
gives the number of deaths from each list according to the industrial type and cause of death.
|
|
|
|
|
|
![]() |
Discussion |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
These analyses show that capture-recapture procedures provide misleading inferences about the size of the hidden population, a minimum value for which is known in this example from the medical examiner records. Some of the problems arise from sparse data; no observations in certain cross-classifications. However, consideration of three lists shows that this is not the only problem. To obtain an interval estimate which covers the gold standard value it is necessary to use a more complex model. The interaction MOSH x WC is known to be essential. Adding either MOSH x NTOF or NTOF x WC results in a three-list estimate whose 95% interval does cover the gold standard. Of these MOSH x NTOF gives numerically a lower residual G2, but with a wider interval estimate (Table 3b). It must be emphasized that, without the gold standard, there is no statistical support for either of these models or the saturated model with all two-list interactions, in preference to the model with the single interaction MOSH x WC.
Could the problem have been diagnosed and overcome by alternative statistical analyses of the data? We consider the possibilities with the aggregated data: similar conclusions are reached from the stratified data, the smaller counts reducing further the power of a statistical procedure to reject a simple model.
Because of the form of the model, the lists MOSH and Worker's Compensation can be amalgamated, and the estimate follows from a two-list estimate between the joint list and NTOF, relying on list NTOF having been shown in this instance to be independent of the other two lists. This shows that list NTOF is needed for a valid estimate of the missing number to be obtained, contradicting the hope that the study would indicate a reliable estimate which did not use the National NTOF database.
Similar conclusions arise from analysis of the data by strata (Tables 4 and 5), although in some strata the smaller numbers generate more zero observations, with consequent indeterminacy in some models. The stratification does cast additional doubts on any inference from the aggregate data. Although all strata are acceptably represented by the model with the single interaction MOSH x WC, its estimate is different in different industries (G2 = 14.4 with 5 d.f.).
Our conclusion is in line with Hook and Regal's7 finding, from simulations, that use of the saturated model is in general optimal, unless paradoxically an information criterion suggests that the saturated model is optimal when no estimate can be recommended.5 As they say, any theoretical preferences for parsimony in model selection appear to be outweighed by considerations of validity. Individuals in different industries are not listed with the same probabilities by different sources. Capture-recapture methodology has to make an untestable act of faith that in some respect missing individuals resemble listed ones. Clearly in this example they do not. This property is likely to be widespread in many other multiple list databases. Thus both precision and accuracy of estimates of population size will be grossly overstated. It may be that the value of multiple lists lies less in the provision of a numerical estimate of population size, more in identifying when substantial undercount exists, and prompting investigation of possible causes.
![]() |
Acknowledgments |
---|
![]() |
References |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
2 National Institute for Occupational Safety and Health. Fatal Injuries to Workers in the United States, 19801989: A Decade of Surveillance. Centers for Disease Control and Prevention, NIOSH, Cincinnati, OH, 1993 (DHHS-NIOSH, No 93108).
3 Fienberg SE. The multiple recapture census for closed populations and incomplete 2k contingency tables. Biometrika 1972;59:591603.[ISI]
4 Cormack RM. Log-linear models for capture-recapture. Biometrics 1989;45:395413.[ISI]
5 International working group for disease monitoring and forecasting. Capture-recapture and multiple-record systems estimation I: History and theoretical development. Am J Epidemiol 1995;142:104758.[Abstract]
6 Cormack RM. Interval estimation for mark-recapture studies of closed populations. Biometrics 1992;48:56776.[ISI][Medline]
7 Regal RR, Hook EB. Validity of methods for model selection, weighting for model uncertainty and small sample adjustment in capture-recapture estimation. Am J Epidemiol 1997;145:113844.[Abstract]
8 Regal RR, Hook EB. Goodness-of-fit based confidence intervals for estimates of the size of a closed population. Stat Med 1984;3:28791.[ISI][Medline]
9 Draper D. Assessment and propagation of model uncertainty (with discussion). J R Statist Soc (Ser B) 1995;57:4570.
10 Evans MA, Bonett DG. Bias reduction for multiple-recapture estimators of closed population size. Biometrics 1994;50:38895.[ISI][Medline]
11 McGilchrist CA, McDonnell LF, Jorm LR, Patel MS. Loglinear models using capture-recapture methods to estimate the size of a measles epidemic. J Clin Epidemiol 1996;49:29396.[ISI][Medline]
12 Cormack RM. Problems with using capture-recapture in epidemiology: an example of a measles epidemic (with discussion). J Clin Epidemiol 1999;52:90914.[ISI][Medline]