RE: "HOW MANY FOODBORNE OUTBREAKS OF SALMONELLA INFECTION OCCURRED IN FRANCE IN 1995? APPLICATION OF THE CAPTURE-RECAPTURE METHOD TO THREE SURVEILLANCE SYSTEMS"

Dankmar A. Böhning, Dr., Prof.

Institute for Social Medicine, Epidemiology and Health Economics, Charité Medical School, Fabeckstrasse 60–62, Berlin 14195, Germany

In a recent paper, Gallay et al. (1Go) used three-sources capture-recapture modeling to estimate the number of foodborne outbreaks of Salmonella infection that had occurred in France during the year 1995. The data provided in the article were used in a course on capture-recapture methods given in March 2004 for the Faculty of Public Health at Mahidol University in Bangkok, Thailand. The purpose of this letter is twofold: 1) to discuss some inconsistencies in the way the capture-recapture data were presented by Gallay et al. (1Go), leading to potentially very different analyses and conclusions, and 2) to argue for presenting capture-recapture data as completely as possible (for k sources, it should be a 2k table—with one missing cell) to avoid the occurrence of misunderstandings such as the one outlined below.

The analysis in the article by Gallay et al. (1Go) was based on three French surveillance systems: the National Public Health Network (NPHN), the Ministry of Agriculture (MA), and the National Salmonella and Shigella Reference Center (NRC). A complete table describing the available information would be as provided in table 1.


View this table:
[in this window]
[in a new window]
 
TABLE 1. Identification of Salmonella outbreaks in France in 1995, according to the three sources of data used by Gallay et al. (1Go)

 
Table 1 shows the multiple identifications of Salmonella outbreaks in the most complete form. n1 are the outbreaks identified by all three sources, n2 are the outbreaks identified by the NPHN and MA only, n3 are the outbreaks identified by the NPHN and NRC only, etc. n8 represents the outbreaks identified by none of the three sources and is the variable for missing information in the table. For the Mahidol University course, the table had to be constructed from the information provided by Gallay et al. in the text (1Go). According to Gallay et al.'s table 2 (1Go, p. 173), we have n1 + n2 = 30, n1 + n3 = 59, and n1 + n5 = 39. This is the number of outbreaks that could be matched by two sources. Consequently, 3n1 + n2 + n3 + n5 = 128. On page 173 of Gallay et al.'s paper (1Go), as well as in the abstract, it is reported that 108 was the number of matches of any kind. This leads to n1 + n2 + n3 + n5 = 108, since n1, n2, n3, and n5 are the frequencies of all kinds of matches. Subtracting this equation from the previous one leads to 3n1 + n2 + n3 + n5 – (n1 + n2 + n3 + n5) = 128 – 108 = 20, or 2n1 = 20, or n1 = 10. It is now easy to construct n2 = 20, n3 = 49, and n5 = 29. Finally, n4 = 35 is found from the marginal NPHN count provided in Gallay et al.'s (1Go) table 2—namely, n1 + n2 + n3 + n4 = 114 (and similarly for n6 (n1 + n2 + n5 + n6 = 73) and n7 (n1 + n3 + n5 + n7 = 529)); the resulting frequencies are given as the first entries in the brackets in column 4 of table 1. One group of students in the Mahidol University course followed this route and derived the results given in table 2.


View this table:
[in this window]
[in a new window]
 
TABLE 2. The three "best" log-linear models fitted to three sources of data on foodborne Salmonella outbreaks and their estimates of the total number of outbreaks occurring in France during 1995, using the first entries in the brackets of frequency data presented in column 4 of table 1

 
The models in table 2 are selected as follows: The first one corresponds to the best choice according to the Akaike Information Criterion; the second is the best choice according to the Bayesian Information Criterion; and the third is the second-best with respect to both criteria. Apparently, the associated estimates for the missing cell are substantially different from the ones given in the article by Gallay et al. (1Go) using identical models.

However, there is another way to construct the frequency information for column 4 in table 1. It is also reported on page 173 of Gallay et al.'s paper (1Go) that 20 was the number of matches obtained from all three sources; in other words, n1 = 20, and since n1 + n2 = 30, n2 = 10, and similarly, n3 = 39 and n5 = 19. Using the remaining information, the second entry in brackets in column 4 of table 1 can be constructed. Using this frequency column, results identical to those provided by Gallay et al. (1Go) could be achieved. This indicates that the second analysis is likely to correspond to the true data constellation and that the total number of matches of any kind given by Gallay et al. (1Go), 108, is incorrect and needs to be replaced by 88.

This analysis shows that simple errors in reported data can result in very different analyses and substantially different conclusions (here, underestimation of underreporting). This is particularly true for capture-recapture frequency data, since the log-linear modeling used is very sensitive to the observed frequencies. It also shows that it is preferable to provide the complete capture-recapture table (such as table 1) so that simple errors like the one reported above can be avoided by allowing for cross-checking.


    ACKNOWLEDGMENTS
 
Conflict of interest: none declared.


    References
 TOP
 References
 

  1. Gallay A, Vaillant V, Bouvet P, et al. How many foodborne outbreaks of Salmonella infection occurred in France in 1995? Application of the capture-recapture method to three surveillance systems. Am J Epidemiol 2000;152:171–7.[Abstract/Free Full Text]