1 Epidemiology Discipline, University of Texas-Houston School of Public Health, El Paso, TX 79902
2 Department of Epidemiology, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD 21205-2179
Trevejo et al. (1) recently described the epidemiology of salmonellosis in California. They calculated confidence intervals for population incidence and hospitalization rates. They state, "Confidence intervals (95 percent) for proportions were calculated for each group-specific rate; differences were considered significant if the confidence intervals did not overlap" (1, p. 49). The authors are not alone in their use of confidence interval overlap to determine statistical significance (2). Indeed, Schenker and Gentleman (3) recently reported finding more than 60 examples of this practice in 22 health science journals.
The use of a confidence interval to determine statistical significance obviates a main intent of the interval, which is to convey precision (4). Furthermore, judging statistical significance by the overlap of two confidence intervals provides a valid but underpowered test of the hypothesis of no difference (3, 5). Specifically, "[r]ejection of the null hypothesis by the method of examining overlap implies rejection by the standard method, whereas failure to reject by the method of examining overlap does not imply failure to reject by the standard method" (3, p. 182). Therefore, there are situations in which confidence intervals overlap but the difference or ratio of the two results is indeed statistically significant. For example, in a limited Monte Carlo simulation, Cole and Blair (5) showed that a direct test of the difference in two proportions (i.e., 0.2 vs. 0.4) with a sample of 200 subjects allocated equally to the two groups had 89 percent power, while the test of overlap had only 66 percent power. Moreover, Schenker and Gentleman (3) provide a proof that the overlap test will always have more variability and less power than a direct test.
In table 2 of their paper, Trevejo et al. (1) report that the hospitalization rate for persons 4064 years of age was 2.7 per 100,000 person-years (95 percent confidence interval (CI): 2.4, 3.1) and that the same rate in persons 517 years of age was 2.1 per 100,000 person-years (95 percent CI: 1.7, 2.5). These confidence intervals overlap. However, the rate ratio is 1.3 (95 percent CI: 1.2, 1.4), suggesting that there is a statistically significant relative difference between these two rates. There are other such examples to be found in table 2. However, the authors conclude on page 51 that there are no statistically significant differences among hospitalization rates in persons 517, 1839, and 4064 years of age.
The rate ratio described above may or may not represent a practically significant difference. However, the authors are incorrect in their assertion that there is not a statistically significant difference. Investigators caught in the practice of statistical significance testing and choosing to use the overlap of confidence intervals as their method must realize that overlap does not always convey nonsignificance. In closing, we reassert our prior statement that the use of confidence intervals to determine statistical significance defeats a main purpose of the interval, which is to convey a sense of the precision of the effect estimate (4).
REFERENCES