RE: "DETECTING PATTERNS OF OCCUPATIONAL ILLNESS CLUSTERING WITH ALTERNATING LOGISTIC REGRESSIONS APPLIED TO LONGITUDINAL DATA"

Basile Chaix1, Georgiy Bobashev2, Juan Merlo3 and Pierre Chauvin1

1 Research Unit in Epidemiology and Information Sciences, National Institute of Health and Medical Research (INSERM U444), 75012 Paris, France
2 Statistics Research Division, RTI International, Research Triangle Park, NC 27709-2194
3 Department of Community Medicine, Malmö University Hospital, Faculty of Medicine, Lund University, S-205 02 Malmö, Sweden

We read with interest the recently published article by Preisser et al. (1) on detecting patterns of occupational illness clustering with alternating logistic regressions. When investigations of variations in health or health-related behavior between areas are conducted, it is highly relevant to measure the extent to which phenomena occur in clusters. Furthermore, doing so is useful to determine whether area-level variations can be explained by a given set of individual- and area-level factors (2). Preisser et al. (1) discussed the relative strengths of alternating logistic regression and multilevel logistic models for measuring clustering of events. Indeed, it is important to compare the statistical consistency and interpretability of the different model-based indexes of clustering to determine which should be preferred in contextual analyses.

It is widely known that problems of statistical consistency exist to define the intraclass correlation coefficient for multilevel logistic models, because the variance and the mean are linked, and the area-level variance is measured on the logistic scale. Several methods, including simulation methods, have been suggested to compute the intraclass correlation coefficient, leading to results that are in reasonable agreement (3, 4). However, the main problem is that, for a given level of variability between areas in predicted probabilities, the intraclass correlation coefficient depends on the prevalence of the phenomenon in the sample. Conversely, the alternating logistic regression pairwise odds ratio is not related to prevalence, which is a clear advantage for this index over the intraclass correlation coefficient.

Besides indicator consistency, the main argument put forward by Preisser et al. in favor of the alternating logistic regression model and its pairwise odds ratio focuses on the convenience of interpreting clustering in the well-known odds ratio scale. First, it should not be overlooked that Larsen et al. (5) proposed an index of clustering in the form of an odds ratio for multilevel logistic models, the median odds ratio, defined as the median value of the odds ratio between the area at highest risk and the area at lowest risk when randomly picking out two areas. Second, we hold that, for some analysis cases, the intraclass correlation coefficient would be more informative than an index in the form of an odds ratio. For example, the main objective in contextual analyses (6) is often to determine whether public health policies should target only individuals at risk or also areas at risk. In this case, apportioning the variance between these levels with the intraclass correlation coefficient for obtaining information on the relative weight of the variations at each level is more relevant than using the pairwise odds ratio or the median odds ratio. Indeed, even if the pairwise odds ratio and the median odds ratio increase with the magnitude of the area-level variations, they do not allow a direct comparison of the magnitude of the variations at the area level with their magnitude at the individual level.

Therefore, despite problems with the intraclass correlation coefficient in logistic models, it may be useful for some analysis cases to compute this index to obtain information on the relative weight of the variations at each level. However, the intraclass correlation coefficient should be cautiously interpreted. Simulation studies would be relevant for assessing the extent to which each definition of the intraclass correlation coefficient is related to prevalence. Conversely, when authors need only general information on the magnitude of clustering, using the pairwise odds ratio is recommended. This way, the comparison of the magnitude of clustering between different phenomena is not biased by differences in prevalence level.

Therefore, aside from the technical differences between the multilevel and alternating logistic regression models underlined by Preisser et al., authors could also consider that the intraclass correlation coefficient, the median odds ratio, and the pairwise odds ratio provide information on clustering under different forms that may be more or less informative, depending on the research question and hypotheses being investigated.

REFERENCES

REFERENCES

  1. Preisser JS, Arcury TA, Quandt SA. Detecting patterns of occupational illness clustering with alternating logistic regressions applied to longitudinal data. Am J Epidemiol 2003;158:495–501.[Abstract/Free Full Text]
  2. Merlo J. Multilevel analytical approaches in social epidemiology: measures of health variation compared with traditional measures of association. J Epidemiol Community Health 2003;57:550–2.[Free Full Text]
  3. Snijders T, Bosker R. Multilevel analysis. An introduction to basic and advanced multilevel modelling. London, United Kingdom: Sage Publications, 1999.
  4. Goldstein H, Browne W, Rasbash J. Partitioning variation in multilevel models. London, United Kingdom: Institute of Education, 2002. (http://www.mlwin.com/hgpersonal/ Variance-partitioning.pdf).
  5. Larsen K, Petersen JH, Budtz-Jorgensen E, et al. Interpreting parameters in the logistic regression model with random effects. Biometrics 2000;56:909–14.[ISI][Medline]
  6. Merlo J, Lynch JW, Yang M, et al. Effect of neighborhood social participation on individual use of hormone replacement therapy and antihypertensive medication: a multilevel analysis. Am J Epidemiol 2003;157:774–83.[Abstract/Free Full Text]