Commentary: Robust estimation of population parameters with sparse data

B Rachet

Department of Epidemiology and Population Health, London School of Hygiene and Tropical Medicine, London, UK

Correspondence: Bernard Rachet, Non-Communicable Disease Epidemiology Unit, Department of Epidemiology and Population Health, London School of Hygiene and Tropical Medicine, Keppel Street, London WC1E 7HT, UK. E-mail: bernard.rachet{at}lshtm.ac.uk

Assunção and Castro1 present a Bayesian approach based on Markov chain Monte Carlo (MCMC) methodology designed to estimate cancer incidence rates simultaneously for a number of cancers. The methodology is appropriate for providing reliable estimates when the available data are geographically and temporally sparse.

Health authorities have demanded incidence, mortality, and survival rates and other economic, socio-demographic, and health-related parameters for many years. Knowledge of geographical and temporal incidence and survival data at local level is naturally of interest to health authorities, especially if covariates such as socioeconomic status can be included in the analyses. Such statistics may allow authorities to evaluate health care system performance, anticipate shortages or crisis situations, and as a consequence, adjust their policies. Such data may also help construct or confirm research hypotheses, such as the presence of new latent risk factors for lung cancer.2 In both situations—the first related to public health, the second to aetiological epidemiology—the aim is to obtain parameter estimates that are as accurate as possible. However, even though the accuracy of data collection at population level has dramatically improved during the last century, there are many situations where either the accuracy of the population data or the reliability of parameters derived from accurate data cannot be readily assured.

Many countries, mostly the less-developed countries, have only recently begun collecting population data. In some cases, data are collected during occasional surveys. Even when registration procedures have taken place, completeness still varies widely.3 In summary, we are often still faced with spatially and temporally sparse data.

Health authorities and epidemiologists increasingly often want to analyse temporo-spatial variations in parameters at small geographical level or within specific sub-groups of the population defined by socioeconomic status or ethnicity. Even in countries with experience in collecting population data and where completeness is high, the number of events in a cell of a multi-dimensional matrix may be small and subject to random instability, leading to unreliable estimates.

For the last 15 years, much research has been devoted to the accurate estimation of such parameters. The most promising, and most recent, have been based on Bayesian methods, particularly those using Markov field methodology. MCMC sampling has greatly improved the possibilities of analysing data on a wide range of complex problems in order to make inference and prediction about the causes or outcomes of diseases. The parameters estimated within these complex frameworks are more accurate than those based on classical approaches. Most of the development has been made in the study of spatial and spatio-temporal variations of parameters for a single disease. Nevertheless, many diseases share common risk factors, and it is likely that parameters such as incidence rates may present a similar pattern. One of the most obvious examples is the association between tobacco smoking and numerous diseases. A few studies have shown the advantage of bivariate approaches (information from two diseases) over univariate ones. Assunção and Castro show here the feasibility of extension to more than two diseases, and the superiority of such an approach over univariate ones.

The proposed model has several appealing features. The main merit of their approach, which in the multivariate case, is new, at least in the biostatistics domain, is that it appears easily applicable and to offer numerous potential insights.

It is natural to speculate that using information from several diseases is likely to improve the estimation of parameters. Although based on a different approach, Longford4 showed that additional components hardly increased the shrinkage effect in comparison with bivariate models. This deserves further investigation. Indeed, assessing the efficiency of a model should not be limited to the shrinkage effect and must be explored more deeply.5

A very promising and useful extension of the method, mentioned by the authors, is the ability to incorporate covariate information. One domain where this could be applied is the study of ecological correlations between a range of pathologies and poorly defined factors such as health care systems, lifestyle factors, deprivation, and environmental pollution. Another suitable extension would be to take into account a temporal component, although this was not necessary in the data analysed by the authors. Indeed, even in the field of cancer, a time trend may reflect either the latency between an exposure and the event, or the effect of screening, or a new treatment. In particular, the authors' method is a useful tool for looking at temporal differences in incidence at the small geographical level. In some cases, the inclusion of time could improve the correlations, which would enhance the effectiveness of the shrinkage.

The MCMC methodology is still under development and, although increasing in popularity, remains an expert domain. In practice, the choice of method is based on its applicability and the availability of software rather than on its own merits. Other approaches, such as those based on generalized linear mixed models,6 allow a shrinkage effect on estimates by borrowing outside information. These models can be easily fitted using mainstream statistical software. Thanks to its simplicity and the development of WinBUGS software,7 the model proposed here constitutes an important and valid alternative to generalized linear mixed models. The respective advantages of various approaches using multivariate outside information should be studied in depth. In particular, the impact of different spatial and time structures should be considered.

Assunção and Castro have shown that their multivariate Bayesian approach significantly improves the shrinkage effect and the fit of the data, compared with a univariate approach. Including additional covariates may provide even more insights. Furthermore, enriching the model with a time component could improve the shrinkage of estimated parameters. Both public health and epidemiology face very complex challenges and may sometimes seem to have reached their limit, both in aetiological research, and in the prevention of many pathologies. Within this context, we are convinced that the method proposed here provides a significant contribution, particularly because it is easy to perform.


    References
 Top
 References
 
1 Assunção RM, Castro MSM. Multiple cancer sites incidence rates estimation using a multivariate Bayesian model. Int J Epidemiol 2004;33:508–16.[Abstract/Free Full Text]

2 Knorr-Held L. Bayesian modelling of inseparable space-time variation in disease risk. Statist Med 2000;19:2555–67.[CrossRef][ISI]

3 Parkin DM, Wabinga H, Nambooze S. Completeness in an African cancer registry. Cancer Causes Control 2001;12:147–52.[CrossRef][ISI][Medline]

4 Longford NT. Multivariate shrinkage estimation of small area means and proportions. J R Statist Soc A 1999;162:227–45.[CrossRef][ISI]

5 Spiegelhalter DJ, Best NG, Carlin BP, van der Linde A. Bayesian measures of model complexity and fit. J R Statist Soc B 2002;64:583–639.[CrossRef][ISI]

6 Breslow NE, Clayton DG. Approximate inference in generalized linear mixed models. J Am Statist Assoc 1993;88:9–25.[ISI]

7 Spiegelhalter D, Thomas A, Best N, Lunn D. WinBUGS: Bayesian inference using Gibbs sampling. Version 1.4. Cambridge: MRC Biostatistics Unit, 2003. Available at http://www.mrc-bsu.cam.ac.uk/bugs/





This Article
Extract
Full Text (PDF)
All Versions of this Article:
33/3/516    most recent
dyh178v1
Alert me when this article is cited
Alert me if a correction is posted
Services
Email this article to a friend
Similar articles in this journal
Similar articles in ISI Web of Science
Similar articles in PubMed
Alert me to new issues of the journal
Add to My Personal Archive
Download to citation manager
Request Permissions
Google Scholar
Articles by Rachet, B
PubMed
PubMed Citation
Articles by Rachet, B