Commentary: Toward systematic reviews in epidemiology

Michael B Bracken

Professor Michael Bracken, Department of Epidemiology and Public Health, Yale University School of Medicine, 60 College Street, PO Box 208034, New Haven, CT 06520–8034, USA. E-mail: michael.bracken{at}yale.edu

One does not need to agree with the premise of Swaen et al.,1 who examine the issue of false positive outcomes in this issue of the International Journal of Epidemiology, or have to ignore some methodological weakness in their study, or even to think their conclusions simply reaffirm some very basic scientific principles, to see considerable merit in the approach taken by these investigators and to believe their paper commands attention.

Taubes' paper,2 which highlighted the discrepancies between different study results that arise in epidemiology and the effect this has on public opinion, was famously read by some as predicting the imminent demise of epidemiology but it also permitted a broader examination of the state of the discipline.3 To be sure, epidemiology produces conflicting results but so does any research enterprise. It is only because the public has such a keen interest in the results of epidemiological studies that they are seen to be at the sharp end of this particular stick. Climatologists, nuclear physicists and students of the fall of the Roman Empire are all seen to produce their share of discrepant observations when the spotlight of public scrutiny falls on them. In fact, epidemiologists know a lot about the correct ways to conduct a research study but less about how to review and synthesize data from multiple studies, and this, I suggest, is a principal source of the public's confusion when faced with a new result from an epidemiological study.

The paper by Swaen et al.1 incurred several methodological difficulties of its own. How does one define a true positive result (i.e. what is the gold standard)? If investigators set up a hypothesis and test it on data already collected for another purpose is that necessarily a ‘fishing expedition'? Indeed, is all secondary data analysis a fishing expedition? Should Swaen et al. have looked at the issue of false negative studies to derive a completely balanced picture and one might mischievously ask whether their own paper is itself an example of a false positive result. Despite these limitations, this is an innovative attempt to try and quantify some biases that may lead to misinterpretation in an epidemiology review. It follows in the footsteps of a growing body of work done under the purview of the discipline of evidence-based medicine and healthcare which, by doing studies of studies, searches for sources of bias in accumulating the entire body of literature on a topic. Among many aspects of the science of review, these studies have led to a better understanding of the role of publication4 and citation bias,5,6 the importance of different data base searching strategies,7 the validity of abstracts in accumulating research evidence,8–10 and the validity of different methods used for quantifying the quality of studies.11

Reassuringly, Swaen et al. find the largest factor in a false positive study to be the absence of an a priori hypothesis as that is arguably the most fundamental of all scientific principles. Similarly, a dose-response relationship and adjustment for a major confounder (smoking), as expected, lead to fewer false positive results. It is interesting that study design is not itself a factor but there is little reason it should be; there is nothing intrinsically in error with case-control studies once problems with confounding have been adjusted, as they have here. Many of these issues are a matter of faith in epidemiology and it is reassuring to have some empirical evidence for them.

It is a great paradox in epidemiology that while the profession is very conversant with the requirements for conducting valid studies, it has generally neglected the need for rigorous, objective and hypothesis-driven summaries of the totality of epidemiological evidence when reviewing a particular topic. Early critics of the lack of scientific rigor in literature reviews focused on medicine,12,13 but in one recent analysis, over 60% of epidemiology reviews were considered to not meet the standards of a systematic review,14 and several specific biases in epidemiology reviews have been reported for chronic fatigue syndrome15 and passive smoking.16 While calls for more quantitative reviews in epidemiology are starting to be made,17 the overall poor quality of current epidemiology reviews is in marked contrast to the field of evidence-based medicine and healthcare which over the last 12 years has made remarkable strides in developing a methodology and strict standards for systematically reviewing and analysing a body of literature.18 While some epidemiologists have played a major role in these developments, by and large it appears that epidemiologists still review evidence using traditional and potentially biased methods.

Tröhler has recently provided a comprehensive account of the origins of evidence-based medicine, focusing on its early history in Britain.19 Of particular note are the ‘arithmetic observationists' who sought to quantify the mass of new observations being made in medicine in the late 18th century, and exemplified by William Black who in his text ‘An Arithmetic and Medical Analysis of the Diseases and Mortality of the Human Species' wrote in 1789:

‘... however it may be slighted as an heretical innovation, I would strenuously recommend Medical Arithmetick, as a guide and compass through the labyrinth of the therapeutick.'20(cited in 19 p.117)

The preparation of systematic reviews in epidemiology goes back at least 100 years. Chalmers et al.,21 remind us of an early review and meta-analysis of 11 studies by Karl Pearson who in 1904 reviewed evidence of typhoid vaccines using many of the strategies expected in modern systematic reviews.22 Winkelstein23 has also brought to our attention the early work of Joseph Goldberger who in 1907 reviewed 26 studies concerning the frequency of urinary infection in cases of typhoid fever.24 Goldberger also followed many of the maxims of modern research synthesis. It remains an interesting question why epidemiologists today have only rarely continued in the early tradition of more empirical research review.

To test (in an admittedly simple manner) the hypothesis that epidemiology reviews are not meeting modern standards of research synthesis, I analysed 39 reviews in 5 recent issues of Epidemiology Reviews, the pre-eminent source for reviews of the epidemiology literature. I asked three questions of each review, all reflecting some but by no means all of the principles frequently used to characterize a high quality systematic review within evidence-based medicine, as promulgated by the Cochrane Collaboration.25 First, did the review address a focused research question based on well-defined a priori exposures being related to a defined pattern of disease. Second, was the method of locating evidence described in detail in the review. Third, were explicit criteria prespecified to indicate the rationale for including or excluding a study. These criteria are also the first three in a larger set of criteria used by Mulrow and colleagues to examine the quality of medical reviews.12,26 Importantly, the use of meta-analysis was not a criterion for a systematic review. Systematic reviews do not require a meta-analysis, which may be deemed inappropriate because of sparse or heterogeneous results, and not all reviews which include meta-analyses follow the requirements of a systematic review. The choice of studies meta-analysed may be serendipitous rather than being based on a well-defined protocol.

Table 1Go shows the result of the analysis of the epidemiology reviews and compares them to the results of Mulrow and colleagues who recently updated their earlier examination of medical review articles.26 The single criterion that epidemiology reviews most commonly meet is to have the review address a focused well-defined question although this was still only met in about half of reviews (49%). Providing a description of the methods used to locate the evidence being reviewed in the form of prespecified criteria for data base searching (15%) and using explicit criteria to select studies included in the review (10%) were rarely met criteria. Reviews in epidemiology show a similar lack of rigor to those in medicine generally and are methodologically inferior to meta-analyses, systematic reviews and overviews.


View this table:
[in this window]
[in a new window]
 
Table 1 Methodology of review articles in medicine and epidemiology
 
Epidemiologists are not alone in having neglected the need to construct methodologically rigorous and unbiased reviews of research evidence. Chalmers et al.21 document calls for systematic reviews in physics,27 education,28 psychology,29 and the social sciences.30 They suggest:

‘Many, if not most people working within academia, have not yet recognized (let alone come to grips with) the rationale for and methodological challenges presented by research synthesis. Research synthesis is only now beginning to be seen as ‘proper research' in the corridors of academic power. As much as anything else, this change seems to reflect the fact that the consumers of research are pointing out more forcibly that the ‘atomised', unsynthesized products of the research enterprise are of little help to people who wish to use research to inform their decisions.'21

If epidemiologists fail to use modern methods of scientific review to derive unbiased syntheses of study results, is it any surprise that journalists do not do so either? It has always been a premise of scientific reporting that after describing a study's new findings, the investigator has a duty to synthesize the new results into the extant body of evidence on the topic. This aspect of epidemiology reporting may be occurring less frequently, perhaps because of editorial pressures to reduce the length of articles or perhaps because students are not being trained in this aspect of report writing; this itself is an area of research. Clinical trial reports have been found to inadequately synthesize their results within the current body of comparable evidence.31 A systematic review should validly reflect the current state of knowledge on a given topic and should form the basis for scientific reporting. If there were more concurrent systematic reviews in epidemiology, and new research findings were routinely discussed within the context of a systematic review, it would be a relatively easy task to refer the inquiring journalist or policy maker to the discussion section of a paper for an explanation of how the new report had changed the totality of evidence, if at all.

The study by Swaen et al. uses an innovative, albeit imperfect, research design to investigate sources of bias in the epidemiology literature. In doing so, it joins a growing body of literature on the science of systematically reviewing and analysing research evidence. The Cochrane Library25 includes a methodological data base of some 1350 titles. It is worth noting that scholars of evidence-based medicine have largely focused their attention on randomized trials, the methodology widely considered to be the gold standard of study design, but even here there remain concerns about reviewing evidence based on trials.32 How much more difficult will be the review of areas of research based on observational study designs and how much more likely the chance for bias, confusion and error?33

The limits of epidemiology are most likely faced when studying associations of rare disease with rare exposures,34 and some of the characteristics of these studies are found in the occupational studies forming the basis for Swaen et al.'s analysis. Would a comparable review of a more common exposure with a common outcome lead to similar conclusions? Only more study will tell. However, it is the rare exposure-rare outcome that increasingly tests epidemiology. As individual studies become more challenging then systematically reviewing the evidence from these studies will pose its own increasing difficulty. It may be that unsystematic and poorly conducted reviews of the smoking-lung cancer association would still correctly conclude that an association existed simply because of the strength of the relationship under study. This is less likely to happen as epidemiologists focus on rare disease-rare exposure associations. In these instances, the science of conducting high quality evidence-based reviews becomes increasingly critical if epidemiology is to credibly inform the public of the current risks to health to which it believes it may be exposed.

Acknowledgments

I am grateful to Iain Chalmers for comments on an early draft of this paper.

References

1 Swaen GGMH, Teggeler O, van Amelsvoort LGPM. False positive outcomes and design characteristics in occupational cancer epidemiology studies. Int J Epidemiol 2001;30:948–54.[Abstract/Free Full Text]

2 Taubes G. Epidemiology faces its limits. Science 1995;269:164–69.[ISI][Medline]

3 Bracken MB. Alarums false, alarums real: challenges and threats to the future of epidemiology. Ann Epidemiol 1998;8:79–82.[ISI][Medline]

4 Laupacis A. Methodologic studies of systematic reviews: is there publication bias. Arch Intern Med 1997;157:357–58.[ISI][Medline]

5 Gøtzsche P. Reference bias in reporting drug trials. Br Med J 1987;295: 654–56.[ISI][Medline]

6 Ojasoo T, Doré JC. Citation bias in medical journals. Scientometrics 1999;45:81–84.[ISI]

7 Watson RJ, Richardson PH. Accessing the literature on outcome studies in group psychotherapy: the sensitivity and precision of Medline and PsycINFO bibliographic data base searching. Br J Med Psychol 1999;72:127–34.[ISI][Medline]

8 Scherer RW, Dickersin K, Langenberg P. Full publication of results initially presented in abstracts. A meta-analysis. JAMA 1994;272: 158–62.[Abstract]

9 Callaham M, Wears RL, Weber EJ, Barton C, Young G. Positive outcome bias and other limitations in the outcome of research abstracts submitted to a scientific meeting. JAMA 1998;280:254–57.[Abstract/Free Full Text]

10 Pitkin RM, Branagan MA, Burmeister LF. Accuracy of data in abstracts of published research articles. JAMA 1999;281:1110–11.[Abstract/Free Full Text]

11 Berlin JA, Rennie D. Measuring the quality of trials: the quality of quality scales. JAMA 1999;282:1083–85.[Free Full Text]

12 Mulrow CD. The medical review article: state of the science. Ann Intern Med 1987;106:485–88.[ISI][Medline]

13 Peto R. Why do we need systematic reviews overviews of randomized trials? Stat Med 1987;6:233–40.[ISI][Medline]

14 Breslow RA, Ross SA, Weed DL. Quality of reviews in epidemiology. Am J Public Health 1998;88:475–77.[Abstract]

15 Joyce J, Rabe-Hesketh S, Wessely S. Reviewing the reviews: the example of chronic fatigue syndrome. JAMA 1998;280:264–66.[Abstract/Free Full Text]

16 Misakian AL, Bero LA. Publication bias and research on passive smoking: comparison of published and unpublished studies. JAMA 1998;280:250–53.[Abstract/Free Full Text]

17 Blettner M, Sauerbrei W, Schlehofer B, Scheuchenpflug T, Friedenreich C. Traditional reviews, meta-analyses and pooled analyses in epidemiology. Int J Epidemiol 1999;28:1–9.[Abstract]

18 Clarke M, Oxman AD (eds). Cochrane Reviewers' Handbook 4.0 [updated July 1999]. In: The Cochrane Library [database on CDROM]. The Cochrane Collaboration. Oxford: Update Softfware; 2000 Issue 1.

19 Tröhler U. ‘To improve the evidence of medicine'. The 18th Century British Origins of a Critical Approach. Edinburgh: The Royal College of Physicians of Edinburgh, 2000.

20 Black W. An Arithmetick and Medical Analysis of the Diseases and Mortality of the Human Species. London: C Dilly, 1789.

21 Chalmers I, Hedges LV, Cooper H. A brief history of research synthesis. In: Clarke M (ed.). Evaluation and the Health Professions. In Press.

22 Pearson K. Report on certain enteric fever inoculation statistics. Br Med J 1904;3:1243–46.

23 Winkelstein W. The first use of meta-analysis. Am J Epidemiol 1998; 147:717.[ISI][Medline]

24 Goldberger J. Typhoid ‘bacillus carriers'. In: Rosenau MJ, Lumsden LL, Kastle JH (eds). Report on the origin and prevalence of typhoid fever in the District of Columbia. Hygienic Laboratory Bulletin 1907;No.35.

25 The Cochrane Library 2000 Issue 3. [database on CDROM]. The Cochrane Collaboration: Update Software; www.cochranelibrary.com

26 McAlister FA, Clark HD, van Walraven C et al. The medical review article revisited: has the science improved. Ann Intern Med 1999;131: 947–51.[Abstract/Free Full Text]

27 Herring C. Distil or drown: the need for reviews. Physics Today 1968; 21:27–33.[ISI]

28 Pillemer DB. Conceptual issues in research synthesis. J Spec Educ 1984;18:27–40.[ISI]

29 Sterling TD. Publication decisions and their possible effects on inferences drawn from tests of significance—or vice versa. J Am Statist Assoc 1959;54:30–34.[ISI]

30 Light RJ, Pillemer DB. Summing Up: The Science of Reviewing Research. Cambridge, MA: Harvard University Press, 1984.

31 Clarke M, Chalmers I. Discussion sections in reports of controlled trials published in general medical journals: islands in search of continents? JAMA 1998;280:280–82.[Abstract/Free Full Text]

32 Jadad AR, Cook DJ, Jones A et al. Methodology and reports of systematic reviews and meta-analyses: a comparison of Cochrane reviews with articles published in paper-based journals. JAMA 1998; 289:278–80.

33 Egger M, Schneider M, Davey Smith G. Spurious precision? Meta-analysis of observational studies. Br Med J 1998;316:140–44.[Free Full Text]

34 Bracken MB. Musings on the edge of epidemiology. Epidemiology 1997;8:337–39.[ISI][Medline]