1 Department of Epidemiology, University of North Carolina, Chapel Hill, NC, USA.
2 Division of Cancer Epidemiology and Genetics, National Cancer Institute, NIH, DHHS, Bethesda, MD, USA.
3 Department of Epidemiology, Public Health Sciences, Wake Forest University School of Medicine, Winston-Salem, NC, USA.
Charles Poole, Department of Epidemiology (CB 7435), University of North Carolina School of Public Health, Chapel Hill, NC 275997435, USA. E-mail: cpoole{at}unc.edu
What are this studys implications? No question is asked more often of epidemiologists, or by them. In this issue of the International Journal of Epidemiology, the paper by Sesso et al.1 on tea consumption in relation to coronary heart disease (CHD) and stroke provides an excellent illustration of several ways of framing this question.
![]() |
What does it all mean? |
---|
![]() ![]() ![]() ![]() ![]() ![]() |
---|
![]() |
What would it all mean if this were the only study? |
---|
![]() ![]() ![]() ![]() ![]() ![]() |
---|
Although Sesso et al.1 do not use the word failed, this is how they determine the implications of their results. Their take-home message is that their estimated trends and almost all of their categorical hazard ratio (HR) estimates are non-significant because their 95% CI straddle the null value. The authors rely so heavily on null hypothesis tests that they interpret a change in an estimate from HR = 0.64 (95% CI: 0.43, 0.96) to HR = 0.67 (95% CI: 0.44, 1.02), after adjusting for a set of covariates, as though this change from significant to non-significant were meaningful. It is not. The two estimates are essentially identical. There is no appreciable difference in their magnitude or precision.
This reliance on null hypothesis testing almost obliges the authors to treat their own results as though they were the only information of scientific and policy relevance with regard to tea. They mention other research, of course, but their main conclusions are based exclusively on their own results:
These data provide no compelling reasons for individuals to initiate black tea consumption, which should be viewed as neither beneficial nor harmful.1
We see no tenable rationale rationale for basing any decision about tea consumption or any judgement about whether tea is beneficial or harmful on the results of a single epidemiological study.
An estimate that fails to achieve significance could be the most valid and precise estimate in an entire literature.3 We would call the study producing such a result a success, not a failure. As we show below, the non-significant results of Sesso et al.1 are resoundingly successful, at least with regard to precision.
![]() |
What does this study contribute to the literature? |
---|
![]() ![]() ![]() ![]() ![]() ![]() |
---|
Unfortunately, Sesso et al.1 report for each outcome only the P for trend, and not what the systematic reviewer needs: the estimated coefficient for trend and its estimated standard error (or a CI from which the standard error can be deduced). Consequently, the systematic reviewer must use meta-analytical methods4 to find out what this study is contributing to the literature. For CHD and a 3-cup/day increment of tea intake, we estimate HR = 0.95 (95% CI: 0.86, 1.04). This estimate is identical to the most precise one in our previous review,2 from a report by Klatsky et al.,5 but the Sesso et al. estimate is considerably more precise; its inverse-variance weight is more than three times that of the Klatsky et al. estimate. For stroke, we estimate from the Sesso et al. results HR = 0.94 (95% CI: 0.78, 1.13) for an increment of 3 cups/day. This estimate is vastly more precise than any estimate for stroke in our previous review.2 It is also much closer to the null than any of those estimates. Therefore, somewhat ironically, the publication of this new study has almost certainly increased the evidence of publication bias in the literature as a whole, as very precise estimates very close to the null value almost always do. The influence of the Sesso et al. results on the evidence of overall heterogeneity and on associations between characteristics of the studies and their trend estimates is more difficult to discern, especially considering the fact that they are not the only results to appear in the interim.
From the validity standpoint, the Sesso et al.1 results share the strengths and limitations of most of the other cohort studies. The obvious strength, of course, is that control-selection bias is not an issue, as it is in the case-control studies. The most obvious drawback is that the outcomes are highly aggregated, presumably because they were self-reported by the participants. We would prefer to have results for specific outcomes, such as myocardial infarction, from all studies; but only the case-control studies provide that information. We have agreed6 that it would be highly desirable to distinguish haemorrhagic stroke from ischaemic stroke.7 This splitting is presumably not possible in most cohort studies.
We do not share the strength of conviction of Sesso et al. that clinical confirmation of 47 of 49 self-reported cases of CHD and of 12 of 15 self-reported cases of stroke means that the self-reporting has been shown to be valid.1 Forty-nine and 15 are small sample sizes for estimating proportions. Moreover, separating the false positives from a group of reported positives tells us nothing whatsoever about false negatives. Nor are we prepared to assume that the sensitivity of self-reported outcome ascertainment is perfect. Even non-differentially imperfect sensitivity biases relative measures of effect if the specificity is imperfect as well.8
Each study provides something novel. In the case of the Sesso et al.1 results, it is the incorporation of a second questionnaire assessment of tea intake for a subset of the cohort. HR estimates farther from the null are obtained, which the authors interpret solely from the standpoint of reduced measurement error, even while contending that a validation study showed a single questionnaire measurement to be reasonably reliable and valid because it explained 59% of the variance among tea intakes measured by dietary records. Another way of looking at the two-measurement results is that they might reflect some aspect of exposure-response or induction time that is not captured very well by a single, baseline measurement of tea intake.
![]() |
Conclusion |
---|
![]() ![]() ![]() ![]() ![]() ![]() |
---|
![]() |
Notes |
---|
![]() |
References |
---|
![]() ![]() ![]() ![]() ![]() ![]() |
---|
2 Peters U, Poole C, Arab L. Does tea affect cardiovascular disease? A meta-analysis. Am J Epidemiol 2001;154:495503.
3 Poole C. Low P-values or narrow confidence intervals: Which are more durable? Epidemiology 2001;12:29194.[CrossRef][ISI][Medline]
4 Greenland S, Longnecker MP. Methods for trend estimation from summarized dose-response data, with applications to meta-analysis. Am J Epidemiol 1992;135:130109.[Abstract]
5 Klatsky AL, Armstrong MA, Friedman GD. Coffee, tea, and mortality. Ann Epidemiol 1993;3:37581.[Medline]
6 Peters U, Poole C, Arab L. The authors reply (letter). Am J Epidemiol 2002;156:49091.
7 Thrift AG, Donnan GA. Re: Does tea affect cardiovascular disease? A meta-analysis. (Letter). Am J Epidemiol 2002;156:490.
8 Poole C. Exceptions to the rule about nondifferential misclassification (abstract). Am J Epidemiol 1985;122:508.