Thorpes, The Grip, Linton, Cambridge CB1 6NR, UK
Correspondence: To whom correspondence should be addressed. E-mail: dew1{at}cus.cam.ac.uk
![]() |
Abstract |
---|
![]() ![]() ![]() ![]() |
---|
Key words: bias/data-pooling/erroneous inferences/meta-analyses
![]() |
Introduction |
---|
![]() ![]() ![]() ![]() |
---|
Perhaps I should state at the outset that, in my view, Daya and Gunby have been fastidious and meticulous in the conduct of their investigation (Daya and Gunby, 1999), and that their findings appear to be fully justified. However, since the numerical manipulations employed in meta-analyses may be a mystery to many readers it may be worthwhile to outline an alternative, simple, yet rigorous, analytical method, which has produced almost identical results in the present example. I would also like to pinpoint one potential flaw in the method generally used to carry out meta-analyses; a flaw which could lead to serious biases. The circumstances which could produce this bias were not however present in the Daya and Gunby (1999) paper. Finally I would like to promote logistic regression as a general method for conducting meta-analyses. This method has the attraction that it combines, in a single analysis, the investigation of trial heterogeneity as well as the estimation of treatment effects, together with the appropriate estimates of experimental error. Furthermore, the logistic regression method is virtually unique in that it can accommodate, and make adjustments for, potential confounding variables.
The favoured method of carrying out meta-analyses on data which consist of proportions is to first carry out a statistical test for trial heterogeneity, and in the absence of heterogeneity, perform a test such as the MantelHaenszel test on the pooled data. Now the MantelHaenszel test assumes a constant but unknown treatment effect, and combines the effect estimates from several trials to provide a test of the departure of the mean effect from zero. The reason why this procedure is not favoured by me, is that the test for heterogeneity will often fail to provide statistical evidence of heterogeneity, particularly for relatively small collections of studies, simply because of the insensitivity of the test. Further, and most importantly, since heterogeneity will almost always be present, to a smaller or greater degree, the bias will always be in the same direction. Underdispersion, or what might be referred to as negative heterogeneity will hardly ever occur for combinations of published studies.
The result of these two phenomena is that the so called errors used in the MantelHaenszel test will generally be under-estimates of the true error, and the test will consequently tend to exaggerate the importance of the effect being investigated. Indeed this could well be the reason why meta-analyses so often appear to report contradictory findings.
Now one of the fundamental aims of a great deal of statistical work is to make inferences about a population, based on a study of a random sample drawn from that population. In the present context we generally wish to make inferences about the entire population of women embarking on the course of assisted reproductive technology based on several, not necessarily homogeneous, studies. The overriding principle in the testing process should therefore be to compare the magnitude of the effect of interest with the variation observed in that effect over the sample.
For the reasons outlined above, the customary procedures of analysing meta-analyses do not provide unbiased methods of combining the evidence of several studies to give a composite finding. Having made this general point however, I confirm that the studies used by Daya and Gunby (1999) showed no evidence of heterogeneity; indeed the degree of variation between studies was slightly less than would be expected by pure chance, this information emerging from a logistic regression carried out on the data. Even so, it has been worthwhile to emphasise this very important point since it is likely to lead to biased conclusions in many meta-analyses.
A truly rigorous statistical alternative to the procedure outlined above is to use logistic regression analysis to analyse the data. Depending on the algorithm being used, the analyst may be offered the option of using within-study variation (akin to the MantelHaenszel test) or the between-study variation. Since this analysis is highly computer oriented it would not be appropriate to discuss the mechanics here, but the findings for the Daya and Gunby database are presented below.
I shall now describe a very simple method of combining the studies, a method which mimics in many ways the more sophisticated logistic regression method. However, this numerical procedure is eminently simple conceptually, and involves simply calculating the weighted mean value and SE of several observations. The SE here is based on the between-trial variation, so that the statistics computed consist of the effect, and the observed variation of the effect.
The first three columns of Table I reproduce the data used by Daya and Gunby in a general comparison of the two types of follicle stimulating hormone (rFSH and uFSH). The fourth column is simply the natural logarithm of their odds ratio (OR), and the fifth column displays the variance of the estimate of the log-OR for that study. The final column is the relative weight to be attached to the data point, being the reciprocal of the variance.
|
The logistic regression analysis of these data returned a mean OR of 1.26 with 95% confidence limits of 1.05 and 1.52 (P < 0.02). Daya and Gunby (1999) cite the figure of 1.20 as the OR, with 95% confidence limits of 1.02 and 1.42 (P = 0.03). The consistency of these three analyses is obvious.
The overriding message of this note is, that in order to make sensible inferences, between-study variation should be used as the estimate of error in meta-analyses. The most popular procedure in current use does not guard against serious bias, in the sense of exaggerating, or over-emphasising the importance of the effect being investigated. Simple (as above) or sophisticated (general linearized model methods) are available to provide rigorous analyses. Although the differences between various analytical methods were inconsequential for the Daya and Gunby data set, there are numerous examples in the literature where lack of attention to heterogeneity leads to conflicting findings (James, 1999; Walters, 2000
)
![]() |
References |
---|
![]() ![]() ![]() ![]() |
---|
Daya, S. and Gunby, J. (2000) Meta-analysis on recombinant versus urinary follicle stimulating hormone (Letter). Hum. Reprod., 15, 16511652.
Daya, S. and Gunby, J. (2001) Meta-analysis of rFSH versus uFSH. (Letter). Hum. Reprod., 16, 594595.[ISI]
Girard, M. (2000) Meta-analysis on recombinant versus urinary follicle stimulating hormone. Hum. Reprod., 15, 16501651.
James, W.H. (1999) The status of the hypothesis that the human sex ratio at birth is associated with the cycle day of conception. Hum. Reprod., 14, 21772178.
Out, H.J. (2001) Meta-analysis on rFSH versus uFSH. Hum. Reprod., 16, 592593.
Walters, D.E. (2000) On the need for statistical rigour when pooling data from a variety of sources, Hum. Reprod., 15, 12051208.
|