Potential dangers in the customary methods of conducting meta-analyses

Recombinant versus urinary follicle stimulating hormone

Eurof Walters

Thorpes, The Grip, Linton, Cambridge CB1 6NR, UK

Correspondence: To whom correspondence should be addressed. E-mail: dew1{at}cus.cam.ac.uk


    Abstract
 Top
 Abstract
 Introduction
 References
 
The customary method of combining success rates in meta-analyses may often result in serious biases, leading to erroneous inferences. This arises because of an inadmissible pooling of frequencies from heterogeneous sources. The fundamental statistical principle, that the magnitude of an ‘effect’ should always be tested against the variation in that effect over the sample, may not therefore be satisfied. A simple, but rigorous, alternative method is described.

Key words: bias/data-pooling/erroneous inferences/meta-analyses


    Introduction
 Top
 Abstract
 Introduction
 References
 
The exchange of letters (Daya and Gunby, 2000Go, 2001 Go;Girard, 2000Go; Out, 2001Go) which followed the publication of the meta-analysis by Daya and Gunby (1999) has prompted me into a careful re-examination of the analyses used in this study. In fact this episode provides a very good opportunity to discuss the ways that meta-analyses are generally conducted, and the findings published, in this and other journals. Since I am not competent to comment on biological and clinical matters, I shall confine my remarks to the treatment of the numerical data.

Perhaps I should state at the outset that, in my view, Daya and Gunby have been fastidious and meticulous in the conduct of their investigation (Daya and Gunby, 1999Go), and that their findings appear to be fully justified. However, since the numerical manipulations employed in meta-analyses may be a mystery to many readers it may be worthwhile to outline an alternative, simple, yet rigorous, analytical method, which has produced almost identical results in the present example. I would also like to pinpoint one potential flaw in the method generally used to carry out meta-analyses; a flaw which could lead to serious biases. The circumstances which could produce this bias were not however present in the Daya and Gunby (1999) paper. Finally I would like to promote logistic regression as a general method for conducting meta-analyses. This method has the attraction that it combines, in a single analysis, the investigation of trial heterogeneity as well as the estimation of treatment effects, together with the appropriate estimates of experimental error. Furthermore, the logistic regression method is virtually unique in that it can accommodate, and make adjustments for, potential confounding variables.

The favoured method of carrying out meta-analyses on data which consist of proportions is to first carry out a statistical test for trial heterogeneity, and in the absence of heterogeneity, perform a test such as the Mantel–Haenszel test on the pooled data. Now the Mantel–Haenszel test assumes a constant but unknown treatment ‘effect’, and combines the ‘effect’ estimates from several trials to provide a test of the departure of the mean effect from zero. The reason why this procedure is not favoured by me, is that the test for heterogeneity will often fail to provide statistical evidence of heterogeneity, particularly for relatively small collections of studies, simply because of the insensitivity of the test. Further, and most importantly, since heterogeneity will almost always be present, to a smaller or greater degree, the bias will always be in the same direction. Underdispersion, or what might be referred to as ‘negative heterogeneity’ will hardly ever occur for combinations of published studies.

The result of these two phenomena is that the so called ‘errors’ used in the Mantel–Haenszel test will generally be under-estimates of the true error, and the test will consequently tend to exaggerate the importance of the effect being investigated. Indeed this could well be the reason why meta-analyses so often appear to report contradictory findings.

Now one of the fundamental aims of a great deal of statistical work is to make inferences about a population, based on a study of a random sample drawn from that population. In the present context we generally wish to make inferences about the entire population of women embarking on the course of assisted reproductive technology based on several, not necessarily homogeneous, studies. The overriding principle in the testing process should therefore be to compare the magnitude of the effect of interest with the variation observed in that effect over the sample.

For the reasons outlined above, the customary procedures of analysing meta-analyses do not provide unbiased methods of combining the evidence of several studies to give a composite finding. Having made this general point however, I confirm that the studies used by Daya and Gunby (1999) showed no evidence of heterogeneity; indeed the degree of variation between studies was slightly less than would be expected by pure chance, this information emerging from a logistic regression carried out on the data. Even so, it has been worthwhile to emphasise this very important point since it is likely to lead to biased conclusions in many meta-analyses.

A truly rigorous statistical alternative to the procedure outlined above is to use logistic regression analysis to analyse the data. Depending on the algorithm being used, the analyst may be offered the option of using within-study variation (akin to the Mantel–Haenszel test) or the between-study variation. Since this analysis is highly computer oriented it would not be appropriate to discuss the mechanics here, but the findings for the Daya and Gunby database are presented below.

I shall now describe a very simple method of combining the studies, a method which mimics in many ways the more sophisticated logistic regression method. However, this numerical procedure is eminently simple conceptually, and involves simply calculating the weighted mean value and SE of several observations. The SE here is based on the between-trial variation, so that the statistics computed consist of the ‘effect’, and the observed variation of the ‘effect’.

The first three columns of Table IGo reproduce the data used by Daya and Gunby in a general comparison of the two types of follicle stimulating hormone (rFSH and uFSH). The fourth column is simply the natural logarithm of their odds ratio (OR), and the fifth column displays the variance of the estimate of the log-OR for that study. The final column is the relative weight to be attached to the data point, being the reciprocal of the variance.


View this table:
[in this window]
[in a new window]
 
Table I. Clinical pregnancy rates for 12 studies, summarising the calculations for the pooling of the information contained in the studies.
 
The 12 estimates entered into the meta-analysis are therefore in the fourth column, with a positive value indicating the superiority of rFSH. We note that 10 out the 12 studies favoured rFSH. The statistical test is simply the weighted mean value tested against zero, the result of which test is displayed at the foot of the table.

The logistic regression analysis of these data returned a mean OR of 1.26 with 95% confidence limits of 1.05 and 1.52 (P < 0.02). Daya and Gunby (1999) cite the figure of 1.20 as the OR, with 95% confidence limits of 1.02 and 1.42 (P = 0.03). The consistency of these three analyses is obvious.

The overriding message of this note is, that in order to make sensible inferences, ‘between-study’ variation should be used as the estimate of error in meta-analyses. The most popular procedure in current use does not guard against serious bias, in the sense of exaggerating, or over-emphasising the importance of the effect being investigated. Simple (as above) or sophisticated (general linearized model methods) are available to provide rigorous analyses. Although the differences between various analytical methods were inconsequential for the Daya and Gunby data set, there are numerous examples in the literature where lack of attention to heterogeneity leads to conflicting findings (James, 1999Go; Walters, 2000Go)


    References
 Top
 Abstract
 Introduction
 References
 
Daya, S. and Gunby, J. (1999) Recombinant versus urinary follicle stimulating hormone for ovarian stimulation in assisted reproduction. Hum. Reprod., 14, 2207–2215.[Abstract/Free Full Text]

Daya, S. and Gunby, J. (2000) Meta-analysis on recombinant versus urinary follicle stimulating hormone (Letter). Hum. Reprod., 15, 1651–1652.[Free Full Text]

Daya, S. and Gunby, J. (2001) Meta-analysis of rFSH versus uFSH. (Letter). Hum. Reprod., 16, 594–595.[ISI]

Girard, M. (2000) Meta-analysis on recombinant versus urinary follicle stimulating hormone. Hum. Reprod., 15, 1650–1651.[Free Full Text]

James, W.H. (1999) The status of the hypothesis that the human sex ratio at birth is associated with the cycle day of conception. Hum. Reprod., 14, 2177–2178.[Free Full Text]

Out, H.J. (2001) Meta-analysis on rFSH versus uFSH. Hum. Reprod., 16, 592–593.

Walters, D.E. (2000) On the need for statistical rigour when pooling data from a variety of sources, Hum. Reprod., 15, 1205–1208.[Free Full Text]





This Article
Abstract
FREE Full Text (PDF )
Alert me when this article is cited
Alert me if a correction is posted
Services
Email this article to a friend
Similar articles in this journal
Similar articles in ISI Web of Science
Similar articles in PubMed
Alert me to new issues of the journal
Add to My Personal Archive
Download to citation manager
Search for citing articles in:
ISI Web of Science (2)
Request Permissions
Google Scholar
Articles by Walters, E.
PubMed
PubMed Citation
Articles by Walters, E.