Potential dangers in the customary methods of conducting meta-analyses

Lack of bias in the meta-analysis of recombinant versus urinary follicle stimulating hormone

Salim Daya,1 and Joanne Gunby

Master University, Hamilton, Ontario, Canada


    Abstract
 Top
 Abstract
 Introduction
 References
 
In a meta-analysis of randomized trials comparing recombinant and urinary FSH for ovarian stimulation in infertility treatment cycles, the clinical pregnancy rate per cycle started was significantly higher with recombinant FSH. Before the results of the different trials can be pooled, it is important to determine whether the results are homogeneous so that one can obtain an overall treatment estimate that is without bias. There are several methods to ascertain lack of homogeneity, including logistic regression analysis, sensitivity analysis comparing fixed-effect and random-effect models, testing based in the {chi}2 distribution, and graphical methods comparing event rates in experimental and treatment groups. These methods all confirmed that the treatment effect was homogeneous across all trials thereby providing assurance that the overall conclusion of the superior efficacy of recombinant FSH compared with urinary FSH is based on an unbiased analysis.

Key words: assisted reproduction/gonadotrophins/homogeneity/meta-analysis/validity


    Introduction
 Top
 Abstract
 Introduction
 References
 
We would like to thank Dr Walters for confirming that our work is not only fastidious and meticulous but also free from bias. Consequently, our conclusion that the use of recombinant FSH is more efficacious than urinary FSH, resulting in a significantly higher clinical pregnancy rate when used for ovarian stimulation in assisted reproduction, is valid and supported by the data we have analysed.

We agree that logistic regression analysis is one of several methods available for conducting a meta-analysis. In fact, we performed such an analysis, and the results, demonstrating that the magnitude of the overall treatment effect and its precision were similar to those obtained in the analysis performed using the Peto modification of the Mantel–Haenszel method, were presented in our paper. The odds ratios (OR) and their 95% confidence intervals (CI) were 1.20 (1.1–1.5) and 1.20 (1.02–1.42) respectively (Daya and Gunby, 1999Go). We should point out that the slight discrepancy noted by Dr Walters in the overall OR [i.e. 1.261 (using the weighted mean value approach) and 1.26 (using logistic regression analysis) compared with 1.20 (with our analysis)] is based on the fact that the data used in his analysis were limited to those for IVF only (he did not include the data from ICSI cycles). In our analysis, the data used were from both IVF and ICSI cycles. When we restricted our analysis to IVF cycles, we too observed a higher OR (1.26, 95% CI 1.05–1.52) (as noted in our paper); this estimate is identical to that provided by Dr Walters using the other two methods.

We agree with Dr Walters that investigation of trial heterogeneity is extremely important in any systematic review and meta-analysis. There are several methods to ascertain heterogeneity. An underlying assumption, in combining the data from individual trials to arrive at a summary estimate of the effect of treatment, is that differences among trials are due to chance alone. Statistical heterogeneity can be attributable to one of two causes. The estimates observed in trials may differ because of random sampling error. Consequently, even if the true (but unknown) effect is the same in each trial, because all samples are drawn from the same population, the results observed in the different trials would be expected to vary randomly around the true fixed-effect. This variability is called the within-study variance. Alternatively, each trial sample may have been drawn from a different population, resulting in treatment effect estimates that would be expected to differ. These differences, called random-effects, describe the between-study variance around the overall mean of the estimates of all trials. Algebraically, the fixed-effects and random-effects models are identical except for the inclusion of a ‘random-effect’ term in the latter model. This term represents unmeasured sources of heterogeneity among trials. To the extent that confidence intervals are supposed to reflect subjective uncertainty about the estimate of the treatment effect, random-effects models can be superior to fixed-effects models (Dickersin and Berlin, 1992Go) because the ‘random-effect’ term provides some allowance for sources of heterogeneity beyond sampling error. In the absence of significant heterogeneity, the ‘random-effect’ term would be zero. Consequently, both analyses (fixed-effects and random-effects) should produce similar results. After performing such a sensitivity analysis on our data, we obtained the following (identical) results: common OR 1.21, 95% CI 1.03–1.43 (fixed-effects model) and 1.21, 95% CI 1.03–1.43 (random-effects model). Thus, there was no heterogeneity in the estimates of treatment effect among trials selected for the systematic review.

Another method of investigating variation in study outcomes is to assess the statistical significance of between-study heterogeneity based on the {chi}2 distribution (Fleiss, 1981Go). It provides a measure of the sum of the squared differences between the results observed and the results expected in each trial, under the assumption that each trial estimates the same common treatment effect. If the total deviation observed is large, then a single common treatment effect is unlikely. Using a test such as the Breslow–Day (Breslow and Day, 1980Go), we observed no significant heterogeneity of treatment effect across all trials (Breslow–Day statistic = 7.5, P = 0.94). In practice, because this test has low sensitivity for detecting heterogeneity, it has been suggested that a liberal significance threshold, such as 0.1, be used to determine whether the result is statistically significant (Fleiss, 1981Go). The probability value we obtained was clearly well above this threshold.

In addition to the statistical approaches mentioned, investigators may want to graphically display the trial outcomes to make a subjective judgement on homogeneity of treatment effect, especially when the formal statistical tests fail to reject the homogeneity assumption. One method displays variations in the observed estimates of treatment effect by plotting the event rates in the treatment groups on the vertical axis and the event rates in the control groups on the horizontal axis (L'Abbe et al., 1987Go). The data from our study are displayed in Figure 1Go using this graphical approach. The scatter plot shows the data clustered together indicating consistency of treatment effect (i.e. homogeneity). If there was a lack of consistent treatment effect (i.e. heterogeneity), the data points would be more widely dispersed.



View larger version (12K):
[in this window]
[in a new window]
 
Figure 1. Scatterplot of treatment effects in trials included in the meta-analysis. The diagonal line represents equal pregnancy rates with rFSH and uFSH.

 
In summary, we agree with Dr Walters that every effort should be made to confirm that the estimates of treatment effect across trials are not heterogeneous before pooling the data. In our paper, we used various statistical approaches, including logistic regression analysis, and obtained the same result, i.e. homogeneity of treatment effect and an overall estimate of clinical pregnancy in favour of recombinant FSH. We have now demonstrated the homogeneity of treatment effect using a graphical display. It should be emphasized again that any systematic review and meta-analysis must be comprehensive, thorough, fastidious, and meticulous so that inferences can be made with confidence. We believe we have satisfied this principle in our work.


    Notes
 
1 To whom correspondence should be addressed. E-mail: dayas{at}mcmaster.ca Back


    References
 Top
 Abstract
 Introduction
 References
 
Breslow, N.E. and Day, N.E. (1980) Statistical Methods in Cancer Research, Vol. 1. Analysis of Data from Retrospective Studies of Disease. AIRC Scientific Publications, Lyon.

Daya, S. and Gunby, J. (1999) Recombinant versus urinary follicle stimulating hormone for ovarian stimulation in assisted reproduction. Hum. Reprod., 14, 2207–2215.[Abstract/Free Full Text]

Dickersin, K. and Berlin, J.A. (1992) Meta-analysis: state-of-the-science. Epidemiol. Rev., 14, 154–176.[ISI][Medline]

Fleiss, J.L. (1981) Statistical Methods for Rates and Proportions. 2nd. Edn,. J. Wiley, New York. pp 161–165.

L'Abbe, K.A., Detsky, A.S. and O'Rourke, K. (1987) Meta-analysis is clinical research. Am. Int. Med., 107, 224–233.





This Article
Abstract
FREE Full Text (PDF )
Alert me when this article is cited
Alert me if a correction is posted
Services
Email this article to a friend
Similar articles in this journal
Similar articles in ISI Web of Science
Similar articles in PubMed
Alert me to new issues of the journal
Add to My Personal Archive
Download to citation manager
Request Permissions
Google Scholar
Articles by Daya, S.
Articles by Gunby, J.
PubMed
PubMed Citation
Articles by Daya, S.
Articles by Gunby, J.