1 Biostatistics Group, University of Manchester and 2 Salford Royal Hospitals NHS Trust, R&D Support Unit, Hope Hospital, Stott Lane, Salford M6 8HD, UK
3 To whom correspondence should be addressed. e-mail: andy.vail{at}man.ac.uk
![]() |
Abstract |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Key words: statistics/subfertility/systematic review
![]() |
Introduction |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Although CONSORT provides a useful framework, it is not intended to be exhaustive. For example, it covers parallel but not cross-over trials. It is also limited to aspects of general concern such as allocation process and loss to follow-up. In practice, each field of application may have specific areas of difficulty in undertaking and reporting randomized trials. Such areas exist in the field of subfertility trials, defined broadly to incorporate interventions designed to enhance the probability of childbirth for the subject.
One such area concerns what statisticians refer to as unit of analysis errors. Simple group comparison tests, such as the t-test or MannWhitney for continuous data and 2 or Fishers for categorical data, require that observations are statistically independent. In the context of clinical trials allocating subjects to different interventions, this will usually mean that only one observation per patient is included in any such analysis. The hierarchical nature of subfertility data with, for example, multiple oocytes, multiple embryos and multiple implants per treatment cycle, and multiple treatment cycles per woman, provides extended scope for unit of analysis errors. Use of multiple observations per woman leads to unpredictable bias in the estimate of treatment difference, but exaggerates the apparent sample size. This exaggeration leads to spuriously narrow confidence intervals and low P-values.
A second area has received attention throughout the last decade (e.g. Daya, 1993; Cohlen et al., 1998
). Treatment cycles naturally define time periods with washout phases, leading the statistically unwary to believe that efficiency benefits of cross-over trials in chronic disease fields can be applied to subfertility trials. However, treatment stops once success has been achieved. An immediate consequence of this extreme form of carry-over is that cross-over trials are inappropriate (Senn, 1993
). As with unit of analysis errors, bias in estimation of treatment effects is unpredictable.
A third area concerns the nature of adverse events. Ovarian hyperstimulation syndrome is a typical adverse event, in that its occurrence is clearly a treatment failure. Other important adverse events, such as ectopic and aborted pregnancies, only occur following the partial technical success of achieving pregnancy. This has two methodological consequences. Firstly, it is usual to report pregnancy-related adverse events as a proportion of pregnancies, rather than as a proportion of randomized women. Such an approach loses the benefits of a randomized comparison. It may also be misleading, as it is possible to have a higher miscarriage rate per pregnancy in the group with the lower miscarriage rate per woman. Nevertheless, for systematic review the error is not serious since reporting these rates allows calculation of the more appropriate rates. Secondly, it is important to avoid conclusions based on interim measures of outcome, such as clinical pregnancy, that may include differential proportions of treatment failures as yet unobserved. Therefore the primary outcome of subfertility trials should be live birth or, at the least, pregnancy ongoing beyond the stage where sceptical readers could reasonably postulate differential spontaneous abortion rates.
Our aim in undertaking this study was to assess empirically the current statistical standards in design, analysis and reporting of subfertility trials. In particular, we aimed to ascertain how often important information concerning trial quality, and valid data for analyses of primary outcome and adverse events, could be extracted from the published reports for the purposes of systematic review.
![]() |
Materials and methods |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
The first author (A.V.) then screened trials for eligibility. We considered only trials in which authors compared pregnancy rates by allocating patients to two different groups. Comparison of pregnancy rates allowed a definition of subfertility trial and constituted evidence that the authors considered the possibility that clinical outcome may differ. Allocation at the level of patients was essential to allow the possibility of valid comparison of live birth rates. The restriction to two-group comparisons was for ease of data extraction. Thus we included, for example, 2 x 2 factorial designs, but not three-group parallel designs.
The two authors then independently assessed pre-specified criteria of the remaining studies as follows:
(i) cross-over design: design inappropriate to setting;
(ii) randomization: method should be specified;
(iii) allocation concealment: ideally there should be a third party, as envelopes may be of variable quality (Peto, 1999);
(iv) power calculation: should be prospective and based on primary analysis;
(v) primary outcomes: should be pre-specified and few to avoid issues of multiplicity;
(vi) patient flow: number randomized and analysed for each outcome should be clear;
(vii) intention-to-treat adherence: should specify and, if non-adherent, justify;
(viii) outcomes: valid analysis or sufficient data for reader to perform valid analysis;
(ix) unit of analysis error: whether reported for any outcome.
These criteria were based on key statistical aspects of the CONSORT statement and issues specific to subfertility research that influence the ability of systematic reviewers to assess and synthesize trial results.
![]() |
Results |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Reporting of study design
Only one identified trial was of cross-over design. Six (15%) trials, including the cross-over study, were clearly not randomized, and there was insufficient detail to assess the allocation method of a further eight (21%) (Table I). Concealment of allocation was secured by a third party in three (8%) trials. Use of sealed envelopes or containers was mentioned in a further 10 (26%) trials.
|
A single primary outcome was specified in eight (21%) trials. Seven others explicitly listed from two to four outcomes as primary. Of these 15, only eight trials included primary outcomes that were clinical pregnancy or longer term, and only one gave live birth as a primary outcome. In most (62%) trials it was not explicit which outcomes, if any, were considered primary.
Reporting of results
Patient flow, the number of patients included at each stage of the trial, was clear in 28 (72%) trials. In the other 11 (28%) trials it was not clear how many patients had been allocated to each treatment. In six of these 11 it was not clear either how many patients were in each group for analysis.
Lack of clarity in patient flow precluded assessment of adherence to the intention-to-treat (ITT) principle (Table I). The basis of this principle is that primary analysis should include all randomized cases in the groups to which they were assigned. Only six trials referred to the principle, and it was clearly misunderstood by four of these where the numbers assigned and analysed differed. Two trials redefined ITT, including one that defined it as the antithesis of its usual meaning: "taking into account all randomized patients who followed the protocol". The other defined ITT as an approach sometimes referred to as modified ITT: excluding cases who receive no therapy. This policy resulted in the exclusion of, among others, two women who achieved pregnancy between randomization and intervention. In 21 (54%) trials the numbers allocated and analysed appeared to be the same.
Only five (13%) studies reported on live births in sufficient detail to allow formal comparison between the groups (Figure 1). A further five (13%) reported live births but lacked sufficient detail to be of use, either because numbers were not reported for each group separately, or because it was not clear how many women were in each group. Of studies not giving sufficient information on live birth, 11 studies reported usable data for ongoing pregnancy, a further 14 for clinical pregnancy, and a further two for pregnancy (unspecified). All studies providing sufficient data for comparison of biochemical pregnancy rate also reported sufficient data for a later clinical outcome. The remaining seven (18%) studies failed to report sufficient data to construct any valid comparison of clinical outcome between the groups. These seven studies comprised the six that failed to report the number of women analysed in each group plus one that reported ongoing, clinical and biochemical pregnancy rates per attempt as a percentage without giving either numerators or denominators.
|
Reporting of adverse events
A total of 34 (87%) studies reported on multiple pregnancy, miscarriage, ectopic pregnancy, or other adverse events, including 31 that reported on at least one of the three main pregnancy-related adverse events (Table II). Appropriate comparison of at least one pregnancy-related adverse event, the number of women experiencing the event divided by the total number of women, was reported in only one (3%) study. More usually, eight (21%) further studies reported at least one such outcome as a proportion of pregnancies, thereby allowing extraction of data for valid comparison. Seven (18%) studies reported at least one such outcome in a way that did not allow valid comparison or that left judgement of validity unclear.
|
![]() |
Discussion |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Concealment of allocation has received increased attention since an empirical study by Schulz et al. (1995) found it to be one of the quality criteria for randomized trials most strongly associated with bias. Their study of 250 trials in the field of pregnancy and childbirth found 32% to be adequately concealed and 18% stating a specific method of randomization. Our figures of 34 and 64% respectively suggest an improvement in reporting of randomization over recent years, but demonstrate that there is still a widespread lack of appreciation of the importance of reporting allocation concealment.
From experience of systematic reviews in subfertility (Farquhar et al., 1999) we had expected to identify more cross-over trials. It is a measure of methodological improvement that only one such study was identified. This trial was a multi-period, two-treatment cross-over design, sometimes referred to as an alternating sequence design, which has been the subject of debate in subfertility research (Norman and Daya, 2000
). The number of periods does not affect the methodological flaw inherent in applying the cross-over design to subfertility research. It is of concern that the identified trial, which was also seriously flawed in other respects, passed peer review.
There appeared to be little awareness of the central role of the ITT principle in randomized trials. Fewer than one in six subfertility trials mentioned the principle, and only two of the studies that appeared to have followed the principle stated their adherence. Hollis and Campbell (1999) identified similar misunderstanding of the principle in trial reports from general medical journals, but found that nearly half of published reports referred to it.
Errors of analysis and reporting are problematical, but unlike design errors may be corrected either from published reports or from returning to the original data. Unit of analysis errors are endemic in subfertility analyses. The regulatory authority in the UK (Human Fertilisation and Embryology Authority, 2000) publish inappropriate live birth rates per started cycle, per oocyte collection and per embryo transfer, but not appropriate per person rates. In this study 82% of trials contained at least one such error. However, only in seven (18%) trials was the error so central to the report that it would be necessary to request data from authors in order to obtain a valid comparison of clinical outcome for meta-analysis. Ecochard and Clayton (2000
) described analyses of hierarchical subfertility data from an epidemiological perspective, but these require sophisticated statistical models. Trialists can avoid statistical difficulty by randomizing each patient once and ensuring that all cited rates are per person.
Recommendations for simple guidelines specific to subfertility trials are outlined in Table III. These are additional to general guidelines for reporting described by CONSORT. From 39 studies (>7000 patients) in leading journals we identified only two trials (840 patients) that claimed to use concealed randomization, may have adhered to the ITT principle, and reported extractable values for either live birth or ongoing pregnancy rates. Adherence to our recommendations would eliminate the common flaws described in this review, giving clinicians, systematic reviewers and future patients more reason to trust the findings of clinical trials.
|
![]() |
FOOTNOTES |
---|
![]() |
Appendix 1. Included studies |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Ben-Chetrit, A., Eldar-Geva, T., Gal, M., Huerta, M., Mimon, T., Algur, N., Diamant, Y.Z. and Margalioth E.J. (2001) The questionable use of albumin for the prevention of ovarian hyperstimulation syndrome in an IVF programme: a randomized placebo-controlled trial. Hum. Reprod., 16, 18801884.
Boostanfar, R., Jain, J.K., Mishell, D.R. Jr and Paulson, R.J. (2001) A prospective randomized trial comparing clomiphene citrate with tamoxifen citrate for ovulation induction. Fertil. Steril., 75, 10241026.
Brown, S.E., Toner, J.P., Schnorr, J.A., Williams, S.C., Gibbons, W.E., de Ziegler, D. and Oehninger, S. (2001) Vaginal misoprostol enhances intrauterine insemination. Hum. Reprod., 16, 96101.
Carroll, N. and Palmer, J.R. (2001) A comparison of intrauterine versus intracervical insemination in fertile single women. Fertil. Steril., 75, 656660.
Costabile, L., Gerli, S., Manna, C., Rossetti, D., Di Renzo, G.C. and Unfer, V. (2001) A prospective randomized study comparing intramuscular progesterone and 17alpha-hydroxyprogesterone caproate in patients undergoing in vitro fertilization-embryo transfer cycles. Fertil. Steril., 76, 394396.
Dal Prato, L., Borini, A., Trevisi, M.R., Bonu, M.A., Sereni, E. and Flamigni, C. (2001) Effect of reduced dose of triptorelin at the start of ovarian stimulation on the outcome of IVF. Hum. Reprod., 16, 14091414.
De Placido, G., Mollo, A., Alviggi, C., Strina, I., Varricchio, M.T., Ranieri, A., Colacurci, N., Tolino, A. and Wilding, W. (2001) Rescue of IVF cycles by HMG in pituitary down-regulated normogonadotrophic young women characterized by a poor initial response to recombinant FSH. Hum. Reprod., 16, 18751879.
El Nour, A.M., Al Mayman, H.A., Jaroudi, K.A. and Coskun, S. (2001) Effects of the hypo-osmotic swelling test on the outcome of intracytoplasmic sperm injection for patients with only nonmotile spermatozoa available for injection. Fertil. Steril., 75, 480484.
Elkind Hirsch, K.E., Bello, S., Esparcia, L., Phillips, K., Sheiko, A. and McNichol, M. (2001) Serum human chorionic gonadotropin levels are correlated with body mass index rather than route of administration in women undergoing in vitro fertilizationembryo transfer using human menopausal gonadotropin and intracytoplasmic sperm injection. Fertil. Steril., 75, 700704.
European and Middle East Orgalutran Study Group (2001) Comparable clinical outcome using the GnRH antagonist ganirelix or a long protocol of the GnRH agonist triptorelin for the prevention of premature LH surges in women undergoing ovarian stimulation. Hum. Reprod., 16, 644651.
Fluker, M., Grifo, J., Leader, A., Levy, M., Meldrum, D., Muasher, S.J., Rinehart, J., Rosenwaks, Z., Scott, R.T. Jr, Schoolcraft, W. et al. (2001) Efficacy and safety of ganirelix acetate versus leuprolide acetate in women undergoing controlled ovarian hyperstimulation. Fertil. Steril., 75, 3845.
Fujii, S., Sato, S., Fukui, A., Kimura, H., Kasai, G. and Saito, Y. (2001) Continuous administration of gonadotrophin-releasing hormone agonist during the luteal phase in IVF. Hum. Reprod., 16, 16711675.
Gabrielsen, A., Lindenberg, S. and Petersen, K. (2001) The impact of the zona pellucida thickness variation of human embryos on pregnancy outcome in relation to suboptimal embryo development. A prospective randomized controlled study. Hum. Reprod., 16, 21662170.
Harrison, R.F., Jacob, S., Spillane, H., Mallon, E. and Hennelly, B. (2001) A prospective randomized clinical trial of differing starter doses of recombinant follicle-stimulating hormone (follitropin-beta) for first time in vitro fertilization and intracytoplasmic sperm injection treatment cycles. Fertil. Steril., 75, 2331.
Ingerslev, H.J., Hojgaard, A., Hindkjaer, J. and Kesmodel, U. (2001) A randomized study comparing IVF in the unstimulated cycle with IVF following clomiphene citrate. Hum. Reprod., 16, 696702.
International Recombinant Human Chorionic Gonadotropin Study Group (2001) Induction of ovulation in World Health Organization group II anovulatory women undergoing follicular stimulation with recombinant human follicle-stimulating hormone. Fertil. Steril., 75, 11111118.
Kuczynski, W., Dhont, M., Grygoruk, C., Grochowski, D., Wolczynski, S. and Szamatowicz, M. (2001) The outcome of intracytoplasmic injection of fresh and cryopreserved ejaculated spermatozoaa prospective randomized study. Hum. Reprod., 16, 21092113.
Laverge, H., De Sutter, P., Van der Elst, J. and Dhont, M. (2001) A prospective, randomized study comparing day 2 and day 3 embryo transfer in human IVF. Hum. Reprod., 16, 476480.
Martikainen, H., Tiitinen, A., Tomás, C., Tapanainen, J., Orava, M., Tuomivaara, L., Vilska, S., Hydén-Granskog, C. and Hovatta, O. (2001) One versus two embryo transfer after IVF and ICSI: a randomized study. Hum. Reprod., 16, 19001903.
Martinez, F., Coroleu, B., Parriego, M., Carreras, O., Belil, I., Parera, N., Hereter, L., Buxaderas, R. and Barri, P.N. (2001) Ultrasound-guided embryo transfer. Hum. Reprod., 16, 871874.
Ng, E.H., Lau, E.Y., Yeung, W.S. and Ho, P.C. (2001) HMG is as good as recombinant human FSH in terms of oocyte and embryo quality. Hum. Reprod., 16, 319325.
Out, H.J., David, I., Ron El, R., Friedler, S., Shalev, E., Geslevich, J., Dor, J., Shulman, A., Ben Rafael, Z., Fisch, B. et al. (2001) A randomized, double-blind clinical trial using fixed daily doses of 100 or 200 IU of recombinant FSH in ICSI cycles. Hum. Reprod., 16, 11041109.
Pellicano, M., Zullo, F., Fiorentino, A., Tommaselli, G.A., Palomba, S. and Nappi, C. (2001) Conscious sedation versus general anaesthesia for minilaparoscopic gamete intra-Fallopian transfer: a prospective randomized study. Hum. Reprod., 16, 22952297.
Propst, A.M., Hill, J.A., Ginsburg, E.S., Hurwitz, S., Politch, J. and Yanushpolsky, E.H. (2001) A randomized study comparing Crinone 8% and intramuscular progesterone supplementation in in vitro fertilization-embryo transfer cycles. Fertil. Steril., 76, 11441149.
Ragni, G., Vegetti, W., Baroni, E., Colombo, M., Arnoldi, M., Lombroso, G. and Crosignani, P.G. (2001) Comparison of luteal phase profile in gonadotrophin stimulated cycles with or without a gonadotrophin-releasing hormone antagonist. Hum. Reprod., 16, 22582262.
Ricci, G., Nucera, G., Pozzobon, C., Boscolo, R., Giolo, E. and Guaschino, S. (2001) A simple method for fallopian tube sperm perfusion using a blocking device in the treatment of unexplained infertility. Fertil. Steril., 76, 12421248.
Strehler, E., Abt, M., El Danasouri, I., De Santo, M. and Sterzik, K. (2001) Impact of recombinant follicle-stimulating hormone and human menopausal gonadotropins on in vitro fertilization outcome. Fertil. Steril., 75, 332336.
Takeuchi, S., Minoura, H., Shibahara, T., Tsuiki, Y., Noritaka, F. and Toyoda, N. (2001) A prospective randomized comparison of routine buserelin acetate and a decreasing dosage of nafarelin acetate with a low-dose gonadotropin-releasing hormone agonist protocol for in vitro fertilization and intracytoplasmic sperm injection. Fertil. Steril., 76, 532537.
Tang, O.S., Ng, E.H.Y., So, W.W.K. and Ho, P.C. (2001) Ultrasound-guided embryo transfer: a prospective randomized controlled trial. Hum. Reprod., 16, 23102315.
The Latin American Puregon IVF Study Group (2001) A double-blind clinical trial comparing a fixed daily dose of 150 and 250 IU of recombinant follicle-stimulating hormone in women undergoing in vitro fertilization. Fertil. Steril., 76, 950956.
Van Langendonckt, A., Demylle, D., Wyns, C., Nisolle, M. and Donnez, J. (2001) Comparison of G1.2/G2.2 and Sydney IVF cleavage/blastocyst sequential media for the culture of human embryos: a prospective, randomized, comparative study. Fertil. Steril., 76, 10231031.
Vandermolen, D.T., Ratts, V.S., Evans, W.S., Stovall, D.W., Kauma, S.W. and Nestler, J.E. (2001) Metformin increases the ovulatory rate and pregnancy rate from clomiphene citrate in patients with polycystic ovary syndrome who are resistant to clomiphene citrate alone. Fertil. Steril., 75, 310315.
Westergaard, L.G., Erb, K., Laursen, S.B., Rex, S. and Rasmussen, P.E. (2001) Human menopausal gonadotropin versus recombinant follicle-stimulating hormone in normogonadotropic women down-regulated with a gonadotropin-releasing hormone agonist who were undergoing in vitro fertilization and intracytoplasmic sperm injection. Fertil. Steril., 76, 543549.
Wikland, M., Bergh, C., Borg, K., Hillensjo, T., Howles, C.M., Knutsson, A., Nilsson, L. and Wood, M. (2001) A prospective, randomized comparison of two starting doses of recombinant FSH in combination with cetrorelix in women undergoing ovarian stimulation for IVF/ICSI. Hum. Reprod., 16, 16761681.
Williams, S.C., Oehninger, S., Gibbons, W.E., Van Cleave, W.C. and Muasher, S.J. (2001) Delaying the initiation of progesterone supplementation results in decreased pregnancy rates after in vitro fertilization: a randomized, prospective study. Fertil. Steril., 76, 11401143.
Wolf, D.P., Patton, P.E., Burry, K.A. and Kaplan, P.F. (2001) Intrauterine insemination-ready versus conventional semen cryopreservation for donor insemination. Fertil. Steril., 76, 181185.
Yim, S.F., Lok, I.H., Cheung, L.P., Briton Jones, C.M., Chiu, T.T. and Haines, C.J. (2001) Dose-finding study for the use of long-acting gonadotrophin-releasing hormone analogues prior to ovarian stimulation for IVF. Hum. Reprod., 16, 492494.
Zollner, U., Zollner, K.P., Dietl, J. and Steck, T. (2001) Semen sample collection in medium enhances the implantation rate following ICSI in patients with severe oligoasthenoteratozoospermia. Hum. Reprod., 16, 11101114.
![]() |
References |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Daya, S. (1993) Is there a place for the crossover design in infertility trials? Fertil. Steril., 59, 67.[CrossRef][ISI][Medline]
Ecochard, R. and Clayton, D.G. (2000) Multivariate parametric random effect regression models for fecundability studies. Biometrics, 56, 10231029.[ISI][Medline]
Farquhar, C.M., Prentice, A., Barlow, D., Evers, H., Vandekerckhove, P. and Vail, A. (1999) Effective treatment of subfertility: introducing the Cochrane menstrual disorders and subfertility group. Hum. Reprod., 14, 16781683.[CrossRef][ISI][Medline]
Hollis, S. and Campbell, F. (1999) What is meant by intention to treat analysis? Survey of published randomised controlled trials. BMJ, 319, 670674.
Human Fertilisation and Embryology Authority. (2000) The patients guide to IVF clinics. HFEA, London, UK.
Moher, D., Schulz, K.F. and Altman, D. (2001a) The CONSORT statement: revised recommendations for improving the quality of reports of parallel-group randomized trials. Lancet, 357, 11911194.
Moher, D., Jones, A., and Lepage, L. (2001b) Use of the CONSORT statement and quality of reports of randomized trials: a comparative before-and-after evaluation. JAMA, 285, 19921995.[CrossRef][ISI][Medline]
Norman, G. and Daya, S. (2000) The alternating-sequence design (or multiple-period crossover) trial for evaluating treatment efficacy in infertility. Fertil. Steril., 74, 319324.
Peto, R. (1999) Failure of randomisation by "sealed" envelope. Lancet, 354, 73.[CrossRef][ISI][Medline]
Schulz, K.F., Chalmers, I., Hayes, R.J. and Altman, D.G. (1995) Empirical evidence of bias: dimensions of methodological quality associated with estimates of treatment effects in controlled trials. JAMA, 273, 408412.
Senn, S. (1993) Cross-over trials in clinical research. Wiley, Chichester, UK.[Abstract]
Submitted on October 10, 2002; accepted on November 27, 2002.