1 Department of Public Health and 2 Department of Obstetrics and Gynecology, Division of Reproductive Medicine, Erasmus MC University Medical Center Rotterdam, PO Box 1738, 3000 DR Rotterdam and 3 Department of Reproductive Medicine, University Medical Center Utrecht, PO Box 85500, 3508 GA Utrecht, The Netherlands
4 To whom correspondence should be addressed. Email: c.hunault{at}erasmusmc.nl
![]() |
Abstract |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Key words: PCT/prediction model/subfertility/treatment-independent pregnancy/validation
![]() |
Introduction |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
We have previously developed two models to improve the prediction of treatment-independent pregnancy (Hunault et al., 2004). These models were based on three previous studies and are therefore called synthesis models. The population in which the two models were developed included couples consulting for various forms of subfertility (unexplained subfertility, subfertility due to cervical hostility or to a mild male factor), and referred by a general practitioner or by a gynaecologist. The first model includes the following predictors: the woman's age, duration of subfertility, type of subfertility (primary or secondary), percentage of motile sperm cells and referral status of the couple. The second model includes the same predictors, plus the result of the best post-coital test (PCT). In clinical practice, such a model could be used to categorize a couple as having a poor, intermediate or good chance of conceiving without treatment. If the chance is poor, the couple should be advised to undergo treatment. If the chance is high, the couple should be encouraged to wait for treatment. If the chance is intermediate, the advice could be driven by the preferences of the couple concerning effectiveness, costs and risks of treatment.
The internal validity of the models has been found to be satisfactory, but an internally validated model can easily produce poor predictions in future patients or in patients from other centres (Justice et al., 1999). The aim of the present study was to validate externally the two treatment-independent pregnancy prediction models, i.e. to assess whether these models predict well in a sample of subfertile patients different from the sample of patients used to develop the models.
![]() |
Subjects and methods |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
The standardized initial screening included the clinical examination of both partners, a (i.e. the first) semen sample analysed according to WHO criteria (World Health Organization, 1999), recording of a basal body temperature chart, a mid-luteal progesterone determination, a PCT, a transvaginal ultrasound and serum Chlamydia antibody testing. A hysterosalpingography or a laparoscopy with tubal patency testing was performed if Chlamydia antibodies were present or in the case of risk factors for tubal pathology (ectopic pregnancy or abdominal surgery history).
Three hundred and two couples from the Rotterdam and Utrecht University hospitals were enrolled prospectively in the study between January 1998 and August 2002. Inclusion criteria were: (i) woman's age <40 years; (ii) duration of subfertility of 1 year; (iii) cycle duration >21 and <35 days; (iv) normal physical examination [no body shape and stature suggesting Turner's syndrome, body mass index (BMI) <30 kg/m2, normal secondary sexual characteristics, no abnormal findings on pelvic and gynaecological examination] and ultrasonography (no uterus abnormalities); (v) serum FSH concentrations within normal limits (110 IU/l); (vi) normal mid-luteal serum progesterone (
28 nmol/l); and (vii) subfertility due to mild male, cervical or unexplained subfertility. Mild male factor was defined as a total motile count of at least 7 x 106. Semen analysis was considered normal if sperm concentration was >14 x 106/ml, if grade A progressive motility was >18% and if the percentage of normal morphology was >8% (strict Kruger criteria, Ombelet et al., 1997
). The PCT was considered as positive if on average one progressively moving spermatozoon was found in at least six high power fields (World Health Organization, 1999
). In case of a negative result, timing of the PCT was done using transvaginal ultrasound. Subfertility was attributed to cervical hostility if a correctly timed PCT revealed no progressive motile spermatozoa in optimal cervical mucus in combination with normal semen samples, or if PCT was repeatedly negative regardless of the condition of the cervical mucus (World Health Organization, 1999
). The diagnosis of unexplained subfertility was made when all investigations were normal. Couples with uni- and/or bilateral tubal disease, ovulatory disorder (abnormal serum progesterone in the mid-luteal phase) or endocrine disorders (abnormal prolactin or thyroid malfunction) or males with azoospermia were excluded. In summary, the inclusion and exclusion criteria of the population in which the models were validated were the same as those of the population in which the models were developed, except the semen criteria, which were stricter in the validation sample: in the development sample, only men with azoospermia were excluded, whereas men with severe male factor were also excluded in the validation sample.
All patient characteristics were collected prospectively: the woman's age, duration of subfertility, type of subfertility (primary or secondary), percentage of motile sperm in the first semen analysis, result of the best PCT during the initial screening and referral status (whether the couple was referred by a general practitioner or by another gynaecologist). The following definitions were used. Duration of subfertility, the interval in years from discontinuation of contraceptive activities until registration at the fertility centre; primary subfertility, women who never conceived; secondary subfertility, subfertility after prior conception for the women; and live birth, living child at the time of hospital discharge after parturition. The number of observation months of couples was counted until either conception leading to live birth, or treatment was started, or because the study stopped before the end of their follow-up.
Analysis
Differences in couple characteristics between the validation sample and the original sample that provided the model were tested by KruskalWallis test for continuous variables and 2-test for categorical variables. The prognostic effects of the patient characteristics included in the model were studied in the validation sample and expressed as hazard ratios for live birth, using a multivariable model.
The synthesis models we aimed to validate are Cox models predicting the chance of treatment-independent pregnancy leading to live birth within 1 year after inclusion (Hunault et al., 2004). The model without PCT has been developed using data on 2459 couples obtained by pooling the data of three studies (Eimers et al., 1994
; Collins et al., 1995
; Snick et al., 1997
). The model with PCT is based on the data of two studies (those of Eimers et al. and Snick et al.) since the PCT was not investigated in the third study (that of Collins et al.). The formulae of the models are given in the Appendix. The probability of live birth was calculated for each couple of the validation sample, according to both models.
The calibration and the discrimination of the models were assessed to test the validity of the model in the validation sample. Calibration refers to the agreement between predicted and observed probabilities of treatment-independent pregnancies, whereas discrimination is the model's ability to distinguish between the women who became pregnant and those who did not.
Calibration was assessed graphically by plotting the observed 1 year live birth rate against the predicted one year live birth probability in a calibration plot (Miller et al., 1993). We statistically tested whether the mean predicted and observed probabilities of pregnancy leading to live birth were different. Furthermore, we tested whether the predictions were too extreme (too low estimates for low probabilities and too high estimates for high probabilities), and whether the observed and predicted ongoing pregnancy rates were systematically different (Harrell et al., 1996
). The discriminative ability of the model was quantified by the c statistic, which is equivalent to an area under the receiver operating characteristic (ROC) curve. A c statistic ranges from 0.5 (no discriminative power) to 1 (perfect discrimination). The c statistic is the probability that from a random pair of women, the one with the highest predicted probability of treatment-independent pregnancy leading to live birth will be the first to succeed.
In order to assess and compare the clinical usefulness of the two models, the patients of the validation sample were grouped into three categories of predicted chances of treatment-independent pregnancy leading to live birth within 1 year, <20, 2040 and 40%. Clinical usefulness of a model was expressed as the percentage of patients assigned by the model to the two extreme categories.
Calculations were performed using commercially available software packages (SPSS Inc., Chicago, IL, 1999 and S-plus 2000, MathSoft Inc., Seattle, WA, version 2000). A P-value <0.05 was considered to indicate statistical significance.
![]() |
Results |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
|
The c statistic was 0.59 (95% CI: 0.460.73) and 0.63 (95% CI: 0.510.75) for the synthesis models without and with PCT, respectively, when used in the validation sample. The two c statistics differed statistically (P=0.04). Figure 1 shows that both models were well calibrated. On average, the observed probabilities were closest to the ideal diagonal line for the model with PCT. The mean predicted and observed probabilities of live birth did not differ significantly for the models without and with PCT (P=0.3 and 0.6, respectively). The predictions were not statistically too extreme (neither too low estimates for low probability patients, nor too high estimates for high probability patients), and no systematic difference was observed between observed and predicted pregnancy rates (P=0.13 for the model without PCT and P=0.6 for the model with PCT).
|
|
![]() |
Discussion |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
The discriminative ability was slightly lower in the validation sample than in the data of the three studies used to develop the models. In the latter, the c statistic varied between 0.59 and 0.64 for the model without PCT and between 0.64 and 0.67 for the model with PCT after internal validation (Hunault et al., 2004). The lower c statistics observed in the validation sample could be due to the fact that the validation sample is a more homogeneous group with patients having less extreme chances of pregnancy without treatment (predicted chance of treatment-independent pregnancy ranging between 5 and 68%, SD = 13 in the validation sample compared with predicted chance ranging between 1 and 75%, SD = 14 in the development sample).
PCT is an important predictor of treatment-independent pregnancy in this sample of patients. This result is interesting since the way in which the PCT is performed in one of the two study centres has changed in the last years. The effect of the result of the PCT in our model has been estimated using data from the study of Eimers et al. (1994) and that of Snick et al. (1997)
. In the study of Eimers et al., the PCT was performed in the fertility laboratory whereas it is currently performed by the clinicians (senior or junior residents). In the study of Snick et al., the PCT was performed by one of the four experienced gynaecologists of the peripheral hospital. The prognostic power of the PCT has been established previously for couples with duration of subfertility <3 years (Glazener et al., 2000
), i.e. 80% of our validation sample. The repeated finding that the PCT is an important predictor suggests that the level of experience of the person performing the PCT does not have an effect.
Currently, various effective treatment modalities are available. In our validation sample, treatment was often started early, also for patients who still had a good chance of treatment-independent pregnancy, even in the centre with a long-standing history of use of clinical prediction models (the Utrecht clinic). Among the 27 patients with a predicted probability of 50% according to the model with PCT, 52% started a treatment within 6 months after intake (79% in the Utrecht clinic and 21% in the Rotterdam clinic). These 77 couples had a median duration of subfertility of 1.6 years, a median woman's age of 29 years and a median sperm motility of 60%. Eighty-five percent of them were referred by a general practitioner and had a secondary subfertility. The PCT was normal in all cases. Because of the high percentage of treatment initiated within the first year, few treatment-independent live births were conceived in 1 year. The statistical power of Cox analysis is related to the number of events (45 treatment-independent pregnancies leading to live birth in this study) so the fact that no significant lack of fit (calibration) of the model was detected does not mean that calibration was perfect. The calibration of the model should be confirmed in a study with a larger number of couples.
Could the use of the models improve the counselling of couples in comparison with the actual IUI and IVF guidelines of the Dutch Society of Obstetrics and Gynaecology (Dutch acronym: NVOG; www.nvog.nl)? According to these guidelines, IUIand eventually IVFtreatments are offered to patients with unexplained subfertility according to the woman's age and the duration of infertility. We categorized the patients from the validation sample without missing values for the predictors of the models into two groups, patients who should be treated immediately and patients who should have an expectant management, according to the criteria of the Dutch IUI and IVF guidelines (see Table III). Within the group who should be treated immediately, 10% of the patients had a predicted probability of treatment-independent live birth >40% according to the model including PCT. In the group who should have expectant management, 11% of the patients had a predicted probability of treatment-independent live birth <20%. Moreover, about half of the patients fall into the intermediate class, in which patient preferences and counselling are particularly important. These findings suggest that use of the models may be valuable in clinical practice in addition to a guideline like the Dutch one. The patients with a predicted probability of <20% had a median duration of subfertility of 3 years, a median woman's age of 33 years and a median sperm motility of 35%. Forty-eight percent of them were referred by a general practitioner and 19% had a secondary subfertility. The PCT was normal in 25% of the cases. The patients with a predicted probability of >40% had a median duration of subfertility of 1.7 years, a median woman's age of 30 years and a median sperm motility of 54%. Eighty-one percent of them were referred by a general practitioner and 62% had a secondary subfertility. The PCT was normal in all cases.
|
If the models are used as a tool in counselling, the model with PCT is more useful than the model without PCT since the poor (<20%) and good (>40%) prognosis categories applied to more patients (52 versus 36%). The study has several implications for clinical patient practice. Only six readily available patients characteristics are necessary to use the model with PCT [woman's age, duration of subfertility, type of subfertility (primary or secondary), referral status of the couple, progressive motility from the first semen analysis and result of the first correctly timed PCT]. The models apply to couples with subfertility due to unexplained reasons, cervical hostility and mild male factor. They have a broad basis of underlying patient populations and provide reliable predictions. Using these models would be useful for identifying those couples in which the treatment-independent chance of live birth is >40%. These couples should be strongly encouraged to restrain from any assisted reproduction treatment (ART) programme in the near future. These models might, furthermore, facilitate a more balanced choice of ART in those couples with lower chances of treatment-independent live birth.
![]() |
Appendix |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
![]() |
The predicted probability (P) of treatment-independent pregnancy within 1 year after intake leading to live birth according to the synthesis model excluding the PCT result is:
![]() |
The formula of the synthesis model with PCT is:
![]() |
AGE1 is the woman's age if the age is 31 years and 31 years if the age is >31 years; AGE2 is the difference (woman's age31 years) if the woman's age is >31 years and zero otherwise; a tertiary-care couple is a couple referred by a gynaecologist. Duration of subfertility is measured in years. For primary subfertility, tertiary couple and abnormal PCT, the value is 1 if true, 0 if not true.
The result of the PCT in the initial cycle was coded as abnormal when no forward-moving sperm cell was found in the whole mucus sample.
![]() |
Acknowledgements |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
![]() |
References |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Eimers JM, te Velde ER, Gerritse R, Vogelzang ET, Looman CW and Habbema JD (1994) The prediction of the chance to conceive in subfertile couples. Fertil Steril 61, 4452.[ISI][Medline]
ESHRE Capri Workshop Group (2000) Multiple gestation pregnancy. Hum Reprod 15, 18561864.
Glazener CM, Ford WC and Hull MG (2000) The prognostic power of the post-coital test for natural conception depends on duration of infertility. Hum Reprod 15, 19531957.
Hansen M, Kurinczuk JJ, Bower C and Webb S (2002) The risk of major birth defects after intracytoplasmic sperm injection and in vitro fertilization. N Engl J Med 346, 725730.
Harrell FE, Jr, Lee KL and Mark DB (1996) Multivariable prognostic models: issues in developing models, evaluating assumptions and adequacy, and measuring and reducing errors. Stat Med 15, 361387.[CrossRef][ISI][Medline]
Hunault CC, Eijkemans MJC, te Velde ER, Collins JA, Evers JLH and Habbema JDF (2004) Two new prediction rules for spontaneous pregnancy leading to live birth among subfertile couples, based on the synthesis of three models. Hum Reprod 19, 20192026.
Jones HW (2003) Multiple births: how are we doing? Fertil Steril 79, 1721.[CrossRef][ISI][Medline]
Justice AC, Covinsky KE and Berlin JA (1999) Assessing the generalizability of prognostic information. Ann Intern Med 130, 515524.
Miller ME, Langefeld CD, Tierney WM, Hui Sl and McDonald CJ (1993) Validation of probabilistic predictions. Med Decis Making 13, 4958.[ISI][Medline]
Moll AC, Imhof SM, Cruysberg JRM, Schouten-van Meeteren AYN, Boers M and van Leeuwen FE (2003) Incidence of retinoblastoma in children born after in-vitro fertilisation. Lancet 361, 309310.[CrossRef][ISI][Medline]
Ombelet W, Bosmans E, Janssen M, Cox A, Vlasselaer J, Gyselaers W, Vandeput H, Gielen J, Pollet H, Maes M et al. (1997) Semen parameters in a fertile versus subfertile population: a need for change in the interpretation of semen testing. Hum Reprod 12, 987993.[CrossRef][ISI][Medline]
Snick HK, Snick TS, Evers JL and Collins JA (1997) The spontaneous pregnancy prognosis in untreated subfertile couples: the Walcheren primary care study. Hum Reprod 12, 15821588.[Abstract]
Stromberg B, Dahlquist G, Ericson A, Finnstrom O, Koster M and Stjernqvist K (2002) Neurological sequelae in children born after in-vitro fertilisation: a population-based study. Lancet 359, 461465.[CrossRef][ISI][Medline]
World Health Organization (1999) WHO Laboratory Manual for the Examination of Human Semen and SpermCervical Mucus Interaction, 4th edn. Cambridge University Press, Cambridge.
Submitted on November 25, 2004; resubmitted on January 27, 2005; accepted on January 27, 2005.
|