1 Department of Public Health, Erasmus MC, PO box 1738, 3000 DR, Rotterdam, The Netherlands, 2 Faculty of Health Sciences, McMaster University, Hamilton Ontario, Canada, 3 Department of Obstetrics & Gynaecology, University Hospital, Maastricht, The Netherlands and 4 Department of Reproductive Medicine, Division of Perinatology and Gynecology, University Medical Center Utrecht, the Netherlands
5 To whom correspondence should be addressed at: Department of Public Health, Erasmus MC Rotterdam, PO box 1738, 3000 DR Rotterdam, the Netherlands. Fax: 31 10 4089455; Email: c.hunault{at}erasmusmc.nl
![]() |
Abstract |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Key words: post-coital test/prediction model/spontaneous pregnancy/subfertility/synthesis
![]() |
Introduction |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Several models for predicting spontaneous pregnancy have been developed (Comhaire, 1987; Eimers et al., 1994
; Wichmann et al., 1994
; Collins et al., 1995
; Snick et al., 1997
). We selected the three studies in which data from both partners were collected prospectively and in which the dependency of the predictors was corrected for by multivariable analysis (Eimers et al., 1994
; Collins et al., 1995
; Snick et al., 1997
). In the present study the data of the three selected studies were pooled to form a new data set.
Because the spontaneous pregnancy rate was found to be higher for subfertile couples referred by a general practitioner to a secondary centre than for couples referred by a gynaecologist to a tertiary centre (Wouts et al., 1987; Snick et al., 1997
), we assessed the importance of the care setting as a potential independent predictor for spontaneous pregnancy.
The aim of this study was to develop one or more prediction models, which more reliably predict the individual chance of pregnancy in subfertile couples and have a broader empirical base than the three individual models.
![]() |
Materials and methods |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
After these exclusions, the data set contained 996 couples from the Eimers study, 1061 couples from the Collins study and 402 couples from the Snick study, resulting in a total of 2459 couples. The pooled data set was used for the analysis of the present study. The study was approved by each institution's research ethics review board.
Definitions and modifications of the predictive and outcome variables of the original models, and the construction of two synthesis models
The definitions of the predictors as used in the three models have been described in the original publications (Eimers et al., 1994; Collins et al., 1995
; Snick et al., 1997
) and are summarized in Table I. In all three studies, semen samples were collected and analysed according to the available standards of the World Health Organization (World Health Organization, 1980
, 1987
). The duration of subfertility was defined by the time interval from discontinuation of contraceptive activities until registration at the fertility centre. Primary and secondary subfertility were defined as subfertility without and with a previous pregnancy respectively.
|
The pooled individual data from the three samples were used to construct two synthesis models (a thee-sample and a two-sample model) after modifying some of the predictors in order to make the data sets of the three original studies compatible. In the three-sample synthesis model, only variables were included which were available in all three samples. They include the duration and type of subfertility (primary or secondary), the woman's age and the percentage of motile sperm. In the Snick sample the percentage of motile sperm was not present in the database and therefore the percentage of progressive motile sperm had to be converted into the percentage of motile sperm. A linear model linking these two semen parameters derived from the Eimers data, was used for this conversion. The effect of the woman's age was modelled as a continuous declining fertility function, with a more rapid decline after 31 years (Van Noord-Zaadstra et al., 1991). Four patients had to be excluded because two or more of the predictors were missing. For 104 patients (4%) with only one missing predictor, the missing values were imputed (filled in), based on the correlation with other predictors (Little, 1992
). Imputation is a better method for handling missing data than simply excluding them, provided that certain conditions are met (Harrell, 2001
). We used the so-called Expectation Maximization method to estimate missing values by an iterative process (SPSS Inc., Chicago, IL). In all three data sets, the period of time (in months) couples were observed, either until a conception leading to live birth or treatment was started or until the end of the follow-up period, was available. Live birth was defined as a child still living 1 week after birth. A pregnancy leading to live birth within 1 year after intake, was taken as the outcome variable for both synthesis models.
In addition, we studied whether the referral status of the couple had a significant relation with the outcome after correcting for the other included predictors. Referral status indicates whether the couple was referred by a general practitioner (secondary-care couple) or by a gynaecologist (tertiary-care couple). All patients from the Snick study were secondary-care couples and all couples from the Collins study were considered as tertiary-care couples. Only the Eimers study included both secondary-care and tertiary-care couples and was therefore used to estimate the effect of the referral status.
The result of the PCT was not available in the Collins sample and therefore could not be included in the three-sample synthesis model. Therefore, we also developed a two-sample synthesis model including the result of the PCT based on the Snick and Eimers data sets. The PCT was scored in three categories in the Eimers model and in two categories in the Snick model (Table I). Therefore, we transformed the three categories of the Eimers patients into two categories by combining the categories positive non-progressive and negative (abnormal) and contrasted them with positive progressive (normal) according to the Snick qualifications (see Table I). All other predictors in the two-sample synthesis model were the same as in the three-sample synthesis model.
For the sake of validation we also constructed three one-sample models (both with and without PCT) and three two-sample models (SnickEimers, SnickCollins, EimersCollins both with and without referral status; the SnickEimers model also with PCT) to be able to perform the jack-knife analysis (see later) using the modified predictors from the three different data sets. We also analyzed whether the modifications necessary to make the data sets compatible, changed the discriminative ability of the three original models. Two score charts were constructed for easy application of the two models.
Any model based on a sample containing its own patients, will tend to give too sharp predictions when applied to other patients (Steyerberg et al., 2001a). We corrected for overoptimism in the newly developed synthesis models by applying a shrinkage factor to the coefficients (Harrell et al., 1996
).
Descriptive analyses
Differences in couple characteristics between the three samples were tested by KruskalWallis test or chi square test (Altman, 1997). The effects of the predictors were compared between the three samples and expressed as fecundity ratios, which are equivalent to the hazard ratios in survival analysis. Differences in fecundity ratios and in spontaneous pregnancy chances between the samples were tested using multivariable Cox analyses (Altman, 1997
).
Performance measurement
How good are the probabilistic predictions of 1 year pregnancy prospects of the two synthesis models? To obtain an unbiased estimate of their performance they should be validated in samples which were not used for their construction. We therefore applied the jack-knife principle to estimate the performance of the three-sample synthesis model, as follows. We developed a two-sample model from two of the three samples and assessed its performance in the third truly independent sample. We repeated this procedure three times for each of the two-sample combinations. For the two-sample synthesis model with PCT, this procedure was not possible. Instead, we cross-validated the Eimers PCT model on the Snick sample and vice versa, and compared the performance of these models with the performance of the models without PCT.
Performance was measured by assessing the ability of the model to distinguish between women who became pregnant and those who did not (discrimination) and by assessing the agreement between the observed and the predicted probabilities of pregnancy (reliability). We applied three performance measures. The c-statistic or area under the receiver operating characteristic curve (AUC) was used for assessing discrimination (Harrell et al., 1996). The c-statistic is the probability that from a random pair of women, the woman who first becomes pregnant had a higher predicted probability of spontaneous pregnancy.
The reliability ratio (or calibration slope) assesses reliability (Steyerberg et al., 2001b). A ratio of 1 indicates a perfect calibration of the joint effect of the predictors included in the model. With a ratio smaller than 1, high probability predictions are too high and low probability predictions are too low, and for a ratio greater than 1 the bias is the other way round.
The third measure assesses overall reliability of the predictions. It measures the difference in overall predicted spontaneous pregnancy rate (SPR) between the tested model and a reference model. Ideally, there is no difference (0%). In our study, the reference is always the one-sample model on its own sample.
Calculations were performed with SPSS (SPSS Inc., Chicago, IL) and S-plus (MathSoft Inc., Seattle, WA) programs.
![]() |
Results |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
The cumulative rate of spontaneous pregnancy leading to live birth within 1 year also differed significantly between the three samples (37%, 24% and 18%, respectively for Snick, Eimers and Collins, P<0.001). The referral status of the couple appeared to be an independent predictor for spontaneous pregnancy after adjusting for the other characteristics (P=0.001), and was therefore included in the synthesis models.
In the pooled data of the three-sample synthesis model, the age of the woman and the duration of subfertility had an adverse effect on fecundity (adjusted fecundity ratios/year=0.95, 95% CI 0.930.98 and 0.83, 95% CI 0.780.88, respectively). Secondary female subfertility (as compared to primary) increased the chance of spontaneous pregnancy (adjusted fecundity ratio=1.79, 95% CI 1.462.19). Sperm motility increased pregnancy chances by 8% for every 10% motility increase (adjusted fecundity ratio=1.08, 95% CI 1.041.13). The estimates for the two-sample SnickEimers model of the above-mentioned variables which are used for the two-sample synthesis model with PCT, are comparable. According to the two-sample synthesis model with PCT, couples with a normal PCT had a two to three times higher chance of spontaneous pregnancy leading to live birth than couples with an abnormal PCT (adjusted fecundity ratio=2.6, 95% CI: 2.03.4).
It appeared that the modifications in the predictors of the one-sample models as required for pooling did not change the discriminative ability of the three original models very much: the c-statistics of the one-sample models were almost identical for the Eimers and Collins models (c=0.69 and 0.66, respectively) and slightly improved for the Snick model (c=0.64 instead of 0.62).
Table II summarizes the predictive performance of the various one- and two-sample models without PCT and of the final three-sample synthesis model. For the jack-knife evaluation of the synthesis model, one-sample models and all possible two-sample combinations were applied to the third truly independent sample and compared to the performance of the one-sample model when applied to its own sample, the performance of which can be considered as the reference (the highest possible performance to be expected). As expected, the performance of the one-sample models in the truly independent samples considerably decreased compared to the reference measures. However, when comparing the performance of the two-sample models in the independent sample to that of the one-sample models, the two reliability measures improved, while discrimination remained about the same. When referral status was included in the two-sample models the difference in SPR again improved. The performance of the three-sample synthesis model was, as expected, better than two-sample models because the individual samples were used to construct the prediction rule and to assess its performance. Both the discriminative ability and the reliability improved in the three-sample model and approached the reference measures.
|
|
|
|
|
|
![]() |
Discussion |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Before we were able to construct both synthesis models, we had to make the datasets compatible. First, we excluded the patients with an ovulation disorder, tubal pathology and azoospermia in the Snick and Collins samples, because such patients were not included in the Eimers sample. The prognostic variables selected in the original models slightly differed (Table I). Because the senior authors of the three originally published models took part in the present study, we could make use of the original data sets and made them compatible with one another by slight modifications. In this way the pooled data could be used now as one new data set.
The care setting between the three samples differed considerably and we found that the referral status appeared to be an independent predictor: after correcting for all other variables, a couple referred by a family doctor to a gynaecologist in a secondary centre appeared to have a better chance to conceive spontaneously than a couple referred by a gynaecologist to a tertiary centre. Apparently, apart from all known variables, there must be some concealed selection, which is reflected by the type of referral status. Apparently, some patients referred by the general practitioner to a gynaecologist of a secondary centre became pregnant (either spontaneously or after treatment) before they could be referred to a tertiary centre. Therefore, we included referral status in the newly developed synthesis models.
How to decide whether or not our synthesis models performed better or worse than the original ones? Ideally, such validation should be performed prospectively in a different population. For reasons already mentioned, to perform such a study at the present time in a large population with a sufficiently long follow-up, is almost impossible. We tried several times, so far without success. Another way to assess the performance of the synthesis models in the future, is to apply the jack-knife principle (see Table II). We reasoned that if the two-sample models performed systematically better than the one-sample models, it is reasonable to assume that the three-sample synthesis model would still perform better than the two-sample models. Indeed there was a trend of better performance when comparing the two-sample models with the one-sample models. Especially the two reliability measures improved. Moreover, when adding the referral status, the performance of the two-sample models further improved the predictive performance, especially for the second reliability measure. Apart from the results of the jack-knife analysis, there is another argument in favour of the three-sample synthesis models. It is based on data derived from three different settings in two different parts of the world and collected under different circumstances. Therefore, it has a broader empirical base than the three original models.
We were not able to validate the synthesis model with PCT by the jack-knife method, because the result of the PCT was only available in two of the three databases. This is unfortunate, because the data of Table III clearly demonstrate that performance, especially the discriminative power, greatly improves when adding the PCT. However, the external validation of Eimers in Snick and vice versa is quite good, also when the PCT is added. Since, apart from the PCT, the same variables as in the three-sample synthesis model were used in the two-sample model, it is reasonable to assume that the same arguments used for the three-sample synthesis model also apply to the two-sample synthesis model. However, the argument of the wider empirical base, applies to a lesser degree to the two-sample synthesis model since it is based on two Dutch populations only.
How can the results of the predictions obtained help the clinician to counsel the individual couple? Most couples have tried for more than 1 yearoften much longerand demand immediate treatment. In their judgement, further waiting is senseless because they consider themselves as infertile. Moreover, the psychological pressure caused by feelings of uncertainty and frustration increase their desire for immediate action. In addition, most couples overestimate the success of ART and grossly underestimate the related risks (Elster, 2000; Olivennes, 2000
; Ericson and Kallen, 2001
; Grobman et al., 2001
; Schieve et al., 2002
; Stromberg et al., 2002
; Land and Evers, 2003
; Moll et al., 2003
). The estimations of spontaneous pregnancy leading to live birth can be a tool in advising the couple in the following manner (see Table IV). If the chances are low e.g. below 20%, there is no point in further waiting, and advising the couple to quickly undergo treatment is realistic. In contrast, if the chances are favourable, e.g. above 40%, the couple should be strongly encouraged to wait for another year, because there is an
50% chance of success. The couple should be advised that there is no ART with an equal chance of success without any risk. Further waiting is certainly worthwhile in such cases. In the middle group (above 20% and below 40%) predictions approximate the overall probability of 3025% and the advice given depends on the balance between the probability of success, the degree of frustration and the risks of ART. These examples demonstrate that the sharp predictionsthe low and high onesare clinically useful. Predictions in the middle group hardly provide additional information for the individual couple.
The data of Table IV show that the two-sample synthesis model (PCT included) performs better in this respect than the three-sample synthesis model. In the latter, sharp predictions are only possible in less than half of the couples, whereas in the former this proportion is almost 70%. The superiority of the favourable predictions is noteworthy: about one quarter of all couples could be advised to wait for another year because their chances of spontaneous pregnancy leading to live birth, are almost 50%. When using the three-sample synthesis model this advice can only be given to 10% of the couples.
We conclude that both synthesis models perform better than the originally published ones and have a broader empirical basis. They can be used both by family doctors and by gynaecologists when considering to refer couples for (further) treatment. Although far from being perfect, they contain the best prognostic information predicting spontaneous pregnancy, so far available.
![]() |
APPENDIX |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
![]() |
The predicted probability (P) of spontaneous pregnancy within 1 year after intake leading to live birth according to the three-sample synthesis model including the referral status of the couple is:
![]() |
AGE1 is the woman's age if the age is lower or equal to 31 years and 31 years if the age is>31 years; AGE2 is the difference (woman's age31 years) if the woman's age is >31 years and zero otherwise; a tertiary couple is a couple referred by a gynaecologist.
The synthesis model with PCT is based on the Snick and Eimers samples. The formula of the two-sample synthesis model with PCT becomes:
![]() |
The result of the PCT in the initial cycle was coded as abnormal when no forward-moving sperm cells were found in the whole mucus sample.
![]() |
Acknowledgements |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
![]() |
References |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Collins JA, Burrows EA and Willan AR (1995) The prognosis for live birth among untreated infertile couples. Fertil Steril 64, 2228.[Medline]
Comhaire FH (1987) Simple model and empirical method for the estimation of spontaneous pregnancies in couples consulting for infertility. Int J Androl 10, 671680.[Medline]
Eimers JM, te Velde ER, Gerritse R, Vogelzang ET, Looman CW and Habbema JD (1994) The prediction of the chance to conceive in subfertile couples. Fertil Steril 61, 4452.[Medline]
Elster N (2000) Less is more: the risks of multiple births. The Institute for Science, Law, and Technology Working Group on Reproductive Technology. Fertil Steril 74, 617623.[CrossRef][Medline]
Ericson A and Kallen B (2001) Congenital malformations in infants born after IVF: a population-based study. Hum Reprod 16, 504509.
Grobman WA, Milad MP, Stout J and Klock SC (2001) Patient perceptions of multiple gestations: an assessment of knowledge and risk aversion. Am J Obstet Gynecol 185, 920924.[CrossRef][Medline]
Harrell FE Jr, Lee KL and Mark DB (1996) Multivariable prognostic models: issues in developing models, evaluating assumptions and adequacy, and measuring and reducing errors. Stat Med 15, 361387.[CrossRef][Medline]
Harrell FE Jr (2001) Regression modelling strategies: with applications to linear models, logistic regression, and survival analysis. Springer-Verlag, New York, NY.
Land JA and Evers JL (2003) Risks and complications in assisted reproduction techniques: Report of an ESHRE consensus meeting. Hum Reprod 18, 455457.
Little RJA (1992) Regression with missing X's: a review. J Am Stat Assoc 80, 11981202.
Moll AC, Imhof SM, Cruysberg JRM, Schouten-van Meeteren AYN, Boers M and van Leeuwen FE (2003) Incidence of retinoblastoma in children born after in-vitro fertilisation. Lancet 361, 309310.[CrossRef][Medline]
Olivennes F (2000) Avoiding multiple pregnancies in ART. Double trouble: yes a twin pregnancy is an adverse outcome. Hum Reprod 15, 16631665.
Schieve LA, Meikle SF, Ferre C, Peterson HB, Jeng G and Wilcox LS (2002) Low and very low birth weight in infants conceived with use of assisted reproductive technology. N Eng J Med 346, 731737.
Snick HK, Snick TS, Evers JL and Collins JA (1997) The spontaneous pregnancy prognosis in untreated subfertile couples: the Walcheren primary care study. Hum Reprod 12, 15821588.[Abstract]
Steyerberg EW, Eijkemans MJ, Harrell FE, Jr and Habbema JD (2001a) Prognostic modeling with logistic regression analysis: in search of a sensible strategy in small data sets. Med Decis Making 21, 4556.
Steyerberg EW, Harrell FE, Jr, Borsboom GJ, Eijkemans MJ, Vergouwe Y and Habbema JD (2001b) Internal validation of predictive models: Efficiency of some procedures for logistic regression analysis. J Clin Epidemiol 54, 774781.[CrossRef][Medline]
Stromberg B, Dahlquist G, Ericson A, Finnstrom O, Koster M and Stjernqvist K (2002) Neurological sequelae in children born after in-vitro fertilisation: a population-based study. Lancet 359, 461465.[CrossRef][Medline]
te Velde ER, Eijkemans R and Habbema HD (2000) Variation in couple fecundity and time to pregnancy, an essential concept in human reproduction. Lancet 355, 19281929.[CrossRef][Medline]
Van Noord-Zaadstra BM, Looman CW, Alsbach H, Habbema JD, te Velde ER and Karbaat J (1991) Delaying childbearing: effect of age on fecundity and outcome of pregnancy. BMJ 302, 13611365.[Medline]
Wichmann L, Isola J and Tuohimaa P (1994) Prognostic variables in predicting pregnancy. A prospective follow up study of 907 couples with an infertility problem. Hum Reprod 9, 11021108.[Abstract]
World Health Organization (1980) Laboratory Manual For The Examination of Human Semen and Semen-Cervical Mucus Interaction. Press Concern, Singapore.
World Health Organization (1987) Laboratory Manual For The Examination of Human Semen and Semen-Cervical Mucus Interaction. Cambridge University Press, Cambridge, UK.
Wouts MH, Duisterhout JS, Kuik DJ and Schoemaker J (1987) The chance of spontaneous conception for the infertile couple referred to an academic clinic for reproductive endocrinology and fertility in The Netherlands. Eur J Obstet Gynecol Reprod Biol 26, 243250.[Medline]
Submitted on June 19, 2002; accepted on May 20, 2004.