1 Epidemiology Branch, Division of Epidemiology, Statistics & Prevention Research, National Institute of Child Health & Human Development, The National Institutes of Health, Department of Health & Human Services, 6100 Executive Boulevard, Room 7B03, Rockville, MD 20852, 2 Department of Health Studies, Room W-260, University of Chicago, Chicago, IL and 3 Division of Reproductive Health, National Center for Chronic Disease Prevention and Health Promotion, Centers for Disease Control and Prevention, Atlanta, GA, USA
4 To whom correspondence should be addressed. Email: gb156i{at}nih.gov
![]() |
Abstract |
---|
Key words: assisted reproductive technologies/child health/correlated outcomes/design/hierarchical models
![]() |
Introduction |
---|
Epidemiologists and biostatisticians have an opportunity to lend expertise in designing study protocols responsive to the many questions about ART effects. Technically speaking, ART treatment includes procedures that involve the handling of human sperm and eggs outside the body for the expressed purpose of creating a pregnancy (i.e. transcervical embryo transfer, ICSI, gamete and zygote intrafallopian transfer, frozen embryo transfer, and donor embryo transfer). Timely answers are needed as the number of treated couples steadily increases along with the number of live births. For example, ART treatment procedures have increased approximately 20-fold in the United States since 1986, accompanied by a 100-fold increase in the number of ART conceived infants (Medical Research International and The American Fertility Society Special Interest Group, 1988; CDC et al., 2003). This percentage is likely to increase as treatment efficacy continues to improve and, in the United States, as health-care coverage becomes increasingly mandated (RESOLVE and The National Infertility Association, 2003
).
We have identified key methodological nuances that challenge investigation regarding ART and child health while offering strategies for addressing them. This is done in a questionanswer format beginning with study design through data analysis and interpretation. While many of the methodological issues are relevant for both aetiology and clinical efficacy, we focus our attention on assessing aetiology given that previous authors have discussed clinical efficacy and the interpretation of clinical trial data (Daya, 2003; Vail and Gardener, 2003
). For the purposes of this paper and our conceptual model illustrated in Figure 1, we define fecundity as the biological capacity for reproduction and fertility as the ability to deliver (or father) a live born infant (Wood, 1994
). Thus, infecundity refers to the inability to become pregnant (or what is clinically known as infertility).
|
![]() |
Can we study the effect(s) of ART treatment on child health? |
---|
With respect to the first consideration, investigators will need to decide a priori whether the research question is aimed exclusively at female, male or couple-based infecundity and accompanying treatment modalities. This is an important decision that impacts study design and the interpretation of results. Investigators need to a priori determine whether the unit pertains to a cycle, individual (woman or man) or couple, in a manner consistent with the study goals and research objectives. This can be a challenging decision, as seen with ICSI and infant outcomes. Although ICSI was originally developed to overcome fertilization failure in couples with male factor infertility, ICSI is increasingly used to treat couples irrespective of male factor. In 1995, ICSI was used in 11% of ART procedures using embryos from the freshly fertilized patient's eggs that were reported to the United States ART registry increasing to 49% in 2001 (CDC et al., 1997). Thus, even for a very select ART procedure such as ICSI that originated as a treatment for male factor, there is now much more diversity with respect to fecundity among couples using ICSI. This example underscores the heterogeneity among infertile couples utilizing ART treatments in contemporary clinical practice. If there is any reason to suspect parental fecundity or treatment effects, a couple-based approach is needed. Given the dearth of empirical evidence on the determinants of male and female fecundity (Buck et al., 1997), every effort should be expended to first consider a couple-based cohort design to ensure effects are not inadvertently designed away.
With respect to the time consideration, investigators can define time (and hence time-varying covariates) as calendar time (e.g. age, duration of infertility) or in terms of treatment cycles. Time needs to be specified upon designing a study according to the study's hypotheses. For example, timing of maternal and/or paternal exposures is essential for determining imprinting outcomes such as genetic and chromosomal constitution of the sperm sample used for ICSI. The need to identify conception as a part of ART treatment enables researchers to consider and measure day-specific exposures. Considerable care should be taken not to interchange time or the unit of analysis (e.g. flip from calendar to cycle time or flip from couples to women) in the data analysis stage without an appropriate a priori analytic plan. For example, changing the time scale from cycle to calendar time may result in loss of power because some couples may have fewer cycles in a given time period than others. Changing from calendar to cycle time may result in emergence of time/age-related confounders that could bias the analysis, if not properly accounted for.
![]() |
Can we differentiate ART treatment effects from underlying fecundity impairments? |
---|
As shown in Figure 1, p1 denotes the proportion of study outcomes among infecund couples who achieved pregnancy with ART. Similarly, p2 refers to the proportion of study outcomes among infecund couples who achieved pregnancy without ART treatment, while p3 and p4 represent the proportions of study outcomes among fecund couples who achieve pregnancy with and without ART, respectively. We can conceptualize a framework for empirically estimating effects, illustrated as relative risks (RR), with respect to three questions: (i) the treatment effect on human development among infecund couples (RR1 = p1/p2); (ii) the infecundity effect on human development for couples undergoing treatment (RR2 = p1/p3); and (iii) the joint effect of infecundity and related treatment on human development (RR3 = p1/p4). For observational or non-randomized studies, we will need a relatively large cohort of couples trying to become pregnant to attempt answering these questions.
Absence of population estimates for p2 makes it difficult to estimate RR1. The probability of becoming pregnant without medical intervention among infecund couples (those trying for >12 months without success) is virtually unknown as we are unaware of any population-based incidence figures on resolved infertility. Moreover, couples that do eventually conceive without assistance are probably different with respect to fecundity and exposure profiles from those who do not. However, clinical findings of couples that present to medical centres for diagnostic testing after failure to conceive suggest that approximately 14% of untreated infertile couples will eventually give birth (Collins et al., 1995).
Another approach with observational designs is to estimate the effect of infecundity alone among couples undergoing treatment (RR2). Although limited anecdotal evidence suggests that ART may be used by women or couples with no indication of infecundity (e.g. single women who elect to use ART after no or only a short time trying to conceive naturally or with insemination), there is currently no way to identify such cases in many registries, including the United States ART registry. While there may be very limited circumstances under which this research question might be addressed, it is virtually impossible to do so at the population level. Moreover, it is likely that such cases make a small contribution to the total population of couples currently receiving ART treatment. Thus, we are realistically left with estimating RR3 in an observational setting.
As stated before, RR3 is the compound effect of infecundity and treatment. From a public health perspective, this is the entity of utmost concern, because the compound effect has direct relevance for couples trying to weigh the total risks associated with conception after ART, whether these risks stem from the treatment or the underlying fecundity impairment. We do, however, recognize that the compound effect is limited in informing aetiology.
![]() |
How can we evaluate the effects of ART treatment on child health? |
---|
Other limitations associated with the use of couples failing to become pregnant is the relatively large initial sample size required to enroll a sufficient number of ART births, especially when the study objective is to evaluate rare infant outcomes such as birth defects. An insufficient sample size with respect to statistical power may lead to an erroneous conclusion about the absence of an ART treatment effect (type II error).
Another prospective cohort approach, albeit less desirable, is to recruit couples upon first seeking services for perceived infecundity. Practical challenges include the receipt of a range of infertility services from various health providers. Some women may be placed on ovulation induction agents prior to a full fecundity evaluation. These issues underscore the potential for information or misclassification bias along with other study limitations such as marked heterogeneity with respect to treatment and the need for retrospective collection of covariate data up to the time medical care was sought. This is particularly troublesome if exposure patterns change either temporally (e.g. seasonal use of pesticides or changes in cigarette smoking when perceiving conception difficulties), or purposefully if couples alter behaviours after experiencing failure (e.g. taking dietary supplements or vitamins).
![]() |
What are possible sampling strategies for evaluating the effects of ART treatment on child health? |
---|
Choice of sampling strategy is impacted by the observation that a large percentage of infecund couples do not seek medical care. The prevalence of medical care-seeking behaviour for primary infecundity ranges from 32% to 92%, and from 22% to 79% for secondary infertility (Schmidt et al., 1995). Approximately 44% of US women reporting fecundity impairments stated that they had not sought medical services (Stephen and Chandra, 1998
). Care-seeking behaviour has been reported to be associated with woman's age, education, income, duration of marriage and nulliparity (Rachootin and Olsen, 1981
; Hirsh and Mosher, 1987
), raising concern about potential selection bias. Even less is known about care-seeking behaviours of men, despite concerns about declining male fecundity or the so-called testicular dysgenesis hypothesis (Skakkebaek et al., 2001
). Perhaps, differences in care-seeking behaviour will be minimized as health insurance providers routinely include ART services.
Implicit with sample selection is the need for sufficient sample size to minimize erroneous interpretation of study results (type II error). The unit of analysis (i.e. treatment cycle versus woman versus couple) is critical not only for determining the sampling framework, but also to ensure the analysis is appropriately statistically powered. Again, less prevalent child health outcomes such as birth defects or cancer must be designed with a sufficient sample size.
![]() |
Which health end points are appropriate for study? Can more than one end point be evaluated in a single study? |
---|
Relevant developmental end points include: plurality of birth, fetal growth and development, birth size, birth defects, developmental disabilities, and cancer. Other aspects of human development might include sexual maturation and puberty, necessitating the need for following children through adolescence. Prospective design also enables investigators to capture incident cases or affected infants in lieu of prevalent cases that might be ascertained in a retrospective study design.
Added challenges in assessing developmental end points pertain to the issue of multiple births. In the United States, an estimated 37% of ART pregnancies that progress far enough to detect fetal hearts are classified as multiple gestations and 35% of live born deliveries are multiples. Considering total infants born from ART, more than half are twins or higher order, 18 times the proportion reported for the general United States population (Wright et al., 2003). Given the inherent perinatal sequalae associated with multiples (ESHRE Capri Workshop Group, 2000
), the need to clearly identify multiple pregnancies and not just multiple births is evident. This can be a difficult task, as many early multiple gestations are reduced to a singleton clinical pregnancy (Salihu et al., 2003
). Recent data from the United States registry of ART procedures indicate that approximately 9% of singleton live births conceived from ART were from pregnancies with two or more fetal hearts observed on early ultrasound (Schieve et al., 2004a
). The fetal death of a co-twin is a known risk factor for neonatal complications and morbidity in the remaining twin (Skrablin et al., 1994
), especially in the presence of a twin-to-twin transfusion (Saito et al., 1999
).
![]() |
How and when should ART treatment data be captured for analysis? |
---|
![]() |
How should data be analysed when evaluating ART treatment and child health? |
---|
ART observations will generally not be independent, given the need to consider multiple oocytes, implantations or embryos per treatment cycle, as well as multiple treatment cycles per woman or couple. Failure to account for such dependency may lead to unpredictable bias (Dukic and Hogan, 2002; Daya, 2003
). Investigators are left with either restricting treatment cycles to one per woman or to the use of analytical models capable of addressing the dependency [Bayesian methods, mixed models or generalized estimating equations (GEE)].
Without consideration of the issues of dependency and correlation, study findings may be inefficient and biased, ultimately yielding imprecise or even incorrect conclusions. It is known that clustering (form of dependency) commonly occurs in ART data, where outcomes are clustered on several levels: within menstrual or treatment cycles, within each woman and within each couple. The complex correlation structure generally observed in ART outcomes can be due to unmeasured (or omitted) level-specific covariates, such as, for example, unmeasured endometrial or embryo characteristics or omitted time-varying exposure.
Unfortunately, many investigators still report and rely on simple statistics such as t-tests or Pearson's 2 statistic, in spite of multiple confounders and the lack of independence of pregnancy outcomes. Such published work may mislead clinicians and, ultimately, couples undergoing care. A variety of statistical techniques exist today for analysing data in situations where clustering occurs and induces dependency among the outcomes of interest. Tools for analysing dependent outcomes are well developed and implemented in a variety of standard statistical packages. For example, longitudinal data analysis techniques, GEE, (empirical) Bayesian hierarchical models and mixed models are readily available in widely used statistical software packages such as Stata or SAS. Other appropriate specialized statistical packages include MIXOR, EGRET, HLM, MLWin, or WinBUGS and BUGS (Hogan and Blazar, 2000
).
Hierarchical models (also known as multi-level models, random-effects or mixed models) can account for this within-cluster correlation by assuming that each cluster has its own (unobservable) probability of a positive outcome, which is different from all other clusters, and that all cluster-specific probabilities follow some distribution. The variance of this distribution is what is known as the heterogeneity parameter, and its magnitude reflects the amount of heterogeneity among women or couples (Hogan and Blazar, 2000; Dukic and Hogan, 2002
). Hierarchical models also can provide cluster-specific estimates of treatment and other relevant covariate effects. This feature is desirable in many clinical settings when one is interested in prediction of outcomes for a given woman or a given couple.
All Bayesian models are in fact hierarchical models, where the top hierarchical level is reserved for so called priors, or summaries of knowledge elicited from previous studies or from expert opinion. This ability to formally account for prior knowledge can be extremely useful, particularly in ART settings. Considerable information on prior history is likely to exist for many couples involved in ART studies, and investigators should be able to formally include that information in the analysis.
Methods like GEE can also account for within-cluster correlation. However, they cannot produce subject-specific estimates, but instead yield population-level inference. This can be well suited for policy making, but less useful for individual clinical prediction. Other methods for handling dependent outcomes are available, such as variance-adjustment techniques (sandwich estimators) or bootstrap.
Another very important analytical issue is missing data and informative drop-out in a longitudinal study setting. Missing data mechanisms and reasons for dropout can in general be extremely complicated in an ART setting, resulting in severe selection bias for ART research. To address non-negligible dropout issues, explicit models for the drop-out mechanism or joint models for the outcome of interest and time to drop-out are needed (Hogan and Laird, 1997). Appropriate statistical input will generally be required for design and implementation of such research.
![]() |
What conclusions can be reached from available and future data? |
---|
As researchers, we are left answering the question about RR3 as described above: what is the compound effect of infertility and ART treatment on child health? This challenge requires the use of a standardized study protocol involving prospective data collection with various laboratory, clinical and behavioural factors, and measures of fecundity impairment. With accurate information on type and diagnostic subtype of infecundity, it will be possible to adjust statistical analyses to further assess effects.
![]() |
Summary |
---|
![]() |
Acknowledgements |
---|
![]() |
References |
---|
Australian In vitro Fertilization Collaborative Group (1985) High incidence of preterm birth and early losses in pregnancy after in vitro fertilization. Br Med J 291, 11601163.[ISI][Medline]
Baird DD, Weinberg CR and Rowland AS (1991) Reporting errors in time-to-pregnancy data collected with a short questionnaire. Am J Epidemiol 133, 12821290.[Abstract]
Bergh T, Ericson A, Hillensjo T, Nygren KG and Wennerholm UB (1999) Deliveries and children born after in-vitro fertilization in Sweden 198295: a retrospective cohort study. Lancet 354, 15791585.[CrossRef][ISI][Medline]
Bonde JP, Hjollund NH, Jensen TK, Ernst E, Kolstad H, Henriksen TB, Giwercman A, Skakkebaek NE, Andersson A-M and Olsen J (1998) A follow-up study of environmental and biologic determinants of fertility among 430 Danish first-pregnancy planners: design and methods. Reprod Toxicol 12, 1927.[CrossRef][ISI][Medline]
Buck GM, Sever LE, Batt RE and Mendola P (1997) Review of lifestyle factors and female infertility. Epidemiology 8, 435441.[ISI][Medline]
Buitendijk SE (1999) Children after in vitro fertilization: an overview of the literature. Int J Technol Assess Health Care 15, 5265.[CrossRef][ISI][Medline]
CDC, American Society for Reproductive Medicine, Society for Assisted Reproductive Technology and RESOLVE (1997) 1995 Assisted Reproductive Technology Success Rates: National Summary and Fertility Clinic Reports. CDC, Atlanta, GA, USA.
CDC, American Society for Reproductive Medicine and Society for Assisted Reproductive Technology (2003) Assisted Reproductive Technology Success Rates: National Summary and Fertility Clinic Reports. CDC, Atlanta, GA, USA.
Collins JA, Burrows EA and Willan AR (1995) The prognosis for live birth among untreated infertile couples. Fertil Steril 64, 2228.[ISI][Medline]
Daya S (2003) Pitfalls in the design and an alysis of efficacy trials in subfertility. Hum Reprod 18, 10051009.
Dhont M, DeNeubourg F, Van Der Elst J and De Sutter P (1997) Perinatal outcome of pregnancies after assisted reproduction: A case control study. J Assist Reprod Genet 14, 575580.[CrossRef][ISI][Medline]
Dukic V and Hogan JW (2002) A hierarchial Bayesian approach to modeling embryo implantation following in vitro fertilization. Biostatistics 3, 361377.
Ericson A and Kallen B (2001) Congenital malformations in infants born after IVF: A population-based study. Hum Reprod 16, 504509.
ESHRE Capri Workshop Group (2000) Multiple gestation pregnancy. Hum Reprod 15, 18561864.
Ferreira-Poblete A (1997) The probability of conception on different days of the cycle with respect to ovulation: an overview. Adv Contracept 13, 8395.[CrossRef][ISI][Medline]
Hansen M, Kurinczuk JJ, Bower C and Webb S (2002) The risk of major birth defects after intracytoplasmic sperm injection and in vitro fertilization. N Engl J Med 346, 725730.
Helmerhorst FM, Perquin DAM, Donker D and Keirse MJNC (2004) Perinatal outcome of singletons and twins after assisted conception: a systematic review of controlled studies. BMJ 328, 261.
Hirsh MB and Mosher WD (1987) Characteristics of infertile women in the United States and their use of infertility services. Fertil Steril 47, 618625.[ISI][Medline]
Hogan J and Laird N (1997) Mixture models for the joint distribution of repeated measures and event times. Stat Med 16, 239258.[CrossRef][ISI][Medline]
Hogan J and Blazar A (2000) Hierarchical logistic regression models for clustered binary outcomes in studies of IVF-ET. Fertil Steril 73, 575581.[CrossRef][ISI][Medline]
Jackson RA, Gibson KA, Wu YW and Croughan MS (2004) Perinatal outcomes in singletons following in vitro fertilization: a meta analysis. Obstet Gynecol 103, 551563.[ISI][Medline]
Joffe M, Villard L, Li Z, Plowman R and Vessey M (1993) Long term recall of time-to-pregnancy. Fertil Steril 60, 99104.[ISI][Medline]
Joffe M, Villard L, Li Z, Plowman R and Vessey M (1995) A time to pregnancy questionnaire designed for long term recall: validity in Oxford England. J Epidemiol Community Health 49, 314319.[Abstract]
Koudstaal J, Braat DD, Bruinse HW, Naaktgeboren N, Vermeiden JP and Visser GH (2000) Obstetric outcome of singleton pregnancies after IVF: a matched control study in four Dutch university hospitals. Hum Reprod 15, 18191825.
Medical Research International and The American Fertility Society Special Interest Group (1988) In vitro fertilization/embryo transfer in the United States: 1985 and 1986 results from the National IVF-ET Registry. Fertil Steril 49, 212215.[ISI][Medline]
Morford LL, Henck JW, Breslin WJ and DeSesso JM (2004) Hazard identification and predictability of children's health risk from animal data. Environ Health Perspect 112, 266271.[ISI][Medline]
Olsen J, Juul S and Basso O (1998) Measuring time to pregnancy-methodological issues to consider. Hum Reprod 13, 17511753.
Rachootin P and Olsen J (1981) Social selection in seeking medical care for reduced fecundity among women in Denmark. J Epidemiol Community Health 35, 262264.[Abstract]
RESOLVE and The National Infertility Association (2003) Online. http://www.resolve.org (date accessed: October 2004).
Royson P and Ferreira A (1999) A new approach to modeling daily probabilities of conception. Biometrics 55, 10051013.[CrossRef][ISI][Medline]
Rufat P, Olivennes F, DeMouzon F, Dehan M and Frydman R (1994) Task force report on the outcome of pregnancies and children conceived by in-vitro fertilization (France: 19871989). Fertil Steril 61, 324330.[ISI][Medline]
Saito K, Ohtsu Y, Amano K and Nishijima M (1999) Perinatal outcome and management of single fetal death in twin pregnancy: a case series and review. J Perinat Med 27, 473477.[ISI][Medline]
Salihu HM, Aliyu MH, Rouse DJ and Kirby RS (2003) Potentially preventable excess mortality among higher-order multiples. Obstet Gynecol 102, 679684.
Schieve LA, Meikle SF, Ferre C, Peterson HB, Jeng G and Wilcox LS (2002) Low and very low birth weight in infants conceived with use of assisted reproductive technology. N Engl J Med 346, 731737.
Schieve LA, Rasmussen SA, Buck GM, Schendel DE, Reynolds M and Wright V (2004b) Are children conceived with assisted reproductive technology at increased risk for adverse health outcomes? Obstet Gynecol 103, 11541163.[CrossRef][Medline]
Schieve LA, Ferre C, Peterson HB, Macaluso M, Reynolds MA and Wright VC (2004a) Perinatal outcome among singleton infants conceived through assisted reproductive technology in the United States. Obstet Gynecol 103, 11441153.[CrossRef][Medline]
Schmidt L, Munster K and Helm P (1995) Infertility and the seeking of infertility treatment in a representative population. BMJ 102, 978984.
Selevan SG, Kimmel CA and Mendola P (2000) Identifying critical windows of exposure for children's health. Environ Health Perspect 108 (Suppl 3), 451455.[ISI][Medline]
Skakkebaek NE, Rajpert-De Meyts E and Main KM (2001) Testicular dysgenesis syndrome: an increasingly common developmental disorder with environmental aspects. Hum Reprod 15, 972978.[CrossRef]
Skrablin S, Kuvacic I, Fuduric I and Hodzic D (1994) Antenatal fetal demise in multiple gestation-the outcome of surviving fetus at one to 4 years of age. Eur J Obstet Gynecol Reprod Biol 56, 1519.[ISI][Medline]
Stephen EH and Chandra A (1998) Updated projections of infertility in the United States: 19952025. Fertil Steril 70, 3034.[CrossRef][ISI][Medline]
Stromberg B, Dahlquist G, Ericson A, Finnstrom O, Koster M and Stjernqvist K (2002) Neurological sequelae in children born after in-vitro fertilisation: a population-based study. Lancet 359, 461465.[CrossRef][ISI][Medline]
Sutcliffe AG, Souza SW, Cadman J, Richards B, McKinlay IA and Lieberman B (1995) Minor congenital anomalies, major congenital malformations and development in children conceived from cryopreserved embryos. Hum Reprod 10, 33323337.[Abstract]
Vail A and Gardener E (2003) Common statistical errors in the design and analysis of subfertility trials. Hum Reprod 18, 10001004.
Van Steirteghem A, Bonduelle M, Devroey P and Liebaers I (2002) Follow-up of children born after ICSI. Hum Reprod 8, 111115.[CrossRef]
Wang X, Chen C, Wang L, Chen D, Guang W and French J (2003) Conception, early pregnancy loss, and time to clinical pregnancy: a population-based prospective study. Fertil Steril 79, 577584.[CrossRef][ISI][Medline]
Weed DL (1995) Causal and preventive inference. In Greenwald P, Kramer BS and Weed DL (eds) Cancer Prevention and Control. Marcel Dekker, New York, NY, USA, pp 285302.
Weinberg C, Baird D and Wilcox A (1994) Sources of bias in studies of time to pregnancy. Stat Med 13, 671681.[ISI][Medline]
Wennerholm UB, Bergh C, Hamberger L, Lundin K, Nilsson L, Wikland M and Kallen B (2000) Incidence of congenital malformations in children born after ICSI. Hum Reprod 15, 944948.
Wilcox AJ, Weinberg CR, O'Connor JF, Baird DD, Schlatterer JP, Canfield RE, Armstrong EG and Nisula BC (1988) Incidence of early pregnancy loss. N Engl J Med 319, 189194.[Abstract]
Wilson JG (1965) Embryological considerations in teratology. In Wilson JG and Warkany J (eds) Teratology: Principles and Techniques. The University of Chicago Press, Chicago, IL, USA, pp 256.
Wood JW (1994) Dynamics of Human Reproduction. Aldine De Gruyter, New York, NY, USA, pp 3.
Wright VC, Schieve LA, Reynolds MA and Jeng G (2003) Assisted reproductive technology surveillance - United States, 2000 (published erratum appears in MMWR 52,942). MMWR Surveill Summ 52, 116.[Medline]
Zielhuis GA, Hulscher ME and Florack EI (1992) Validity and reliability of a questionnaire on fecundability. Int J Epidemiol 21, 11511156.[Abstract]
Submitted on May 12, 2004; accepted on September 10, 2004.
|