Research hurdles complicating the analysis of infertility treatment and child health

G.M. Buck Louis1,4, E.F. Schisterman1, V.M. Dukic2 and L.A. Schieve3

1 Epidemiology Branch, Division of Epidemiology, Statistics & Prevention Research, National Institute of Child Health & Human Development, The National Institutes of Health, Department of Health & Human Services, 6100 Executive Boulevard, Room 7B03, Rockville, MD 20852, 2 Department of Health Studies, Room W-260, University of Chicago, Chicago, IL and 3 Division of Reproductive Health, National Center for Chronic Disease Prevention and Health Promotion, Centers for Disease Control and Prevention, Atlanta, GA, USA

4 To whom correspondence should be addressed. Email: gb156i{at}nih.gov


    Abstract
 Top
 Abstract
 Introduction
 Can we study the...
 Can we differentiate ART...
 How can we evaluate...
 What are possible sampling...
 Which health end points...
 How and when should...
 How should data be...
 What conclusions can be...
 Summary
 Acknowledgements
 References
 
Research aimed at the empirical evaluation of infertility treatment including assisted reproductive technologies (ART) on child health and development is hampered by investigators' inability to methodologically separate possible treatment effects from underlying fecundity impairments. While the literature continues to identify ART as a risk factor for many child health outcomes, less attention has been paid to the methodologic rigor needed to answer this question. We identify aspects of fecundity and the nuances of medical practice that need to be considered and captured when designing epidemiologic investigations aimed at assessing ART and child health. These include: (i) the use of prospective study designs in which the unit of analysis (cycle versus individual versus couple) is defined; (ii) data collection on relevant time-varying covariates at, before and during treatment; and (iii) the use of statistical techniques appropriate for hierarchical data and correlated exposures. While none of these issues in and by itself is unique to ART research, attention to these issues has been lacking in much of the published research limiting our ability to evaluate health consequences for children. Longitudinal studies of children conceived with ART will benefit from attention to these issues and, hopefully, produce answers to lingering questions about safety.

Key words: assisted reproductive technologies/child health/correlated outcomes/design/hierarchical models


    Introduction
 Top
 Abstract
 Introduction
 Can we study the...
 Can we differentiate ART...
 How can we evaluate...
 What are possible sampling...
 Which health end points...
 How and when should...
 How should data be...
 What conclusions can be...
 Summary
 Acknowledgements
 References
 
There is an accumulating body of research suggesting that infertility treatment may adversely affect the health and development of children so conceived. Singleton infants born to couples undergoing assisted reproductive technologies (ART) have been reported to be at increased risk for decrements in gestation and birth size (Australian In vitro Fertilization Collaborative Group, 1985Go; Rufat et al., 1994Go; Koudstaal et al., 2000Go; Schieve et al., 2002Go; Schieve et al., 2004aGo), select birth defects (Bergh et al., 1999Go; Wennerholm et al., 2000Go; Ericson and Kallen, 2001Go; Hansen et al., 2002Go) and developmental disabilities (Stromberg et al., 2002Go), although equivocal findings exist (Anthony et al., 2002Go; Van Steirteghem et al., 2002Go). Reviews have summarized the health outcomes of concern for children (Helmerhorst et al., 2004Go; Schieve et al., 2004bGo). A recent random effects meta-analysis comparing 12 283 singleton infants conceived with IVF with 1.9 million spontaneously conceived infants reported treatment to confer a significant approximately two-fold increased risk of perinatal mortality, preterm delivery, low birth weight and small-for-gestational age (Jackson et al., 2004Go) after controlling for maternal age and parity.

Epidemiologists and biostatisticians have an opportunity to lend expertise in designing study protocols responsive to the many questions about ART effects. Technically speaking, ART treatment includes procedures that involve the handling of human sperm and eggs outside the body for the expressed purpose of creating a pregnancy (i.e. transcervical embryo transfer, ICSI, gamete and zygote intrafallopian transfer, frozen embryo transfer, and donor embryo transfer). Timely answers are needed as the number of treated couples steadily increases along with the number of live births. For example, ART treatment procedures have increased approximately 20-fold in the United States since 1986, accompanied by a 100-fold increase in the number of ART conceived infants (Medical Research International and The American Fertility Society Special Interest Group, 1988Go; CDC et al., 2003). This percentage is likely to increase as treatment efficacy continues to improve and, in the United States, as health-care coverage becomes increasingly mandated (RESOLVE and The National Infertility Association, 2003Go).

We have identified key methodological nuances that challenge investigation regarding ART and child health while offering strategies for addressing them. This is done in a question–answer format beginning with study design through data analysis and interpretation. While many of the methodological issues are relevant for both aetiology and clinical efficacy, we focus our attention on assessing aetiology given that previous authors have discussed clinical efficacy and the interpretation of clinical trial data (Daya, 2003Go; Vail and Gardener, 2003Go). For the purposes of this paper and our conceptual model illustrated in Figure 1, we define fecundity as the biological capacity for reproduction and fertility as the ability to deliver (or father) a live born infant (Wood, 1994Go). Thus, infecundity refers to the inability to become pregnant (or what is clinically known as infertility).



View larger version (16K):
[in this window]
[in a new window]
 
Figure 1. Conceptual framework for designing studies focusing on the effects of ART treatment on child health and development.

 

    Can we study the effect(s) of ART treatment on child health?
 Top
 Abstract
 Introduction
 Can we study the...
 Can we differentiate ART...
 How can we evaluate...
 What are possible sampling...
 Which health end points...
 How and when should...
 How should data be...
 What conclusions can be...
 Summary
 Acknowledgements
 References
 
Yes. The relatively short interval between initiation of ART treatment and time to event (i.e. implantation, pregnancy, live birth) underscores the utility of prospective design while ensuring a temporal relation between exposure and outcome. Some retrospective observational designs might be more efficient for rare outcomes such as Beckwith–Wiedemann syndrome or in offering an interim assessment of long-term outcomes such as the future fertility of children conceived with ART treatment. These designs are less desirable, overall, because of the potential for bias, imprecision with regard to temporal ordering and correlated exposures. Both descriptive (e.g. case series and cross-sectional) and retrospective (i.e. case control) study designs have been successfully used in previous research initiatives aimed at assessing the impact of ART on infant health. The complexity of this question and underlying methodological nuances underscore the role for prospective study as outlined in this paper. In particular, two key issues are: (i) deciding on the right unit of analysis; and (ii) defining what time scale should be used.

With respect to the first consideration, investigators will need to decide a priori whether the research question is aimed exclusively at female, male or couple-based infecundity and accompanying treatment modalities. This is an important decision that impacts study design and the interpretation of results. Investigators need to a priori determine whether the unit pertains to a cycle, individual (woman or man) or couple, in a manner consistent with the study goals and research objectives. This can be a challenging decision, as seen with ICSI and infant outcomes. Although ICSI was originally developed to overcome fertilization failure in couples with male factor infertility, ICSI is increasingly used to treat couples irrespective of male factor. In 1995, ICSI was used in 11% of ART procedures using embryos from the freshly fertilized patient's eggs that were reported to the United States ART registry increasing to 49% in 2001 (CDC et al., 1997). Thus, even for a very select ART procedure such as ICSI that originated as a treatment for male factor, there is now much more diversity with respect to fecundity among couples using ICSI. This example underscores the heterogeneity among infertile couples utilizing ART treatments in contemporary clinical practice. If there is any reason to suspect parental fecundity or treatment effects, a couple-based approach is needed. Given the dearth of empirical evidence on the determinants of male and female fecundity (Buck et al., 1997Go), every effort should be expended to first consider a couple-based cohort design to ensure effects are not inadvertently designed away.

With respect to the time consideration, investigators can define time (and hence time-varying covariates) as calendar time (e.g. age, duration of infertility) or in terms of treatment cycles. Time needs to be specified upon designing a study according to the study's hypotheses. For example, timing of maternal and/or paternal exposures is essential for determining imprinting outcomes such as genetic and chromosomal constitution of the sperm sample used for ICSI. The need to identify conception as a part of ART treatment enables researchers to consider and measure day-specific exposures. Considerable care should be taken not to interchange time or the unit of analysis (e.g. flip from calendar to cycle time or flip from couples to women) in the data analysis stage without an appropriate a priori analytic plan. For example, changing the time scale from cycle to calendar time may result in loss of power because some couples may have fewer cycles in a given time period than others. Changing from calendar to cycle time may result in emergence of time/age-related confounders that could bias the analysis, if not properly accounted for.


    Can we differentiate ART treatment effects from underlying fecundity impairments?
 Top
 Abstract
 Introduction
 Can we study the...
 Can we differentiate ART...
 How can we evaluate...
 What are possible sampling...
 Which health end points...
 How and when should...
 How should data be...
 What conclusions can be...
 Summary
 Acknowledgements
 References
 
No. The ideal study design for answering this question is a randomized clinical trial, which is not possible given that it would require administering ART to fecund couples. However, it is possible to evaluate ART effects among infecund couples receiving treatment. To better understand our inability to separate infecundity from an ART treatment effect, we offer the framework of a randomized trial as illustrated in Figure 1.

As shown in Figure 1, p1 denotes the proportion of study outcomes among infecund couples who achieved pregnancy with ART. Similarly, p2 refers to the proportion of study outcomes among infecund couples who achieved pregnancy without ART treatment, while p3 and p4 represent the proportions of study outcomes among fecund couples who achieve pregnancy with and without ART, respectively. We can conceptualize a framework for empirically estimating effects, illustrated as relative risks (RR), with respect to three questions: (i) the treatment effect on human development among infecund couples (RR1 = p1/p2); (ii) the infecundity effect on human development for couples undergoing treatment (RR2 = p1/p3); and (iii) the joint effect of infecundity and related treatment on human development (RR3 = p1/p4). For observational or non-randomized studies, we will need a relatively large cohort of couples trying to become pregnant to attempt answering these questions.

Absence of population estimates for p2 makes it difficult to estimate RR1. The probability of becoming pregnant without medical intervention among infecund couples (those trying for >12 months without success) is virtually unknown as we are unaware of any population-based incidence figures on resolved infertility. Moreover, couples that do eventually conceive without assistance are probably different with respect to fecundity and exposure profiles from those who do not. However, clinical findings of couples that present to medical centres for diagnostic testing after failure to conceive suggest that approximately 14% of untreated infertile couples will eventually give birth (Collins et al., 1995Go).

Another approach with observational designs is to estimate the effect of infecundity alone among couples undergoing treatment (RR2). Although limited anecdotal evidence suggests that ART may be used by women or couples with no indication of infecundity (e.g. single women who elect to use ART after no or only a short time trying to conceive naturally or with insemination), there is currently no way to identify such cases in many registries, including the United States ART registry. While there may be very limited circumstances under which this research question might be addressed, it is virtually impossible to do so at the population level. Moreover, it is likely that such cases make a small contribution to the total population of couples currently receiving ART treatment. Thus, we are realistically left with estimating RR3 in an observational setting.

As stated before, RR3 is the compound effect of infecundity and treatment. From a public health perspective, this is the entity of utmost concern, because the compound effect has direct relevance for couples trying to weigh the total risks associated with conception after ART, whether these risks stem from the treatment or the underlying fecundity impairment. We do, however, recognize that the compound effect is limited in informing aetiology.


    How can we evaluate the effects of ART treatment on child health?
 Top
 Abstract
 Introduction
 Can we study the...
 Can we differentiate ART...
 How can we evaluate...
 What are possible sampling...
 Which health end points...
 How and when should...
 How should data be...
 What conclusions can be...
 Summary
 Acknowledgements
 References
 
Two strategies for prospective recruitment of couples that will eventually undergo ART can be used to answer this question. The first is the recruitment of couples from the population or a target subgroup prior to attempting pregnancy or after failing to become pregnant during a specified period of time, such as 6 months. This approach is supported by empirical evidence that suggests approximately 65–85% of couples attempting pregnancy will do so within 6 months (Bonde et al., 1998Go; Wang et al., 2003Go). Moreover, investigators can collect baseline information and biospecimens prior to entering treatment along with longitudinal information on treatment and relevant covariates once treatment is initiated. This approach would allow investigators to capture information on the full range of treatments received prior to ART, albeit a number of important methodologic challenges and limitations. While past authors have demonstrated the validity and reliability of time-to-pregnancy (TTP) information recalled by men and women for 20+ years (Baird et al., 1991Go; Zielhuis et al., 1992Go; Joffe et al., 1993Go; 1995Go; Olsen et al., 1998Go), other authors have reported biases associated with retrospectively collected TTP such as a woman's behaviours, intentions about becoming pregnant or ability to recognize pregnancy (Weinberg et al., 1994Go).

Other limitations associated with the use of couples failing to become pregnant is the relatively large initial sample size required to enroll a sufficient number of ART births, especially when the study objective is to evaluate rare infant outcomes such as birth defects. An insufficient sample size with respect to statistical power may lead to an erroneous conclusion about the absence of an ART treatment effect (type II error).

Another prospective cohort approach, albeit less desirable, is to recruit couples upon first seeking services for perceived infecundity. Practical challenges include the receipt of a range of ‘infertility’ services from various health providers. Some women may be placed on ovulation induction agents prior to a full fecundity evaluation. These issues underscore the potential for information or misclassification bias along with other study limitations such as marked heterogeneity with respect to treatment and the need for retrospective collection of covariate data up to the time medical care was sought. This is particularly troublesome if exposure patterns change either temporally (e.g. seasonal use of pesticides or changes in cigarette smoking when perceiving conception difficulties), or purposefully if couples alter behaviours after experiencing failure (e.g. taking dietary supplements or vitamins).


    What are possible sampling strategies for evaluating the effects of ART treatment on child health?
 Top
 Abstract
 Introduction
 Can we study the...
 Can we differentiate ART...
 How can we evaluate...
 What are possible sampling...
 Which health end points...
 How and when should...
 How should data be...
 What conclusions can be...
 Summary
 Acknowledgements
 References
 
In the United States there are few sampling frameworks for ascertaining populations at risk for fecundity or fertility impairments, necessitating the need for creativity on the part of investigators. This has led to the use of convenient or clinically based samples for assessing ART treatment effects. The absence of population-based samples limits external validity or generalizability of study results. As such, the study results cannot be used in empirically based clinical decision-making beyond the treated individuals comprising the sample.

Choice of sampling strategy is impacted by the observation that a large percentage of infecund couples do not seek medical care. The prevalence of medical care-seeking behaviour for primary infecundity ranges from 32% to 92%, and from 22% to 79% for secondary infertility (Schmidt et al., 1995Go). Approximately 44% of US women reporting fecundity impairments stated that they had not sought medical services (Stephen and Chandra, 1998Go). Care-seeking behaviour has been reported to be associated with woman's age, education, income, duration of marriage and nulliparity (Rachootin and Olsen, 1981Go; Hirsh and Mosher, 1987Go), raising concern about potential selection bias. Even less is known about care-seeking behaviours of men, despite concerns about declining male fecundity or the so-called testicular dysgenesis hypothesis (Skakkebaek et al., 2001Go). Perhaps, differences in care-seeking behaviour will be minimized as health insurance providers routinely include ART services.

Implicit with sample selection is the need for sufficient sample size to minimize erroneous interpretation of study results (type II error). The unit of analysis (i.e. treatment cycle versus woman versus couple) is critical not only for determining the sampling framework, but also to ensure the analysis is appropriately statistically powered. Again, less prevalent child health outcomes such as birth defects or cancer must be designed with a sufficient sample size.


    Which health end points are appropriate for study? Can more than one end point be evaluated in a single study?
 Top
 Abstract
 Introduction
 Can we study the...
 Can we differentiate ART...
 How can we evaluate...
 What are possible sampling...
 Which health end points...
 How and when should...
 How should data be...
 What conclusions can be...
 Summary
 Acknowledgements
 References
 
A spectrum of study end points can be studied in most prospective cohort designs. While many outcomes are well defined and suitable for measurement (e.g. number of oocytes retrieved or fertilized, number of live births), others require careful operational definitions and are subject to measurement error. For example, pregnancy can be defined as a biochemical or HCG detected pregnancy, or as a clinical pregnancy as evidenced by the presence of a gestational sac or heartbeat. Pregnancies detected by HCG need to exclude false-positive detections associated with a residual treatment effect. Clinical pregnancy as an end point automatically excludes approximately two-thirds of all post-implantation HCG pregnancies and may introduce the potential for biased findings (Wilcox et al., 1988Go; Wang et al., 2003Go). The interplay between early pregnancy loss and fecundability in the following cycle has received limited study. Previous authors have found that early (HCG detected) pregnancy loss is associated with a significant increase in the odds of total conception, clinical pregnancy and subsequent early loss (Wilcox et al., 1988Go; Wang et al., 2003Go). These findings further support the role of prospective studies inclusive of a spectrum of end points that may be suitable for the eventual delineation of the effect of prior pregnancy outcome on subsequent outcome regardless of mode of conception. Thus, varying definitions for outcomes will impact the study findings and their interpretation. The study goals should a priori dictate which end points to study and the methodology should delineate operational definitions for all outcomes.

Relevant developmental end points include: plurality of birth, fetal growth and development, birth size, birth defects, developmental disabilities, and cancer. Other aspects of human development might include sexual maturation and puberty, necessitating the need for following children through adolescence. Prospective design also enables investigators to capture incident cases or affected infants in lieu of prevalent cases that might be ascertained in a retrospective study design.

Added challenges in assessing developmental end points pertain to the issue of multiple births. In the United States, an estimated 37% of ART pregnancies that progress far enough to detect fetal hearts are classified as multiple gestations and 35% of live born deliveries are multiples. Considering total infants born from ART, more than half are twins or higher order, 18 times the proportion reported for the general United States population (Wright et al., 2003Go). Given the inherent perinatal sequalae associated with multiples (ESHRE Capri Workshop Group, 2000Go), the need to clearly identify multiple pregnancies and not just multiple births is evident. This can be a difficult task, as many early multiple gestations are reduced to a singleton clinical pregnancy (Salihu et al., 2003Go). Recent data from the United States registry of ART procedures indicate that approximately 9% of singleton live births conceived from ART were from pregnancies with two or more fetal hearts observed on early ultrasound (Schieve et al., 2004aGo). The fetal death of a co-twin is a known risk factor for neonatal complications and morbidity in the remaining twin (Skrablin et al., 1994Go), especially in the presence of a twin-to-twin transfusion (Saito et al., 1999Go).


    How and when should ART treatment data be captured for analysis?
 Top
 Abstract
 Introduction
 Can we study the...
 Can we differentiate ART...
 How can we evaluate...
 What are possible sampling...
 Which health end points...
 How and when should...
 How should data be...
 What conclusions can be...
 Summary
 Acknowledgements
 References
 
Timing of data collection is inherently a function of the unit of analysis (cycle versus individual versus couple) and measure of time (calendar time versus age versus treatment duration), not to mention the plethora of relevant clinical and behavioural factors known to impact on the probability of conception and birth. The longstanding recognition of critical windows important for human development further underscores the importance of specifying time-varying covariates (Wilson, 1965Go; Selevan et al., 2000Go), including peri-conception (Morford et al., 2004Go). For ART couples alone, investigators have a unique ability to identify conception and, hence, gestational age-specific exposures. Research efforts currently aimed at determining day-specific probabilities of conception among couples attempting pregnancy offer further promise for the eventual analysis of time-specific exposures (Ferreira-Poblete, 1997Go; Royson and Ferreira, 1999Go). One of the biggest challenges facing research in this area is to capture the treatment data including individual clinic nuances when administering treatment protocols to couples (e.g. adding aspirin use to treatment protocols). The added challenge will be to capture the specific timing of the treatment in the context to other exposures the couple may face. In essence, couples are exposed to a mixture of factors, of which only one broad category is the ART treatment itself. This situation is similar to other avenues of research such as the study of environmental chemicals in the context of other physical agents (e.g. pesticides and heat) and human development.


    How should data be analysed when evaluating ART treatment and child health?
 Top
 Abstract
 Introduction
 Can we study the...
 Can we differentiate ART...
 How can we evaluate...
 What are possible sampling...
 Which health end points...
 How and when should...
 How should data be...
 What conclusions can be...
 Summary
 Acknowledgements
 References
 
Data analysis is complex on many levels, from conceptualizing the research question to devising an analytic plan appropriate to that question. Investigators are faced with interesting challenges including: (i) hierarchical nature of ART treatment; (ii) correlation patterns of exposures with respect to timing; and (iii) potential for parental interactions.

ART observations will generally not be independent, given the need to consider multiple oocytes, implantations or embryos per treatment cycle, as well as multiple treatment cycles per woman or couple. Failure to account for such dependency may lead to unpredictable bias (Dukic and Hogan, 2002Go; Daya, 2003Go). Investigators are left with either restricting treatment cycles to one per woman or to the use of analytical models capable of addressing the dependency [Bayesian methods, mixed models or generalized estimating equations (GEE)].

Without consideration of the issues of dependency and correlation, study findings may be inefficient and biased, ultimately yielding imprecise or even incorrect conclusions. It is known that clustering (form of dependency) commonly occurs in ART data, where outcomes are clustered on several levels: within menstrual or treatment cycles, within each woman and within each couple. The complex correlation structure generally observed in ART outcomes can be due to unmeasured (or omitted) level-specific covariates, such as, for example, unmeasured endometrial or embryo characteristics or omitted time-varying exposure.

Unfortunately, many investigators still report and rely on simple statistics such as t-tests or Pearson's {chi}2 statistic, in spite of multiple confounders and the lack of independence of pregnancy outcomes. Such published work may mislead clinicians and, ultimately, couples undergoing care. A variety of statistical techniques exist today for analysing data in situations where clustering occurs and induces dependency among the outcomes of interest. Tools for analysing dependent outcomes are well developed and implemented in a variety of standard statistical packages. For example, longitudinal data analysis techniques, GEE, (empirical) Bayesian hierarchical models and mixed models are readily available in widely used statistical software packages such as Stata or SAS. Other appropriate specialized statistical packages include MIXOR, EGRET, HLM, MLWin, or WinBUGS and BUGS (Hogan and Blazar, 2000Go).

Hierarchical models (also known as multi-level models, random-effects or mixed models) can account for this within-cluster correlation by assuming that each cluster has its own (unobservable) probability of a positive outcome, which is different from all other clusters, and that all cluster-specific probabilities follow some distribution. The variance of this distribution is what is known as the heterogeneity parameter, and its magnitude reflects the amount of heterogeneity among women or couples (Hogan and Blazar, 2000Go; Dukic and Hogan, 2002Go). Hierarchical models also can provide ‘cluster-specific’ estimates of treatment and other relevant covariate effects. This feature is desirable in many clinical settings when one is interested in prediction of outcomes for a given woman or a given couple.

All Bayesian models are in fact hierarchical models, where the top hierarchical level is reserved for so called ‘priors’, or summaries of knowledge elicited from previous studies or from expert opinion. This ability to formally account for prior knowledge can be extremely useful, particularly in ART settings. Considerable information on prior history is likely to exist for many couples involved in ART studies, and investigators should be able to formally include that information in the analysis.

Methods like GEE can also account for within-cluster correlation. However, they cannot produce subject-specific estimates, but instead yield population-level inference. This can be well suited for policy making, but less useful for individual clinical prediction. Other methods for handling dependent outcomes are available, such as variance-adjustment techniques (sandwich estimators) or bootstrap.

Another very important analytical issue is missing data and informative drop-out in a longitudinal study setting. Missing data mechanisms and reasons for dropout can in general be extremely complicated in an ART setting, resulting in severe selection bias for ART research. To address non-negligible dropout issues, explicit models for the drop-out mechanism or joint models for the outcome of interest and time to drop-out are needed (Hogan and Laird, 1997Go). Appropriate statistical input will generally be required for design and implementation of such research.


    What conclusions can be reached from available and future data?
 Top
 Abstract
 Introduction
 Can we study the...
 Can we differentiate ART...
 How can we evaluate...
 What are possible sampling...
 Which health end points...
 How and when should...
 How should data be...
 What conclusions can be...
 Summary
 Acknowledgements
 References
 
As with all epidemiological research, study findings need to be evaluated within one of the established paradigms for assessing causality (Weed, 1995Go). Such an approach ensures that the methods are appropriate given the design and that the results are interpreted within the study's limitations and strengths. Two particularly worrisome observations that appear in the published literature to date with respect to the interpretation of ART treatment effects and child health are lack of statistical power for the relevant unit of analysis and limited attention to alternative explanations. With respect to power, several authors report the absence of an ART treatment effect on rare outcomes such as birth defects, despite insufficient statistical power (Sutcliffe et al., 1995Go; Dhont et al., 1997Go). The issue of limited statistical power for birth defects and developmental disabilities for many studies has recently been addressed (Buitendijk, 1999Go) and can be exaggerated further if a good response rate is not achieved. This concern is further amplified by the fact that ignoring dependency in ART outcomes will tend to overestimate the precision of the findings and consequently result in overestimated statistical power. These early studies, albeit underpowered for many outcomes, were initial attempts to assess health implications of ART on children's health. Current investigators need to build upon these efforts with the implementation of study designs appropriate for the research question and sensitive to the methodological nuances described in this paper if definitive answers are to be obtained.

As researchers, we are left answering the question about RR3 as described above: what is the compound effect of infertility and ART treatment on child health? This challenge requires the use of a standardized study protocol involving prospective data collection with various laboratory, clinical and behavioural factors, and measures of fecundity impairment. With accurate information on type and diagnostic subtype of infecundity, it will be possible to adjust statistical analyses to further assess effects.


    Summary
 Top
 Abstract
 Introduction
 Can we study the...
 Can we differentiate ART...
 How can we evaluate...
 What are possible sampling...
 Which health end points...
 How and when should...
 How should data be...
 What conclusions can be...
 Summary
 Acknowledgements
 References
 
Despite the challenges in designing research initiatives aimed at assessing the relationship between ART treatment and child health as globally defined, concerted study can be done and answers to pressing questions are possible. This will most likely require conceding that we will not be able to totally eliminate effects stemming from underlying fecundity impairments from treatment. But, we will be able to estimate child health risks among infecund couples receiving ART (or RR3, as defined in this paper as the compound effect of infecundity and ART). This alone will be a contribution to the field and offer promise for the early identification of potential reproductive and developmental toxicants at critical windows for humans.


    Acknowledgements
 Top
 Abstract
 Introduction
 Can we study the...
 Can we differentiate ART...
 How can we evaluate...
 What are possible sampling...
 Which health end points...
 How and when should...
 How should data be...
 What conclusions can be...
 Summary
 Acknowledgements
 References
 
The authors thank Dr Courtney J. Lynch for her critical review and feedback provided on the paper.


    References
 Top
 Abstract
 Introduction
 Can we study the...
 Can we differentiate ART...
 How can we evaluate...
 What are possible sampling...
 Which health end points...
 How and when should...
 How should data be...
 What conclusions can be...
 Summary
 Acknowledgements
 References
 
Anthony S, Buitendijk SE, Dorrepaal CA, Lindner K, Braat DDM and den Ouden AL (2002) Congenital malformations in 4224 children conceived after IVF. Hum Reprod 17, 2089–2095.[Abstract/Free Full Text]

Australian In vitro Fertilization Collaborative Group (1985) High incidence of preterm birth and early losses in pregnancy after in vitro fertilization. Br Med J 291, 1160–1163.[ISI][Medline]

Baird DD, Weinberg CR and Rowland AS (1991) Reporting errors in time-to-pregnancy data collected with a short questionnaire. Am J Epidemiol 133, 1282–1290.[Abstract]

Bergh T, Ericson A, Hillensjo T, Nygren KG and Wennerholm UB (1999) Deliveries and children born after in-vitro fertilization in Sweden 1982–95: a retrospective cohort study. Lancet 354, 1579–1585.[CrossRef][ISI][Medline]

Bonde JP, Hjollund NH, Jensen TK, Ernst E, Kolstad H, Henriksen TB, Giwercman A, Skakkebaek NE, Andersson A-M and Olsen J (1998) A follow-up study of environmental and biologic determinants of fertility among 430 Danish first-pregnancy planners: design and methods. Reprod Toxicol 12, 19–27.[CrossRef][ISI][Medline]

Buck GM, Sever LE, Batt RE and Mendola P (1997) Review of lifestyle factors and female infertility. Epidemiology 8, 435–441.[ISI][Medline]

Buitendijk SE (1999) Children after in vitro fertilization: an overview of the literature. Int J Technol Assess Health Care 15, 52–65.[CrossRef][ISI][Medline]

CDC, American Society for Reproductive Medicine, Society for Assisted Reproductive Technology and RESOLVE (1997) 1995 Assisted Reproductive Technology Success Rates: National Summary and Fertility Clinic Reports. CDC, Atlanta, GA, USA.

CDC, American Society for Reproductive Medicine and Society for Assisted Reproductive Technology (2003) Assisted Reproductive Technology Success Rates: National Summary and Fertility Clinic Reports. CDC, Atlanta, GA, USA.

Collins JA, Burrows EA and Willan AR (1995) The prognosis for live birth among untreated infertile couples. Fertil Steril 64, 22–28.[ISI][Medline]

Daya S (2003) Pitfalls in the design and an alysis of efficacy trials in subfertility. Hum Reprod 18, 1005–1009.[Free Full Text]

Dhont M, DeNeubourg F, Van Der Elst J and De Sutter P (1997) Perinatal outcome of pregnancies after assisted reproduction: A case control study. J Assist Reprod Genet 14, 575–580.[CrossRef][ISI][Medline]

Dukic V and Hogan JW (2002) A hierarchial Bayesian approach to modeling embryo implantation following in vitro fertilization. Biostatistics 3, 361–377.[Abstract/Free Full Text]

Ericson A and Kallen B (2001) Congenital malformations in infants born after IVF: A population-based study. Hum Reprod 16, 504–509.[Abstract/Free Full Text]

ESHRE Capri Workshop Group (2000) Multiple gestation pregnancy. Hum Reprod 15, 1856–1864.[Abstract/Free Full Text]

Ferreira-Poblete A (1997) The probability of conception on different days of the cycle with respect to ovulation: an overview. Adv Contracept 13, 83–95.[CrossRef][ISI][Medline]

Hansen M, Kurinczuk JJ, Bower C and Webb S (2002) The risk of major birth defects after intracytoplasmic sperm injection and in vitro fertilization. N Engl J Med 346, 725–730.[Abstract/Free Full Text]

Helmerhorst FM, Perquin DAM, Donker D and Keirse MJNC (2004) Perinatal outcome of singletons and twins after assisted conception: a systematic review of controlled studies. BMJ 328, 261.[Abstract/Free Full Text]

Hirsh MB and Mosher WD (1987) Characteristics of infertile women in the United States and their use of infertility services. Fertil Steril 47, 618–625.[ISI][Medline]

Hogan J and Laird N (1997) Mixture models for the joint distribution of repeated measures and event times. Stat Med 16, 239–258.[CrossRef][ISI][Medline]

Hogan J and Blazar A (2000) Hierarchical logistic regression models for clustered binary outcomes in studies of IVF-ET. Fertil Steril 73, 575–581.[CrossRef][ISI][Medline]

Jackson RA, Gibson KA, Wu YW and Croughan MS (2004) Perinatal outcomes in singletons following in vitro fertilization: a meta analysis. Obstet Gynecol 103, 551–563.[ISI][Medline]

Joffe M, Villard L, Li Z, Plowman R and Vessey M (1993) Long term recall of time-to-pregnancy. Fertil Steril 60, 99–104.[ISI][Medline]

Joffe M, Villard L, Li Z, Plowman R and Vessey M (1995) A time to pregnancy questionnaire designed for long term recall: validity in Oxford England. J Epidemiol Community Health 49, 314–319.[Abstract]

Koudstaal J, Braat DD, Bruinse HW, Naaktgeboren N, Vermeiden JP and Visser GH (2000) Obstetric outcome of singleton pregnancies after IVF: a matched control study in four Dutch university hospitals. Hum Reprod 15, 1819–1825.[Abstract/Free Full Text]

Medical Research International and The American Fertility Society Special Interest Group (1988) In vitro fertilization/embryo transfer in the United States: 1985 and 1986 results from the National IVF-ET Registry. Fertil Steril 49, 212–215.[ISI][Medline]

Morford LL, Henck JW, Breslin WJ and DeSesso JM (2004) Hazard identification and predictability of children's health risk from animal data. Environ Health Perspect 112, 266–271.[ISI][Medline]

Olsen J, Juul S and Basso O (1998) Measuring time to pregnancy-methodological issues to consider. Hum Reprod 13, 1751–1753.[Free Full Text]

Rachootin P and Olsen J (1981) Social selection in seeking medical care for reduced fecundity among women in Denmark. J Epidemiol Community Health 35, 262–264.[Abstract]

RESOLVE and The National Infertility Association (2003) Online. http://www.resolve.org (date accessed: October 2004).

Royson P and Ferreira A (1999) A new approach to modeling daily probabilities of conception. Biometrics 55, 1005–1013.[CrossRef][ISI][Medline]

Rufat P, Olivennes F, DeMouzon F, Dehan M and Frydman R (1994) Task force report on the outcome of pregnancies and children conceived by in-vitro fertilization (France: 1987–1989). Fertil Steril 61, 324–330.[ISI][Medline]

Saito K, Ohtsu Y, Amano K and Nishijima M (1999) Perinatal outcome and management of single fetal death in twin pregnancy: a case series and review. J Perinat Med 27, 473–477.[ISI][Medline]

Salihu HM, Aliyu MH, Rouse DJ and Kirby RS (2003) Potentially preventable excess mortality among higher-order multiples. Obstet Gynecol 102, 679–684.[Abstract/Free Full Text]

Schieve LA, Meikle SF, Ferre C, Peterson HB, Jeng G and Wilcox LS (2002) Low and very low birth weight in infants conceived with use of assisted reproductive technology. N Engl J Med 346, 731–737.[Abstract/Free Full Text]

Schieve LA, Rasmussen SA, Buck GM, Schendel DE, Reynolds M and Wright V (2004b) Are children conceived with assisted reproductive technology at increased risk for adverse health outcomes? Obstet Gynecol 103, 1154–1163.[CrossRef][Medline]

Schieve LA, Ferre C, Peterson HB, Macaluso M, Reynolds MA and Wright VC (2004a) Perinatal outcome among singleton infants conceived through assisted reproductive technology in the United States. Obstet Gynecol 103, 1144–1153.[CrossRef][Medline]

Schmidt L, Munster K and Helm P (1995) Infertility and the seeking of infertility treatment in a representative population. BMJ 102, 978–984.

Selevan SG, Kimmel CA and Mendola P (2000) Identifying critical windows of exposure for children's health. Environ Health Perspect 108 (Suppl 3), 451–455.[ISI][Medline]

Skakkebaek NE, Rajpert-De Meyts E and Main KM (2001) Testicular dysgenesis syndrome: an increasingly common developmental disorder with environmental aspects. Hum Reprod 15, 972–978.[CrossRef]

Skrablin S, Kuvacic I, Fuduric I and Hodzic D (1994) Antenatal fetal demise in multiple gestation-the outcome of surviving fetus at one to 4 years of age. Eur J Obstet Gynecol Reprod Biol 56, 15–19.[ISI][Medline]

Stephen EH and Chandra A (1998) Updated projections of infertility in the United States: 1995–2025. Fertil Steril 70, 30–34.[CrossRef][ISI][Medline]

Stromberg B, Dahlquist G, Ericson A, Finnstrom O, Koster M and Stjernqvist K (2002) Neurological sequelae in children born after in-vitro fertilisation: a population-based study. Lancet 359, 461–465.[CrossRef][ISI][Medline]

Sutcliffe AG, Souza SW, Cadman J, Richards B, McKinlay IA and Lieberman B (1995) Minor congenital anomalies, major congenital malformations and development in children conceived from cryopreserved embryos. Hum Reprod 10, 3332–3337.[Abstract]

Vail A and Gardener E (2003) Common statistical errors in the design and analysis of subfertility trials. Hum Reprod 18, 1000–1004.[Abstract/Free Full Text]

Van Steirteghem A, Bonduelle M, Devroey P and Liebaers I (2002) Follow-up of children born after ICSI. Hum Reprod 8, 111–115.[CrossRef]

Wang X, Chen C, Wang L, Chen D, Guang W and French J (2003) Conception, early pregnancy loss, and time to clinical pregnancy: a population-based prospective study. Fertil Steril 79, 577–584.[CrossRef][ISI][Medline]

Weed DL (1995) Causal and preventive inference. In Greenwald P, Kramer BS and Weed DL (eds) Cancer Prevention and Control. Marcel Dekker, New York, NY, USA, pp 285–302.

Weinberg C, Baird D and Wilcox A (1994) Sources of bias in studies of time to pregnancy. Stat Med 13, 671–681.[ISI][Medline]

Wennerholm UB, Bergh C, Hamberger L, Lundin K, Nilsson L, Wikland M and Kallen B (2000) Incidence of congenital malformations in children born after ICSI. Hum Reprod 15, 944–948.[Abstract/Free Full Text]

Wilcox AJ, Weinberg CR, O'Connor JF, Baird DD, Schlatterer JP, Canfield RE, Armstrong EG and Nisula BC (1988) Incidence of early pregnancy loss. N Engl J Med 319, 189–194.[Abstract]

Wilson JG (1965) Embryological considerations in teratology. In Wilson JG and Warkany J (eds) Teratology: Principles and Techniques. The University of Chicago Press, Chicago, IL, USA, pp 256.

Wood JW (1994) Dynamics of Human Reproduction. Aldine De Gruyter, New York, NY, USA, pp 3.

Wright VC, Schieve LA, Reynolds MA and Jeng G (2003) Assisted reproductive technology surveillance - United States, 2000 (published erratum appears in MMWR 52,942). MMWR Surveill Summ 52, 1–16.[Medline]

Zielhuis GA, Hulscher ME and Florack EI (1992) Validity and reliability of a questionnaire on fecundability. Int J Epidemiol 21, 1151–1156.[Abstract]

Submitted on May 12, 2004; accepted on September 10, 2004.





This Article
Abstract
Full Text (PDF )
All Versions of this Article:
20/1/12    most recent
deh542v1
Alert me when this article is cited
Alert me if a correction is posted
Services
Email this article to a friend
Similar articles in this journal
Similar articles in PubMed
Alert me to new issues of the journal
Add to My Personal Archive
Download to citation manager
Request Permissions
Google Scholar
Articles by Buck Louis, G.M.
Articles by Schieve, L.A.
PubMed
PubMed Citation
Articles by Buck Louis, G.M.
Articles by Schieve, L.A.