Reliability of serial urine HCG as a biomarker to detect early pregnancy loss

S.-I. Cho1, M.B. Goldman2,9, L.M. Ryan3, C. Chen1, A.I. Damokosh1, D.C. Christiani1,4, B.L. Lasley5, J.F. O'Connor6, A.J. Wilcox7 and X. Xu1,8,10

1 Department of Environmental Health, 2 Department of Epidemiology, Harvard School of Public Health, 3 Dana-Farber Cancer Institute, Harvard Medical School, and Department of Biostatistics, Harvard School of Public Health, 4 Massachusetts General Hospital, Harvard Medical School, Boston, 5 Institute of Toxicology and Environmental Health, University of California, Davis, CA, 6 Columbia University College of Physicians and Surgeons, New York, NY, 7 Epidemiology Branch, National Institute of Environmental Health Sciences, Research Triangle Park, NC and 8 Channing Laboratory, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA


    Abstract
 Top
 Abstract
 Introduction
 Materials and methods
 Results
 Discussion
 Acknowledgements
 References
 
BACKGROUND: To examine the reliability of HCG as a biomarker for early pregnancy loss, five experienced researchers independently assessed data from 153 menstrual cycles, determining whether each cycle represented `no conception,' a `continuing conception' or a `conception lost.' METHODS: Urine samples were analysed by immunoradiometric assay using a combination of capture antibodies for the intact heterodimer (B109) and for an epitope common to the beta subunit and the beta core fragment (B204). For each cycle, HCG data were presented as graphs of daily assay results. Summary statistics for HCG assays from 46 women who had undergone bilateral tubal ligation represented baseline values. RESULTS: Pairwise agreement among the assessors for any of the three options ranged from 78–89%. At least three experts agreed for 147 cycles (96%), accounting for 28 conception losses and 19 continuing conceptions. The multi-rater kappa was 0.62 for the conception lost category and 0.68 for continuing conceptions, indicating substantial agreement. CONCLUSION: The main sources of disagreement involved deciding whether there was sufficient information for assessment, interpreting cycle parameters such as cycle length or bleeding event, and interpreting a distinct HCG rise pattern that does not exceed the baseline value obtained from the sterilized women.

Key words: early pregnancy loss/human chorionic gonadotrophin/immunoradiometric assay/reliability


    Introduction
 Top
 Abstract
 Introduction
 Materials and methods
 Results
 Discussion
 Acknowledgements
 References
 
Daily measurement of HCG during the luteal phase of the menstrual cycle is an excellent biomarker of early pregnancy status (Armstrong et al., 1984Go; Wilcox et al., 1985Go). Techniques for detecting early pregnancy loss (EPL) have been evolving since the early 1980s in an effort to develop standardized methods that could be applied to large-scale epidemiologic studies (O'Connor et al., 1994Go). Since HCG is not solely produced by the conceptus, but also is secreted by the pituitary (in both pregnant and non-pregnant individuals), sensitivity and specificity of the method have been key issues. Detection of EPL, a common pregnancy outcome that is not noticeable under ordinary conditions, requires a sensitive biochemical test for pregnancy applied in a prospectively conducted study. Without such a test, EPLs will be mistaken for non-conception cycles.

The landmark study by Wilcox et al. of 221 healthy women who were trying to conceive found that the incidence of pregnancy loss was 32% of all conceptions, and that two-thirds of these were losses that would have been unrecognized without the use of HCG as a biomarker (Wilcox et al., 1988Go). That study used data from 28 women who had been sterilized by bilateral tubal ligation to obtain `baseline' HCG values. A pregnancy was defined as 3 consecutive days of HCG values >0.025 ng/ml as detected by the B101-R525 assay, which identified intact HCG as well as some of the free beta subunit.

After that study, the assay method underwent continuing development, partly because the rabbit polyclonal antibody used by Wilcox et al. had been depleted, and also because the amount of urine required by the assay was too large for application to full-scale epidemiologic studies (O'Connor et al., 1994Go). Subsequently, an immunoradiometric assay that uses a combination of capture antibodies for a beta subunit epitope (B204) and the intact heterodimer (B109) (the so called `combo assay') has been widely applied to epidemiologic studies (Lasley and Shideler, 1994Go; Hakim, et al., 1995Go; Ellish, et al., 1996Go; Zinaman, et al., 1996Go).

To separate EPL cycles from non-conception cycles requires not only a sensitive test, but also criteria to distinguish noise from the true signal, thus the new assay method requires that criteria be developed for defining an EPL. Since there is no `gold standard' for an EPL, expert judgement becomes an important component in developing any algorithm. As a further challenge, false positives are probably more of a threat than false negatives, simply because there are far more non-conception cycles than EPLs.

To date, each study has used a different algorithm and, therefore, different definitions of pregnancy and subsequent pregnancy loss. For example, Lasley et al. considered a 2 day HCG rise >0.15 ng/ml within 3 consecutive days as an indicator of conception (Lasley et al., 1995Go). Hakim et al. used 2 consecutive days >0.25 ng/ml (Hakim et al., 1995Go) and Zinaman et al. 3 consecutive days >=0.15 ng/ml as their cut-off (Zinaman et al., 1996Go). Because of the lack of a true gold standard, it is difficult to assess the performance of these criteria. Ellish et al. showed that the frequency of early pregnancy loss ranged from 11.0–26.9% depending on the definition used (Ellish et al., 1996Go). Furthermore, in addition to differences in defining a meaningful HCG rise and therefore an early pregnancy, determining a pregnancy loss often requires the investigator's consideration of the timing of the HCG rise within a cycle, the variability of baseline HCG measurements across cycles, and missing data.

As a first step in developing an EPL algorithm, five experts were invited to interpret assay results subjectively and these interpretations were compared with a crude preliminary algorithm. In this study, we examine the reliability of HCG data interpretation by comparing the assessments of the five experienced researchers who independently reviewed identical graphs containing daily plots of HCG values (Figure 1Go) to determine whether each menstrual cycle represented `no conception', a `continuing conception' or a `conception lost'.



View larger version (26K):
[in this window]
[in a new window]
 
Figure 1. An example of the graph sent to the experts. Measurement 1: singleton assay; measurements 2 and 3: duplicate assays; mean: arithmetic average of 2 and 3. The HCG values represented by dotted lines are 0.6, 1.2, 2.4 and 4.8 ng/ml. The first day of menses is indicated by the horizontal bar with the length of the bleeding episode marked on the bottom graph.

 

    Materials and methods
 Top
 Abstract
 Introduction
 Materials and methods
 Results
 Discussion
 Acknowledgements
 References
 
Field study procedures
The data for the assessment presented here are from our ongoing National Institute of Health-funded prospective reproductive health study among women textile workers in Anqing, China (Ronnenberg et al., 2000Go). At each mill, a listing of all full-time women workers was obtained from the personnel office. Study staff communicated weekly with the mill's family planning and women's health officers who are in charge of issuing birth permissions and women's reproductive health care respectively, to identify all married women workers aged 20–34 years who were currently employed by the mill and had obtained permission to have a child.

Women were excluded if they: (i) were already pregnant; (ii) had tried unsuccessfully to conceive for >=1 year; (iii) were current or former smokers (defined as ever smoking at least one cigarette per day for 6 months or more); or (iv) planned to quit, change jobs, or move out of the city during the 1 year follow-up period. From 1996–1998, a total of 190 women were enrolled in the study. The participation rate among eligible women was ~90%.

The participants kept menstrual diaries and collected their first morning urine sample every day from enrolment to the end of follow-up, which ended when a pregnancy was clinically recognized or after 1 year, whichever occurred first. In total, daily urine specimens were collected and analysed for 763 menstrual cycles from the 190 women.

Women were given a health evaluation, educated about prenatal care and given a remuneration for the inconvenience of providing daily urine samples. The Human Subjects Committees at the Harvard School of Public Health and the China Medical Institutes approved all study procedures and informed consent was obtained from each participant.

Laboratory analysis
Urine samples were analysed for HCG by the immunoradiometric assay (IRMA) developed by O'Connor et al. using a combination of anti-fragment B204 and anti-intact B109 clones (O'Connor et al., 1988Go). The detailed characteristics and behaviour of this assay have been previously described (Ellish et al., 1996Go; O'Connor et al., 1998Go). A singleton measurement was performed on all samples. Urine creatinine levels were measured according to the method of Jaffe (Husdan and Rapoport, 1968Go). All HCG values were normalized to creatinine values to adjust for urine concentration.

Selection of cycles and urine samples
For quality control purposes, we selected a subset of the 763 cycles for duplicate HCG measurements. To increase efficiency, the selection was stratified so that cycles with consistently low or high HCG values in the singleton analysis were less frequently sampled. The selection proceeded as follows: (i) 47 cycles with consistently low HCG values (defined as no 2 consecutive days with HCG levels >0.6 ng/ml) were randomly sampled out of 501 such cycles (~10%); (ii) 18 cycles with high values were randomly selected out of 129 in this category (~15%). High values were defined as HCG levels >1.2 ng/ml for the last 4 days of the menstrual cycle, if there was a subsequent bleeding episode. For the cycles where high HCG values continued and no bleeding episode was observed, HCG was measured for the window –7 to +8 days around the start of the HCG rise, counting from the first of the 3 days with HCG levels >0.6 ng/ml; (iii) All of the remaining 133 cycles were included. Among the 198 cycles in the subset, duplicate assay results were not available for 45 cycles, either because of missing urine samples for >10 days of the cycle or insufficient quantity of urine. These were excluded, leaving 153 cycles from 78 subjects. For these selected cycles, HCG was measured in duplicate for the window of days around day 1 of bleeding (–10 to +5); 49 cycles had no missing values within the window, 71 had 1–3, and 33 had 4–9 values missing. The duplicate assays were measured 4 months after the original singleton assay. The final analysis included 1950 urine samples from the 153 cycles.

As a comparison, non-conceptive levels of HCG were determined from urine samples contributed by 46 women aged 20–34 years who had had a recent bilateral tubal ligation and who had no known fertility problems or chronic illnesses, and at least one successful pregnancy. These women were generally similar in their characteristics to the women trying to conceive. From these women, 2496 daily urine samples were collected over two menstrual cycles. Of those, 696 (27.9%) had detectable HCG levels; the lowest value observed was 0.0005 ng/ml.

Classification of cycles by algorithm
To compare with the expert assessment, we classified the cycles according to an algorithm adapted from Wilcox et al. (Wilcox et al., 1988Go). They found that out of 28 sterilized women, one had 2 consecutive days >0.025 ng/ml HCG, while no woman had 3 consecutive days above that level. We applied a similar logic and obtained 0.6 ng/ml for the maximum of all consecutive 2 day minima observed in our study. Comparison is made to 3 and 4 day minima (Table IGo). We defined an `HCG rise' as 3 consecutive days with HCG levels above this baseline, calculated as the geometric mean of the three measurements (singleton and duplicates). Among the 55 cycles with an HCG rise, 15 were classified as a `continuing conception' because the elevated HCG values were sustained until a clinical pregnancy was confirmed. For the rest of the cycles, those with an HCG rise within the –10 to +5 day window around a bleeding episode were classified as a `conception lost'. Cycles with no HCG rise within the window around the bleeding episode were classified as `no conception'. In this algorithm, cycles with missing values may be classified as `no conception' because of insufficient information to define an HCG rise.


View this table:
[in this window]
[in a new window]
 
Table I. Distribution of HCG values among sterilized women (2496 days, 46 subjects)
 
Expert assessment
We formed an expert panel of five experienced researchers in reproductive epidemiology and endocrinology. Each expert had a doctoral and/or medical degree and a minimum of 10 years of experience in research related to the development of relevant laboratory methods, quantitative assessment of laboratory results, or the use of laboratory methods in population-based research studies. For each cycle, a graph containing the singleton assay results, the two replicates, and the mean of the two replicates was provided. Cycle day and menstrual bleeding days were also marked (Figure 1Go). Mean, median, range, SD and coefficient of variation were provided for the singleton and the replicate assays for reference. For the samples from the sterilized women, 2, 3, and 4 day baseline values were given in addition to summary statistics (Table IGo). Each expert was asked to assess whether a given cycle represented: (i) no conception, (ii) conception continuing, or (iii) conception lost. The experts were also asked to indicate their degree of confidence in their choice by checking one of the numbers listed on a scale from 0 (no confidence) to 10 (perfect confidence).

Statistical analysis
In evaluating the judgements of the panel, we began by comparing the frequency of the three possible outcomes reported by the experts. Pairwise agreement among any two of the experts was then obtained. The number and percentage were calculated for the cycles that were classified into each of the three possible categories by the two experts being paired. The cycles that were rated as `undetermined' by either of the two experts were excluded. The assessment by five experts was combined to form one criterion to classify all the cycles. The criteria were defined in three different ways: by the agreement of at least three, at least four, and all five experts. The outcomes were compared among the three definitions. The definition by at least three experts was compared with the definition by the algorithm developed from the sterilized women's samples.

To identify the most frequent pattern of disagreement, the cycles were classified into all possible categories involving different outcomes. The number and percentage of the cycles were compared for the different patterns of disagreement.

The overall summary measure of agreement among the experts' assessments was obtained by calculating a multi-rater kappa (Fleiss, 1981Go).


    Results
 Top
 Abstract
 Introduction
 Materials and methods
 Results
 Discussion
 Acknowledgements
 References
 
The 78 women were relatively young workers in textile mills (Table IIGo). None reported a history of smoking or alcohol use. Most of the women (87%) had never used contraception, and only one woman reported prior use of oral contraceptives. Sixteen women had a history of pregnancy, the majority of which ended in an induced abortion.


View this table:
[in this window]
[in a new window]
 
Table II. Characteristics of study participants (n = 78): non-smoking, non-drinking women planning to conceive in Anqing, China
 
The number of cycles rated as `conception lost' ranged from 21 (14%) to 39 (25%) of 153 cycles selected for assessment, a two-fold difference between the minimum and the maximum (Table IIIGo). Similar differences between the raters were observed when the comparison was restricted to the cycles assessed with a confidence level >=8. The percentage of cycles assessed with a confidence level of >=8 varied among the assessors, ranging from 48–75% of the total cycles, suggesting that some raters were more conservative than others in making a decision. Some of the raters chose to rate cycles as `undetermined' when they felt that those cycles did not have sufficient information on which to base a decision. Reasons given for an undetermined rating included presence of missing values, high variability and an ambiguous data pattern. The experts differed in their judgement whether to classify a cycle with lower confidence or rate it as undetermined. The decision by algorithm appeared generally comparable with the expert assessment. However, the algorithm tended to classify more cycles into the `conception lost' category compared with the experts.


View this table:
[in this window]
[in a new window]
 
Table III. Number (percentage) of outcomes among 153 menstrual cycles, by expert rater and by a decision algorithm based on 0.6 ng/ml for 3 days
 
Table IVGo shows percentages of pairwise agreements where each of the experts is paired with the four other raters respectively. The cycles that either of the pair rated as undetermined were excluded from the calculation. The agreements ranged from 78–92%. The average of the four percentages for each expert was calculated. They were 87, 89, 84, 89 and 85% respectively for each expert, showing that none of the raters was particularly far apart from the rest of the panel in the assessment. The decision made by the algorithm showed similar agreement with each expert. However, in general, the experts were much more likely to agree with each other than with the `objective' algorithm.


View this table:
[in this window]
[in a new window]
 
Table IV. Percentage of pairwise agreements (number of cycles assessed by both) among experts and a decision algorithm based on 0.6 ng/ml for 3 daysa
 
We examined the number of assessment outcomes by three different definitions: agreement among at least three, at least four, and all five experts (Table VGo). For example, at least three experts rated 100 cycles as `no conception'. There were 12 cycles for which all five experts agreed the classification was `conception lost'. Definition of the cycle category by the agreement of three or more experts could classify 96% of the 153 cycles. All five experts concurred on 36% of the 153 cycles.


View this table:
[in this window]
[in a new window]
 
Table V. Number of cycles (percentage among 153 cycles) for which most experts agreed. Outcomes of algorithm based on 0.6 ng/ml for 3 days are also shown
 
The classification of cycles by the algorithm was compared with the assessment of at least three experts (Table VIGo). The algorithm was consistent with the expert decision for 124 cycles (81%). The major discrepancy occurred for 11 cycles classified as `conception lost' by the algorithm, but as `no conception' by the experts. These cycles tended to have either large day-to-day variability, generally high levels throughout the entire cycle, or inconsistency between the initial and the repeated measurements. Six cycles classified as `conception lost' by the experts were not detected by the algorithm. Although they showed a distinct rise and fall pattern in HCG measurements, the levels were either <0.6 ng/ml or failed to meet the 3 day criteria because of missing values. There was only one cycle that was clinically verified as a pregnancy but was not detected by the algorithm because of missing values. This cycle was left undetermined by three of the five experts and thus categorized as `not determined'.


View this table:
[in this window]
[in a new window]
 
Table VI. Number of cycles classified by expert (agreement among at least three) and by algorithm (0.6 ng/ml for >=3 days)
 
Disagreements between the experts primarily involved two different assessments. For example, some experts classified a cycle as `no conception', whereas others classified it as `conception lost'. There was only one cycle that involved all three outcomes. In Table VIIGo, the diagonal cells represent the cycles for which there was no disagreement among the experts who gave assessment. The upper portion (A) of Table VIIGo includes all cycles except for the one cycle for which three different assessments were given. The lower portion (B) of Table VIIGo excludes the cycles that any expert left undetermined or gave an assessment with confidence <5. The subscript `b' indicates those for which there were disagreements in the classification. For example, in A, there were 29 cycles for which the assessments were split between `no conception' and `conception lost'. Similarly in B, 4 cycles are left for this pattern of disagreement. The results show that the disagreements occurred most frequently in deciding between `no conception' and `conception lost'.


View this table:
[in this window]
[in a new window]
 
Table VII. Number (percentage) of cycles by patterns of disagreement among experts (for example, assessments were split between `no conception' and `conception lost' for 28 cycles, as shown in the upper portion of table)
 
The multi-rater kappa was 0.62 for the `conception lost' category, and 0.68 for the `continuing conception' category, indicating substantial agreement among the five experts. The kappa was 0.52 for the `no conception' category, indicating moderate agreement among the experts.


    Discussion
 Top
 Abstract
 Introduction
 Materials and methods
 Results
 Discussion
 Acknowledgements
 References
 
This study describes the reliability of the interpretation of serial urine HCG assay data by comparing the assessments of five independent experts. Despite the variability among different raters, the data indicate that there is substantial agreement among the raters in detecting early pregnancy and EPL. Using criteria based on agreement by at least three of the five experts, 96% of the cycles could be classified into one of three categories, `no conception', `continuing conception' or `conception lost'. There were clear cases of EPL on which all five experts agreed. Risk of EPL among the conceptive cycles (conception continuing and conception lost categories in Table IIIGo) ranged from 50–85%. If we restrict results to `outcomes determined with high degree of confidence,' the risk of EPL ranged even more widely: from 38–100% (Table IIIGo). The crude algorithm lands in about the middle of this range, with 73%. (Note that these results are not the true conception rate in the entire study population, since we selected a subset of the study cycles for this exercise.)

Some of the variation is due to differences in the number of cycles the experts excluded as having `insufficient information'. The problem is not simply to define EPL, but to decide what is sufficient evidence for a decision. There was significant variability among the raters in deciding whether a specific cycle contained sufficient information to determine the outcome, suggesting that an explicit criterion defining insufficient information may be needed to standardize the procedure. There seemed to be several different factors implicated in the assessment. These include availability of replicate assays, quality of the laboratory assay as observed within the cycle, the characteristics of the cycle such as length and bleeding duration, and the amount and location of missing data within a cycle. Most of these factors are difficult to quantify with individual raters using their subjective judgement to reach a conclusion.

We did not have information on the date of ovulation for each menstrual cycle to precisely identify the luteal phase, in which the conceptus starts to produce HCG. Instead, we used the –10 to +5 day window around the bleeding episode to detect a `conception lost'. Since the luteal phase is less variable than the follicular phase (Harlow and Ephross, 1995Go), the window we used may be expected to have reasonable accuracy in capturing the relevant period. Nevertheless, precise determination of the luteal phase will reduce false positives in cycles with an unusually short luteal phase, whenever it is practical to obtain the day of ovulation. Identifying cycles in which no ovulation or no intercourse occurs will also reduce false positives by excluding cycles with zero probability of conception. However, measurement validity needs to be assessed in order to utilise additional information on ovulation and intercourse.

The threshold in this study was an HCG level of 0.6 ng/ml, significantly higher than that in previous studies. The early study by Wilcox et al. used an assay method that detected intact HCG and a portion of the free beta subunit (Wilcox et al., 1988Go). The `combo' assay used in our study also measured the beta core fragment; therefore, we would expect to show higher levels than the Wilcox study. The higher threshold in our study, compared with later studies using similar assays, may reflect that our subjects are newly married and younger than the women in previous studies or may result from differences between laboratories. Whilst we adhered to a strict quality assurance protocol and the baseline levels detected were consistent for each woman, differences between assays and laboratories may warrant investigation through future collaborative work.

Within-cycle variability of the HCG values, whether from technical or biological sources, may also be an issue. In some cycles, the baseline values were very low, yet a distinct pattern of HCG rise was readily identifiable, although the peak was lower than the minimum definition of HCG rise, based on samples from the sterilized women. For such cycles, raters may differ in their assessment depending on their relative emphasis on the pattern within the cycle and the cut-off derived from the sterilized women. Further investigation is needed to elucidate how the between-cycle or between-woman variability of HCG baseline affects the overall results in epidemiologic studies.

Classification of cycles by an algorithm derived from samples from the sterilized women performed reasonably well if combined with the ultimate knowledge of clinical pregnancy status from continued follow-up. However, the algorithm tended to produce false positives for EPLs, particularly for the cycles in which HCG levels were more variable or generally high. Such variability and higher levels may be biological or technical. Some cycles in our data set showed obvious differences in assay variability between the initial measurements and the additional measurements performed 4 months later. Appropriate standardization of laboratory quality control procedures to minimize technical variability will help reduce false positives. Further studies are needed to better characterize the biological variability in baseline HCG levels.

The factors that cause variation in HCG urinary excretion are imperfectly understood. This is especially a problem when interpreting the subtle patterns of HCG rise and fall produced by a faltering blastocyst and measured by less-than-perfect assays. We found that experts were much more likely to agree with each other than with an `objective' algorithm. When no gold standard is available, expert human judgement may offer a surrogate standard for refining an objective algorithm. In addition, expert human judgement may provide a tool for extracting information from biological patterns beyond that which any explicit algorithm could accomplish. Biological information at the margins of interpretability offers a special challenge. While departures from strict objectivity must be treated with caution, a judicious combination of explicit rules and expert opinion may come closer to the truth than either alone. Similar issues have been discussed in the field of image analysis, where the semi-automatic interactive method was more accurate than the automatic method (Flygare et al., 1997Go).

In summary, we arrive at the following conclusions. The algorithm for defining EPL is as important as the assay, although much less work has been done on algorithms. Lacking a gold standard, the input of expert opinion is a necessary step toward developing an objective algorithm. In choosing an algorithm, specificity should have priority over sensitivity in order to minimize false positives. Finally, similar assays and criteria for EPL are necessary to allow comparisons among studies.


    Acknowledgements
 Top
 Abstract
 Introduction
 Materials and methods
 Results
 Discussion
 Acknowledgements
 References
 
This study was supported by grants 1R01 HD32505–01 from the National Institute of Child Health and Human Development; 1R01 ES08337–01 and 5P30 ES000002–38 from the National Institute of Environmental Health Sciences; and 1R01 OH03027 from the National Institute of Occupational Safety and Health.


    Notes
 
9 Present address: Department of Obstetrics, Gynecology and Reproductive Biology, Harvard Medical School, Boston, MA and New England Research Institutes, Watertown, MA, USA Back

10 To whom correspondence should be addressed at: Occupational Health Program, Department of Environmental Health, Harvard School of Public Health, 665 Huntington Avenue, Boston, MA 02115, USA. E-mail: xu{at}hsph.harvard.edu Back


    References
 Top
 Abstract
 Introduction
 Materials and methods
 Results
 Discussion
 Acknowledgements
 References
 
Armstrong, E.G., Ehrlich, P.H., Birken, S., Schlatterer, J.P., Siris, E., Hembree, W.C. and Canfield, W.C. (1984) Use of a highly sensitive and specific immunoradiometric assay for detection of human chorionic gonadotropin in urine of normal, nonpregnant, and pregnant individuals. J. Clin. Endocrinol. Metab., 59, 867–874.[Abstract]

Ellish, N.J., Saboda, K., O'Connor, J.O., Nasca, P.C., Stanek, E.J. and Boyle, C. (1996) A prospective study of early pregnancy loss. Hum. Reprod., 11, 406–412.[Abstract]

Fleiss, J.L. (1981) Statistical Methods for Rates and Proportions. John Wiley and Sons Inc., New York.

Flygare, L., Hosoki, H., Rohlin, M. and Petersson, A. (1997) Bone histomorphometry using interactive image analysis. A methodological study with application on the human temporomandibular joint. Eur. J. Oral Sci., 105, 67–73.[ISI][Medline]

Hakim, R.B., Gray, R.H. and Zazur, H. (1995) Infertility and early pregnancy loss. Am. J. Obstet. Gynecol., 172, 1510–1517.[ISI][Medline]

Harlow, S.D. and Ephross, S.A. (1995) Epidemiology of menstruation and its relevance to women's health. Epidemiol. Rev., 17, 265–286.[ISI][Medline]

Husdan, H. and Rapoport, A. (1968) Estimation of creatinine by the Jaffe reaction. A comparison of three methods. Clin. Chem., 14, 222–238.[Abstract/Free Full Text]

Lasley, B.L. and Shideler, S.E. (1994) Methods for evaluating reproductive health of women. Occup. Med.: State Art Rev., 9, 423–433.[ISI]

Lasley, B.L., Lohstroh, P., Kuo, A., Gold, E.B., Eskenazi, B., Samuels, S.J. and Overstreet, J.W. (1995) Laboratory methods for evaluating early pregnancy loss in an industry-based population. Am. J. Ind. Med., 28, 771–781.[ISI][Medline]

O'Connor, J.F., Schlatterer, J.P., Birken, S., Krichevsky, A., Armstrong, E.G., McMahon, D. and Canfield, R.E. (1988) Development of highly sensitive immunoassays to measure human chorionic gonadotropin, its beta subunit and beta core fragment in the urine: application to malignancies. Can. Res., 48, 1361–1366.[Abstract]

O''Connor, J.F., Birken, S., Lustbader, J.W., Krichevsky, A., Chen, Y. and Canfield, R.E. (1994) Recent advances in the chemistry and immunochemistry of human chorionic gonadotropin: impact on clinical measurements. Endocr. Rev., 15, 650–682.[ISI][Medline]

O'Connor, J.O., Ellish, N., Kakama, T., Schlatterer, J. and Kovalevskaya, G. (1998) Differential urinary gonadotrophin profiles in early pregnancy and early pregnancy loss. Prenatal Diag., 18, 1232–1240.[ISI][Medline]

Ronnenberg, A.G., Goldman, M.B., Aitken, I.W. and Xu, X. (2000) Anemia and deficiencies of folate and vitamin B-6 are common and vary with season in Chinese women of childbearing age. J. Nutr., 130, 2703–2710.[Abstract/Free Full Text]

Wilcox, A.J., Weinberg, C.R., Wehmann, R.E., Armstrong, E.G., Canfield, R.E. and Nisula, B.C. (1985) Measuring early pregnancy loss: laboratory and field methods. Fertil. Steril., 44, 366–374.[ISI][Medline]

Wilcox, A.J., Weinberg, C.R., O'Connor, J.F., Baird, D.D., Schlatterer, J.P., Canfield, R.E., Armstrong, E.G. and Nisula, B.C. (1988) Incidence of early loss of pregnancy. N. Engl. J. Med., 319, 189–194.[Abstract]

Zinaman, M.J., O'Connor, J., Clegg, E.D., Selevan, S.G. and Brown, C.C. (1996) Estimates of human fertility and pregnancy loss. Fertil. Steril., 65, 503–509.[ISI][Medline]

Submitted on May 21, 2001; accepted on November 12, 2001.