Sampling Strategies for Prospective Studies of Menstrual Function

Lynda Lisabeth1, Siobán D. Harlow1 , Xihong Lin2, Brenda Gillespie2 and MaryFran Sowers1

1 Department of Epidemiology, University of Michigan, Ann Arbor, MI.
2 Department of Biostatistics, University of Michigan, Ann Arbor, MI.

Received for publication April 10, 2003; accepted for publication November 6, 2003.


    ABSTRACT
 TOP
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS
 DISCUSSION
 REFERENCES
 
Little information is available about optimal sampling strategies for prospective studies of menstrual function. Sample size and study duration for menstrual studies have often been driven as much by feasibility and cost as by statistical principles, with follow-up lasting 6 months to 2 years and sample size ranging from 100 to 500 women. Whether these studies are sufficiently powered to address common study objectives has not been adequately evaluated, and sample size estimates rarely account for the repeated nature of menstrual cycle data. Using data from the Tremin Trust (a study of menstrual function across the reproductive life span initiated in Minneapolis, Minnesota, in 1935 with data collected through 1977), the authors determined sampling strategies for assessing differences in mean cycle length between two exposure groups and for assessing change in mean cycle length across the reproductive life span. Following a larger number of women for 1–2 years is optimal for studies of host and environmental exposures that alter menstrual function. In contrast, following fewer women for an extended period of time, for example, 4–5 years, is optimal when studying how menstrual patterns vary across the reproductive life span in different populations.

epidemiologic methods; menstrual cycle; models, statistical; prospective studies; sample size; sampling studies


    INTRODUCTION
 TOP
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS
 DISCUSSION
 REFERENCES
 
The importance of studying menstrual function in epidemiology is increasingly being recognized (13). Menstrual morbidity has a significant impact on women’s health, and menstrual cycle patterns are a useful marker of ovarian function and reproductive health (1, 2, 4). Menstrual cycle patterns across the reproductive life span and the timing of menopause are associated with risk of chronic disease and mortality (3, 5). However, epidemiologic studies evaluating potentially modifiable host and environmental factors that may affect menstrual function remain limited, and basic research is still needed on patterns of change in menstrual function with age across the reproductive life span, especially among women of non-European descent.

Prospective menstrual calendars are considered the most reliable method for collecting data on menstrual function; however, little information is available about optimal sampling strategies for prospective studies of menstrual function, that is, how many women are needed and for how long women should be followed. Studies of menstrual function typically focus either on comparing differences in mean or variance of cycle length with respect to host and environmental factors—such as ethnicity, smoking, human immunodeficiency virus serostatus, or body size (610)—or on characterizing how mean and variance of cycle length change across age or during periods of transition (1114). With the exception of a few truly longitudinal cohorts (12, 13), studies of menstrual function have generally been short-sequence prospective studies whose duration ranges from 6 months to 2 years and whose sample sizes range from 100 to 500 women. Whether these studies are sufficiently powered has not been adequately evaluated, with sample size and study duration often driven by feasibility and cost considerations.

Although longitudinal methods for estimating sample size exist (15, 16), most investigators do not have access to the longitudinal menstrual data needed to apply these methods. Using data from the Tremin Trust, we determined sampling strategies for prospective studies of menstrual function during three spans of reproductive life. In this paper, we first consider sampling strategies for studies that aim to assess differences in mean cycle length with respect to an exposure. Next, we discuss sample size requirements for assessing changes in mean cycle length across the reproductive life span.


    MATERIALS AND METHODS
 TOP
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS
 DISCUSSION
 REFERENCES
 
Data for this analysis came from the Tremin Trust, a historic data set that includes menstrual calendar data recorded for individual women throughout their reproductive lives (12). The study population consisted of White college students enrolled at The University of Minnesota. Data collection began in 1935 and enrollment continued for 4 years, yielding a sample of 1,997 women. Records for the 997 women in the original sample who were aged 25 years or less at enrollment, for whom information on age at menarche was included, and who participated in the study for at least 5 years were eligible for this analysis. Of these women, 29 had no observations prior to their first sustained hormone use and were excluded, yielding an analytical sample size of 968 women. A total of 263 women were censored when they withdrew from the study, 430 women were censored when they began sustained hormone use for the first time, and 81 women were censored when they underwent surgical menopause (hysterectomy/oophorectomy). Hormone use includes hormone replacement therapy, hormonal contraception, or other sustained use of steroid hormones. One hundred ninety-four women were uncensored and were followed through natural menopause.

Menstrual calendars were used to record the days on which women experienced bleeding. Each calendar covered a 1-year period. Women completed a short questionnaire at the end of each year to collect information regarding marital status, medical treatment for menstrual difficulties, and pregnancies as well as to identify their willingness to continue participating. Questions about approaching menopause were added in 1952, and questions pertaining to oral contraceptive use and surgery were added in 1963.

Definitions recommended by the World Health Organization were used to summarize the menstrual diary data (17). Per these definitions, a bleeding episode is a period of consecutive bleeding days, and a bleeding-free interval is a period of consecutive bleeding-free days. A menstrual segment is defined as a bleeding episode and the subsequent bleeding-free interval. The term menstrual segment is analogous to the term menstrual cycle but acknowledges that diary data cannot distinguish between menstrual and nonmenstrual bleeds. Additionally, a criterion that a menstrual segment had to include at least 2 consecutive days in the bleeding-free interval was applied to avoid unusually short segments. A single bleeding-free day occurring between 2 bleed days was considered a bleed day. Pregnancy intervals and the first two cycles after a birth and the first cycle after a spontaneous abortion were coded as nonmenstrual intervals.

Sampling strategies for three spans of reproductive life were investigated. The three age strata—18–25 years of age (young adult), 26–39 years of age (adult), and 40–53 years of age (menopausal transition)—were necessary because certain sample size calculations assume a linear relation with age, and the association between age and cycle length is nonlinear. The age strata were defined on the basis of prior investigations of menstrual function across the life span suggesting that the relation of cycle length with age within each of these strata is approximately linear (11, 14). The sample sizes for the three age strata were 963, 830, and 435 women for 18–25 years of age, 26–39 years of age, and 40–53 years of age, respectively; 194 women in the last strata were followed through menopause.

Estimating the effect of host and environmental factors on mean cycle length
To evaluate sampling strategies for assessing differences in mean menstrual cycle length between two exposure groups, we estimated the magnitude of the detectable difference by using the following equation (16):

where N is the number of subjects in each exposure group, {sigma}2 is the total variation (between plus within) in mean cycle length, {rho} is the correlation among repeated observations for individual women, d is the smallest difference to be detected, P is the number of cycles, {alpha} is the probability of a type I error assumed to be 5 percent, and Q is the power of the statistical test assumed to be 80 percent. The parameters of {sigma}2 and {rho} were estimated for each age range by fitting random intercept mixed models using the SAS Proc Mixed procedure (18). Age was modeled as dummy variables for each year. Estimates of {sigma}2 and {rho} used for sample size estimation were based on median values for the given age ranges. Because these models assume normality, models were run for cycles up to 180 days in length to create roughly normal distributions within each age range. The distribution of cycle lengths remained skewed in the age category 40–53 years; thus, a natural log transformation was applied to cycles for this age range. We also estimated the number of subjects needed to detect specified differences in mean cycle length between exposure groups by using equation 1. Note that equation 1 pertains to subject-level covariates or exposures, which do not change over time, and does not directly apply to the situation in which an exposure is a time-varying covariate.

Describing menstrual function across the reproductive life span
To evaluate the precision of different sampling strategies for measuring annual rate of change in mean cycle length as women age, we estimated maximum error values as the half width of the 95 percent confidence interval for the annual rate of change in mean cycle length, ß, using the following equation (15):

The standard error for the annual rate of change was calculated as follows:

where is the variation in slope between women, {sigma}2 is the variation in slope within women, P is the number of cycles, D is the duration of the study in years, N is the number of women in the sample, and {alpha} is the probability of a type I error assumed at 5 percent. The parameters of and {sigma}2 were estimated for each age range by fitting random intercept–random slope mixed models using the SAS Proc Mixed procedure (18). The models fit average cycle length as a function of age, with age modeled as a continuous variable, using an unstructured covariance matrix. We again included cycles up to 180 days and applied a natural log transformation to cycles in the age range 40–53 years.


    RESULTS
 TOP
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS
 DISCUSSION
 REFERENCES
 
Median age at enrollment was 19 years. Length of follow-up ranged from 5 to 42 years, with a median of 27 years. Women contributed between one and 495 eligible menstrual segments ranging in length from 3 to 595 days, with a median of 27 days. For most women (70 percent), some data were missing; however, only 1.3 percent of observed segments were missing, and the median number of missing segments per woman was one. Median age at menarche in this study population was 12 years. Median age at menopause was 51.7 years.

Differences in mean cycle length between two exposure groups
Table 1 displays estimated detectable differences in mean menstrual cycle length between two exposure groups for fixed sample sizes and study durations. Estimates for the age range 40–53 years were calculated on a log scale. If the anticipated difference in mean cycle length between two exposure groups is 1 day, then any of the footnoted sampling strategies in table 1 would be appropriate. For the same anticipated difference in mean cycle length, the largest sample size is needed to study women 40–53 years of age followed by women aged 18–25 years and then women aged 26–39 years. To provide additional detail for investigators, table 2 presents estimates of the numbers of subjects needed to detect specified differences in mean menstrual cycle length between two exposure groups.


View this table:
[in this window]
[in a new window]
 
TABLE 1. Estimated detectable difference in mean menstrual cycle length (days) between two exposure groups for fixed sample sizes and study durations ({propto} = 0.05, power = 80%), the Tremin Trust*
 

View this table:
[in this window]
[in a new window]
 
TABLE 2. Estimated numbers of subjects needed in each of two exposure groups to detect specified differences in mean menstrual cycle length ({propto} = 0.05, power = 80%), the Tremin Trust*
 
When two exposure groups are compared, more precision is gained by increasing sample size than by increasing study duration, as illustrated in figure 1 for women aged 26–39 years. Although the lines representing different sample sizes are clearly separated, the slope of each line is relatively flat, with little improvement in detectable difference after 2 years of follow-up.



View larger version (23K):
[in this window]
[in a new window]
 
FIGURE 1. For women aged 26–39 years from the Tremin Trust (a study of menstrual function across the reproductive life span initiated in Minneapolis, Minnesota, in 1935 with data collected through 1977), estimated detectable difference in mean menstrual cycle length (days) between two exposure groups for fixed sample sizes and study durations. N, sample size per exposure group.

 
Estimating maximum error values for annual rate of change in mean menstrual cycle length
Table 3 displays maximum error values, that is, the half widths of the 95 percent confidence intervals for estimating annual rate of change in mean cycle length for fixed sample sizes and study durations. Estimates for the age range 40–53 years were calculated on a log scale. For a given study duration and sample size, maximum error values are smallest for studies of women aged 26–39 years, when cycle variability is lowest.


View this table:
[in this window]
[in a new window]
 
TABLE 3. Maximum error values for estimating annual rate of change in menstrual cycle length (days/year) for fixed sample sizes and study durations ({propto} = 0.05), The Tremin Trust*
 
Annual rate of change in mean cycle length has previously been estimated to be 0.20 days for women aged 18–25 years, –0.15 days for women aged 26–39 years, and 2 days for women aged 40–53 years (11, 14). To estimate an annual rate of change for women aged 18–25 years with a maximum error of ±0.05 days per year, investigators could choose from the footnoted sampling strategies. Other footnoted entries reflect sampling strategies for women aged 26–39 years (maximum error of ±0.03 days/year), when the annual rate of change is negative and slightly lower. To interpret the estimates for the age range 40–53 years, an increase of 2 days in mean cycle length from the population average of 28 days can be thought of as equivalent to roughly a 7 percent (ln(30/28)) change in the mean. A reasonable maximum error value around this estimate may be ±1 percent or 0.01, depicted by the footnoted entries.

Increasing study duration from 2 to 3 years or from 3 to 4 years has a large impact on precision when estimating annual rate of change in mean cycle length, as illustrated in figure 2 for women aged 26–39 years. Gains in precision from increasing the study duration to more than 4 years are not as large.



View larger version (29K):
[in this window]
[in a new window]
 
FIGURE 2. For women aged 26–39 years from the Tremin Trust (a study of menstrual function across the reproductive life span initiated in Minneapolis, Minnesota, in 1935 with data collected through 1977), maximum error values for estimating the annual rate of change in menstrual cycle length (days/year) for fixed sample sizes and study durations. MCL, mean cycle length; N, sample size.

 

    DISCUSSION
 TOP
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS
 DISCUSSION
 REFERENCES
 
This paper provides information regarding optimal sampling strategies for prospective studies of menstrual function. Using the Tremin Trust data, we supplied estimates of the relative precision and power of different sampling strategies for three different age strata, information that can guide investigators planning epidemiologic studies of menstrual function. Results from this analysis can also inform us as to whether prior studies have been sufficiently powered to address their study objectives.

For studies that focus on detecting differences in mean cycle length with respect to host or environmental factors, increasing study duration beyond 1–2 years will not have a large impact on minimizing the detectable difference in mean between exposure groups. In contrast, precision in estimates of annual rate of change in mean cycle length improves considerably when study duration is increased up to about 4 years.

Examples of exposures investigated in previous studies include race-ethnicity, socioeconomic status, weight, physical activity, stress, occupational exposures, and exposure to diethylstilbestrol (4, 6, 10, 19). On the basis of our findings, differences of 2–4 days in mean cycle length, characteristic of some occupational exposures (4), could be detected with as few as 25–100 women per exposure group and 6 months of prospective follow-up. To detect smaller effects, such as those that may occur with in utero exposure to diethylstilbestrol, 200 women per group followed for 6 months would be needed to detect a 0.75-day difference in mean cycle length (19).

Most studies characterizing menstrual function across the reproductive life span have been conducted in highly educated, White populations (12, 13). An efficient approach to obtaining information on menstrual patterns across age in other populations would be to use an accelerated or mixed longitudinal study design, an approach that economizes the time that participants are required to maintain menstrual diaries (20). With this approach, investigators would select multiple distinct, but overlapping age cohorts and track each for a relatively short period of time. An accelerated strategy for characterizing menstrual patterns from age 15 to 50 years could reduce follow-up to 5 years (table 3) in contrast to the 35 years required if women were to be followed for their entire reproductive life, as occurred in the Tremin Trust.

This analysis does have some limitations. Transformation of the data, while appropriate for the statistical models, complicates interpretation of our findings. More work is needed to optimally account for the nonnormality of menstrual data. We considered sample size calculations for two specific study objectives that focus on mean cycle length. Other objectives of interest might include assessing change in cycle variability with age, evaluating differences in cycle variability across populations, and comparing rates of change in mean cycle length or cycle variability between two exposure groups. These study objectives would require different sampling methods.

Finally, the Tremin Trust includes data for only White, college-educated females. Studies suggest that menstrual cycle characteristics differ by race-ethnicity, socioeconomic status, body composition, and other host and environmental characteristics (3, 6, 9, 10). For example, postmenarcheal European-American girls have a longer mean cycle length than do African-American girls in the United States (6), whereas the association between body mass index and cycle length is curvilinear (9, 10). Although specific sample size estimates would likely vary slightly in populations that differ with respect to the above factors, the Tremin Trust is one of only two data sources in which this type of sample size estimation can be conducted (12, 13). Furthermore, because age is a primary determinant of variability in menstrual function within a population, sample size estimates provided here can serve as a basic guideline. The finding that increasing study duration for studies comparing exposure groups results in little gain in precision, whereas increasing study duration to 4–5 years for studies evaluating changes in the mean across age adds precision, is an important insight that is broadly generalizable.

In summary, using data from a prospective study of menstrual function, we have provided sampling strategies to guide investigators who want to design prospective menstrual calendar studies. This analysis suggests efficient sampling strategies that will result in adequate power to detect differences in mean cycle length with respect to exposures or acceptable values of maximum error when assessing annual rate of change in mean cycle length. Following women for a shorter period of time (e.g., 1–2 years) is optimal for studies investigating host and environmental exposures that alter menstrual function. In contrast, following women for an extended period of time (e.g., 4–5 years) is optimal for studying how menstrual patterns across the reproductive life span vary in different populations. Regardless of the study objective, sampling strategies should be tailored to the age range to be investigated.


    NOTES
 
Correspondence to Dr. Siobán D. Harlow, 109 Observatory Street, Ann Arbor, MI 48109-2029 (e-mail: harlow{at}psc.isr.umich.edu). Back


    REFERENCES
 TOP
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS
 DISCUSSION
 REFERENCES
 

  1. Walraven G, Ekpo G, Coleman R, et al. Menstrual disorders in rural Gambia. Stud Fam Plann 2002;33:261–8.[CrossRef][ISI][Medline]
  2. Harlow SD, Campbell OM. Menstrual dysfunction: a missed opportunity for improving reproductive health in developing countries. Reprod Health Matters 2000;8:142–7.
  3. Harlow SD, Ephross SA. Epidemiology of menstruation and its relevance to women’s health. Epidemiol Rev 1995;17:265–86.[ISI][Medline]
  4. Gold EB, Eskenazi B, Hammond SK, et al. Prospectively assessed menstrual cycle characteristics in female wafer-fabrication and nonfabrication semiconductor employees. Am J Ind Med 1995;28:799–815.[ISI][Medline]
  5. Jacobsen BK, Heuch I, Kvåle G. Age at natural menopause and all-cause mortality: a 37-year follow-up of 19,731 Norwegian women. Am J Epidemiol 2003;157:923–9.[Abstract/Free Full Text]
  6. Harlow SD, Campbell B, Lin X, et al. Ethnic differences in the length of the menstrual cycle during the postmenarcheal period. Am J Epidemiol 1997;146:572–80.[Abstract]
  7. Harlow SD, Schuman P, Cohen M, et al. Effect of HIV infection on menstrual cycle length. J Acquir Immune Defic Syndr 2000;24:68–75.[ISI][Medline]
  8. Hornsby PP, Wilcox AJ, Weinberg CR. Cigarette smoking and disturbance of menstrual function. Epidemiology 1998;9:193–8.[ISI][Medline]
  9. Symons JP, Sowers MF, Harlow SD. Relationship of body composition measures and menstrual cycle length. Ann Hum Biol 1997;24:107–16.[ISI][Medline]
  10. Harlow SD, Matanoski GM. The association between weight, physical activity, and stress and variation in the length of the menstrual cycle. Am J Epidemiol 1991;133:38–49.[Abstract]
  11. Harlow SD, Lin X, Ho MJ. Analysis of menstrual diary data across the reproductive life span: applicability of the bipartite model approach and the importance of within-woman variance. J Clin Epidemiol 2000;53:722–33.[CrossRef][ISI][Medline]
  12. Treloar AE, Boynton RE, Behn BG, et al. Variation of the human menstrual cycle through reproductive life. Int J Fertil 1967;12:77–126.[ISI][Medline]
  13. Vollman RF. The menstrual cycle. Major Probl Obstet Gynecol 1977;7:1–193.[Medline]
  14. Lisabeth LD, Harlow SD, Qaqish B. A new statistical approach demonstrated menstrual patterns during the menopausal transition did not vary by age at menopause. J Clin Epidemiol (in press).
  15. Schlesselman JJ. Planning a longitudinal study. I. Sample size determination. J Chronic Dis 1973;26:553–60.[ISI][Medline]
  16. Diggle P, Liang K, Zeger S. Analysis of longitudinal data. Oxford, United Kingdom: Oxford University Press, 1994.
  17. Belsey EM, Farley TM. The analysis of menstrual bleeding patterns: a review. Contraception 1988;38:129–56.[ISI][Medline]
  18. Littell R, Milliken G, Stroup W, et al. SAS system for Proc Mixed models. Cary, NC: SAS Institute Inc, 1996.
  19. Hornsby PP, Wilcox AJ, Weinberg CR, et al. Effects on the menstrual cycle of in utero exposure to diethylstilbestrol. Am J Obstet Gynecol 1994;170:709–15.[ISI][Medline]
  20. Willett JB, Singer JD, Martin NC. The design and analysis of longitudinal studies of development and psychopathology in context: statistical models and methodological recommendations. Dev Psychopathol 1998;10:395–426.[CrossRef][ISI][Medline]




This Article
Abstract
FREE Full Text (PDF)
Alert me when this article is cited
Alert me if a correction is posted
Services
Email this article to a friend
Similar articles in this journal
Similar articles in ISI Web of Science
Similar articles in PubMed
Alert me to new issues of the journal
Add to My Personal Archive
Download to citation manager
Disclaimer
Request Permissions
Google Scholar
Articles by Lisabeth, L.
Articles by Sowers, M.
PubMed
PubMed Citation
Articles by Lisabeth, L.
Articles by Sowers, M.