Issues to debate on the Women's Health Initiative (WHI) study. Epidemiology or randomized clinical trials—time out for hormone replacement therapy studies?

Anette Tønnes Pedersen and Bent Ottesen1

Juliane Marie Centret, Rigshospitalet, Blegdamsvej 9, 2100 Copenhagen, Denmark

1 To whom correspondence should be addressed. e-mail: bent.ottesen{at}rh.dk


    Abstract
 Top
 Abstract
 The WHI study-a classical...
 Epidemiological studies on the...
 Time out or come...
 References
 
Over the last 40 years, there has been increasing epidemiological evidence that post-menopausal treatment with sex steroids in physiological doses may reduce the relative risk of cardiovascular disease (CVD). These findings have been supported by biological studies showing favourable changes in cardiovascular risk factors with estrogen supplementation. The impact of the so-called ‘healthy user’ bias has been eagerly debated, and the results of the first and only randomized long-term clinical trial of HRT for primary prevention have therefore been long awaited. The dramatic decision to halt the Women’s Health Initiative (WHI) study before completion came unexpectedly as the consequence of not only an increased risk of breast cancer but also increased occurrence of cardiovascular events with HRT. Due to the superior design of the study, the results from the WHI study have had an enormous impact on the clinical recommendations of HRT to post-menopausal women, concurrent with a degradation of evidence from observational studies. It is not very likely that other long-term randomized clinical trials (RCTs) will be completed and epidemiology has certainly been disreputed—so is this ‘time out’ for HRT studies?

Key words: cardiovascular disease/HRT/Women’s Health Initiative Study


    The WHI study–a classical RCT
 Top
 Abstract
 The WHI study-a classical...
 Epidemiological studies on the...
 Time out or come...
 References
 
Randomized clinical trials are considered the ‘gold standard’ in medical research. Randomized trials are in general required to determine if confounding, selection or compliance bias findings from observational studies. This was the intention of the WHI study—a classically designed RCT (Women’s Health Initiative Study Group, 1998Go). The WHI is a large and complex clinical investigation of strategies for the prevention and control of the most common causes of morbidity and mortality among post-menopausal women. The HRT study is a part of this initiative. The intention was to test the hypothesis that women randomly assigned to HRT have lower rates of CVD and osteoporotic fractures compared with placebo. Coronary heart disease was selected as a primary outcome recognizing that heart disease is the major cause of morbidity and mortality among post-menopausal women, especially over the age of 65 years.

Even if considered as the ideal study design, the RCT has its errors and pitfalls, which are crucial and should be considered when results are interpreted and implemented in clinical practice. The ideal study design can be aimed at but realistically never fully achieved, for both ethical and practical reasons.

The HRT part of the WHI study was performed as a double-blind placebo-controlled design. However, the fact that only 18 845 women provided consent for randomization, out of the 373 092 women who initiated screening, certainly points to bias due to selection of the study population. Out of these, 8506 women were assigned to receive estrogen plus progestin in a continuous combined regimen, and 8102 women to receive placebo. In ‘real life’, women who start to take HRT will choose sooner or later to cease the treatment for various reasons. This was no different in the WHI study, where an increasing proportion of the participants chose to discontinue as time went by. ‘Drop-outs’ refers to the women who discontinued study medication and ‘drop-ins’ to the women who discontinued study medication and instead received HRT through their own clinician. The cumulative drop-out rates were nearly 40% in the hormone as well as in the placebo group after 7 years of follow-up. This exceeded design projections, but compares favourably with community-based adherence to HRT.

This flaw was actually foreseen prior to study start (Freedman et al., 1996Go). Several sets of hypothetical interim results (‘scenarios’) were created and analysed in order to place more emphasis on formally defined global measures of health, and not simply on a single targeted disease. This seems to be important especially when an intervention may reduce the incidence of some diseases but increase the incidence of others.

The statistical analyses performed in the WHI study were therefore carefully planned and based on the intention-to-treat principle in order to utilize the advantages of the randomized design. The lack of adherence to the study medication would tend to decrease the observed treatment effects. Thus, the results reported by the writing group of the WHI study may tend to underestimate the magnitude of both adverse effects on cardiovascular disease and breast cancer and the beneficial effects on fractures and colorectal cancer among women who adhere to treatment. Taking this into account, the statistically significant estimates reported must be interpreted as strong evidence for the observed treatment effects in this particular study population.

This brings us to probably the most prominent weakness of the RCT, i.e. the external validity of the estimates given by the characterization of the present study population. The fact that a trial is carried out in one group of patients does not necessarily mean that the results may be extrapolated to other groups. Patients going into (and indeed completing) a randomized design are a highly selected population. This was also the case in the WHI study, where women recruited for randomization had to be willing to start taking HRT or placebo— blinded—for several years at ‘the flip of a coin’. Their decision was not governed by menopausal symptoms. The study population of the HRT part of the WHI study was characterized by a high average age and a high frequency of obesity and hypertensive disorders. About two-thirds of the participants were enrolled in the study at age 60 years or older, and 21% were above the age of 70 years at randomization. This does not reflect clinical practice in which women are usually prescribed HRT at the time they reach menopause and often due to climacteric complaints. Also, the frequency of obesity in the study population was above average. Only 30% of the participants were of normal weight [body mass index (BMI) < 25 kg/m2), and 30% of the participants had a BMI >30 kg/m2, which would be described as morbid obesity. One can discuss the intention of a ‘primary prevention study’ as 36% of the participants were actually treated for hypertension or had a blood pressure exceeding 140/90 mmHg at enrolment. Furthermore, women who had used HRT previously were also included even if the analysis considered HRT use only from the time of randomization.

So what can we conclude from the available results of the WHI study? The estimates reported for this particular study population are quite convincing: HRT initiated several years after menopause increases the risk of CVD in a population of older women with a high frequency of obesity and hypertension. However, one could speculate if the WHI study answers the question of whether a continuous combined estrogen–progestin regimen initiated in younger, normotensive women of normal weight at the onset of menopause increases or decreases the risk of CVD. It is biologically plausible that effect modification by age at HRT initiation exists. Prior to menopause, women may be protected from atherosclerosis due to the endothelium protective effects of the ovarian hormones. With the direct beneficial effects of estrogen on the vessel wall, they would therefore be less susceptible to the pro-thrombotic effects promoted by estrogen supplementation than older women, who are likely to have built up atherosclerotic plaques in the cerebral and coronary arteries.

Also effect modification by other cardiovascular risk factors should be considered. The writing group of the WHI study have reported that ‘no noteworthy interactions with age, race/ethnicity, body mass index, prior hormone use, smoking status, blood pressure, diabetes, aspirin use or statin use were found....’. They do not, however, mention the requirements for sufficient statistical strengths in the data set to perform this kind of analyses.

A very relevant clinical question is whether subgroups of women are especially sensitive to HRT. Such analyses will require a very large number of exposed women at risk to achieve sufficient statistical power to perform subanalyses on effect modification. Practical, economical and ethical considerations do not permit another RCT to answer this question. So what we are left with are the epidemiological data from large cohort studies, and epidemiological thinking might even contribute to the interpretation of the WHI study.


    Epidemiological studies on the defensive
 Top
 Abstract
 The WHI study-a classical...
 Epidemiological studies on the...
 Time out or come...
 References
 
Observational studies are often conducted by necessity—not always by preference. However, epidemiological studies will, in many situations, provide the best available information, and are essential for generating new hypotheses. Analytic epidemiology focuses on the determinants of a disease by testing the hypotheses formulated from associations found in descriptive studies. The ultimate goal is to judge whether a particular exposure causes or prevents disease.

The two basic types of observational analytical investigations are the case–control design and the cohort design, each offering certain unique advantages and disadvantages. The decision to use a particular design strategy is based on features of the exposure and disease, the current state of knowledge and logistic considerations such as available time and resources.

In the case–control study, cases and controls are selected on the basis of presence or absence of the disease of interest, e.g. CVD. The two groups are then compared as for the proportions with the exposure of interest, e.g. HRT. In contrast, subjects for a cohort study are classified on the basis of presence or absence of exposure to the factor of interest and compared with respect to the subsequent development of disease. The cohort study is therefore considered to be a ‘prospective’ design, while the case–control study often is designated a ‘retrospective’ design.

Case–control studies are especially efficient in terms of both time and costs relative to other analytical approaches, as data collection is based on cases that have already been diagnosed with the disease of interest. Case–control studies are therefore often superior when it comes to number of cases and statistical strength. Case–control studies are well suited to evaluate rare diseases and diseases with long latent periods. The major potential problem in case–control studies relates to the fact that both the exposure and disease have already occurred at the time the participants enter into the study. This design is therefore particularly susceptible to bias from the differential selection of cases and controls and bias due to recall of previous exposures and confounding risk factors.

The prospective cohort design elucidates the temporal relationship between exposure and disease and minimizes bias in the ascertainment of exposure. Cohort studies can, however, be extremely time consuming, and the validity of the results can be seriously affected by losses to follow-up.

A prominent concern in observational studies, in the assessment of the influence of HRT on risk of CVD, is that hormone users, as a self-selected group, are different from former users and from never users. Women taking estrogen seem to have more favourable lifestyles regarding risk factors of CVD—the so-called ‘healthy user’ bias. This has been especially prominent in American studies, but not in European studies (Matthews et al., 1996Go; Hundrup et al., 2000Go). Until recently, women with less favourable cardiovascular risk profiles due to hypertension, diabetes or present ischaemic heart disease (IHD) were less likely to be prescribed HRT due to contraindications listed on the estrogen preparation package insert. Using modern statistical techniques, such as multivariate regression models, it is possible to adjust for known, measurable confounding factors. However, residual confounding caused by unknown or non-measurable factors, which cannot be corrected for, remains a problem in observational studies. Lack of awareness about mechanisms of selection may lead to misinterpretation of the results. Potential unknown confounders not included in the statistical analyses would nevertheless have to be very closely associated with the use of HRT and also be strong predictors of CVD in order to influence the given results.

If epidemiological studies and RCTs produce discrepant results, doubts are generally cast over observational studies, challenging the epidemiologists for a defence and an explanation of the differences. A critical review of the data analyses and the resulting estimates is mandatory but, when interpreting and comparing results from observational studies and RCTs, one could ask if the discrepancy is that pronounced after all.

The findings of a 29% increased risk of coronary heart disease with combined estrogen–progestin treatment in the WHI study came as a surprise, as the vast majority of observational studies have reported a risk reduction of 40–60% with HRT. The latest analysis from the American Nurses Health Study reported that the use of combined estrogen–progestin therapy was associated with a 36% risk reduction of coronary heart disease (Grodstein et al., 2000Go). Both studies have been carefully performed and analysed. So what could possibly cause this pronounced difference?

One of the fundamental issues is whether the data are in fact comparable? Attention should be given to the differentiation between primary and secondary prevention studies, between unopposed estrogen therapy and combined estrogen–progestin therapy, between different hormone components, treatment regimens and administration routes, and between definitions and determination of end points.

Also, participants in RCTs are not comparable with participants in observational studies. Differences in the characteristics of the study participants should be considered, with special emphasis on possible effect modification by age or by other risk factors of CVD.

Recently we published the results of the Danish Nurse Cohort Study with focus on the modification by risk factors of CVD on the effect of different regimens of HRT on the risk of IHD and myocardial infarction (MI) (Lokkegaard et al., 2003aGo). There was no increased risk of CVD with HRT. However, in the subgroup of Danish nurses who reported diabetes, there was a significant 4-fold increased risk of IHD and a 9-fold increased risk of MI among the current hormone users compared with diabetic women who had never taken HRT (Table I). This significant effect modification by diabetes was based on only six cases of IHD and four cases of MI among the diabetic hormone users, and the 95% confidence intervals are large. The results therefore need to be confirmed. Another subanalysis of the Danish Nurses Cohort Study revealed effect modification by hypertension with a 3-fold significantly increased risk of stroke among hypertensive women currently using combined estrogen–progestin replacement therapy compared with hypertensive women who never used HRT (Lokkegaard et al., 2003bGo). Normotensive women had no increase of stroke with HRT.


View this table:
[in this window]
[in a new window]
 
Table I. Hazard ratios of all deaths, ischaemic heart disease (IHD) and myocardial infarction (MI) with exposure to HRT (current, past or never use) stratified on information on diabetes at baseline
 

    Time out or come back?
 Top
 Abstract
 The WHI study-a classical...
 Epidemiological studies on the...
 Time out or come...
 References
 
Judging whether an association is causal extends beyond the validity of the results of any single study. When the possible causal relationship of the effect of HRT on the risk of CVD is evaluated, consistency with other investigations, strength of the association, time sequence between exposure and event, possible dose–response relationships and biological credibility should be taken into consideration. A further integration of biochemical, physiological, clinical and epidemiological findings with special emphasis on differences in the pathogenesis and the pathophysiology of subtypes of cardiovascular diseases as well as differences in risk profiles is needed.

The body of evidence on HRT is derived from studies in elderly, naturally menopausal women. There is an urgent need for studies on the consequences, i.e. risks and benefits, of HRT in younger women with premature ovarian failure.

One should bear in mind that the primary indication for HRT is alleviation of climacteric symptoms. Women are therefore well advised to follow their symptoms when the decision on whether or not to initiate short-term HRT has to be taken. HRT has never been approved for prevention of CVD or any other age-related chronic condition, except for osteoporosis. The decision of long-term use of HRT should be counterbalanced with the risk of breast cancer on an individual basis.

Future detection of women who are especially sensitive to HRT will be a complex medical problem, which demands individual counselling based on scientific results. A challenge in future studies therefore seems to be a detailed evaluation of effect modification by predisposing factors and lifestyle habits.

Ethical, economical and practical considerations prohibit long-term RCTs for the research questions in which we might be interested. So this is not ‘time out’, but a ‘come back’ to epidemiological studies on HRT.


    References
 Top
 Abstract
 The WHI study-a classical...
 Epidemiological studies on the...
 Time out or come...
 References
 
Freedman, L., Anderson, G., Kipnis, V., Prentice, R., Wang, C.Y., Rossouw, J., Wittes, J. and DeMets, D. (1996) Approaches to monitoring the results of long-term disease prevention trials: examples from the Women’s Health Initiative. Controlled Clin. Trials, 17, 509–525.[CrossRef][ISI][Medline]

Grodstein, F., Manson, J.E., Colditz, G.A., Willett, W.C., Speizer, F.E. and Stampfer, M.J. (2000) A prospective, observational study of postmenopausal hormone therapy and primary prevention of cardiovascular disease. Ann. Intern. Med., 133, 933–941.[Abstract/Free Full Text]

Hundrup, Y.A., Obel, E.B., Rasmussen, N.K. and Philip, J. (2000) Use of hormone replacement therapy among Danish nurses in 1993. Acta Obstet. Gynecol. Scand., 79, 194–201.[CrossRef][ISI][Medline]

Lokkegaard, E., Pedersen, A.T., Heitmann, B.L., Jovanovich, Z., Keiding, N., Hundrup, Y.A., Obel, E.B. and Ottesen, B. (2003a) Relation between hormone replacement therapy and ischaemic heart disease in women: prospective observational study. Br. Med. J., 326, 426–430.[Abstract/Free Full Text]

Lokkegaard, E., Jovanovic, Z., Heitmann, B.L., Keiding, N., Ottesen, B., Hundrup, Y.A., Obel, E.B. and Pedersen, A.T. (2003b) Increased risk of stroke in hypertensive women using hormone replacement therapy. Analyses based on the Danish Nurse Cohort Study. Arch. Neurol. In press.

Matthews, K.A., Kuller, L.H., Wing, R.R., Meilahn, E.N. and Plantinga, P. (1996) Prior to use of estrogen replacement therapy, are users healthier than nonusers? Am. J. Epidemiol., 143, 971–978.[Abstract]

Women’s Health Initiative Study Group (1998) Design of the Women’s Health Initiative Clinical Trial and Observational Study. Controlled Clin. Trials, 19, 61–109.[CrossRef][ISI][Medline]

Submitted on May 15, 2003; accepted on July 9, 2003.