University of Colorado Health Sciences Center, Colorado Center for Bone Research, Lakewood, Colorado 80227
Address all correspondence and requests for reprints to: Paul D. Miller, M.D., Clinical Professor of Medicine, University of Colorado Health Sciences Center, Medical Director, Colorado Center for Bone Research, Lakewood, Colorado 80227.
In this issue of the JCEM, Watts et al. (1) report on the results from a post hoc, subset analysis of a higher-risk cohort from the original United States and multinational risedronate vertebral fracture trials. In this analysis, the authors examined the first-year risk of incident vertebral fracture in three subsets of study subjects: patients more than 70 yr old vs. less than 70 yr old; patients with two or more prevalent vertebral compression fractures at randomization; and, patients with T-scores at the femoral neck calculated from the National Health and Nutrition Examination Survey (NHANES) III reference population database or lumbar spine by manufacturer-specific reference databases equal to or less than -2.5 SD (1). The authors then compared fracture risk reduction in these risedronate-treated higher-risk groups to overall risk reduction in the original cohort of the risedronate vertebral fracture trials (2, 3). The data from the placebo arms and the treatment arms both for their higher-risk subset analysis as well as from the original cohort are shown in Fig. 1 of the article. Whereas it is clear that these higher-risk individuals did, indeed, have a greater risk for incident vertebral fractures in the placebo groups, it did not appear that risedronate reduced incident vertebral fracture rates to any greater extent in this higher-risk subset than in their lower-risk subset; or, as compared with the 1-yr risk reduction in the overall original risedronate cohort. The lack of any greater benefit from risedronate therapy in the higher-risk group does not diminish the importance of the fundamental observations: risedronate reduces incident vertebral fractures in postmenopausal women with low bone mass and prevalent vertebral fractures by 60% within 1 yr. Thus, whereas it may take years for postmenopausal women to reach the level of low bone mass that was measured at baseline, this bisphosphonate rapidly reduces future fracture risk. Rapid risk reduction, within 1 yr of therapy, has also been seen in both the alendronate and raloxifene clinical trials, when a post hoc analysis of clinical vertebral fracture events was carried out (4, 5). Yet, because the 1-yr risk reduction was not greater in the higher-risk risedronate-treated groups as compared with the overall risedronate study cohort, a fundamental question remains: Do these drugs confer a greater benefit, despite greater risk? A second question that follows from the first is: Are we justified in targeting osteoporosis therapy only on those who are at the greatest risk vs. those who might be at lesser risk? Whereas it is important to stress that all of the risedronate vertebral fracture patients in both the United States and multinational studies were at increase fracture risk because they all had to have at least one vertebral fracture at baseline, the post hoc subset analysis categorized as "high risk" vs. the study population as a whole might create the perception that treatment is best reserved for higher-risk patients. Yet, because the risedronate-mediated fracture risk reduction was equivalent between these two groups despite different fracture risk in the untreated groups, it forces the question: What level of baseline risk really predicts a pharmacological response? How are risk factors, especially the ones defined by Watts et al. (1), as categorizing their higher-risk subset related to fracture risk from the prospective epidemiological fracture trials? And, exactly what level of T-score, age, and/or prevalent vertebral fractures best defines response to reduce fractures with osteoporosis-specific pharmacological agents? Is there even a lower level of risk that nevertheless defines a pharmacological benefit to fracture risk reduction?
Many prospective epidemiological surveys have confirmed that in postmenopausal women fracture is powerfully related to the T-score: the lower the bone mineral density (BMD) (or T-score) the greater the risk (6, 7, 8). Additionally, with advancing age fracture risk increases even at the same BMD level (9). Individuals who have already sustained vertebral fractures are at substantially greater risk for another vertebral or a hip fracture (10). This latter greater risk is independent of the prevailing T-score. Both the number of vertebral fracture and their severity are powerful predictors of future fracture events (11, 12). The risedronate vertebral study design had the capacity to show, prospectively, that even within 1 yr in untreated patients, the greater the number of baseline, prevalent vertebral fractures (even asymptomatic vertebral fractures), the greater the risk for incident fracture (11). Because the prevalence of asymptomatic vertebral fractures approaches 20% in Caucasian men and women at age 70 yr, it could be argued that direct imaging of the spine with Instant Vertebral Assessment, Lateral Vertebral Assessment, or routine x-ray is needed to determine the prevalence of vertebral fractures, regardless of what their T-score is (13, 14, 15).
If prevalent vertebral fractures clearly predict increased risk for subsequent fractures, at what age and at what T-score does risk increase? BMD-associated risk is a gradient function well defined by the curve that describes exponentially increasing risk with lowering T-score. Age-associated risk is also a gradient function with a curve that describes exponentially increasing risk with advancing age. Although these gradients place the older or the more severely osteoporotic at markedly greater risk for fracture, younger, postmenopausal women with smaller degrees of reduction in T-scores are not free of risk.
The National Osteoporosis Risk Assessment data documented that postmenopausal women with low T-score have an increase risk of fracture within 1 yr, even if they are relatively young, between 50 and 59 yr of age (16, 17). Although peripheral technology device-derived T-scores are not the same as central dual-energy x-ray absorptiometry (DXA) technology device-derived T-scores, and cannot be used for classification of osteoporosis or osteopenia by the World Health Organization (WHO) criteria (18), it is clear that if a peripheral T-score is -1.0 or less in the early postmenopausal population, the risk for both global as well as hip fracture is increased. Furthermore, several prospective studies have recently shown that even using the NHANES III database for T-score calculation at the hip, 50% of postmenopausal women will suffer hip fractures if left untreated, at T-scores better than -2.5 SD (i.e. osteopenia; Ref. 19). This is due, in part, to the fact that there are many more individuals with osteopenia than there are with osteoporosis. So, although the risk (relative or absolute risk as a function of the level of T-score) is less among those with osteopenia, there will be more fractures simply as a function of a larger population base. We need to give more thought to the younger, osteopenic postmenopausal population, in terms of proactive therapeutic approaches.
Now, for the major question: Do osteoporosis-specific pharmacological agents reduce risk in the osteopenic population? How strong is the evidence that pharmacological intervention for postmenopausal osteoporosis is best reserved for patients who have prevalent vertebral fracture, or have osteoporosis by WHO criteria at the spine or at the hip?
The data that have led to our perceptions that only patients with osteoporosis by WHO criteria and/or prevalent fractures benefit from osteoporosis-specific pharmacological interventions are, actually, quite limited. Many patients in the clinical trials who were randomized did not have osteoporosis by WHO criteria (i.e. T-score -2.5 or lower). Many trials only required T-scores of -2.0 SD, such as the risedronate vertebral fracture trials and the alendronate Fracture Intervention Trials (FIT). In addition, some of the data are based on post hoc analysis of the original randomization outcomes. Post hoc analysis per se may not allow one necessarily to come to valid conclusions. Post hoc analysis results are often skewed by selection biases. In addition, when subgroups are constructed and comparisons are made, it is likely that there is not enough "power" to see differences, even if they existed. Furthermore, the initial clinical trial randomization T-score criteria for inclusion in many of the clinical trials have been derived from inconsistent reference population databases between different central DXA manufacturers, which may lead to different T-scores between different DXA machines, even within the same clinical trial. These inconsistent databases may yield spurious conclusions about which T-score level really provided evidence of a pharmacological benefit (i.e. WHO osteoporosis by one DXA manufacturer could be osteopenia by a different DXA manufacturer). Some clinical trials used only Hologic DXA machines (alendronate FIT), some used both Hologic and Lunar-GE (risedronate trials), and some used Hologic, Lunar-GE, and Norland DXA machines [Prevention of Osteoporotic Fractures (PROOF) and Multiple Outcomes Raloxifene Evaluation (MORE)]. All calculated T-scores from manufacturer-specific reference population databases, which are inconsistent between manufacturers and may yield different T-scores at the spine and at the hip in individual patients. For example, using manufacturer-specific reference population databases at the hip, there is a 1.0 SD difference between Hologic and Lunar-GE DXA machines, with Hologic being lower than Lunar-GE (20). When it was recognized that the only consistent reference population database, NHANES III, which was developed after all of the current antiresorptive clinical trials were planned, eliminated the T-score discrepancies for hip measurements between the central DXA devices, most clinical trials then reanalyzed their data as a function of the NHANES III database (21). Because using the NHANES III reference population database yields a T-score nearly 1 SD higher than the T-score calculated from the Hologic database, many patients who had previously been randomized in clinical trials with the assumption that they had WHO osteoporosis using the manufacturer-specific reference database now had osteopenia using the NHANES III reference database. Thus, the post hoc reanalysis of pharmacological effect that have been done using NHANES III have now examined efficacy in a lower-risk and now often osteopenic population, simply as a function of changing the database and not the patients. Manipulating the database can manipulate the outcome. In addition, T-scores still currently calculated from manufacturer-specific spine databases may differ by 0.5 SD between central DXA manufacturers (22, 23). These WHO discrepancies become even more divergent when incorporating the multitude of peripheral devices into clinical assessment, all of which calculate T-scores from their manufacturer-specific reference population database (24). Only the standardization of reference population databases between all manufacturers will solve this dilemma and potentially allow, as was done by the NHANES III database for the hip, the use of a consistent WHO classification between all central and peripheral BMD devices (25). In view, then, of these database differences, what about the evidence for a pharmacological benefit in patients without prevalent vertebral fractures as a function of the T-score?
The reduction in either any clinical fracture or hip fractures in the clinical fracture arm of the alendronate FIT-2 was seen only at T-scores of -2.5 or below, in a post hoc analysis, recalculating the T-scores from the initial Hologic reference database to the NHANES III database. The initial pre-specified randomization in FIT-2, done on all Hologic DXA machines, required the entry level T-score to be -2.0 or lower as derived from the Hologic reference population database. Patients in FIT-2 did not have a prevalent vertebral fracture (26). When any clinical (primary end point) or hip fracture results in FIT-2 were analyzed from the original pre-specified Hologic database, there was no significant reduction in these fractures. In the subsequent reanalysis, using the NHANES III reference database, any clinical as well as hip fracture reduction in FIT-2 now became significant, in the subset of the FIT-2 patients with T-scores less than -2.5 (4). Recognize that a T-score of -2.0 calculated from the Hologic database (the pre-specified entry criteria for FIT-2) would be approximately -1.0 by the NHANES III database. The raloxifene MORE data showed a benefit of raloxifene in postmenopausal women without prevalent vertebral fractures who were randomized at T-scores of -2.5 or lower, but again the T-scores at the spine and hip were all derived from each of the three manufacturer-specific databases (Hologic, Lunar-GE, and Norland; Ref. 27). No reanalysis for outcome has been done for the entire initial cohort in MORE using the hip NHANES III database for T-score calculations. So, within MORE, the T-score could have differed by 1.0 SD at the hip and by 0.5 SD at the spine between individual patients according to the DXA manufacturer used. It is also important to emphasize that in the MORE data the benefit of raloxifene to reduce incident vertebral fracture was greater in those patients without prevalent fractures at baseline than in those patients with prevalent fractures, the only study, to date, showing that the benefit was greater in those with a lower risk. In the risedronate hip trial, the pre-specified end point in group I, those with osteoporosis by WHO criteria, clearly showed a significant reduction in hip fractures in the entire pre-specified cohort (28). This end point was achieved regardless of the randomization database or manufacturer DXA machine used for randomization. The risedronate hip trial used both Hologic as well as Lunar-GE machines, and the initial randomization criteria for group I required a manufacturer database-derived T-score of less than -4.0 SD in patients without any additional risk factors for hip fracture. When the post hoc reanalysis of the outcome in the risedronate group I hip data was completed using the NHANES III reference database, all of the patients still had osteoporosis by WHO criteria (mean T-score, -2.56 SD). Hence, the risedronate hip trial showed this significant benefit in those with osteoporosis by WHO criteria, regardless of the database used for T-score calculation. Yet, in another post hoc analysis in the risedronate group I hip fracture group, when the patients were subdivided into those with and those without a prevalent vertebral compression fracture and outcomes assessed, risedronate did not have any benefit to reduce hip fractures in those patients without a prevalent fracture, even though they had WHO-defined osteoporosis. The Womens Health Initiative (WHI) study is important to this dialogue (29). In the WHI study, there was a significant reduction in both clinical vertebral as well as hip fracture incidence in women who did not have osteoporosis. The WHI study, in fact, is the first prospective study to demonstrate unequivocally that a therapeutic agent for osteoporosis can reduce the incidence of clinical vertebral as well as hip fractures among a population that was not osteoporotic. In addition, recent post hoc data analyzed from a subset of the raloxifene as well as the combined FIT-1 and FIT 2 alendronate clinical trials (again by the NHANES III database) suggest that patients with osteopenia also have a reduction in vertebral and/or all clinical vertebral fracture events, even among patients without a prevalent vertebral fracture (30, 31). These newer observations suggest that osteopenic patients may benefit from antiresorptive treatment. Finally, a recent analysis of cost-effectiveness of interventions in patients without osteoporosis suggests that the inclusion of all (global) osteoporotic fractures in assessment of efficacy as well as costs is important, because by including all (global) fractures in an analysis, it appears that all available treatments can be cost-effectively targeted to individuals at only moderately increased risk (32). Because 50% of the osteoporotic fractures occur at T-scores better than -2.5 and with the evidence suggesting benefit from treatment even in the osteopenic group, we should be cautious with regard to advising or restricting pharmacological intervention only to patients with osteoporosis, defined either by prevalent fracture or WHO criteria. Since the number needed to treat (NNT) is highly dependent on the risk in the placebo group, and the risk in the higher-risk placebo groups described by Watts et al. (1) was higher than the risk in their study population as a whole, the NNT would obviously be lower in the authors higher-risk population. Yet, the benefit with regard to risk reduction was no different between the groups in the study by Watts et al. (1). Hence, whereas health economists will undoubtedly address NNT values between lower-risk vs. higher-risk groups, the fundamental observation for clinicians is that risk reduction might be comparable whether the risk is higher risk or less than higher risk, even though the NNT will differ. Even in the WHI analysis as well as the post hoc analysis of the FIT and MORE trials where the patients were osteopenic, incident fracture reductions ranged from from 3460%, not too dissimilar from the analysis of vertebral fracture reduction in the study by Watts et al. (1) in patients with prevalent vertebral fractures, even using a T-score less than -2.5 at the spine by manufacturer-specific databases and less than -2.5 by NHANES III at the hip (48% and 60%). This realization of what might be similar outcomes between osteoporotic and osteopenic populations as well as the realization that there was a greater benefit in MORE in patients who did not have prevalent vertebral fractures and might not even have had WHO-defined osteoporosis, depending on the DXA manufacturer and/or reference database used for T-score calculation, must make us cautious about dogma in establishing guidelines about what patients we should treat for postmenopausal osteoporosis.
Hence, while it is true that the greater the risk, the greater the risk; it may not necessarily be true that the greater the risk, the greater the benefit.
Acknowledgments
Footnotes
Abbreviations: BMD, Bone mineral density; DXA, dual-energy x-ray absorptiometry; FIT, Fracture Intervention Trial; MORE, Multiple Outcomes Raloxifene Evaluation; NHANES, National Health and Nutrition Examination Survey; NNT, number needed to treat; WHI, Womens Health Initiative; WHO, World Health Organization.
Received November 29, 2002.
Accepted December 12, 2002.
References
HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |
Endocrinology | Endocrine Reviews | J. Clin. End. & Metab. |
Molecular Endocrinology | Recent Prog. Horm. Res. | All Endocrine Journals |