Affiliations of authors: S. Yasmeen, P. S. Romano, J. A. Robbins, University of California, Davis; M. Pettinger, Womens Health Initiative Clinical Coordinating Center, Fred Hutchinson Cancer Research Center, Seattle, WA; R. T. Chlebowski, HarborUniversity of California at Los Angeles Research and Education Institute, Torrance, CA; D. S. Lane, State University of New York, Stony Brook; S. L. Hendrix, Wayne State University, Detroit, MI.
Correspondence to: Shagufta Yasmeen, M.D., University of California, Davis, Department of Obstetrics/Gynecology and Internal Medicine, 4860 Y St., Suite 2500, Sacramento, CA 95817 (e-mail: syasmeen{at}ucdavis.edu).
![]() |
ABSTRACT |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
![]() |
INTRODUCTION |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
The likelihood of breast cancer in women who have mammographic recommendations for short-interval follow-up may depend on risk factors such as age, family history, and hormone use, as well as the accuracy of the mammogram and its interpretation. Considerable variation exists among radiologists in the interpretation and subsequent management of these abnormalities (35).
Lesions that are judged to have a low probability of malignancy can be managed by prompt biopsy, early follow-up mammography, or routine follow-up mammography. On the basis of limited and indirect evidence, it has been suggested that careful mammographic surveillance (i.e., every 36 months) of such lesions may result in the detection of more asymptomatic cancers at an early stage when the prognosis is still favorable as well as a reduction in the morbidity and the cost associated with biopsies (4,6).
However, early recall for follow-up testing on screening mammography causes anxiety, psychological morbidity, and increased health care utilization in the year after a false-positive mammogram is reported (7,8). Furthermore, it is questionable whether repeating mammograms at 36 months increases the overall sensitivity of the test or has any positive impact on breast cancer outcomes. Current information regarding the benefits and the positive predictive value of a recommendation for short-interval follow-up is limited (9,10). In one recent report (8), only two breast cancers were found at a 6-month follow-up among 3184 mammograms with such recommendations. Indeed, the BI-RADS manual (2) states that "most approaches (to this problem) are intuitive . . . (recommendations) will likely undergo future modification as more data accrue as to the validity of an approach, the interval required, and the type of findings that should be followed" (2).
The current study was designed to evaluate the prevalence and positive predictive value of mammographic recommendations for short-interval follow-up in a longitudinal prospective cohort of postmenopausal women participating in the Womens Health Initiative (WHI) at 40 clinical centers throughout the United States.
![]() |
SUBJECTS AND METHODS |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
The WHI is a prospective study of 161 860 postmenopausal women aged 5079 years who were enrolled from 1993 through 1998 at 40 clinical centers throughout the United States. The study population included women who enrolled in any of the WHI Clinical Trial (CT) arms (Dietary Modification [DM-CT], Hormone Replacement Therapy [HRT-CT], or Calcium/Vitamin D; 68 135 women) and who also had a baseline screening mammogram (n = 68 126 women).
Study methods for these WHI trials have been described in detail elsewhere (11). Briefly, women were eligible for enrollment if they were postmenopausal, unlikely to move or die within 3 years, and not currently participating in any other clinical trial. Women with a prior breast cancer diagnosis were excluded. At baseline, women completed screening and enrollment questionnaires by interview and self-report, underwent a physical examination, and provided a blood specimen. Special efforts were made to recruit a diverse sample that represented the population of community-dwelling, postmenopausal women in the United States.
We obtained reports of all screening mammograms for WHI clinical trial participants from the Clinical Coordinating Center in Seattle, Washington. Reports submitted before February 1995 (N = 9669) could not be analyzed because of inadequacies in the original design of the mammography reporting form. After having a negative screening examination (defined as BI-RADS category 1 or 2), HRT-CT participants had yearly screening mammograms and DM-CT participants had biennial screening mammograms. Although all participants received annual clinical breast examinations, the results of those examinations were not collected by the WHI Coordinating Center. Only women who completed at least 2 years of follow-up after having a baseline mammogram that was classified as "negative," "benign," or "probably benign finding short-interval follow-up suggested" were included in the current study to avoid bias due to differences in 1-year follow-up procedures across the study arms. Baseline mammography preceded a womans assignment to a specific intervention. Women with baseline category 4 ("suspicious abnormality biopsy should be considered") or category 5 ("highly suggestive of malignancy appropriate action should be taken") mammograms were not eligible for enrollment in the WHI unless subsequent evaluation, performed independently of the WHI, ruled out malignant disease. Because we do not know the number of women who were deemed ineligible for this reason (and who, therefore, were never entered into our dataset), we excluded all women with baseline category 4 or 5 mammograms from our analyses. The study was reviewed and approved by the Human Subjects Review Committee at each participating institution, and all participants provided written informed consent.
Because our principal aim was to evaluate radiologists recommendations for short-interval follow-up (i.e., "less than 1 year" instead of "1 year" or "2 years"), we classified mammograms on the basis of whether such a recommendation appeared in the radiologists summary. Unfortunately, as previously documented by other authors (12), radiologists assessments and recommendations are sometimes inconsistent. In addition, the BI-RADS assessment scheme was not used universally until April 1999, which prevented us from analyzing our data strictly according to BI-RADS assessment categories. The WHI follows BI-RADS reporting practices, however, in that a definitive interpretation is provided only after "assessment is complete," based on any additional studies (e.g., magnification, ultrasound) that the interpreting radiologist may deem necessary (13). BI-RADS category 0 mammograms ("need additional imaging evaluation") must be resolved before they can be reported to the WHI Coordinating Center. Regardless of whether mammography was performed at a clinical center participating in the WHI or in the community, all results were entered into the WHI database. Because breast cancer is a primary endpoint in the WHI, all abnormal mammograms are followed up, in accordance with radiologists recommendations, until a diagnosis is obtained. Active follow-up using hospital, cancer registry, and mortality data ensures nearly complete ascertainment of relevant outcomes.
Potential breast cancer risk factors, including age, body mass index, waist-to-hip ratio, family history of breast cancer, hormone use, smoking history, alcohol use, level of physical activity, age at menarche, nulliparity, age at first full-term birth, age at menopause, education level, family income, and having medical insurance were collected from various WHI forms (11). Among all study participants, baseline characteristics were compared between women with a recommendation for short-interval follow-up and women without such a mammographic recommendation. Students t test was used to compare the means of continuous variables, and the chi-square test was used to compare the distributions of categorical variables.
For the current analyses, we combined each womans mammography results from the right and left breasts and used the most severe reading to categorize each participant. The primary study outcome was in situ or invasive breast cancer diagnosis based on local and central WHI adjudication following a review of medical records of each study participant within 2 years of the date of the baseline mammogram. The outcomes of women who had mammograms with a recommendation for short-interval follow-up were compared with the outcomes of women who had "negative" and "benign" mammograms. In accordance with previous studies, "negative" and "benign" mammograms were considered false negatives if the participant was diagnosed with breast cancer within the subsequent 2 years (14). A mammogram with a recommendation for short-interval follow-up was considered a false positive if the participant was found not to have breast cancer within 2 years of follow-up. Predictive value (positive or negative) was defined as the percentage of women with a specific mammographic finding who received a diagnosis of breast cancer within the subsequent 2 years, regardless of when the cancer was reported to the WHI. Our primary analysis included all baseline mammograms except those with category 4 or 5 recommendations. We performed a secondary analysis that was limited to women who had a "negative" or "benign" baseline mammogram and at least 2 years of follow-up after having a subsequent (annual or biennial) screening mammogram. In this latter analysis of new (incident) mammographic abnormalities, we classified each woman according to the most serious interpretation among all mammograms obtained after her baseline mammogram. A comparison mammogram was randomly selected for each woman with a series of "negative" mammograms. The likelihood ratio (LR) for each summary category was empirically derived from the pretest and post-test odds of breast cancer in the WHI sample. We estimated 95% confidence intervals (CIs) by using a recently described objective Bayesian technique, which outperforms traditional methods of estimating 95% CIs when observed probabilities are close to zero (15,16).
Statistical Analyses
Our primary aim was to estimate the predictive value of a recommendation for short-interval follow-up among a national sample of women. We therefore focused our power analysis on the width of the 95% CIs surrounding these estimates. Given that 24 (3%) category 3 mammograms were identified among the 750 mammograms at the WHI clinical center at the University of California, Davis, and that the prevalence of category 3 mammograms reported in previous studies was as high as 11% (5,9), we expected to find approximately 2000 baseline mammograms with a recommendation for short-interval follow-up in the national WHI database. We estimated that if the positive predictive value of these mammograms was 2%, then the 95% CI would have a width of 1.2% (i.e., 95% CI = 1.4% to 2.6%). This CI was sufficiently narrow to justify this study (17).
A secondary objective of this study was to estimate regional variation in the reported prevalence of short-interval follow-up recommendations. The prevalence and predictive value of these recommendations were computed for each of the 40 clinical centers. Prevalence was defined as the percentage of baseline mammograms with a recommendation for short-interval follow-up. The 40 clinical centers were categorized into four quartiles on the basis of prevalence. We then compared the predictive value of the recommendation for short-interval follow-up across these four prevalence quartiles to determine whether this recommendation was associated with a higher likelihood of breast cancer at centers that applied it more cautiously compared with centers that were less selective in applying this interpretation.
We used multivariable logistic regression at the patient level to test whether the prevalence of a recommendation for short-interval follow-up varied across clinical centers, both before and after adjusting for differences in participant characteristics. Our initial regression model had the recommendation for short-interval follow-up as a dependent variable and 39 dummy variables representing 40 clinical centers as independent variables. To determine whether the variation among clinical centers was due to differences in participant characteristics, we ran additional regression models that included demographic, socioeconomic, lifestyle, reproductive, and family history risk factors for breast cancer. To test whether this variation could be due to preexisting cancer, we further augmented the model by using breast cancer that was diagnosed during the 2 years subsequent to baseline (which was presumably present at baseline) as a predictor. The likelihood ratio chi-square test, with 39 degrees of freedom, was used to evaluate the statistical significance of clinical center as an independent predictor. All P values are two-sided. Statistical analyses were conducted using Statistical Analysis Software (SAS), version 8 (SAS Institute, Cary, NC).
![]() |
RESULTS |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
|
|
|
|
|
|
The adjusted odds ratio for having a short-interval follow-up recommendation (versus a recommendation for routine 1- or 2-year follow-up) at baseline varied from 0.4 to 3.2 across WHI centers when we used the largest center as the reference group (data not shown). When we categorized the clinical centers into quartiles on the basis of prevalence of short-interval follow-up recommendations at baseline, the predictive value of a short-interval follow-up recommendation was not statistically or clinically significantly different across the quartiles (data not shown), as would have been expected if centers with a higher prevalence were less selective in offering this recommendation.
We extended our analysis by calculating the incidence of breast cancer among WHI participants who had a baseline mammogram described as "negative" or "benign" and then subsequently had an annual or biennial follow-up mammogram with a recommendation for short-interval follow-up (Table 5). The cumulative incidence of short-interval follow-up recommendations among the participants with "negative" or "benign" baseline mammograms was 7.2% (data not shown), which was slightly greater than the point prevalence of such recommendations at baseline among all participants (5%). Among women who had completed at least 2 years of further follow-up, the incidence of newly diagnosed breast cancer was 83.8% (likelihood ratio [LR] = 531, 95% CI = 234 to 1320), 24.5% (LR = 33.4, 95% CI = 26.8 to 40.9), and 2.1% (LR = 2.20, 95% CI = 1.65 to 2.86) within 2 years after receiving mammographic recommendations that were consistent with BI-RADS categories 5, 4, and 3, respectively. Among the 44 (2.1%) women who had a breast cancer diagnosis after a short-interval follow-up recommendation, 36 women had cancer involving the same breast, 34 women had invasive breast cancer, and five women had associated nodal involvement at the time of diagnosis (Table 5
). By comparison, the incidence of newly diagnosed breast cancer was 0.5% (LR = 0.54, 95% CI = 0.43 to 0.66) and 0.4% (LR = 0.37, 95% CI = 0.28 to 0.47) within 2 years after "benign" and "negative" mammograms, respectively. The overall sensitivity of mammography, based on 2 year follow-up, was 58% if mammograms with short-interval follow-up recommendations were classified as positive and 43% if these mammograms were classified as negative.
![]() |
DISCUSSION |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Our primary analyses focused on baseline mammograms because use of hormone replacement therapy, a potentially important confounder of the association between mammographic abnormalities and breast cancer, was unknown for HRT-CT participants at follow-up. However, our secondary analyses of subsequent mammograms revealed that a recommendation for short-interval follow-up was associated with a surprisingly low sensitivity of 58% (if we classified such mammograms as positive). This high false-negative rate should be interpreted cautiously, given the unusually high vigilance, with respect to breast health, of participants and physicians in the WHI. Many of these women were taking, or believed that they were taking, a medication known to increase the risk of breast cancer. In addition, these women were relatively well educated and sufficiently motivated and concerned about their health to enroll in a major randomized clinical trial. Hence, it is likely that they were receiving regular clinical breast examinations and doing frequent breast self-examinations. These factors might have led to the earlier clinical detection of cancers that, under ordinary circumstances, would have been detected only by mammography.
Our findings regarding the prevalence and predictive value of short-interval follow-up recommendations are consistent with the predictive values of 2.0% or less that were reported by previous studies (9,10,19) of mammographic follow-up of "probably benign finding short-interval follow-up suggested" (i.e., BI-RADS category 3) lesions. The only studies that reported higher predictive values than ours included subsets of category 3 mammograms that were referred for biopsy (9,20). We found that the predictive value of recommendations for short-interval follow-up was slightly higher among women 7079 years of age than among younger women, principally because prevalence of disease is greater among older women. By comparison with a prior study (21), we found a slightly lower positive predictive value associated with BI-RADS category 3 mammograms and a higher false-negative rate associated with BI-RADS category 1 and 2 mammograms. The superior performance of screening mammography in that study may be due to several factors, such as more aggressive follow-up or the possibility that higher quality mammography was performed at academic centers such as the University of California, San Francisco (9,19).
The temporal distribution of breast cancer diagnoses after baseline mammography suggests that a few cancers were, in fact, diagnosed earlier than they would have been had those women not received a recommendation for an early recall for follow-up mammography. However, the value of short-interval follow-up mammography was limited because only 11 cases of breast cancer, representing approximately 0.4% of the 2927 mammograms with that recommendation, were diagnosed during the 1-year period before the next regularly scheduled mammogram. Although 19 women were diagnosed with breast cancer in the second year after having a mammogram with such a recommendation, they appear not to have benefited from short-interval (6-month) follow-up because most of their 19 cancers presumably would have been diagnosed at about the same time if follow-up had been scheduled at 1 year. This result is supported by other studies (9,20) that have examined relatively large numbers of category 3 mammograms and their subsequent clinical outcomes. For example, in one study (9) of 3184 category 3 mammograms, only two breast cancers (0.06%) were found at the 6-month follow-up.
The size and geographic distribution of the study sample and the completeness of data collection at each study center are important strengths of our study. The WHI has 40 centers in a wide range of both academic and community settings, which were selected to maximize generalizability of trial results. The reporting of mammography results is likely to be accurate, because all abnormal mammograms receive special attention. Each category of mammogram is required by the WHI protocol to have appropriate follow-up, and the results must be documented at semiannual and annual visits before the study pills are dispensed. Our study also had the advantage that false-negative baseline mammograms could be identified because the WHI protocol requires aggressive follow-up of every participant. Participants who stop visiting the clinic are still contacted to obtain study endpoint data, which include the diagnosis of breast cancer from hospital and cancer registry data.
The quality of mammography varies greatly across the United States, such that the predictive value of an abnormal screening mammogram may be three times higher in academic centers than in community-based practices (22). WHI clinical centers use both university-based and community-based mammographic facilities, and the proportions of each vary from center to center. Reliability of mammographic interpretations may be suboptimal because each mammogram was probably interpreted by only one radiologist. Although these design features may explain why the positive predictive value of mammograms with a short-interval follow-up recommendation and the negative predictive value of "negative" and "benign" mammograms were lower among women participating in the WHI than among women in some prior studies (19,20), they also enhance the generalizability of our results to community practice.
Our study has several limitations. First, the results of our study are not applicable to premenopausal women or women aged 4049 years who were ineligible to participate in the WHI. The predictive value of short-interval follow-up recommendations among premenopausal women may be lower or higher than the estimates reported here because breast cancers in this age group are less prevalent but more aggressive than they are in postmenopausal women (23,24). Second, we focused on radiologists recommendations for short-interval follow-up rather than on the BI-RADS assessment scheme per se. We did so for two reasons. BI-RADS usage was not mandated under the Mammography Quality Standards Act (25) until April 1999; therefore, WHI data collectors had difficulty retrospectively assigning some prior mammography reports to BI-RADS categories. In addition, radiologists assessments and recommendations are sometimes inconsistent. In the primary care setting, physicians are likely to focus on and follow the radiologists final recommendation and only cursorily scan the accompanying explanation and assessment. Third, the results of interim steps in evaluating abnormal findings, such as 3- to 6-month follow-up mammography after a baseline recommendation for short-interval follow-up, are not documented in the WHI database. Therefore, we have no information on the diagnostic pathway by which women were proven to have, or not to have, breast cancer after receiving a mammographic recommendation for short-interval follow-up. We also have no data on whether a radiologist requested additional imaging before offering a definitive interpretation.
Given the fact that recommendation for short-interval follow-up may account for over 40%50% of abnormal screening mammograms (12,21), our results should stimulate re-examination of the criteria used to make this recommendation. In addition, the recommended timing of short-interval follow-up at 6 months should be critically examined. BI-RADS category 3 mammograms have economic implications for health care delivery, and emotional consequences for the women that receive them, because they inevitably lead to more mammographic examinations and, probably, more biopsies for changes that turn out to be benign. Our results, and those of other recent studies (5,9,21), suggest that a 1-year follow-up of BI-RADS category 3 mammograms may be as or more appropriate than a 6-month follow-up. The higher predictive value of new BI-RADS category 3 abnormalities, after a woman has had a normal baseline mammogram, suggests that the availability of prior mammograms may improve the usefulness of this classification.
![]() |
NOTES |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Supported by N01-WH-32100 through -2102, -2105, -2106, -2108 through -2113, -2115, -2118 through -2120, and -2122; M01-RR-00-425; N01-WH-4-2107 through -2126, and -2129 through -2132 as part of the WHI program (funded by the National Heart, Lung, and Blood Institute, National Institutes of Health, U.S. Department of Health and Human Services).
![]() |
REFERENCES |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
1 Humphrey LL, Helfand M, Chan BK, Woolf SH. Breast cancer screening: a summary of the evidence for the U.S. Preventive Services Task Force. Ann Intern Med 2002;137(5 Part 1):34760.
2 American College of Radiology. Breast Imaging Reporting and Data System (BI-RADS). 3rd ed. Reston (VA): American College of Radiology; 1998. [Last accessed: 02/06/2003.] Available at: http://www.imaginis.com/breasthealth/acrbi.asp and then click on the first link under "Additional Resources and References" at the bottom of the page (http://www.acr.org/cgi-bin/fr?mast:masthead-products,text:/departments/standaccred/birads-a.html).
3 Velanovich V. Immediate biopsy versus observation for abnormal findings on mammograms: an analysis of potential outcomes and costs. Am J Surg 1995;170:32732.[CrossRef][Medline]
4 Hall FM, Storella JM, Silverstone DZ, Wyshak G. Non-palpable breast lesions: recommendations for biopsy based on suspicion of carcinoma at mammography. Radiology 1988;167:3538.[Abstract]
5 Caplan LS, Blackman D, Nadel M, Monticciolo DL. Coding mammograms using the classification "probably benign finding--short interval follow-up suggested". AJR Am J Roentgenol 1999;172:33942.[Abstract]
6 Cyrlak D. Induced costs of low cost screening mammography. Radiology 1988;168:6613.[Abstract]
7 Barton MB, Moore S, Polk S, Shtatland E, Elmore JG, Fletcher SW. Increased patient concern after false-positive mammograms: clinician documentation and subsequent ambulatory visits. J Gen Intern Med 2001;16:1506.[CrossRef][Medline]
8 Ong G, Austoker J, Brett J. Breast screening: adverse psychological consequences one month after placing women on early recall because of diagnostic uncertainty. A multicenter study. J Med Screen 1997;4:15868.[Medline]
9 Sickles EA. Periodic mammographic follow-up of probably benign lesions: results in 3,184 consecutive cases. Radiology 1991;179:4638.[Abstract]
10 Orel SG, Kay N, Reynolds C, Sullivan DC. BI-RADS categorization as a predictor of malignancy. Radiology 1999;211:84550.
11 The Womens Health Initiative Study Group. Design of the Womens Health Initiative Clinical Trial and Observational Study. Controlled Clin Trials 1998;19:61109.[CrossRef][Medline]
12 Taplin SH, Ichikawa LE, Kerlikowske K, Ernster VL, Rosenberg RD, Yankaskas BC, et al. Concordance of breast imaging reporting and data system assessments and management recommendations in screening mammography. Radiology 2002;222:52935.
13 Imaginis. Breast cancer diagnosis. Mammogram interpretation: categories and the ACR/BI-RADS. [Last accessed: 02/06/2003.] Available at: http://www.imaginis.com/breasthealth/acrbi.asp and then click on the first link under "Additional Resources and References" at the bottom of the page (http://www.acr.org/cgi-bin/fr?mast:masthead-products,text:/departments/standaccred/birads-a.html).
14 Roberts MM, Alexander FE, Anderson TJ, Chetty U, Donnan PT, et al. Edinburgh trial of screening for breast cancer: mortality at seven years. Lancet 1990;335:2416.[Medline]
15 Mossman D, Berger JO. Intervals for posttest probabilities: a comparison of 5 methods. Med Decis Making 2001;21:498507.
16 Simel DL, Samsa GP, Matchar DB. Likelihood ratios with confidence: sample size estimation for diagnostic studies. J Clin Epidemiol 1991;44:76370.[Medline]
17 Weissert WG. Estimating the long-term care population: prevalence rates and selected characteristics. Health Care Financ Rev 1985;6:8391.[Medline]
18 Specificity of screening in United Kingdom trial of early detection of breast cancer. BMJ 1992;304:3469.[Medline]
19 Kerlikowske K, Grady D, Barclay J, Sickles EA, Ernster VL. Likelihood ratios for modern screening mammography. Risk of breast cancer based on age and mammographic interpretation. JAMA 1996;276:3943.[Abstract]
20 Varas X, Leborgne F, Leborgne JH. Nonpalpable, probably benign lesions: role of follow-up mammography. Radiology 1992;184:40914.[Abstract]
21 Lacquement MA, Mitchell D, Hollingsworth AB. Positive predictive value of the Breast Imaging Reporting and Data System. J Am Coll Surg 1999;189:3440.[CrossRef][Medline]
22 Brown ML, Houn F, Sickles EA, Kessler LG. Screening mammography in community practice: positive predictive value of abnormal findings and yield of follow-up diagnostic procedures. AJR Am J Roentgenol 1995;165:13737.[Abstract]
23 Retsky M, Demicheli R, Hrushesky W. Breast cancer screening for women aged 4049 years: screening may not be the benign process usually thought. J Natl Cancer Inst 2001;93:1572.
24 Retsky M, Demicheli R, Hrushesky W. Premenopausal status accelerates relapse in node positive breast cancer: hypothesis links angiogenesis, screening controversy. Breast Cancer Res Treat 2001;65:21724.[CrossRef][Medline]
25 Code of Federal Regulations: Quality Mammography Standards. Final Rule. 21 C.F.R. Parts 16 and 900 (1997). [Last accessed: 02/06/2003.] Available at: http://www.fda.gov/cdrh/mammography/mqsa_accomplishments.html.
Manuscript received February 27, 2002; revised January 7, 2003; accepted January 21, 2003.
This article has been cited by other articles in HighWire Press-hosted journals:
![]() |
||||
|
Oxford University Press Privacy Policy and Legal Statement |