ARTICLE

Performance of Diagnostic Mammography for Women With Signs or Symptoms of Breast Cancer

William E. Barlow, Constance D. Lehman, Yingye Zheng, Rachel Ballard-Barbash, Bonnie C. Yankaskas, Gary R. Cutter, Patricia A. Carney, Berta M. Geller, Robert Rosenberg, Karla Kerlikowske, Donald L. Weaver, Stephen H. Taplin

Affiliations of authors: W. E. Barlow, Center for Health Studies, Group Health Cooperative, Department of Biostatistics, University of Washington, Seattle; C. D. Lehman, Department of Radiology, Seattle Cancer Care Alliance, University of Washington; Y. Zheng, Department of Biostatistics, University of Washington; R. Ballard-Barbash, Applied Research Program, Division of Cancer Control and Population Sciences, National Cancer Institute, Bethesda, MD; B. C. Yankaskas, Department of Radiology, University of North Carolina, Chapel Hill; G. R. Cutter, Center for Research Design and Statistical Methods, University of Nevada at Reno; P. A. Carney, Norris Cotton Cancer Center/ Dartmouth-Hitchcock Medical Center/Department of Community and Family Medicine, Dartmouth Medical School, Lebanon, NH; B. M. Geller, Health Promotion Research, College of Medicine, University of Vermont, Burlington; R. Rosenberg, Department of Radiology, University of New Mexico, Albuquerque; K. Kerlikowske, Department of Epidemiology and Biostatistics, and General Internal Medicine Section, Department of Veterans Affairs, University of California, San Francisco; D. L. Weaver, Department of Pathology, University of Vermont, College of Medicine, Burlington; S. H. Taplin, Center for Health Studies, Group Health Cooperative, and Department of Family Medicine, University of Washington.

Correspondence to: William E. Barlow, Ph.D., Center for Health Studies, Group Health Cooperative, 1730 Minor Ave., Suite 1600, Seattle, WA 98101-1448 (e-mail: barlow.w{at}ghc.org).


    ABSTRACT
 Top
 Notes
 Abstract
 Introduction
 Methods
 Results
 Discussion
 References
 
Background: The performance of diagnostic mammography for women with signs or symptoms of breast cancer has not been well studied. We evaluated whether age, breast density, self-reported breast lump, and previous mammography influence the performance of diagnostic mammography. Methods: From January 1996 through March 1998, prospective diagnostic mammography data from women aged 25–89 years with no previous breast cancer were linked to cancer outcomes data in six mammography registries participating in the Breast Cancer Surveillance Consortium. We used the final mammographic assessment at the end of the imaging work-up to determine abnormal mammographic examination rate, positive predictive value (PPV), sensitivity, specificity, and area under the receiver operating characteristic (ROC) curve. We used age, breast density, prior mammogram, and self-reported breast lump jointly as predictors of performance. All statistical tests were two-sided. Results: Of 41 427 diagnostic mammograms, 6279 (15.2%) were judged abnormal. The overall PPV was 21.8%, sensitivity was 85.8%, and specificity was 87.7%. Multivariate analysis showed that sensitivity and specificity generally declined as breast density increased (P = .007 and P<.001, respectively), that previous mammography decreased sensitivity (odds ratio [OR] = 0.52, 95% confidence interval [CI] = 0.36 to 0.74; P<.001) but increased specificity (OR = 1.43, 95% CI = 1.31 to 1.57; P<.001), and that a self-reported breast lump increased sensitivity (OR = 1.64, 95% CI = 1.13 to 2.38; P = .013) but decreased specificity (OR = 0.54, 95% CI = 0.49 to 0.59; P<.001). ROC analysis showed that higher breast density and previous mammography were negatively related to accuracy (P<.001 for both). Conclusions: Diagnostic mammography in women with signs or symptoms of breast cancer shows higher sensitivity and lower specificity than screening mammography does. Higher breast density and previous mammographic examination appear to impair performance.



    INTRODUCTION
 Top
 Notes
 Abstract
 Introduction
 Methods
 Results
 Discussion
 References
 
Diagnostic mammography is commonly used to identify possible breast cancers in women who present with signs or symptoms of the disease. These signs or symptoms may include a palpable breast lump, nipple discharge or retraction, and breast dimpling or other skin changes. A diagnostic mammographic examination usually consists of standard screening views and additional views using spot compression and/or magnification of a specific area. Although mammography may be sufficient to evaluate the clinical finding, additional imaging with ultrasound, ductography, or other imaging techniques may also be done.

Sensitivity and specificity have been well studied for screening mammography studies (13) but not for diagnostic evaluations. Diagnostic mammography may have superior performance over screening mammography, because noticeable symptoms or clinical findings may indicate a more advanced tumor that is easier to locate and identify. Dee and Sickles (4) found that tumors detected by diagnostic mammography were larger than those detected by screening mammography. However, the performance of diagnostic mammography could also be altered by characteristics of the woman, including age, breast density, history of previous mammography, and presence of breast cancer symptoms.

We performed a large prospective study of women with signs or symptoms of breast cancer who were undergoing diagnostic mammography. The records from the mammography registries of the Breast Cancer Surveillance Consortium (BCSC), a population-based consortium of community radiology practices (5), were linked with cancer outcomes. We evaluated the performance of diagnostic mammography and how performance may be influenced by the characteristics of the women undergoing mammography.


    METHODS
 Top
 Notes
 Abstract
 Introduction
 Methods
 Results
 Discussion
 References
 
Study Sample and Population

For this study, we used diagnostic mammograms from six mammography registries of the BCSC performed from January 1996 through March 1998. Two other mammography registries were excluded because all cancer outcomes among women undergoing mammography had not been identified at the time of this study. Five of six mammography registries ask women to complete a questionnaire in conjunction with the mammogram. Only these five registries were used in analyses that examined individual patient factors that affected performance. Furthermore, only these five registries supply a linkage variable that allows mammograms interpreted by the same radiologist to be identified. The identities of the women, radiologists, and mammography facilities were encrypted before analysis so that no identifying information for patients, radiologists, or facilities was included. Individual mammography registries are not identified in this report. Institutional Review Board approval for use of the information for research was obtained at each registry, and each mammography registry has a federal certificate of confidentiality. Complete details of the confidentiality procedures have been described (6).

The BCSC categorizes the indication for mammography into one of four mutually exclusive categories: 1) screening (asymptomatic), 2) evaluation of breast problem (symptomatic), 3) additional evaluation of recent mammogram, or 4) short-interval follow-up. We included only examinations that were recorded as evaluation of breast problem (symptomatic) by the radiology facility. For simplicity, we call these "diagnostic mammograms." We used only the first diagnostic mammographic examination in the rare situation that a woman had had more than one. We excluded diagnostic mammograms that were indicated to be additional evaluations of recent mammograms or short-interval follow-up mammograms. We also excluded mammograms from patients who had had previous breast cancer (11.4%) or breast implants (1.7%) or who were younger than 25 years or older than 89 years (1.0%). Eight percent of the mammograms performed for the five registries were excluded because breast density data were either missing or not recorded on the four-point BI-RADSTM (Breast Imaging Reporting and Data System) scale (7). We included only those mammograms that had a BI-RADSTM assessment (described below) (7). Most of the examination records stated whether ultrasound was performed on the same day as the diagnostic mammographic examination and whether the result was used to arrive at the BI-RADSTM assessment. Initially, a BI-RADSTM assessment may have been made, and further imaging performed within 3 months may have modified the assessment. For other cases, an initial assessment was never recorded, and only the final assessment was recorded after all imaging had been performed. We used the final recorded assessment to evaluate the performance of a sequence of imaging that began with the diagnostic mammogram. For some cases, there was evidence that further imaging was done, but a final BI-RADSTM code was never assigned. These cases were included in our analyses as diagnostic mammograms with a BI-RADSTM assessment of 0.

We did not require that diagnostic views be taken because, in some cases, the "standard screening" views (craniocaudal and mediolateral oblique full breast views) would have been sufficient to either identify the abnormality or determine that further imaging was unnecessary. We did require that the radiologists characterize the indication for the examination as diagnostic.

Classification of Examinations as Positive or Negative for Analysis

The BCSC has adopted a standard definition of a "positive" mammographic examination to evaluate mammography. Positive mammograms include all those that have a BI-RADSTM assessment of 0 ("Need additional imaging evaluation"), 4 ("Suspicious abnormality; biopsy should be considered"), or 5 ("Highly suggestive of malignancy; appropriate action should be taken"). Additionally, positive mammograms include those that had an assessment of 3 ("Probably benign finding; short-interval follow-up suggested"), with an associated recommendation for immediate work-up (additional imaging, clinical evaluation, biopsy). Negative mammograms include those with a BI-RADSTM assessment of 1 ("Normal"), 2 ("Benign finding"), or 3 with a recommendation of short-term or normal follow-up rather than immediate work-up. The BI-RADSTM assessment was also used as an ordinal measure of suspicion for the receiver operating characteristic (ROC) analysis, which then allows alternative classifications of positive and negative. In summary, BI-RADSTM assessments of 1 and 2 were called negative, and assessments of 0, 4, and 5 were called positive, regardless of recommendations. Mammograms given a BI-RADSTM assessment of 3 were called positive if there was a recommendation for immediate work-up; otherwise they were called negative. Rosenberg et al. (8) describe how these definitions affect the performance of screening mammography.

Breast Cancer Identification and Modifying Variable Information

Mammography and cancer/pathology registries were linked by using demographic information (e.g., name, address, birth date) to identify women who had mammography and a subsequent breast cancer diagnosis. The linking was done by each mammography registry to ensure privacy of the patient records. Only histologically confirmed invasive and ductal carcinoma in situ (DCIS) breast cancers were included in the analysis. The follow-up period was 1 year after the diagnostic mammogram, although results were also computed for a 3-month follow-up period for comparison. Subsequent breast cancer may have been detected clinically or by subsequent mammography during the 1-year follow-up period.

The individual patient modifying factors that were considered were age, breast density, self-reported breast lump, and previous mammography. We assume that this information may be readily available to a radiologist when interpreting the mammographic examination. Age at time of mammography was categorized into five groups: 25–39, 40–49, 50–64, 65–79, and 80–89 years. The division at age 65 was chosen because Medicare reimbursement may impact the classification of mammograms as screening or diagnostic. The age category 40–49 years was also created to accommodate the policies of health care organizations that recommend mammographic examination screening start at either age 40 or age 50. Breast density was classified by the radiologist on the four point BI-RADSTM scale: 1) almost entirely fat; 2) scattered fibroglandular densities; 3) heterogeneously dense; and 4) extremely dense (7). At the time of their mammogram, women reported whether they had breast changes in the last 3 months. At the time of their mammogram, most women were asked specifically about presence of a breast lump, but other symptoms were collected differentially across mammography registries. If there was no positive report of a breast lump, the woman was classified as not having a breast lump. Only the presence of a breast lump by self-report was available for this study. Information was not available on whether a palpable mass was detected by a clinician. In the absence of reported symptoms, the radiologist may have had other reasons for classifying the mammographic examination as diagnostic, such as a finding of atypical hyperplasia or lobular carcinoma in situ during a previous biopsy (9). Previous mammography was assessed by using both self-reported and mammography registry information.

Performance Measures and Statistical Analysis

The primary performance outcomes are positive predictive value (PPV), sensitivity, specificity, and ROC area. All of these outcomes require evaluation of the true disease status at the time of the mammogram. However, in the absence of a positive mammogram, the true status of the woman may not be observed because she may not be sent for additional testing, including a biopsy. We assumed that breast cancers not detected by the diagnostic mammographic examination would be detected clinically or mammographically within 1 year of the diagnostic mammogram. This 1-year follow-up period is commonly used to assess the accuracy of screening mammography (10). We also assessed how patient factors affect the likelihood of having a positive mammogram, i.e., the abnormal mammographic examination rate.

We analyzed the performance measures, adjusting for woman-specific covariates. Abnormal mammographic examination rate was the number of positive mammograms divided by the number of diagnostic mammograms performed. PPV was the proportion of women who had a diagnosis of breast cancer within the follow-up period among women who had a positive mammogram. Sensitivity was the proportion of positive mammograms among woman who were diagnosed with breast cancer within the follow-up period. Specificity was the proportion of negative mammograms among women who were not diagnosed with breast cancer during the follow-up period.

Abnormal mammographic examination rate, PPV, sensitivity, and specificity were analyzed by using logistic regression for each covariate separately and for all covariates together in a multivariate model. Covariates included age group, breast density, self-reported breast lump, and previous mammogram. In the multivariate models, we adjusted for mammography registry, but the results are not presented for reasons of confidentiality. We also recognized that mammograms evaluated by the same radiologist might have had assessments that were more similar than mammograms evaluated by different radiologists. Consequently, for analysis, we used generalized estimating equations (GEE) with an independent working correlation matrix to adjust the standard errors for the correlation of mammograms from the same radiologist (11). For the logistic regression models, we used the SAS (12) procedure GENMOD with an empirical covariance matrix. All P values were two-sided, with {alpha} = 0.05. The same covariates were included in the multivariate model for each outcome, and each covariate was evaluated for statistical significance, adjusting for the other factors in the model. Odds ratios and associated 95% two-sided confidence intervals were constructed from the modeled parameter estimates and robust standard errors.

We used ROC analysis to assess simultaneous effects on sensitivity and specificity. For the ROC analysis, the modified BI-RADSTM assessment categories were ordered (1, 2, 3 with no immediate work-up, 3 with immediate work-up, 0, 4, and 5) in accordance with an increasing likelihood of a breast cancer diagnosis. A bivariate normal (binormal) model was fit with the assumptions that cases and noncases have underlying normal distributions on a single underlying latent scale (13). Tosteson and Begg (14) have shown that ROC analysis corresponds to performing ordinal regression on the ordered categories. Covariates can affect either the cut points for the BI-RADSTM categories or the difference between the diseased and nondiseased distributions. The cut points determine whether a mammographic examination is judged positive or negative on a particular ROC curve. The difference between the diseased and nondiseased distributions determines the location of the ROC curve. The area under the ROC curve (AUC) determines the overall level of accuracy, with a value of 0.50 indicating purely random performance and 1.00 indicating the maximal value possible. The ordinal regression model was fit using the SAS procedure NLMIXED (12). We allowed the scale parameter to differ between the cases and noncases, but otherwise the scale did not depend on covariates. The NLMIXED procedure does not give empirical (robust) standard errors, so a special program in the SAS procedure IML was written to compute these variance estimates. Thus, we can compute confidence intervals (CIs) adjusted for the correlation within reader when evaluating the ROC curves.


    RESULTS
 Top
 Notes
 Abstract
 Introduction
 Methods
 Results
 Discussion
 References
 
We analyzed 41 427 diagnostic mammograms from women at six mammography registries that were linked to 1598 invasive and DCIS breast cancers diagnosed within 1 year of the mammogram. Most (91.5%) of the cancers were detected within 3 months of the mammographic examination and most (91.4%) were invasive, with a median tumor size of 20.0 mm.

Of the 41 427 diagnostic mammograms, 6279 (15.2%) were called positive. For cancers identified within 1 year, diagnostic mammography had a sensitivity of 85.8% (95% CI = 84.1% to 87.5%), a specificity of 87.7% (95% CI = 87.4% to 88.0%), and a PPV of 21.8% (95% CI = 20.8% to 22.9%). The negative predictive value (number of true negative mammograms/number of mammograms called negative) was 99.4%, which was too high to model further. When we included only those cancers detected within 3 months, diagnostic mammography had a sensitivity of 89.8% (95% CI = 88.3% to 91.4%), a specificity of 87.6% (95% CI = 87.3% to 87.9%), and a PPV of 20.9% (95% CI = 19.9% to 21.9%). Sensitivity declined over the longer follow-up period, whereas specificity and PPV increased slightly. Because the pattern of results was similar for both follow-up periods, we present data using only the 1-year follow-up.

For the multivariate model, we included data only for the five registries that had women complete questionnaires at the time of the mammographic examination and that reported an encrypted radiologist identifier permitting identification of all examinations from the same radiologist. This yielded 32 748 diagnostic mammograms with an overall sensitivity of 85.9% (95% CI = 84.0% to 87.8%) and specificity of 88.0% (95% CI = 87.7% to 88.4%)—very similar to the results from all six registries. The mammography registry linked the mammograms to 1340 breast cancers diagnosed within 1 year of the mammogram, which corresponds to a rate of 40.9 cancers per 1000 diagnostic mammograms. The mammograms were interpreted by 654 radiologists, with the number of mammograms read per reader ranging from 1 to 2066 (median = 13).

Information regarding whether ultrasound was used to make the assessment recorded on the day of the initial diagnostic mammographic examination was available for 89.4% of all examinations. Of 29 283 examinations, ultrasound was used to make an assessment for 21.4%. Use of ultrasound increased dramatically with breast density (8.6%, 16.8%, 24.6%, and 29.5% for breast density categories 1–4, respectively). Ultrasound was also associated with the mammographic examination assessment. Ultrasound was used in 38.4% of mammograms called positive but in only 18.4% of mammograms called negative.

Table 1Go shows the distribution of characteristics for women with and without breast cancer at the end of the 1-year follow-up period. The percentage of women reporting a breast lump was higher (72.2%) in women with a diagnosed breast cancer than in those without a diagnosed breast cancer (47.4%). Similarly, the percentage reporting any breast symptoms (including breast lump) was higher (83.8%) among women with breast cancer than among women without breast cancer (75.6%). Table 1Go also allows one to compute alternative measures of sensitivity and specificity using different criteria for calling the mammographic examination positive. For example, if only BI-RADSTM assessments of 4 and 5 are called positive, then sensitivity and specificity would be 74.9% and 96.3%, respectively. Including a BI-RADSTM assessment of 0 as positive increases sensitivity to 81.4%, but specificity decreases to 92.7%. Adding all BI-RADSTM assessments of 3, without regard to recommendations, would yield a sensitivity of 88.7% with a specificity of 80.7%.


View this table:
[in this window]
[in a new window]
 
Table 1. Risk factors and BI-RADSTM assessment by breast cancer status for women in the Breast Cancer Surveillance Consortium*
 
Effect of Individual Covariates

We next assessed the effect of each covariate on the performance measures (Table 2Go). Overall, 15.0% of the mammograms were called abnormal (positive). The abnormal mammographic examination rate was influenced by age, breast density, presence of a breast lump, and previous mammography (P<.001 for each covariate). The probability of having a positive mammographic examination was highest for those aged 65–79 years, those with increased breast density, those with a breast lump, and those without a previous mammogram. Within 1 year of the positive diagnostic mammogram, 23.5% of the women undergoing diagnostic mammography were diagnosed with breast cancer. The PPV generally increased with age except for the oldest age group, decreased with breast density, and increased with a self-reported breast lump (P<.001 for each covariate). The PPV was not associated with whether the woman had a previous mammogram.


View this table:
[in this window]
[in a new window]
 
Table 2. Univariate analysis of the performance of diagnostic mammography by age, breast density, self-reported breast lump, and previous mammography for women in the Breast Cancer Surveillance Consortium*
 
Sensitivity was higher in those self-reporting a breast lump (P = .039), those without previous mammography (P<.001), and those with lower breast density (P = .016) but was unrelated to age (P = .099). Specificity generally decreased with age, with increasing breast density, and among women who reported a breast lump (P<.001) or were having their first mammographic examination (P<.001 for all covariates).

Effect of Covariates by Multivariate Analysis

We also assessed the effect of each variable in a multivariate model, adjusting for the other covariates (Table 3Go). For sensitivity, specificity, and PPV, odds ratios (ORs) greater than 1 indicate improved effects. For the abnormal mammographic examination rate, ORs greater than 1 indicate a greater likelihood of the mammogram's being called abnormal (positive mammogram). Age, breast density, presence of a self-reported breast lump, and previous mammographic examination jointly predicted an abnormal mammographic examination (Table 3Go; P<.001 for all). The ORs for calling the mammographic examination positive increased with age, breast density, and self-reported breast lump and decreased if the woman had had a previous mammogram. In the multivariate model, PPV increased strongly with age and self-reported breast lump and decreased with previous mammography (P<.001 for all).


View this table:
[in this window]
[in a new window]
 
Table 3. Multivariate analysis of the performance of diagnostic mammography using mammography registry,* age, breast density, self-reported breast lump, and previous mammography for women in the Breast Cancer Surveillance Consortium{dagger}
 
After controlling for all variables, sensitivity varied statistically significantly by breast density (P = .007), by self-reported breast lump (P = .013), and by whether a previous mammographic examination had been performed (P<.001). Compared with the referent breast density category 4 (extremely dense), sensitivity was highest for category 2 (scattered fibroglandular densities) (OR = 2.79, 95% CI = 1.63 to 4.78). Higher sensitivity was observed in women who reported a breast lump (OR = 1.64, 95% CI = 1.13 to 2.38). Having a previous mammographic examination was associated with statistically significantly lower sensitivity (OR = 0.52, 95% CI = 0.36 to 0.74).

Strong predictors of specificity included breast density, self-reported breast lump, and previous mammographic examination (P<.001 for all). Specificity decreased as breast density increased. Specificity was negatively related to the presence of a breast lump (OR = 0.54, 95% CI = 0.49 to 0.59) and positively related to ever having had a previous mammographic examination (OR = 1.43; 95% CI = 1.31 to 1.57). Age was also a statistically significant, but inconsistent, predictor of specificity (P = .034).

ROC Analysis

To determine an overall estimate of the accuracy of diagnostic mammography, we performed ROC analysis. Fig. 1Go shows the empirical (not modeled) ROC curve, with a point at each value on the modified BI-RADSTM ordinal scale without adjustment for covariates. The overall AUC was 0.913 (95% CI = 0.903 to 0.923), which suggests high overall accuracy of diagnostic mammography.



View larger version (16K):
[in this window]
[in a new window]
 
Fig. 1. Diagnostic mammography empirical receiver operating characteristic (ROC) curve. The modified ordinal BI-RADSTM (Breast Imaging Reporting and Data System) scale for diagnostic mammography was used for women in the Breast Cancer Surveillance Consortium. Empirical or observed true positive rate (sensitivity) is plotted against the empirical or observed false positive rate (1 minus specificity) for each criterion point. The diagonal line represents chance performance. Starting from the lower left-hand corner of the graph, the cut points (solid triangles) for the mammographic examination BI-RADSTM assessments being called positive are 5, 4, 0, 3 with immediate work-up, 3 with no immediate work-up, and 2.

 
To determine the factors that influence accuracy, we modeled the ROC curve in a joint model using age, breast density, self-reported breast lump, and previous mammogram. Of particular interest was the interaction of disease with each level of the covariate, because this indicates a difference in AUC. Accuracy decreased as breast density increased, even after adjusting for other factors (P<.001; AUC breast density category 1 (almost entirely fat) = 0.960, 95% CI = 0.944 to 0.975; AUC breast density category 2 = 0.947, 95% CI = 0.928 to 0.966; AUC breast density category 3 = 0.905, 95% CI = 0.874 to 0.935; AUC breast density category 4 = 0.866, 95% CI = 0.827 to 0.906) (Fig. 2Go). Having had a previous mammographic examination actually decreased accuracy (P<.001; AUC with no previous mammographic examination = 0.948, 95% CI = 0.920 to 0.975; AUC with previous mammographic examination = 0.909, 95% CI = 0.867 to 0.952) (Fig. 3Go). After adjusting for breast density and other factors, accuracy was not affected by age (P = .20) or self-reported breast lump (P = .54).



View larger version (28K):
[in this window]
[in a new window]
 
Fig. 2. Diagnostic mammography modeled receiver operating characteristic (ROC) curves for adjusted breast density. Density was adjusted for mammography registry, age group, self-reported breast lump, and previous mammography determined by the modified ordinal BI-RADSTM (Breast Imaging Reporting and Data System) scale for women in the Breast Cancer Surveillance Consortium. Modeled ROC curves for each level of breast density are shown after adjustment for the joint distribution of other covariates in the model. True positive rate (sensitivity) is plotted against the false positive rate (1 minus specificity). The effect of breast density is statistically significant (P<.001; two-sided test), after adjustment for the other covariates and correlation of assessments within each reader, with accuracy decreasing as breast density increases. The diagonal line represents chance performance.

 


View larger version (21K):
[in this window]
[in a new window]
 
Fig. 3. Diagnostic mammography modeled receiver operating characteristic (ROC) curves for adjusted previous mammogram. Mammogram data were adjusted for mammography registry, breast density, age, and self-reported breast lump determined by the modified ordinal BI-RADSTM (Breast Imaging Reporting and Data System) scale for women in the Breast Cancer Surveillance Consortium. Modeled ROC curves for women with no previous mammographic examination (solid line) and with a previous mammographic examination (broken line) are shown after adjustment for the joint distribution of other covariates in the model. True positive rate (sensitivity) is plotted against the false positive rate (1 minus specificity). The effect of having a previous mammographic examination is statistically significant (P<.001; two sided test), after adjustment for the other covariates and the correlation of assessments within each reader. The diagonal line represents chance performance.

 

    DISCUSSION
 Top
 Notes
 Abstract
 Introduction
 Methods
 Results
 Discussion
 References
 
This study assessed the overall performance of diagnostic mammography and how that performance depended on age, breast density, self-reported lump, and previous mammography and is an extension of our previous work (9). Older age was associated with higher abnormal mammographic examination rates and higher PPV but was not strongly associated with sensitivity, specificity, or ROC area. As breast density increased, more mammograms were called positive and specificity declined. The ROC analysis reveals the strong effect of increasing breast density on impaired accuracy. Although women with dense breasts were more likely to receive ultrasound before a mammographic assessment was made, diagnostic mammography performance was still lowest in this group.

The presence of a self-reported breast lump was positively related to sensitivity but negatively related to specificity. Unfortunately, the result of a clinical examination of the suspected breast lump was not available to us. Consequently, we must assume that the woman's self-report is highly associated with the actual presence of a mass. This seems a reasonable assumption because these are diagnostic mammograms in symptomatic women. The reduced specificity may be because a woman with a palpable lump is more likely than a woman without a lump to have a mammographic examination with a finding that warrants biopsy. Although palpable lumps are sometimes found to be malignant with mammography, many are benign masses, such as fibroadenomas or complex cysts. The decreased specificity of mammograms in women with reported breast lumps could also be a result of radiologists being more likely to recommend biopsy of a palpable lump, regardless of the mammographic findings (15). The lack of effect of breast lump in the ROC analysis suggests that a breast lump may affect a radiologist's criterion for calling a mammographic examination positive but has no overall effect on accuracy. However, we compared symptomatic women who self-reported breast lumps with symptomatic women who did not report breast lumps, rather than comparing them with asymptomatic women.

The strong effect of previous mammography on increased specificity, decreased sensitivity, and decreased ROC area suggests a true underlying difference in the performance of diagnostic mammography for women who have had a previous mammogram. We found a statistically significant difference (P = .006) in the size of tumors in women without previous mammography (median 23 mm) compared with women with previous mammography (median 20 mm). Women who are routinely screened may have interval or incident cancers detected at diagnostic mammographic examinations, whereas women having their first mammographic examination may have large prevalent tumors. Because the mammography registries promote periodic breast cancer screening, the majority (76.4%) of women receiving diagnostic mammography have had a previous mammogram. Indeed, a positive diagnostic mammographic examination that detects a breast cancer is called a true positive in this study but could indicate a screening failure for a previous screening mammographic examination that occurred less than 12 months earlier. Incorporation of past screening results and the role of comparison films are outside the scope of this article.

Several diagnostic mammography studies (1618) have been performed in Europe. Duijm et al. (16) found that diagnostic mammography had a sensitivity of 92.0% and a specificity of 97.7%. Eltahir et al. (17) obtained similar results (93.2% sensitivity and 96.7% specificity) for symptomatic women. Flobbe et al. (18) found that diagnostic mammography had a sensitivity of 89% and a specificity of 98% in an unscreened population. However, because the patient population, methods of follow-up, and types of analyses for these studies differed from those we report here, we do not believe the results are comparable. Differences include the average age of the women, the length of follow-up, how false negative breast cancers are detected, and the amount of screening that most women have undergone prior to the diagnostic mammogram.

Moskowitz (19) suggested that sensitivity for all mammography could be expected to be 80%–85% and that specificity could be as low as 87% in diagnostic situations and 95% in screening mammography. Poplack et al. (2) used data from the New Hampshire mammography registry and showed that diagnostic mammography had a sensitivity, specificity, and PPV of 78.1%, 89.3%, and 17.1%, respectively, which are different from our overall values of 85.8%, 87.7%, and 21.8%, respectively. However, our definition of a positive mammographic examination differed slightly from the definition in their study. Relative to screening mammography, sensitivity and PPV tend to be higher and specificity tends to be lower for diagnostic mammography. For example, Poplack et al. (2) found sensitivity, specificity, and PPV of 72.4%, 97.3%, and 10.6%, respectively, for screening mammography. Rosenberg et al. (1) showed sensitivity and specificity of screening mammography of 79.9% and 90.5%, respectively. A meta-analysis of screening studies showed that sensitivity ranged from 83% to 95% and specificity ranged from 93.5% to 99.1% (3). Screening sensitivity may be lower because the cancers detected are smaller than those detected with diagnostic mammography. However, the population undergoing screening is older, and average breast density may be less. Both sensitivity and specificity of screening mammography increase with age and decrease with increasing breast density (2023). We found similar results with diagnostic mammography, although the effect of age was negligible after controlling for breast density.

It is difficult to compare sensitivity and specificity across studies that use different definitions for a positive mammogram. Moreover, we showed how our values were affected by different cutoffs for calling the mammographic examination positive. The ROC curve in Fig. 1Go shows that the separation of BI-RADSTM assessment category 3 into those with immediate follow-up and those without was justified by the difference in specificity in the two groups. The lack of consistency in recommendations associated with a BI-RADSTM assessment of 3 has been noted in diagnostic mammograms of the BCSC (9). We also included BI-RADSTM assessment category 0 as a positive mammogram. Conceptually, all diagnostic mammograms with an assessment of 0 should be resolved and assigned to categories 1–5 (7). However, in practice, many are not resolved before the patient is recommended for biopsy or other follow-up. Therefore, the "final" assessment remains a 0. Including these mammograms in the evaluation of diagnostic mammography is necessary because the ROC curve demonstrates that assessments of 0 have a high probability of cancer. Consequently, the ordering of the modified BI-RADSTM assessments used in this study is supported by the data.

The overall ROC analysis balances sensitivity and specificity and quantifies how well radiologists can accurately identify women with or without breast cancer. Some factors that increase sensitivity but decrease specificity, or vice-versa, may not affect accuracy overall because they may affect only the radiologist's criterion for calling the mammographic examination positive. Consequently, some factors, such as breast lump or age, may affect the location on a single ROC curve (trading specificity for sensitivity) but do not affect accuracy. In other cases, different ROC curves are generated that reflect a true difference in accuracy, such as that observed with breast density and previous mammography.

The ROC curve and the estimation of sensitivity 1) depend on the number of observed breast cancers—1340 in this study—so analyses may lack power to detect an effect and 2) assume complete ascertainment of cancer status. There is potential verification bias because women with a negative mammographic examination are much less likely to get a biopsy than are those with a positive one. False negatives must be detected clinically in the follow-up period or by subsequent mammography, although not all breast cancers may be detected. We are currently examining the possible effect of incomplete ascertainment on accuracy. Sensitivity and ROC AUC are certainly overestimated. This is apparent when a 3-month follow-up window is used, because most positive mammograms have been worked up but some cancers among women with negative mammograms have not yet been detected. Although sensitivity increased and specificity and PPV decreased with the shorter follow-up period, the influence of woman-specific covariates could still be measured accurately. Women who have negative mammograms are less likely to receive follow-up for possible breast cancer; thus, false negatives are more likely to be missed in the follow-up period. This problem will increase sensitivity and AUC, making mammography appear better than it actually is if all cancers are not ascertained.

Other factors that might be important in the assessment of performance include family history of breast cancer, genetic risk factors, or previous benign breast disease. Although the patient's clinical history can affect interpretation, it does not necessarily enhance accuracy (24). Accuracy may depend on timing with respect to the menstrual cycle for premenopausal women (25) and may be affected by use of hormone replacement therapy for postmenopausal women (2628). Some of these factors are currently being included in the questionnaires administered to women participating in the BCSC and will be evaluated in future studies.

Performance may also be influenced by other factors unrelated to the woman who is receiving the mammogram. Radiologists may differ by as much as 40% in screening sensitivity (29). Factors that might affect performance include mammography experience and training (30). In this report, we adjust statistically for systematic differences across mammography registries and for the correlation among assessments of the same radiologist, but other sources of variation in performance remain. Differences in data collection methods, linkage between the mammography and cancer registries, and catchment area coverage of the mammography and cancer registries could all contribute to observed differences in performance across registries. The mammograms included in this study are from 1996 through 1998, when the BCSC was beginning to standardize its data collection methods. It is expected that these efforts may reduce variability in the future.

We noted that women with dense breasts were much more likely to have ultrasound as part of their evaluation. The benefit of additional imaging, particularly in women with very dense breasts, needs further study. Comparison of the BI-RADSTM assessments before and after additional imaging would be useful, but our data on actual clinical practice often do not include the initial BI-RADSTM assessment that might be made on the diagnostic examination before subsequent imaging. Lack of inclusion may be because radiologists in practice consider diagnostic evaluation to be a process and, therefore, are more likely to provide a single final assessment after the entire imaging evaluation process has been completed. Skaane (31) found that there was only marginal benefit of ultrasound following mammography overall, but it was helpful in detecting some cancers. One study (32) found that sensitivity of mammography alone was 83% but increased to 91% after ultrasound. Among symptomatic women, sensitivity increased from 78.9% to 94.2% with the addition of ultrasound to mammography (33). Additional imaging is not limited to ultrasound and can include magnetic resonance imaging (MRI) or positron emission tomography (PET). Sestamibi scintimammography is a nuclear medicine technique that is not affected by breast density. At this time, it is not cost-effective as a screening tool, but it could be used as an addition to diagnostic mammography for women with dense breasts (34).

The strength of this study is its comprehensive assessment of diagnostic mammography as practiced in the community. It appears that breast density and previous mammography are the greatest determinants of performance. The study included only diagnostic mammography in women who appeared to have signs or symptoms of breast cancer, and the mammographic examination was not prompted by an abnormal result from a screening mammogram. Sensitivity and PPV of diagnostic mammography exceed those of standard screening mammography but at the expense of specificity. With increased prevalence of screening mammography, tumors detected by diagnostic mammography will be smaller and of an earlier stage. The sensitivity and specificity of diagnostic mammography may be reduced unless they are enhanced with further additional imaging, particularly for women with dense breasts.


    NOTES
 Top
 Notes
 Abstract
 Introduction
 Methods
 Results
 Discussion
 References
 
Supported by Public Health Service cooperative agreements UO1CA63731, UO1CA63736, UO1CA63740, UO1CA69976, UO1CA70013, UO1CA70040, UO1CA86076, and UO1CA86082 from the National Cancer Institute (NCI), National Institutes of Health, Department of Health and Human Services, as part of the NCI's Breast Cancer Surveillance Consortium.


    REFERENCES
 Top
 Notes
 Abstract
 Introduction
 Methods
 Results
 Discussion
 References
 

1 Rosenberg RD, Lando JF, Hunt WC, Darling RR, Williamson MR, Linver MN, et al. The New Mexico Mammography Project. Screening mammography performance in Albuquerque, New Mexico, 1991 to 1993. Cancer 1996;78:1731–9.[Medline]

2 Poplack SP, Tosteson AN, Grove MR, Wells WA, Carney PA. Mammography in 53,803 women from the New Hampshire mammography network. Radiology 2000;217:832–40.[Abstract/Free Full Text]

3 Mushlin AI, Kouides RW, Shapiro DE. Estimating the accuracy of screening mammography: a meta-analysis. Am J Prev Med 1998;14:143–53.[Medline]

4 Dee KE, Sickles EA. Medical audit of diagnostic mammography examinations: comparison with screening outcomes obtained concurrently. AJR Am J Roentgenol 2001;176:729–33.[Abstract/Free Full Text]

5 Ballard-Barbash R, Taplin SH, Yankaskas BC, Ernster VL, Rosenberg RD, Carney PA, et al. Breast Cancer Surveillance Consortium: a national mammography screening and outcomes database. AJR Am J Roentgenol 1997;169:1001–8.[Medline]

6 Carney PA, Geller BM, Moffett H, Ganger M, Sewell M, Barlow WE, et al. Current medico-legal and confidentiality issues in large multi-center research programs. Am J Epidemiol 2000;152:371–8.[Abstract/Free Full Text]

7 American College of Radiology. Illustrated Breast Imaging Reporting and Data System (BI-RADSTM). 3rd ed. Reston (VA): American College of Radiology; 1998.

8 Rosenberg RD, Yankaskas BC, Hunt WC, Ballard-Barbash R, Urban N, Ernster VL, et al. Effect of variations in operational definitions on performance estimates for screening mammography. Acad Radiol 2000;7:1058–68.[Medline]

9 Geller BM, Barlow WE, Ballard-Barbash R, Ernster V, Yankaskas BC, Sickles EA, et al. Use of the American College of Radiology BI-RADSTM to report the mammographic evaluation of women with signs and symptoms of breast disease. Radiology 2002;222:536–42.[Abstract/Free Full Text]

10 Linver MN, Osuch JR, Brenner RJ, Smith RA. The mammography audit: a primer for the mammography quality standards act (MQSA). AJR Am J Roentgenol 1995;165:19–25.[Abstract]

11 Zeger SL, Liang KY. Longitudinal data analysis for discrete and continuous outcomes. Biometrics 1986;42:121–30.[Medline]

12 SAS Institute Inc. SAS/STATTM users guide, version 8. Cary (NC): SAS Institute Inc.; 1999.

13 Metz CE. ROC methodology in radiologic imaging. Invest Radiol 1986;21:720–33.[Medline]

14 Tosteson AN, Begg CB. A general regression methodology for ROC curve estimation. Med Decis Making 1988;8:204–15.[Medline]

15 Litherland JC, Dobson HM. Symptomatic women with normal screening mammograms: is assessment by the breast screening programme justifiable? The Breast 2001;10:58–60.[Medline]

16 Duijm LE, Guit GL, Zaat JO, Koomen AR, Willebrand D. Sensitivity, specificity and predictive values of breast imaging in the detection of cancer. Br J Cancer 1997;76:377–81.[Medline]

17 Eltahir A, Jibril JA, Squair J, Heys SD, Ah-See AK, Needham G, et al. The accuracy of "one-stop" diagnosis for 1110 patients presenting to a symptomatic breast clinic. J R Coll Surg Edinb 1999;44:226–30.[Medline]

18 Flobbe K, van der Linden ES, Kessels AG, van Engelshoven JM. Diagnostic value of radiological breast imaging in a non-screening population. Int J Cancer 2001;92:616–8.[Medline]

19 Moskowitz M. Breast imaging. In: Donegan WL, Spratt JS, editors. Cancer of the breast, 4th ed. Philadelphia (PA): W. B. Saunders Company; 1995. p. 206–39.

20 Kerlikowske K, Grady D, Barclay J, Sickles EA, Ernster V. Effect of age, breast density, and family history on the sensitivity of first screening mammography. JAMA 1996;276:33–8.[Abstract]

21 Laya MB, Larson EB, Taplin SH, White E. Effect of estrogen replacement therapy on the specificity and sensitivity of screening mammography. J Natl Cancer Inst 1996;88:643–9.[Abstract/Free Full Text]

22 Lehman CD, White E, Peacock S, Drucker MJ, Urban N. Effect of age and breast density on screening mammograms with false-positive findings. AJR Am J Roentgenol 1999;173:1651–5.[Abstract]

23 Porter PL, El-Bastawissi AY, Mandelson MT, Lin MG, Khalid N, Watney EA, et al. Breast tumor characteristics as predictors of mammographic detection: comparison of interval- and screen-detected cancers. J Natl Cancer Inst 1999;91:2020–8.[Abstract/Free Full Text]

24 Elmore JG, Wells CK, Howard DH, Feinstein AR. The impact of clinical history on mammographic interpretations. JAMA 1997;277:49–52.[Abstract]

25 White E, Velentgas P, Mandelson MT, Lehman CD, Elmore JG, Porter P, et al. Variation in mammographic breast density by time in menstrual cycle among women aged 40–49 years. J Natl Cancer Inst 1998;90:906–10.[Abstract/Free Full Text]

26 Thurfjell EL, Holmberg LH, Persson IR. Screening mammography: sensitivity and specificity in relation to hormone replacement therapy. Radiology 1997;203:339–41.[Abstract]

27 Rosenberg RD, Hunt WC, Williamson MR, Gilliland FD, Wiest PW, Kelsey CA, et al. Effects of age, breast density, ethnicity, and estrogen replacement therapy on screening mammographic sensitivity and cancer stage at diagnosis: review of 183,134 screening mammograms in Albuquerque, New Mexico. Radiology 1998;209:511–8.[Abstract]

28 Mandelson MT, Oestreicher N, Porter PL, White D, Finder CA, Taplin SH, et al. Breast density as a predictor of mammographic detection: comparison of interval- and screen-detected cancers. J Natl Cancer Inst 2000;92:1081–7.[Abstract/Free Full Text]

29 Beam CA, Layde PM, Sullivan DC. Variability in the interpretation of screening mammograms by US radiologists. Arch Intern Med 1996;156:209–13.[Abstract]

30 Nodine CF, Kundel HL, Mello-Thoms C, Weinstein SP, Orel SG, Sullivan DC, et al. How experience and training influence mammography expertise. Acad Radiol 1999;6:575–85.[Medline]

31 Skaane P. The additional value of US to mammography in the diagnosis of breast cancer. A prospective study. Acta Radiol 1999;40:486–90.[Medline]

32 Zonderland HM, Coerkamp EG, Hermans J, van de Vijver MJ, van Voorthuisen AE. Diagnosis of breast cancer: contribution of US as an adjunct to mammography. Radiology 1999;213:413–22.[Abstract/Free Full Text]

33 Moss HA, Britton PD, Flower CD, Freeman AH, Lomas DJ, Warren RM. How reliable is modern breast imaging in differentiating benign from malignant breast lesions in the symptomatic population? Clin Radiol 1999;54:676–82.[Medline]

34 Allen MW, Hendi P, Bassett L, Phelps ME, Gambhir SS. A study on the cost effectiveness of sestamibi scintimammography for screening women with dense breasts for breast cancer. Breast Cancer Res Treat 1999;55:243–58.[Medline]

Manuscript received December 27, 2001; revised May 16, 2002; accepted May 30, 2002.


This article has been cited by other articles in HighWire Press-hosted journals:


             
Copyright © 2002 Oxford University Press (unless otherwise stated)
Oxford University Press Privacy Policy and Legal Statement