1 Department of Statistics, University of California, Berkeley, CA 947203860, USA
2 Kaiser Permanente, Southern California, Pasedena CA 91188, USA
3 Harvard School of Public Health, Boston MA 02115, USA
![]() |
Abstract |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Methods Qualitative review.
Results The basis for the GøtzscheOlsen critique turns out to be simple. Studies that found a benefit from mammography were discounted as being of poor quality; remaining negative studies were combined by meta-analysis. The critique therefore rests on judgements of study quality, but these judgements are based on misreadings of the data and the literature.
Conclusion The prior consensus on mammography was correct.
Accepted 24 June 2003
The first large-scale clinical trial to demonstrate the efficacy of mammography was Health Insurance Plan (HIP) in New York,17 followed by the Two-County study in Sweden.822 There were about half a dozen other trials as well, some negative but most positive. In theory, if breast cancer begins as a local disease, then early diagnosisbefore the disease spreadsshould allow treatment that is less invasive and more effective;23 there may also be a biological rationale from the perspective of systemic disease.24 After an initial period of controversy, mammography gained general acceptance.2329 Some doubts remained, especially for younger women.3032 There were also questions about optimal schedules for screening and cost effectiveness,33,34 but our focus is efficacy.
The consensus opinion was challenged by two researchers at the Nordic branch of the Cochrane collaboration, Gøtzsche and Olsen, who concluded that mammography does not save lives: instead, it exposes women to unnecessary diagnostic and surgical procedures.3538 This opinion was based on a meta-analysis of the existing trials, where positive studies were eliminated as being of poor quality; the remaining two studies found negative effects. Thus, the critique hinges on the decision to exclude positive studies like HIP and Two-County. That decision was justified in turn by a literature review.
In this paper, we discuss HIP, Two-County, and the best-known of the negative studies, the Canadian National Breast Screening Study (CNBSS).3945 We summarize the trials and the critique. We briefly discuss other work faulting the positive studies.46,47 Our paper addresses the major points raised by the critics, and a few of the minor ones. We find that the quality judgements behind the critiqueand hence the meta-analytic results are based on misunderstandings of the data. We see no reason to believe that CNBSS was superior in quality to HIP or Two-County. In our opinion, therefore, the critique has little merit. We conclude that the prior consensus on mammography was correct: screening does save lives. Others have reached similar conclusions on the central points,2629,4853 although the critique has attracted some support.46,47,5458 The Swedish trials (including Göteborg, Malmö, and Stockholm) have been reviewed by Nyström et al.,53,5961 with commentary3538,6266 and responses.53,67 The Edinburgh trial and the Finnish National Screening Program have been reviewed elsewhere, with comparison to the Swedish trials.2729
![]() |
The Health Insurance Plan trial |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
The analysis was by intention to treat rather than treatment received. This is conservative, and measures the effect of the invitation to screening rather than the effect of screening itself. (Biases in treatment-received analyses are discussed by Shapiro et al.6) The effect of screening is diluted because there were only four rounds of screening, and some women in the treatment group declined to be screened: 67% were screened at least once, 40% were screened four times (Table 3.1 in Shapiro et al.6). Because of this crossover, Shapiro et al.6 refer to the treatment group as the study group.
Results from the first 5 years of follow-up are shown in Table 1 below.4,6,68 The effect of the invitation is small in absolute terms: 63 39 = 24 lives saved. Since the absolute risk from breast cancer is small, no intervention can have a large effect in absolute terms. On the other hand, in relative terms, the 5-year death rates from breast cancer are in the ratio 39/63 = 62%. The absolute differential persists throughout the 18-year follow-up period, and is perhaps more marked if we take cases incident during the first 7 years of follow-up, rather than 5.
|
|
![]() |
Numerator and denominator are unbiased estimates, but there is considerable statistical uncertainty due to the limited sample size. There is also a tacit assumption that the invitation to screening has no effect unless the woman takes it up. Similar estimates have been reported, based on data from the Two-County trial and population screening in Sweden,19,21 with discussion.20,23,7074 Stronger assumptions are needed to analyse these data, which are partly experimental and partly observational; somewhat lower estimates have recently been suggested.22
We turn now to the critique. Gøtzsche and Olsen3538 have three main arguments against HIP which we discuss in turn.
Differential exclusion of breast cancer cases
Women were assigned to study or control in alternation, so the two groups should be equal in size. However, the study group is a bit smaller and the differential changes a little from one report to another.35 On p. 18 of Shapiro et al.,6 the difference is:
![]() |
This (differential) would be expected to create bias. If only 10% of these excluded breast cancer cases are added as breast cancer deaths after 18 years of follow-up, the breast cancer mortality becomes higher in the screened group than in the control group, since the difference in breast cancer mortality at that time was 44 deaths. (ref. 37, p. 6)
In essence, women with a prior diagnosis of breast cancer are at higher risk of death from that disease: differentially excluding them from the screening group therefore creates a bias favouring mammography. To assess this criticism, we consider the exclusion criteria in HIP, which can be summarized as follows: (i) change in medical coverage between randomization and first screen, or (ii) pregnancy at screening, or (iii) diagnosis of breast cancer prior to entry.
The first criterion excluded few women; bias from this source must be small. Furthermore, reasons for changing medical coverage (moving, changing jobs) seem at best weakly related to breast cancer risk, so the sign of the bias is uncertain. With respect to the second, pregnant women are also few in number (we estimate 100 in each arm, as does Raymond Fink, personal communication). Such women must generally have been in their early 40s, and therefore at lower-than-average risk of breast cancer. Perhaps two-thirds of them were excluded from the study group at screening, which creates a (small) bias against mammography. These two criteria operate only on the screening group, accounting for some of the difference between the study and control groups.
In both arms of the trial, women with a diagnosis of breast cancer prior to randomization were excluded from counts of breast cancer cases or deaths. In the screening group, exclusions were mainly done at first screen; date of diagnosis was determined from medical records. For women in the study group who refused screening, and for women in the control group, exclusions were made when there was a recurrence of breast cancer or death; again, date of diagnosis was determined from medical records. (With paper records, exclusion prior to randomization would have been expensive because it would have been necessary to go through all 62 000 medical files.) This is not the most elegant of designs but it does not introduce bias in the countsif follow-up is good and exclusions are done correctly. There is a small upward bias in determining person years at risk in the control arm and refused-screening group, which on balance works against mammography.
The design implies unequal group sizes after exclusions unless follow-up continues until the last breast cancer case has died. By way of illustration, suppose there were 1000 women with prior diagnosis of breast cancer in each group at baseline. If 80% of those in the study group accept screening and there is no detection in the control group, the initial imbalance would be 800. Over the next 18 years, perhaps half of the 1000 − 800 = 200 cases in the refused-screening group and a similar percentage of the 1000 cases in the control group would have a recurrence of cancer or die, and then be excluded. On this basis, the difference between the study and control groups would drop to something like 400. Although this result depends on parameters subject to considerable uncertainty, the calculation shows that the difference of 434 (cited above) between the groups at year 18 of follow-up is not evidence of bias.75
Screening does not prevent breast cancer but only speeds up detection. The study group should start with a higher incidence of breast cancer (lead time bias), but the control group will catch up a year or so after screening stops. That is what happened (Table 2). Screening was finished (or nearly so) in 4 years, and lead time is on the order of 1 year: in other words, screening picks up a cancer roughly a year before it would become clinically manifest (pp. 4344, 105106 in Shapiro et al.6). Thus, the incidence of breast cancers in the two arms should equalize between years 5 and 7, and it does.76 The differences in the Table, 304 295, 426 439, and 767 740, are well within the range of chance variation. If high-risk women were differentially excluded from the study group, the count of incident cases in the study group would be lower than the control count, which is not the case.
The number of differential exclusions changes from one HIP paper to another, because there are women in the control group and the refused-screening group who had a diagnosis of breast cancer prior to randomization. As follow-up goes on, some of these women suffer a recurrence or die, and are then excluded from the counts (see above). Thus, differentials depend on length of follow-up period and, perhaps, on data editing. GO seem to be aware of these facts.35 [We abbreviate Gøtzsche and Olsen to GO; the quote is from p. 131; our comments are in square brackets.]
Deaths from breast cancer diagnosed before entry to the trial were generally excluded from analysis. [Done in HIP and Two-County.] Such exclusions can lead to bias when the first round of screening identifies cancer in women who have already noted a tumour in their breast if these women are subsequently excluded. [But exclusions were not made that way.] The New York trial excluded more cancers in the screening group than in the control group. [This is true at first screen but false when tables were compiled, because exclusions depend only on events prior to first screen, as discussed abovetherefore irrelevant.]
Design issues have been explored75 and Gøtzsche has responded: We furthermore doubt that retrospective exclusion of women after 18 years of follow-up, as in the New York study, is reliable. (ref. 77, p. 2168)
Here, Gøtzsche implicitly withdraws previous arguments3538 about differential exclusions, claiming instead that procedures were retrospective and therefore unreliable. However, the exclusions were done as the study progressed. They were not based on participants' memories but on diagnosis of breast cancer prior to baseline, as documented in the medical records. HIP surveillance of vital records, hospitals, and health insurance reports was designed to pick up all incident breast cancer cases, recurrences, and deaths, in the study group and in the control group. The evidence in Shapiro et al. (ref. 6, pp. 34, 1718, 2224) shows that exclusions were made in a balanced, comprehensive manner. Ascertainment was nearly perfect through year 10, (ref. 6, p. 24) and not much worse in years 1118. Although bias due to differential exclusion of breast cancer cases is possible, significant bias seems unlikely; Table 2 supports the data in Shapiro et al.6 On the other side, no tangible evidence has been produced to support claims of differential exclusions.
Of course, another interpretation may be offered for Table 2. Some large number of invasive breast cancers were excluded from the study group, balanced by the inclusion of a very similar number of DCIS (ductal carcinoma in situ) cases detected by screening; the corresponding cases on the control side remaining occult. This scenario seems far-fetched, for two reasons (apart from the nicety of the requisite balancing): (i) in the 1960s, DCIS might have accounted for 5% of breast cancers in the study group (compare Table 4 in Tabár et al.15 with Table 1 in Rosner et al.78) and (ii) roughly half of untreated DCIS cases become clinically manifest. (Health Council of The Netherlands, ref. 48, pp. 4547) The rate of screen-detected DCIS increased rapidly in the 1980s,79,80 but this is some 20 years after HIP.
|
... in the table of seven selected characteristics ... we calculated imbalances for previous lump in the breast (p < 0.0001), menopause (p < 0.0001), and education (p = 0.05); there were no differences for age, religion, marital status, or pregnancies. These findings are incompatible with an adequate randomisation. (ref. 35, p. 130)
View this table:
[in this window]
[in a new window]
Table 3 Characteristics of women entering Health Insurance Plan project during 1964 (per cent)
We found no details of the calculation in any published paper, and believe GO misunderstood the sample sizes in the table header and footnotes. The table reports not on the whole cohort, but only on a sample of those recruited during 1964. (Samples were taken to reduce costs.) With allowance for non-response, the sample sizes for the examined, refused, and control groups are about 700, 600, and 1800 respectively. (Raymond Fink has contemporaneous documentation for planned sample sizes, personal communication.) On this basis, differences in education and menopause are insignificant. For lumpsthe worst of the comparisonswe compute z = 2, P = 0.05 (two-sided), using the correct sample sizes. Table 2 in Shapiro et al.2 has more complete data for the examined group; the percentage with lumps may be computed from those data as 11.7%, which is much closer to the control figure of 11.8% in Table 3, confirming that the discrepancy noted by GO is just due to chance; also see Table 8 in Fink et al.3 We conclude there was no imbalance at baseline in HIP.
|
With respect to (1), HIP used blind review on all breast cancer cases assigned to another cause of death on the death certificate (ref. 6, pp. 2933). Moreover, there was extensive blind review on the Two-County data.59,60 The second sentence in (1) is simply wrong. With respect to (2) and (3), GO's statistics ignore the possibility that ambiguous deaths in the screening group are indeed less likely to be breast cancers, because screening and early therapy help to prevent death from that disease.50,75 By our calculations, the HIP review process moved 13 out of 71 deaths from other causes to breast cancer in the study group, and 35 out of 73 in the control group (last column of Table 4). Details of the GO calculation in (3) remain a little hazy, but they seem to be looking at an odds ratio like
![]() |
Another way to handle the possibility of errors in determining the cause of death is to use death itself as the endpoint. The total mortality rate among incident breast cancer cases has less statistical power than mortality from breast cancer, but is unaffected by cause-of-death classifications. That endpoint too favours mammography (Table 5), although the difference is only borderline significant: 2 = 3.2 on 1 degree of freedom, P = 0.07. The test is two-sided, and does not consider time of death. All screening was done during the first 5 years after entry, so there can be no benefit for women with cancers detected in years 6 and 7. Moreover, as time goes on, the number of deaths from causes other than breast cancer will increase, further reducing power. If we shorten the follow-up time to 10 years (Table 6), power will be better:
2 = 5.8, P = 0.02. As the examples show, significance levels will be different for different time periods; Tables 5 and 6 seem representative of the HIP data. At year 7, the numbers of incident cases in the two arms have equalized, so the comparison of mortality rates is fair.76 This idea has been developed on data from the Swedish trials, where sample sizes are much larger and results are highly significant.81
|
|
![]() |
All-cause mortality |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
The main outcome measure in the screening trials was breast-cancer mortality. This choice seems rational, since larger trials would be needed to show an effect on overall mortality. However, we showed that the assumption that a demonstrated effect on breast-cancer mortality can be translated into a reduction in overall mortality rests on suppositions that are not correct.... The only reliable mortality estimates are therefore those for overall mortality.... Thus, although the trials were underpowered for all-cause mortality, the reliable evidence does not indicate any survival benefit of mass screening for breast cancer. (ref. 36, p. 1341)To clarify the power issue, we sketch a hypothetical clinical trial. Randomize 200 000 women, half to mammography and half to control. Follow the women for 10 years. Assume 100% compliance in both arms, no loss to follow-up, 50% reduction in risk of death from breast cancer in the screening arm, and no other effects. Assume baseline mortality rates as in HIP. A trial of this size would be extraordinarily difficult to implement, and the power to detect a significant reduction in total mortality (at the 0.05 significance level) is barely 0.80. With more realistic compliance parameters and contemporary death rates, power would be even lower. Power is limited because death from breast cancer is a rare event. The conclusion is equally obviousall-cause mortality is impractical as the defining endpoint for any single trial of mammography.51 (Pooling the trials is discussed later.)
![]() |
The Two-County trial: Sweden |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Randomization began block by block in 1977 in Kopparberg and in 1978 in Östergötland. After randomization, the ASP in a block was invited to screening (mediolateral oblique-view mammography only). There were two to four (occasionally five) rounds of screening, with more for the younger women and less for the older. In 19841986, the PSP was invited to screening, and then the trial was closed. Subsequently, all women in the two counties were invited to screening on a service basis (as part of their routine health care). Compliance among women age 75+ was poor, so this group was dropped in the analysis phase. Compliance for women age 7074 was also not so good: those in the ASP were therefore invited to two rounds of screening only.
Incident cases are counted for the period 1977/781986, in both arms of the trial, based on Swedish cancer registry data. More specifically, the incidence period is from the randomization of a block until closure of the trial, that is, completion of the first PSP screen in the block. Women with a diagnosis of breast cancer prior to randomization are excluded, using registry data. There are 2468 incident cases. Follow-up of these cases to determine mortality continues indefinitely, and the bulk of the reports on the Two-County trial focus on the experience of these cases. (This design is called the evaluation model by Nyström et al.,53,5961 although no model is involved.) No one source fully describes the Two-County study. Details are in various publications by the investigators.813,17 Further clarification was provided by Duffy and Tabár (personal communication).
Results
Tables 7 and 8 show virtual equality of breast cancer incidence rates in the ASP and PSP over the period of the trial. The tables also show that death rates from breast cancer are significantly lower in the ASP. The reduction in death rates due to screening (more precisely, due to assignment to ASP) is 3.9/6.5 = 60% in Kopparberg, and 4.3/5.7 = 75% in Östergötland (see Discussion). This intention-to-treat analysis suffers from dilution effects. For one thing, there was a 1020% crossover rate in the treatment arm; there was a 1015% crossover in control, although some of this may reflect diagnostics rather than screening.9,11 And, if we were to extend the incidence period beyond the period of the trial, as in the follow-up model of Nyström et al.,59 service screening would play a major role.
|
![]() |
The critique |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
(1) Cluster randomization is biased.This is false, with a minor exceptionratio estimator bias,82 which affects all rates whose denominators are random (a typical denominator being person-years at risk). Numerators and denominators in the Two-County study are unbiased, because sample averages are unbiased. With a study of this size, ratio estimator bias is likely to be negligible. Of course, variances may be larger with cluster randomization, and this needs to be taken into account when analysing the data.
(2) The ASP is older than the PSP, p < 0.0001, demonstrating the failure of the randomization.The difference was discussed by Tabár et al.12,83 It amounts to a few months, and (if anything) dilutes the effect of screening.84 GO exaggerated the statistical significance of this difference by ignoring the cluster randomization when computing P.8590 Furthermore, there is good evidence to show that randomization was successful, producing comparable groups of women in the ASP and PSP along several important dimensions. For instance, there is near-equality of breast cancer incidence rates before the study began (Figure 1 in Nyström et al.53). Likewise, death rates from other causes are nearly equal.11,61
(3) There is inconsistent reporting of population size: for instance, 134 867 in 19859 and 133 065 in 1989.12GO cite the 1989 paper12 but miss some crucial details. The Two-County investigators linked their database to the Swedish cancer registry and cleaned the data by eliminating women with diagnosis of breast cancer prior to randomization. Before linkage, such women were excluded at recurrence of disease or death. The 1985 paper9 was written before linkage; the 1989 paper,12 after linkageexplaining the difference in reported population size.49,83
(4) There is inconsistent reporting of deaths from breast cancer. For example, take women age 4049 in Kopparberg. Are the ASP:PSP counts 22:16 or 26:18? (Olsen and Gøtzsche, ref. 37, p. 16).Table 2 in Tabár et al.15 has 22:16 at 12.5 years of average follow-up, whereas Table II in Tabár et al.16 has 26:18 at 15.5 years (average follow-up for all subjects, our calculation). The difference in counts is due to longer follow-up, which GO ignore.90 We resolved other discrepancies in a similar fashion.
(5) There is bias in assigning cause of death.This is the most disturbing of the arguments, and we take it up in some detail.
(5.1) The decrease in breast-cancer mortality with screening in the Two-County study when the endpoint committee did not know status was similar to that when cause of death was assessed openly (and where we found bias in the classification process). Therefore, our findings that masked endpoint committees make biased assessments are supported. (Gøtzsche, ref. 77, p. 2167)(5.2) We found data from the Two-County trial (Tabár et al.11) that could illustrate this possible misclassification directly ... Among women with a diagnosis of breast cancer, mortality for other cancers was significantly higher (RR = 2.42, [CI] 1.005.85) (p = 0.05); mortality from all other causes was also higher, although not significantly (RR = 1.37, [CI] 0.932.04) (p = 0.11). (Olsen and Gøtzsche, ref. 37, p. 16)
With respect to (5.1), the agreement between endpoint committees shows if anything that bias is unlikely. GO would have an argument only if they had evidence to demonstrate bias in open reading; but they do not, as will be seen by examining (5.2). Most of the data in Tabár et al.11 show near-equality of death rates from other causes among the ASP and PSP, which makes bias in death classification seem unlikely. Tables 5 and 6 in Tabár et al.11 report deaths by cause among the breast cancer cases, with 25/1295 deaths from other cancers in the ASP and 6/768 in the PSP. That is the probable source for the claimed P = 0.05, although we cannot quite replicate the calculation. What should we make of this finding? Adjustment for time on risk would increase P, since the ASP cases live longer than the PSP cases. Adjustment for multiple comparisonsand GO have clearly made many comparisonswould also have a substantial effect on P. This is not good evidence.
Longer follow-up confirms the view that GO have capitalized on an artifact. For instance, with 8 years of follow-up, Table 11 in Tabár et al.12 shows that risk of death from other causes among breast cancer cases is similar in the ASP and PSP (P = 0.7). Using data with 11 years of follow-up in Table 9 of Tabár et al.13 we consider deaths from other causes among the breast cancer cases, comparing the observed number of events in the ASP to an expected number computed from the PSP: observed expected = 7 ± 22, P = 0.8, taking into account age and county. (Deaths from other causes were allocated proportional to time on risk, which is conservative because hazard rates increase with age.) Tabár et al.51 and Duffy et al.91 analyse more recent data with similar conclusions; in particular, there is significant reduction in all-cause mortality among breast cancer cases, which cannot be explained by errors in death classification.81
|
|
In summary, the Swedish data demonstrate a protective effect from screening if breast cancer deaths are determined from death certificate data or by either of two independent endpoint committees; screening has an effect whether breast cancer is the underlying cause of death, or present at death.53,5961 Further support for the Two-County data comes from the statistical analyses described above, using deaths among breast cancer cases as the endpoint,12,51,91 or deaths in the whole study population.53 GO have not made a case for differential bias in death classification. Their other points are even less convincing. Reviewers have concurred with this assessment: see 4.2, 5 in Health Council of The Netherlands.48
![]() |
Other work |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
(1) The presence of a number of well-known biases (including lead-time bias and length bias) make it difficult to ascertain the benefits of screening.46Lead-time bias means that screening speeds up detection, so incidence is higher in the study group at the beginning of follow-up. Length bias means that screening is likely to pick up slower rather than faster growing tumours. However, clinical trials like HIP or Two-County are skewed by neither bias, because (i) they use death from breast cancer as the endpoint not detection rates or lifespan after detection, and (ii) they use intention-to-treat analyses. Indeed, that is why attention is restricted to evidence from clinical trials.46
A major issue in randomized studiesincluding screening trialsis lack of compliance with the study protocols.... participants skipped their assigned mammograms about 20% of the time. In addition some participants assigned to be control subjects opted to have screening mammograms. The extent of the bias caused by lack of compliance is not known.46Generally, however, crossover dilutes the effect of screening: intention-to-treat measures the effect of assignment not the effect of treatment.
(3) Women with pre-existing breast cancer were preferentially excluded from the screening group. The problem was most severe in the New York trial....47As shown above, the evidence for preferential exclusions in HIP or Two-County is speculative at best. This point gets a different twist in an interview (New York Science Times 9 April 2002, p. D4).
"Only the screening group had mammograms," Dr Berry said. "On second look at a woman's first mammogram, one might find that breast cancer was present at the time but it had been missed," he said. "So more women might have been excluded from the mammography group after they developed breast cancer." [But women were excluded if their breast cancer was diagnosed before randomization; what was found or missed on mammography is irrelevant to exclusion.]The Two-County trial followed its timetable as well as could be expected, and obtained near-equality of incidence rates in the ASP and PSP due to its design. In theory, however, time elapsed between the last screen of ASP and first screen of PSP could create a bias in favour of screening (time-lapse bias). The argument is not entirely straightforward. The evaluation model59 counts only cases incident during the period of the trial; however, all deaths among these cases are counted, including those occurring after the trial closes. Screening the PSP when closing the trial advances the time of diagnosis for some breast cancers; subsequent deaths are then counted against the PSP. That is what creates a possible bias in favour of mammography. However, Tabár et al.8 showed a significant effect from screening the ASP, at a time when only 5% of the PSP had been screened.91 These results cannot be affected by time-lapse bias.(4) ... the scheduled control mammogram slipped in all three [Swedish] trials, allowing for more time to detect cancers in the control group [after last ASP screen until completion of first PSP screen].47
The follow-up model59 looks at all cancers incident after the trial starts, including cancers incident after the PSP is screened, and is also immune to time-lapse bias. The protective effect of mammography is significant according to the follow-up model: Table II in Nyström et al.,59 after pooling the two counties. Here, dilution is the problem: the effect of mammography is understated, because the PSP was screened at the end of trial, and continues to receive service screening after the trial is over. The theoretical bias created by the time lapse has no practical relevance.
![]() |
Canadian National Breast Screening Study |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
To describe the designs, the following abbreviations will be useful:
![]() |
CNBSS differed from HIP (Table 9). The HIP trial measured the impact of screening by MA and PE compared with usual care, whereas CNBSS2 measured the impact of screening by MA and PE relative to screening by PE only. (More precisely, the trials are measuring the impact of invitations to be screened.) Furthermore, Two-County differs from HIP and CNBSS. The different trials are measuring different things.
Power is an issue. CNBSS had low power because there were few deaths from breast cancer66 in CNBSS1 (Table 9, Miller et al.40) and 77 in CNBSS2 (Table 9, Miller et al.41). Any effect, or lack of effect, can be only be demonstrated with poor precision. Moreover, CNBSS has been dogged from the beginning by accusations of (i) poor radiology, and (ii) steering high-risk women to MA; the trialists and others have responded to the accusations.92103 We consider the points in turn.
According to Baines, McFarlane, and Miller,93 centre radiologists only agreed with the reference radiologist 3050% of the time. Observer error and technical problems led to delayed detection in 2235% of cancers. Suggestionsfor instance, don't mark up the film with a grease pencilwere sometimes resisted by center radiologists. Baines and Miller were the two lead investigators on CNBSS, and McFarlane was the reference radiologist. Their report is not reassuring about the quality of the radiology.
The evidence on steering is generally anecdotal but should not be dismissedor acceptedfor that reason. There is one statistical analysis to report.94,100 In CNBSS1 (the 4049 age group), 22 advanced breast cancers (4+ nodes involved) were detected by PE at first screen: 17 in the treatment arm and 5 in the control arm, P = 0.017. In the treatment group, there were two additional cancers detected by MA only, which are irrelevant for present purposes.
Responses
Bailar and MacMahon102 (with commentary ref. 103) assessed the randomization and found it acceptable. Bailar and MacMahon did not consider the radiology or follow-up procedures. They did not look at CNBSS2 (women age 5059). They did not follow their own plan for the review: among other things, they did not interview field staff. They acknowledge the 17:5 imbalance.94,100 They acknowledge that the protocol for the trial was not followed, with the result that (i) steering would have been easy to do, and (ii) there could have been some motivation for steering. Indeed, assignment to treatment or control was generally done locally, after results of physical examination were known. Nurses may have wanted high-risk subjects to get what seemed to be the better treatment. The CNBSS log books were altered, but document experts found no evidence of a deliberate attempt to conceal the alterations. That seems weak: among other things, randomization could have been subverted simply by changing the order in which names were entered into the log books.
Baines (ref. 99, p. 329) notes that a comparison of advanced cancers detected by MA + PE in treatment to those detected by PE in control (19 to 5) is biased. Such a comparison might indeed be biased, but it is not the comparison that was made.94,100 Baines also addresses questions about follow-up, radiology, and steering, as do other papers.4245,101 CNBSS remains controversial in some quarters20,23,103105 but approved in others.28,29
Gøtzsche and Olsen on the Canadian National Breast Screening Study
GO have not addressed the radiology, except to say that CNBSS was the only trial to have assessed the mammograms (ref. 37, p. 5) and CNBSS found small tumours.35 But that does not address questions about which tumours were missed, which were found, and when they were found. Baines, McFarlane, and Miller93 are not reassuring about study quality. GO cite this paper without discussion.
With respect to the statistical evidence on steering, say GO, the 17:5 imbalance in advanced cancers detected by PE at first screen of the younger women
is a post-hoc subgroup finding which is probably a result of the intervention, and exclusion of the deaths caused by these cancers does not change the result.... (ref. 37, p. 10)(i) GO are not in the strongest of positions to complain about post hoc subgroup findings, since most of their analysis is post hoc. (ii) This particular finding is hardly post hoc, being mentioned in the original report of the trial (Miller et al., ref. 40, pp. 1470, 1473). (iii) The 22 cancers detected by PE cannot be a result of the intervention, since PE was done at first screen in both arms of the trial. (iv) These 22 cases may only be the tip of the iceberg. We cannot know how many other high-risk women were steered to treatment. The answer may be 0. But this is the number in question, and until this number can be estimated, adjustment for steering is impossible. GO also say:
A persistent criticism has been that an effect would be difficult to find because the breasts of all women in the age-group 5059 years were physically examined regularly. This criticism is unwarranted because mammography will identify many tumours that are too small to be detected on physical examination alone. Furthermore, any effect of physical examination is likely to be small. A study of 122 471 women found no effect of regular self-examination of the breast on breast-cancer mortality after 9 years of follow-up, even though twice as many of the intervention group consulted an oncologist. (ref. 35, p. 132)
This response to criticism of CNBSS is irrelevant, because breast self examination is not the same as breast physical examination. Breast self examination is done by the woman herself: breast physical examination is done to the woman by a trained professional. Breast self examination may be of little value,106 whereas breast physical examination is effective at cancer detection. That seems to be the case in CNBSS, especially among the younger women (Table 10). PE even detects many cancers missed by MA.
|
![]() |
Conclusion |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
With respect to HIP, point (i) reflects a misunderstanding of the design. Point (ii) reflects a misunderstanding of the table that was analysed. This is bothersome, because the design features relevant to (i) are discussed in the reports GO cite, within a few pages of the numbers they use. Similarly, the table cited for (ii) contains most of the relevant information in headnotes and footnotes. For Two-County, points (i) and (ii) show some misunderstanding of basic statistical concepts like bias, variance, and clustering. Point (iii) depends on lack of care in reading tables, or lack of attention to explanatory material presented within a few pages of the tables used. Bias in determining cause of death remains a possibility. However, evidence cited to demonstrate this bias evaporates when examined, and there is compelling evidence on the other side, including comparability of death rates in treatment and control from causes other than breast cancer, reduction of total mortality among breast cancer cases, and a reduction in total mortality in the whole intervention group when the Swedish trials are pooled.
CNBSS has been criticized for (i) failures in randomization, and (ii) poor mammography. It has also been observed that (iii) CNBSS compared mammography to physical breast examination by trained personnel, rather than comparing mammography to usual care. GO's defence of the randomization mischaracterizes the evidence.94,100 The comparison was not post hoc; nor could the finding possibly have resulted from the intervention, because the comparison was of tumours discovered by physical examination at baseline in each arm of the trial. With respect to (ii), Baines, McFarlane, and Miller93the two principal investigators and the reference radiologistare not reassuring about study quality, and GO have not discussed this paper. GO's response to point (iii) involves a confusion between breast examination by a practitioner and self examination; it also ignores CNBSS data on the efficacy of breast examination by a practitioner (Table 10).
GO's critique of the positive studies (HIP and Two-County), like their defence of CNBSS, is careless at best. Rather than clarifying the issues, their papers have instead generated much confusion. Clinical trials of mammography have led to substantial advances in understanding breast cancer, and a substantial reduction in mortality from this disease. It is time to move on,107109 although some questions may remain.110,111
KEY MESSAGES
|
![]() |
Acknowledgments |
---|
![]() |
References |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
2 Shapiro S, Strax P, Venet L, Venet W. Changes in 5-year breast cancer mortality in a breast cancer screening program. In: Proc Seventh Natl Cancer Conference. Philadelphia: Lippincott, 1972, pp. 66378.
3 Fink R, Shapiro S, Roeser R. Impact of efforts to increase participation in repetitive screenings. Am J Public Health 1972;62:32836.[Medline]
4 Shapiro S. Evidence on screening for breast cancer from a randomized trial. Cancer 1977;39(Suppl.):277282.[ISI][Medline]
5 Shapiro S, Venet W, Strax P, Venet L, Roeser R. Selection, follow-up, and analysis in the Health Insurance Plan study: A randomized trial with breast cancer screening. Natl Cancer Inst Monogr 1985;67: 6579. With discussion.[Medline]
6 Shapiro S, Venet W, Strax P, Venet L. Periodic Screening for Breast Cancer: The Health Insurance Plan Project and its Sequelae, 19631986. Baltimore: Johns Hopkins, 1988.
7 Shapiro S, Venet W, Strax P, Venet L. Current results of the breast cancer screening randomized trial: The Health Insurance Plan (HIP) of greater New York study. In: Day N, Miller A (eds). Screening for Breast Cancer. Toronto: Hans Huber, 1988, pp. 315.
8 Tabár L, Gad A. Screening for breast cancer: The Swedish trial. Radiology 1981;138:21922.[Abstract]
9 Tabár L, Fagerberg CJ, Gad A et al. Reduction in mortality from breast cancer after mass screening with mammography: Randomised trial from the Breast Cancer Screening Working Group of the Swedish National Board of Health and Welfare. Lancet 1985;i:82932.
10 Fagerberg CJG, Tabár L. The results of periodic one-view mammography screening in a randomized controlled trial in Sweden. Part 1: Background, organization, screening program, tumor findings. In: Day N, Miller A (eds). Screening for Breast Cancer. Toronto: Hans Huber, 1988, pp. 3338.
11 Tabár L, Fagerberg CJG, Day NE. The results of periodic one-view mammography screening in a randomized controlled trial in Sweden. Part 2: Evaluation of the results. In: Day N, Miller A (eds). Screening for Breast Cancer. Toronto: Hans Huber, 1988, pp. 3944.
12 Tabár L, Fagerberg G, Duffy SW, Day NE. The Swedish two county trial of mammographic screening for breast cancer: Recent results and calculation of benefit. J Epidemiol Community Health 1989;43:10714.[Abstract]
13 Tabár L, Fagerberg G, Duffy SW, Day NE, Gad A, Gröntoft O. Update of the Swedish two-county program of mammographic screening for breast cancer. Radiol Clin North Am 1992;30:187210.[ISI][Medline]
14 Tabár L, Fagerberg G, Day NE, Duffy SW, Kitchin RM. Breast cancer treatment and natural history: New insights from results of screening. Lancet 1992;339:41214.[CrossRef][ISI][Medline]
15 Tabár L, Fagerberg G, Chen HH et al. Efficacy of breast cancer screening by age: New results from the Swedish two-county trial. Cancer 1995;75:250717.[ISI][Medline]
16 Tabár L, Vitak B, Chen HH, Prevost TC, Duffy SW. Update of the Swedish two-county trial of breast cancer screening: Histologic grade-specific and age-specific results. Swiss Surg 1999;5:199204.[Medline]
17 Tabár L, Vitak B, Chen HH et al. The Swedish two-county trial twenty years later: Updated mortality results and new insights from long-term followup. Radiol Clin North Am 2000;38:62551.[ISI][Medline]
18 Nixon R, Prevost TC, Duffy SW, Tabár L, Vitak B, Chen HH. Some random-effects models for the analysis of matched-cluster randomised trials: application to the Swedish two-county trial of breast-cancer screening. J Epidemiol Biostat 2000;5:34958.[Medline]
19 Tabár L, Vitak B, Chen HH, Yen MF, Duffy SW, Smith RA. Beyond randomized controlled trials: Organized mammographic screening substantially reduces breast carcinoma mortality. Cancer 2001;91: 172431.[CrossRef][ISI][Medline]
20 Feig SA. Effect of service screening mammography on population mortality from breast carcinoma. Cancer 2002;95:45157.[CrossRef][ISI][Medline]
21 Duffy SW, Tabár L, Chen HH et al. The impact of organized mammography service screening on breast carcinoma mortality in seven Swedish counties: A collaborative evaluation. Cancer 2002;95: 45869.[CrossRef][ISI][Medline]
22 Tabár L, Yen M-F, Vitak B, Chen H-HT, Smith RA, Duffy SW. Mammography service screening and mortality in breast cancer patients: 20-year follow-up before and after introduction of screening. Lancet 2003;361:140510.[CrossRef][ISI][Medline]
23 Cady B, Michaelson JS. The life-sparing potential of mammographic screening. Cancer 2001;91:1699703.[CrossRef][ISI][Medline]
24 Margolese RG, Fisher B, Hortobagyi GN, Bloomer WD. Neoplasms of the Breast. In: Bast OC Jr, Kufe DW, Pollock RE et al. (eds). Cancer Medicine. Hamilton, Ontario, Canada: BC Decker, 2000, Ch. 118, pp. 1735822. Available on-line at www.cancer.org.
25 US Preventive Services Task Force. Guide to Clinical Preventive Services. 1st Edn. Washington, DC: US Department of Health and Human Services, Office of Public Health and Science, Office of Disease Prevention and Health Promotion, 1989.
26 Nass SJ, Henderson IC, Lashof J (eds). Mammography and Beyond: Developing Technologies for the Early Detection of Breast Cancer. Washington, DC: Institute of Medicine, National Research Council, 2001.
27 International Agency for Research on Cancer. Breast Cancer Screening. Volume 7 of the IARC Handbooks of Cancer Prevention. Lyon: IARC, 2002.
28 US Preventive Services Task Force. Screening for breast cancer: Recommendations and rationale. Ann Intern Med 2002;137:34446.
29 US Preventive Services Task Force. Breast cancer screening: A summary of the evidence. Ann Intern Med 2002;137:34767.
30 Kerlikowske K, Grady D, Rubin SM, Sandrock C, Ernster VL. Efficacy of screening mammography. A meta-analysis. JAMA 1995;273:14954. Discussion, 1995;274:38083.[Abstract]
31 Kerlikowske K. Efficacy of screening mammography among women aged 40 to 49 years and 50 to 69 years: Comparison of relative and absolute benefit. J Natl Cancer Inst Monogr 1997;22:7986.[Medline]
32 Gohagan JK (ed.). National Institutes of Health Consensus Conference on Breast Cancer Screening for Women Ages 4049. J Natl Cancer Inst Monogr 1997;22.
33 Kattlove H, Liberati A, Keeler E, Brook RH. Benefits and costs of screening and treatment for early breast cancer. JAMA 1995;273: 14248. Discussion, 1995;274: 38083.[Abstract]
34 Wright CJ and Mueller CB. Screening mammography and public health policythe need for perspective. Lancet 1995;346:2932. Discussion, 1995;346:852.[CrossRef][ISI][Medline]
35 Gøtzsche PC, Olsen O. Is screening for breast cancer with mammography justifiable? Lancet 2000;355:12934.[CrossRef][ISI][Medline]
36 Olsen O, Gøtzsche PC. Cochrane review on screening for breast cancer with mammography. Lancet 2001;358:134042.[CrossRef][ISI][Medline]
37 Olsen O, Gøtzsche PC. Screening for Breast Cancer with Mammography (Cochrane Review). Oxford: Update Software. The Cochrane Library, Issue 4, 2001.
38 Olsen O, Gøtzsche PC. Systematic Screening for Breast Cancer with Mammography. 2001. http://image.thelancet.com/lancet/extra/fullreport.pdf
39 Miller AB, Howe GR, Wall C. The National Study of Breast Cancer Screening. Clin Invest Med 1981;4:22758.[ISI][Medline]
40 Miller AB, Baines CJ, To T, Wall C. Canadian National Breast Screening Study 1. Breast cancer detection and death rates among women aged 40 to 49 years. Can Med Assoc J 1992;147:145976.[Abstract]
41 Miller AB, Baines CJ, To T, Wall C. Canadian National Breast Screening Study 2. Breast cancer detection and death rates among women aged 50 to 59 years. Can Med Assoc J 1992;147:147788.[Abstract]
42 Miller AB, To T, Baines CJ, Wall C. The Canadian National Breast Screening Study: Update on breast cancer mortality. J Natl Cancer Inst Monogr 1997;22:3741.[Medline]
43 Baines CJ, Miller AB. Mammography versus clinical examination of the breasts. J Natl Cancer Inst Monogr 1997;22:12529.[Medline]
44 Miller AB, To T, Baines CJ, Wall C. Canadian National Breast Screening Study-2: 13-year results of a randomized trial in women aged 5059 years. J Natl Cancer Inst 2000;92:149099.
45 Miller AB, To T, Baines CJ, Wall C. The Canadian National Breast Screening Study-1: Breast cancer mortality after 11 to 16 years of follow-up. A randomized screening trial of mammography in women age 40 to 49 years. Ann Intern Med 2002;137:30512.
46 Berry DA. Benefits and risks of screening mammography for women in their forties: A statistical appraisal. J Natl Cancer Inst 1998;90: 143139.
47 Berry DA. Testimony, Senate hearing, 28 February 2002.
48 Health Council of The Netherlands. The Benefit of Population Screening for Breast Cancer with Mammography. The Hague, 2002.
49 Duffy SW. Interpretation of the breast screening trials: A commentary on the recent paper by Gøtzsche and Olsen. Breast 2001;10: 20912.[CrossRef][ISI]
50 Duffy SW, Tabár L, Smith RA. The mammographic screening trials: Commentary on the recent work by Olsen and Gøtzsche. CA Cancer J Clin 2002;52:6871.
51 Tabár L, Duffy SW, Yen M-F et al. All-cause mortality among breast cancer patients in a screening trial: Support for breast cancer mortality as an end point. J Med Screen 2002;9:15962.[CrossRef][ISI][Medline]
52 Tabár L, Smith RA, Vitak B et al. Mammographic screening: A key factor in the control of breast cancer. Randomisation and endpoint evaluation. Cancer J 2003;9:1527.[ISI][Medline]
53 Nyström L, Andersson I, Bjurstam N, Frisell J, Nordenskjöld B, Rutqvist LE. Long-term effects of mammography screening: Updated overview of the Swedish randomised trials Lancet 2002;359: 90919.[CrossRef][ISI][Medline]
54 Baum M. Screening mammography re-evaluated. Letter. Lancet 2000; 355:751.
55 Rozenberg S, Liebens F, Ham H. Screening mammography re-evaluated. Letter. Lancet 2000;355:75152.
56 Thornton H. Screening for breast cancer with mammography. Letter. Lancet 2001;358:2165.
57 Vaidya JS. Screening for breast cancer with mammography. Letter. Lancet 2001;358:2166.
58 Dixon-Woods M, Baum M, Kurinczuk JJ. Screening for breast cancer with mammography. Letter. Lancet 2001;358:216667.
59 Nyström L, Rutqvist LW, Wall S et al. Breast cancer screening with mammography: Overview of Swedish randomised trials. Lancet 1993; 341:97378. Discussion, 1993;341:153132.[CrossRef]
60 Nyström L, Larsson LG, Rutqvist LE et al. Determination of cause of death among breast cancer cases in the Swedish randomized mammography screening trials: A comparison between official statistics and validation by an endpoint committee. Acta Oncol 1995;34: 14552.[ISI][Medline]
61 Nyström L, Larsson LG, Wall S et al. An overview of the Swedish randomized mammography trials: Total mortality pattern and the representivity of the study cohorts. J Med Screening 1996;3:8587.[Medline]
62 Tabár L, Smith RA, Duffy SW. Update on effects of screening mammography. Letter. Lancet 2002;360:337.
63 Bonneux L. Update on effects of screening mammography. Letter. Lancet 2002;360:33738.
64 Gøtzsche PC. Update on effects of screening mammography. Letter. Lancet 2002;360:33839.[CrossRef][ISI][Medline]
65 Cheng KK. Update on effects of screening mammography. Letter. Lancet 2002;360:339.
66 Gulbrandsen P. Update on effects of screening mammography. Letter. Lancet 2002;360:339.
67 Nyström L, Andersson I, Bjurstam N, Frisell J, Rutqvist LE. Update on effects of screening mammography. Authors reply. Lancet 2002;360: 33940.
68 Freedman DA, Pisani R, Purves R. Adhikari A. Statistics. 2nd Edn. New York: WW Norton & Company, Inc, 1991.
69 Angrist J, Imbens G. Identification and estimation of local average treatment effects. Econometrica 1994;62:46775.[ISI]
70 Gøtzsche PC. Beyond randomized controlled trials: Organized mammographic screening substantially reduces breast carcinoma mortality. Letter. Cancer 2002;94:578.
71 Olesen O. Beyond randomized controlled trials: Organized mammographic screening substantially reduces breast carcinoma mortality. Letter. Cancer 2002;94:57879.
72 Ponzone R, Baum M. Beyond randomized controlled trials: Organized mammographic screening substantially reduces breast carcinoma mortality. Letter. Cancer 2002;94:580.[CrossRef][ISI][Medline]
73 Kopans DB. Beyond randomized controlled trials: Organized mammographic screening substantially reduces breast carcinoma mortality. Letter. Cancer 2002;94:58081.[CrossRef][ISI][Medline]
74 Tabár L, Duffy SW, Smith RA. Beyond randomized controlled trials: Organized mammographic screening substantially reduces breast carcinoma mortality. Authors reply. Cancer 2002;94:58183.[CrossRef][ISI]
75 Miller AB. Screening for breast cancer with mammography. Letter. Lancet 2001;358:2164.
76 Chu KC, Smart CR, Tarone RE. Analysis of breast cancer mortality and stage distribution by age for the Health Insurance Plan clinical trial. J Natl Cancer Inst 1988;80:112532.[Abstract]
77 Gøtzsche PC. Screening for breast cancer with mammography. Author's reply. Lancet 2001;358:216768.[Medline]
78 Rosner D, Bedwani RN, Vana J, Baker HW, Murphy GP. Noninvasive breast carcinoma: Results of a national survey by the American College of Surgeons. Ann Surg 1980;192:13947.[ISI][Medline]
79 Ernster VL, Barclay J. Increases in ductal carcinoma in situ (DCIS) of the breast in relation to mammography: A dilemma. J Natl Cancer Inst Monogr 1997;22:15156.[Medline]
80 Cady B, Stone MD, Schuler JG, Thakur R, Wanner MA, Lavin PT. The new era in breast cancer: Invasion, size, and nodal involvement dramatically decreasing as a result of mammographic screening. Arch Surg 1996;131:30108.[Abstract]
81 Larsson LG, Nyström L, Wall S et al. The Swedish randomised mammography screening trials: analysis of their effect on the breast cancer related excess mortality. J Med Screening 1996;3:12932.[Medline]
82 Cochran WG. Sampling Techniques. 3rd Edn. New York: John Wiley & Sons, 1977.
83 Duffy SW, Tabár L. Screening mammography re-evaluated. Letter. Lancet 2000;355:74748.[CrossRef][ISI][Medline]
84 de Koning HJ. Assessment of nationwide cancer-screening programmes. Lancet 2000;355:8081.[CrossRef][ISI][Medline]
85 Moss S, Blanks R, Quinn MJ. Screening mammography re-evaluated. Letter. Lancet 2000;355:748.
86 Nyström L. Screening mammography re-evaluated. Letter. Lancet 2000;355:74849.
87 Law M, Hackshaw A, Wald N. Screening mammography re-evaluated. Letter. Lancet 2000;355:74950.[CrossRef]
88 Cates C, Senn S. Screening mammography re-evaluated. Letter. Lancet 2000;355:750.
89 Senn S. Screening for breast cancer with mammography. Letter. Lancet 2001;358:2165.
90 Duffy SW, Tabár L, Smith RA. Screening for breast cancer with mammography. Letter. Lancet 2001;358:2166.
91 Duffy SW, Tabár L, Vitak B et al. The Swedish Two-County Trial of mammographic screening: Cluster randomisation and endpoint evaluation. Ann Oncol 2003, in press.
92 Baines CJ, Miller AB, Kopans DB et al. Canadian National Breast Screening Study: Assessment of technical quality by external review. Am J Roentgenol 1990;155:74347. Discussion, 1990;155:74849, 113334.[Abstract]
93 Baines CJ, McFarlane DV, Miller AB. The role of the reference radiologist. Estimates of interobserver agreement and potential delay in cancer detection in the national breast screening study. Invest Radiol 1990;25:97176.[ISI][Medline]
94 Mettlin CJ, Smart CR. The Canadian National Breast Screening Study: An appraisal and implications for early detection policy. Cancer 1993;72(Suppl.):146165.[ISI][Medline]
95 Kopans DB, Feig SA. The Canadian National Breast Screening Study: A critical review. Am J Roentgenol 1993;161:75560.[Abstract]
96 Burhenne LJ, Burhenne HJ. The Canadian National Breast Screening StudyA Canadian critique. Am J Roentgenol 1993;161: 76163.[Abstract]
97 Boyd NF, Jong RA, Yaffe MJ, Tritchler D, Lockwood G, Zylak CJ. A critical appraisal of the Canadian National Breast Cancer Screening Study. Radiology 1993;189:66163.[ISI][Medline]
98 Kopans DB, Halpern E, Hulka CA. Statistical power in breast cancer screening trials. Cancer 1994;74:1196203. Discussion, 1994;74: 120416.[ISI][Medline]
99 Baines CJ. The Canadian National Breast Cancer Screening Study: A perspective on criticism. Ann Intern Med 1994;120:32634.
100 Tarone RE. The excess of patients with advanced breast cancers in young women screened with mammography in the Canadian National Breast Screening Study. Cancer 1995;75:9971003.[ISI][Medline]
101 Baines CJ. The Canadian National Breast Cancer Screening Study: Why? What next? And so what? Cancer 1995;76(Suppl.): 210912.
102 Bailar JC 3rd, MacMahon B. Randomization in the Canadian National Breast Screening Study: A review for evidence of subversion. Can Med Assoc J 1997;156:19399.[Abstract]
103 Boyd NF. The review of randomization in the Canadian National Breast Screening Study. Can Med Assoc J 1997;1567:20709. Discussion, 1997;157:247250.
104 Cady B. The screening mammography: The continuous dilemma. Breast J 2002;8:18586.[CrossRef][Medline]
105 Kopans DB. The most recent breast cancer screening controversy about whether mammographic screening benefits women at any age: Nonsense and nonscience. Am J Roentgenol 2003;180:2126.
106 Thomas DB, Gao DL, Self SG et al. Randomized trial of breast self-examination in Shanghai: Methodology and preliminary results. J Natl Cancer Inst 1997;89:35565.
107 Gelmon KA, Olivotto I. The mammography screening debate: Time to move on. Lancet 2002;359:90405.[CrossRef][ISI][Medline]
108 Begg CB. The mammography controversy. Oncologist 2002;7:17476. Editorial commentary, 2002;7:17073.
109 Fletcher SW, Elmore JG. Mammographic screening. New Engl J Med 2003;348:167280.
110 Sox H. Screening mammography for younger women: Back to basics. Ann Intern Med 2002;137:36162.
111 Goodman SN. The mammography dilemma: A crisis for evidence-based medicine? Ann Intern Med 2002;137:36365.