a Department of Health Science, College of Health and Human Performance, Brigham Young University, 213 Richards Building, Provo, UT 84602; Division of Epidemiology, Department of Family and Preventive Medicine, University of Utah College of Medicine, UT 84132, USA.
b Laboratory of Epidemiology and Biostatistic, Istituto Superiore di Sanita, Viale Regina Elena, 299, 00161, Roma, Italy.
c Applied Research Branch, Cancer Control Research Program, Division of Cancer Control and Population Sciences, National Cancer Institute, EPN 313, 9000 Rockville Pike, Bethesda, MD 20892, USA.
![]() |
Abstract |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Methods Incidence and relative survival were modelled and used to obtain estimated completeness indices of SEER prevalence proportions for all cancer sites combined, stomach, cervix uteri, skin melanomas, non-Hodgkin's lymphomas, lung and bronchus, colon/rectum, female breast, and prostate. For validation purposes, modelled completeness indices were computed for Connecticut and compared with empirical completeness indices (the ratio of Connecticut based prevalence proportion estimates using 19731993 data to 19401993 data). The SEER-based modelled completeness indices were used to adjust SEER prevalence proportion estimates for white and black patients.
Results Model validation showed that the adjusted SEER cancer prevalence proportions provided reasonably unbiased prevalence proportion estimates in general, although more complex modelling of the completeness indices is necessary for female cancers of the colon, melanoma, breast, cervix, and all cancers combined. The SEER-based cancer prevalence proportions are incomplete for most cancer sites, more so for women, whites, and at older ages. For all cancers combined, prevalence proportions tended to be higher for whites than blacks. For the site-specific cancers this was true for stomach, prostate, cervix uteri, and lung and bronchus (men only). For colon/rectal cancers the prevalence proportions were higher for blacks through ages 59 (men) and 64 (women), and then for the remaining ages they were higher for whites. Prevalence proportions were lowest for stomach cancer and highest for prostate and female breast cancers. Men experienced higher prevalence proportions than women for skin melanomas, non-Hodgkin's lymphomas, lung and bronchus, and colon/rectal cancers.
Conclusion The modelling approach applied to SEER data generally provided reasonable estimates of cancer prevalence. These estimates are useful because they are more representative of cancer prevalence than previously obtained and reported in the US.
Keywords Cancer, prevalence, burden, cohort, cross-sectional, life table
Accepted 8 September 1999
![]() |
Introduction |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Population-based cancer prevalence estimates are only complete if obtained from tumour registry data which have been collected over a sufficiently long period of time to capture all prevalent cases of the disease.23,6 Although incomplete prevalence estimates may be appropriate for determining required treatment in the population for diseases which primarily involve short term care,7 complete prevalence estimates provide a better assessment of the disease burden for those conditions in which recurrence is common and long term physical and psychological care needed. The Connecticut Tumor Registry (CTR) has information on cancer cases from as early as 1935,8,9 and is the only registry in the US with sufficient follow-up data to directly estimate cancer prevalence. In the mid 1980s, Feldman et al. derived prevalence estimates based on 47 years of incidence data from the CTR.2 These estimates have more recently been updated using 59 years of incidence data.5 Nevertheless, the use of CTR data to estimate complete prevalence is limited in that Connecticut incidence and survival data may not be representative of the US population, particularly for racial/ethnic subgroups. Neither of these two CTR-based studies reported prevalence estimates for racial/ethnic subgroups.
Since 1973, the Surveillance, Epidemiology, and End Results (SEER) Program of the National Cancer Institute began actively collecting and reporting cancer incidence and survival data.10 The nine standard cancer registries in the SEER Program cover about 10% of the US population. These registries are thought to be reflective of the US cancer experience, and are the primary source of national estimates of cancer incidence and survival. Prevalence proportions based on SEER data are of interest because they better reflect US prevalence among racial subgroups in the population. However, such estimates will be biased for many cancer sites because unobserved cases diagnosed before the start of active data collection by the cancer registries in SEER are not included in the prevalence measure.
In the current study, we determine the degree of completeness of SEER prevalence estimates for both white and black cancer patients using the modelling approach of Capocaccia and De Angelis (1997).11 This approach is validated using CTR data. The SEER prevalence estimates are then computed and reported, adjusted by the modelled index of completeness. The primary aim of this study is to provide more representative prevalence estimates for the US as well as to provide, for the first time using SEER data, prevalence estimates for blacks. The analysis focuses on ten cancer sites (stomach, cervix uteri, melanomas-skin, non-Hodgkin's lymphomas, lung and bronchus, colon/rectum, female breast, and prostate) which represent a wide range of incidence and survival rates, from low incidence and low survival resulting in low prevalence to high incidence and high survival resulting in high prevalence.
![]() |
Materials and Methods |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
A diagnosed case contributes to the pool of prevalent cases until death. Only the first primary for a given cancer type is considered. Cases diagnosed by autopsy and death certificate are not treated as prevalent cases and thus excluded from analysis. Incidence rates were computed by single year of age. Population denominators from the US Bureau of the Census required to compute rates were available in 5-year age groups, with single year ages derived by Beers' Ordinary Formula.12
We selected the cancer sites shown in Table 1 because, in addition to providing a good representation of various levels of prevalence, they also represent cancers for which extensive screening and prevention efforts have been made in this country.
|
The general methodology considers a single birth cohort with the completeness index a combination of incident and survival functions. Now consider that the birth cohort is observed for a time period of L years. The proportion of the population of individuals with cancer at age x may be separated into a part which derives from the incident cases observed in a registry between the age interval [x-L, x], and a part of unobserved cases in the registry diagnosed at previous ages and still living at x; that is, the prevalence at age x consists of the unobserved and observed cancer cases living in the registry:
![]() |
![]() |
The completeness index, R, varies between zero and one, where one means that all the diagnosed cases were included in the prevalence estimate during the reference period.
Incidence functions
The incidence function used for describing the relationship between cancer incidence and age, adjusting for birth cohort, is expressed as
![]() |
![]() |
![]() |
The models were fit to strata defined by various combinations of area (SEER, Connecticut only), gender, and race (white, black) to obtain estimates of the incidence slope parameter b used to derive the completeness index, R. Because prevalence for a specific age group is estimated from a single cohort, the numerator and denominator of the prevalence calculation for the completeness index are both scaled by the same cohort parameter, and thus the cohort parameter cancels out of the calculation. The incidence slope estimates were similar between white and black males and white and black females for all cancers combined, stomach, non-Hodgkin's lymphomas, lung and bronchus, and colon/rectum, so the estimated values used to compute the completeness indexes for these sites were based on white and black cases combined. For melanomas, which are very rare among blacks, we only conducted the analysis for whites.
Incidence slope parameter estimates were derived from SEER and Connecticut data by cancer site and gender, for the first incidence function, where b is the slope parameter measuring the log-log linear relationship between incidence and age (estimates not shown). Incidence slope parameter estimates were also derived for the cancer sites assessed using the second incidence function, where b i is the slope parameter of a polynomial logistic relationship between incidence and age (estimates not shown). The cohort parameters were estimated along with the slope parameters, but as noted they cancel out of the calculation of the completeness index.
Modelling survival
A survival model with cure was fit to the SEER data. Similar models have been successfully applied previously.16,17 This model assumes that only a portion of the patients have an excess mortality rate while the remainder have the same mortality rate as the general population and can be considered with regard to the death risk as cured.
In the survival model, a Weibull function was assumed for fatal cases and the influence of time of diagnosis and race was modelled with an exponential factor of the entire relative survival function. The cumulative relative survival up to age x of a patient diagnosed at age t and year y was assumed as:
![]() |
Relative survival estimates using the life table method were computed using the SEER Portable Survival System.19 Relative survival was stratified according to 5-year age intervals (2529, 3034, ..., 7579, 8084; except for all cancers, which included earlier age intervals), period of diagnosis (19731977, 19781982, 19831987, 19881993), gender, and race (white and black). A 20-year follow-up period was considered and we constrained the relative survival not to increase over follow-up time. Exclusions were made of cases with second or later primaries, cases diagnosed by death certificates and at autopsy, and cases not actively followed. Files containing the empirical relative survival rates and standard errors, together with the corresponding values of time since diagnosis, age, time of diagnosis, gender, and race were exported from the portable survival package. The parameters A, ,
, ß1, and ß2 were then estimated using the SAS NLIN procedure from the exported relative survival results (estimates not shown). On the basis of these parameter estimates, cancer site, gender, and race specific completeness indices were derived.
Figure 1 provides an illustration of 3-, 5-, 7- and 10-year modelled and observed relative survival for white women with breast cancer, presented by year of diagnosis in Connecticut and SEER for the reference age 62. We only use SEER data from 1973 through 1993 to model survival. However, modelled estimates from these years and those back-projected to earlier years (not shown) are used in the denominator of the completeness index R. For Connecticut, where we have observed survival to compare with projected survival from 1940 onward, we compared back-projected and observed survival for breast cancer. The projected survival underestimates the observed survival in early years. This notwithstanding, modelled total prevalence of breast cancer is very close to the observed, as shown in Table 3
. This indicates that estimation of completeness indices is scarcely sensitive to the values of the survival function at
20 years before the index date.
|
|
Adjusted prevalence proportions
Cancer site and age-group specific prevalence proportions on 1 January 1994 were estimated per 100 000 using the Feldman et al. method.2 This method was applied to four data sets: (1) Connecticut white cases diagnosed 19401993, (2) Connecticut white cases diagnosed 19731993, (3) SEER white cases diagnosed 19731993, (4) Connecticut black cases diagnosed 19401993, and (5) SEER black cases diagnosed 19731993. The method applied to (1) and (4) provides the conventional prevalence proportion estimates. We compare these estimates with those obtained by applying the Feldman et al. method to (2), (3), and (5), which were adjusted by dividing the cancer site, age group, area-, and race-specific computed prevalence proportions by the corresponding modelled completeness indices obtained from the Capocaccia and De Angelas method.11
![]() |
Results |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
|
|
|
|
For all cancers combined, prevalence estimates tended to be higher for whites than blacks. For the site-specific cancers this was true for stomach, prostate, cervix uteri, and lung and bronchus (men only). For colon/rectal cancers the prevalence estimates were higher for blacks through ages 59 (men) and 64 (women), and then for the remaining ages they are higher for whites.
As expected, based on the incidence and survival rate combinations reported in Table 1, stomach cancer for men and women had the lowest prevalence estimates whereas prostate and breast cancers had the highest prevalence estimates. Higher prevalence estimates were experienced by men than women for skin melanomas, non-Hodgkin's lymphomas, lung and bronchus, and colon/rectal cancers.
![]() |
Discussion |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Cancer prevalence, which reflects in a single measure the effects of incidence and survival, is an important indicator of the burden of this disease in the population and on the health care system. Currently the Connecticut Tumor Registry is the only source of data in the US which allows us to directly compute prevalence. However, prevalence in Connecticut does not necessarily mirror that in the SEER areas (as suggested by differences in incidence and survival rates)11 or the total US. For example, the estimated number of prevalent cases of any cancer for white men ages 7074 in the US on 1 January 1994 (obtained by multiplying our modelled prevalence proportion estimates by the average of the populations in 1993 and 1994 from the Bureau of the Census), is 495 225 in Connecticut and 526 433 in SEER. Based on the modelled prevalence proportions on 1 January 1994 and projections of the white male population from the Bureau of the Census middle series,20 in the year 2020 the estimated number of prevalent cases is 829 840 in Connecticut and 882 134 in SEER. Hence, the burden of cancer for white men in the US appears to be potentially very different when based on Connecticut data versus SEER data. The aim of this study was to provide prevalence proportions which better reflect the US white and black populations.
Factors influencing the number of years of follow-up required before the registration period is sufficient to capture the majority of prevalent cases includes the age in which the disease is common and the lethality of the disease. For example, the registration period was essentially sufficient for prostate cancer because it primarily occurs in old age where the life expectancy is relatively short. The registration period was also almost sufficient for lung and bronchus cancers because of the short survival associated with these diseases. On the other hand, for cancer of the cervix uteri the relatively young age at diagnosis and good survival require many more years of follow-up to capture prevalence. In general, women needed more years of follow-up than men, and whites more years of follow-up than blacks. This is because of better survival in women than men, and in whites than blacks.
The empirical completeness indices for whites from Connecticut could have been used to correct the SEER-based prevalence estimates rather than the modelled completeness indices for whites from SEER. However, random variation in the empirical estimates, and uncertainty about the representativeness of the Connecticut-based completeness indices to SEER indicated the need for modelled completeness indices. Further, sparse data limited us from obtaining empirical completeness indices for blacks in Connecticut. Yet although the modelled completeness indices based on SEER data are more stable, and can be obtained for blacks, they require certain assumptions about the cure fraction and distribution function of survival. Hence, limitations exist for both approaches.
Modelled prevalence estimates for whites in Connecticut compared to SEER were higher for stomach, non-Hodgkin's lymphomas, and colon/rectal cancers, and lower for prostate and cervix uteri cancers. This may be explained by higher incidence rates in Connecticut than in SEER for the former set of cancers but lower incidence rates in Connecticut compared to SEER for the latter set of cancers.10 Prevalence estimates for blacks could not be directly compared between Connecticut and SEER, but we would expect that they would be higher in those areas displaying higher incidence rates. The incidence rates between Connecticut and SEER vary greatly for blacks for certain cancers (all cancers combined, stomach, lung and bronchus, prostate, and cervix uteri).10 Hence, the modelled prevalence estimates among blacks in SEER for these cancer sites would be different than in Connecticut and better reflect US prevalence.
While the modelled SEER-based prevalence estimates provide a better representation of US prevalence, methods to obtain US and state level prevalence estimates are of primary interest. This has led to prevalence estimates obtained from national surveys,21 Medicare data,2224 and a recent effort based on general methods using mortality and survival data.25 The National Cancer Institute is currently sponsoring a project developing and applying methods to obtain estimates of US incidence and prevalence. The estimates presented in the current work are important for validating estimates obtained in further modelling efforts.
![]() |
References |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
2 Feldman AR, Kessler L, Myers MH, Naughton MD. The prevalence of cancer. N Engl J Med 1986;315:139497.[Abstract]
3 Adami OH, Gunnarsson T, Sparen P, Eklund G. The prevalence of cancer in Sweden 1984. Acta Oncologia 1989;28:46370.[ISI][Medline]
4 Capocaccia R, Verdecchia A, Micheli A, Sant M, Gatta G, Berrino F. Breast cancer incidence and prevalence estimated from survival and mortality. Cancer Causes Control 1990;1:2329.[ISI][Medline]
5 Polednak AP. Estimating prevalence of cancer in the United States. Cancer 1997;80:13641.[ISI][Medline]
6 Teppo L, Hakama M, Hakulinen T, Lehtonen M, Saxen E. Cancer in Finland 195370: Incidence, mortality, prevalence. Acta Pathologica Microbiologica Scandinavica (A) 1975;(Suppl.252).
7 Micheli A, Gatta G, Sant M et al. Breast cancer prevalence measured by the Lombardy Cancer Registry. Tumori 1997;83:87579.[ISI][Medline]
8 Connelly RR, Campbell PC, Eisenberg H. Central registry of cancer cases in Connecticut. Public Health Rep 1968;83:38690.[ISI][Medline]
9 Gershman ST, Flannery JT, Barrett H, Nadel RK, Meigs JW. Development of the Connecticut Tumor Registry. Conn Med 1976;40:697701.[Medline]
10 Ries LA, Kosary CL, Hankey BF, Miller BA, Harras A, Edwards BK (eds). SEER Cancer Statistics Review, 19731994. National Cancer Institute. NIH Pub. No. 97-2789. Bethesda, MD, 1997.
11 Capocaccia R, De Angelis R. Estimating the completeness of prevalence based on cancer registry data. Stat Med 1997;16:42540.[ISI][Medline]
12 Shryock HS, Siegel JS et al. US Bureau of the Census. The Methods and Materials of Demography. Third Print (rev.). Washington DC: US Government Printing Office, 1975.
13 Ederer F, Axtell LM, Cutler SJ. The relative survival rate: a statistical methodology. Natl Cancer Inst Monog 1961;6:10121.
14 Armitage P, Doll R. The age distribution of cancer and a multi-stage theory of carcinogenesis. Br J Cancer 1954;8:112.[ISI]
15 Cook PJ, Doll R, Fellingham SA. A mathematical model for the age distribution of cancer in man. Int J Cancer 1969;4:93112.[ISI][Medline]
16 Goldman AI. Survivorship analysis when cure is a possibility: a Monte Carlo study. Stat Med 1984;3:15363.[ISI][Medline]
17 Gamel JW, McLean IW, Rosenberg SH. Proportion cured and mean long survival time as functions of tumor size. Stat Med 1990;9:9991006.[ISI][Medline]
18 Miller RG Jr. Survival Analysis. New York: John Wiley & Sons, 1981.
19 SEER 197393 Public-Use CD-Rom. US Department of Health and Human Services, PHS/NIH/NCI/CSB, August 1996.
20 Day JC. Population Projections of the United States by Age, Sex, Race, and Hispanic Origin: 19952050. US Bureau of the Census, Current Population Reports, P251130, US Government Printing Office, Washington DC, 1996.
21 Byrne J, Kessler LG, Devesa SS. The prevalence of cancer among adults in the United States: 1987. Cancer 1992;68:215459.
22 McBean AM, Babish JD, Warren JL. Determination of lung cancer incidence in the elderly using Medicare claims data. Am J Epidemiol 1993;137:22634.[Abstract]
23 McBean AM, Warren JL, Babish JD. Measuring the incidence of cancer in elderly Americans using Medicare claims data. Cancer 1994; 73:241725.[ISI][Medline]
24 Warren JL, Riley GF, McBean AM, Hakim R. Use of Medicare data to identify incident breast cancer cases. Health Care Financing Rev 1996; 18:23746.[ISI][Medline]
25 Verdecchia A, Capocaccia R, Egidi V, Golini A. A method for the estimation of chronic disease morbidity and trend from mortality data. Stat Med 1989;8:20116.[ISI][Medline]