Using Hospital Discharge Files to Enhance Cancer Surveillance

Lynne Penberthy1 , Donna McClish2, Amy Pugh3, Wally Smith1, Claudine Manning1 and Sheldon Retchin1

1 Department of Internal Medicine, Medical College of Virginia, Virginia Commonwealth University, Richmond, VA.
2 Department of Biostatistics, Medical College of Virginia, Virginia Commonwealth University, Richmond, VA.
3 Virginia Cancer Registry, Division of Surveillance and Epidemiology, Virginia Department of Health, Richmond, VA.

Received for publication April 10, 2002; accepted for publication August 28, 2002.


    ABSTRACT
 TOP
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS
 DISCUSSION
 REFERENCES
 
Use of the traditional mechanism for cancer surveillance, hospital-based registries, may limit ascertainment of incident cases. In this study, the authors evaluated the ability of a statewide hospital discharge file (HDF) to enhance central cancer registry reporting. Incident cancers from a Virginia cancer registry were linked with an HDF for 1995. Medical record abstractions for over 2,000 cancers verified HDF and registry data. There were 19,740 unique cases ascertained from the two combined data sources. The registry captured approximately 83% of cases, while the HDF captured 62%. Although the HDF missed a substantial number of registry cases, the HDF positive predictive value for identifying the correct cancer site was 94%. Logistic regression was used to identify significant characteristics of cases likely to be captured only by the HDF; these characteristics included hospital cancer program certification, the position of the cancer diagnosis on the claim, and cancer surgery. This study represents the evaluation of a novel approach to enhancing registry completeness and accuracy using statewide HDFs. The results strongly suggest that neither a central cancer registry nor an HDF is a sufficient source for complete capture of cases. Using HDFs to supplement a central cancer registry may be a valuable and efficient method for improving registry completeness of reporting.

hospital records; neoplasms; population surveillance; registries

Abbreviations: Abbreviations: ACOS, American College of Surgeons; HDF, hospital discharge file; ICD-9-CM, International Classification of Diseases, Ninth Revision, Clinical Modification; VCR, Virginia Cancer Registry.


    INTRODUCTION
 TOP
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS
 DISCUSSION
 REFERENCES
 
Cancer surveillance is an essential tool for assessing patterns in the occurrence of cancer and for detecting important trends within subpopulations. The data gathered through surveillance provide information with which to rationally allocate limited resources for cancer prevention and to assess the impact of cancer prevention programs (1). The National Cancer Institute supports cancer surveillance through the collection of data from a set of cancer registries called the Surveillance, Epidemiology, and End Results Program. This is the database on which the national cancer statistics reported by the National Cancer Institute are based (2, 3). Individual states also must maintain their own cancer surveillance systems to evaluate incident trends and to monitor patterns of care for subpopulations within the state (4, 5). Although central registries have improved identification of incident cancers, underreporting remains a problem (68).

The use of secondary databases, including Medicare, Medicaid, or Blue Cross/Blue Shield data or statewide hospital discharge files (HDFs), has been suggested to augment the capture of incident cases and treatment (917, 23, 24). For the populations these databases represent, capture rates are high in comparison with central cancer registries (7, 8, 17, 2022, 25, 26). Several studies have specifically evaluated the validity of claims data using medical records in a health maintenance organization as a gold standard in identifying incident breast cancers (27, 28). The positive predictive values reported in those studies were high, ranging from 83 percent to 96 percent. Other studies have assessed the accuracy in incidence reporting using cancer registries as the gold standard (7, 25, 26, 29). The study reported here was specifically designed both to evaluate accuracy and to determine whether a secondary data source, hospital discharge data, can enhance the capture of incident cancers for a central cancer registry. Validation of diagnoses and diagnosis dates was based on the quality control model from the Surveillance, Epidemiology, and End Results Program and used the inpatient medical record as the gold standard (3, 5, 30).


    MATERIALS AND METHODS
 TOP
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS
 DISCUSSION
 REFERENCES
 
Overview
Cases from the Virginia Cancer Registry (VCR) were matched with cases identified from the Virginia statewide HDF. Data from inpatient medical records for the year of diagnosis (1995) for a random sample of cases were abstracted by trained nurse abstractors and hospital cancer registrars for validation of information on cancer site and incidence.

Data sources
Virginia Cancer Registry
The VCR has been population-based for incident cancers since 1990. For the 50 percent of hospitals certified by the American College of Surgeons (ACOS), data are likely to be more completely reported. For non-ACOS facilities, the same data are less consistently available (31). State regulations have required reporting from all hospital, laboratory, and other medical facilities since 1990, although active surveillance did not begin in free-standing radiation treatment and surgery centers until 1997 (31, 32). Despite these regulations, estimates of completeness based on the predictive models of the North American Association of Central Cancer Registries during the study period ranged from 85 percent to 92 percent (32). Therefore, any potential method for supplementing case reporting is important. Cancers evaluated in this study included breast, prostate, colon and rectal, cervical, and lung cancers.

Virginia HDF
The Virginia HDF is a statewide HDF that has electronically collected the universal billing (UB92) forms from all acute-care hospitals in Virginia since 1994. By 1995, over 91 percent of patients had a Social Security number included as a unique identifier on their universal billing form. Cancer cases were identified on the basis of International Classification of Diseases, Ninth Revision, Clinical Modification (ICD-9-CM) codes in up to any of the 10 possible positions on the claim form for breast (code 174), prostate (code 165), colon and rectal (codes 153 and 154), cervical (code 180), and lung (codes 162 and 163) cancer. Because we identified a substantial number of cancers reported as carcinoma in situ in the registry data, these codes were included in the study. They included codes for cancer of the breast (code 2330), colon/rectum (codes 23303–23306), cervix (code 2331), lung (code 23312), and prostate (code 2334). The date of diagnosis for HDF cases was defined as the date of first admission in 1995 with a diagnosis code in any position for one of the five cancer sites.

Matching of VCR and HDF cases
The case-finding abilities of the VCR and HDF were compared by matching cases from the two files on the basis of Social Security numbers. These matches were confirmed with the patient’s date of birth and gender. If Social Security information was missing (9.8 percent of patients with cancer admission in 1995) or there was no match based on the Social Security number, a match was performed between the VCR and the HDF based on last name, first initial, and date of birth. For those patients who were matched on the latter criteria, a review was carried out by analysis personnel as a final check to determine whether the two patients were the same person, using additional confirming factors such as first name, middle initial, medical record number, and dates of admission or diagnosis. Potential errors in matching may have occurred because of erroneous Social Security numbers, incorrectly spelled names, or names missing from the HDF file.

Population
The population for this study included all cases of breast, cervical, colorectal, lung, and prostate cancer identified by either the registry, the HDF, or both in 1995. The index year 1995 was selected for this study because it was the most recent year permitting evaluation of prevalent cases in the HDF during the year prior to the initial diagnosis. Prevalent cases uniquely identified by the HDF were excluded if they had a prior diagnosis either in the registry in 1990–1994 or in the HDF in 1994.

Sample for validation
A random sample of 2,625 cases was selected for validation. This number provided sufficient power to assess validity for reporting of treatment and incidence by cancer site. Validation consisted of review and abstraction of selected data elements from all inpatient admissions of each case-patient for 1995. Medical records were available and data on the cases were abstracted for 82 percent of these patients. Prior to analysis, 13 patients were deleted because the abstraction was incomplete. Three cases were deleted because we were not able to verify patient identification due to coding errors such as an incorrect Social Security number or name. Three males with breast cancer were deleted. A validated case was defined as complete and included in the gold standard for the validation sample if all hospital admissions in 1995 were abstracted and data fields were complete. Cases not meeting this requirement were deleted (n = 108); this left 2,025 complete cases to serve as the gold standard in the validation sample.

Abstraction was performed by a trained nurse abstractor or cancer registrar. Items to be validated included: patient identifying and demographic information, cancer site, dates of initial diagnosis and treatment, and the names of physicians providing care for the patient.

Analysis
Analysis was done using PC SAS, version 8.0 (SAS Institute, Inc., Cary, North Carolina). The numbers of unique cases from each of the two data sources and the number of cases common to both data sources were counted to determine the total number of unique cases reported. From this information, the percentage of total cases captured by each data source was calculated. Cases uniquely reported from the HDF estimate the potential added benefit of the HDF data to the VCR.

Estimates of the accuracy of the HDF in capturing incident cases were calculated as the positive predictive value of the HDF. The positive predictive value estimates the ability of the claims data (HDF) to distinguish true-positive cases from false-positive cases. In this instance, true-positive cases are those cases identified as cancer by both the medical record (the "gold standard") and the HDF. False-positive cases are those cases identified as cancer by the HDF but not verified by the gold standard. The HDF subset of the 2,025 cases in the validation study was initially used to estimate accuracy related to ascertainment of cancer site. This represents the positive predictive value of using ICD-9-CM codes in the HDF file to correctly identify the site. Validation of the year of diagnosis for these cases provided a measure of the accuracy of the HDF in detecting incident cases. False-positive cases captured as incident cases only by the HDF and not reported to the registry (as either incident or prevalent in 1990–1994) were reviewed.

Logistic regression analysis was performed to identify factors that would enhance the accuracy and completeness of the cancer registry data in identifying additional cases from the HDF. The model is written as follows:

logit(HDF-only) = B0 + B1 age + B2 gender + B3 ACOS + B4 principal position + B5 cancer surgery + {gamma}.

The dependent variable for this model is logit(HDF-only) = log(probability that a case is captured by HDF only)/(probability that a case is captured by HDF and registry). Independent variables included age (in years), gender, hospital ACOS cancer-center certification status, position of the ICD-9-CM code on the claim, and cancer-specific surgery during the index hospitalization.


    RESULTS
 TOP
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS
 DISCUSSION
 REFERENCES
 
In total, there were 16,361 cases from the five study sites reported to the cancer registry in 1995. There were 12,174 cases identified in the 1995 claims from the HDF. Matching the two sources resulted in a total of 19,740 unique cases identified through the combined data sources. Table 1 shows numbers of cases by age group according to the individual and combined data sources. The last column of the table provides the added percentage of cases that would be contributed by the HDF relative to the number routinely captured by the registry. This percentage assumes that all HDF cases are valid incident cancers, and it represents a maximum potential contribution of the HDF. The range in potential additional cases varies by cancer site and age group.


View this table:
[in this window]
[in a new window]
 
TABLE 1. Breast, cervical, colon/rectal, lung, and prostate cancer cases captured by the Virginia Cancer Registry and the Virginia hospital discharge file, by cancer site and age group, 1995
 
Analysis of the subset of cases that were validated through medical record abstraction was performed to determine the accuracy of the HDF in correctly identifying a cancer and its correct site of origin. Table 2 shows the results of this validation process. The overall positive predictive value was 94 percent, with the site-specific positive predictive value ranging from a low of 86 percent for the cervix to a high of 98 percent for the breast.


View this table:
[in this window]
[in a new window]
 
TABLE 2. Results of inpatient validation of cancer site as reported by the Virginia hospital discharge file, 1995
 
Additional evaluation of the validation sample was conducted, focusing on cases identified uniquely by the HDF, since these are of the most interest to the cancer registry as potentially accessionable cases. For the cases uniquely identified by the HDF (HDF-only), there was substantial variation in the positive predictive value by cancer site. The positive predictive value was lower for cases that were identified by the HDF only (table 3) as compared with all cases identified by the HDF (table 2). The percentage of HDF-only cases for which the cancer site was correctly identified ranged from 77 percent for cervical cancer to 97 percent for breast cancer. For lung and breast cancer, the positive predictive value was nearly as high for HDF-only cases as for all HDF cases: 97 percent versus 98 percent for breast cancer and 94 percent versus 96 percent for lung cancer. Because the registry does not require reporting of in-situ cancers, other than cervical cancer, we examined the specific ICD-9-CM codes for the HDF-only cases to ascertain whether these were primarily in-situ cancers that would not be of interest to the registry. Of the 559 cases, 64 cervical cancers, eight breast cancers, and one colorectal cancer were in-situ cancers.


View this table:
[in this window]
[in a new window]
 
TABLE 3. Assessment of the positive predictive value of cancer site identification through inpatient claims-based validation of cancer cases identified only through the Virginia hospital discharge file, 1995
 
In evaluating the false-positive cancers, many were true cancers but the site was a site other than that identified in the HDF file. The false-positive rate for inaccurate identification of the cancer site ranged from 1 percent to 10 percent. The contribution of these cases to the total false-positive rate ranged from 38 percent to 68 percent, according to cancer site. The category of false positives for which no cancer was confirmed in the medical record ranged from a low of just over 2 percent for breast cancer to 13 percent for cervical cancer.

Further detailed analyses of these false-positive cases were performed. Of those cancers reported as cervical (n = 11), colorectal (n = 3), and prostate (n = 4) cancers by the discharge data, the cancer was validated as cancer of an adjacent anatomic site. In addition, there were two cases in which the claims identified the cancer as lung cancer but the cancer was validated as having metastasized from another site to the lung. In other instances, when no cancer was confirmed by the medical record, the patient had benign disease such as cervical intraepithelial neoplasia I or II (n = 2) or colon polyps (n = 2).

We assessed cases identified only by the HDF to verify that they were incident cancers. This analysis was based on the year of diagnosis reported in the gold standard. The results from the validation for incidence are reported in table 4. For this analysis, HDF-only cases for which the site was correctly identified were used (n = 493). Between 32 percent and 60 percent of these cases were verified as incident cases in 1995 by cancer site. The remaining patients either had prevalent cases based on the gold standard in 1994 or earlier or were stated to have a "history of cancer" in the gold standard. Additional cases (n = 26) either did not have a date of diagnosis in the medical record or were stated to have a clinical diagnosis only. These were categorized as "date of diagnosis ambiguous."


View this table:
[in this window]
[in a new window]
 
TABLE 4. Results of inpatient validation assessing incidence among cancer cases captured only by the Virginia hospital discharge file, 1995
 
Table 5 contains the results of the logistic regression analysis identifying factors for potentially targeting hospitals to improve registry completeness. Patients whose cases were captured only by the HDF were more likely to be seen at a non-ACOS-certified hospital (odds ratios ranged from 2.9 to 5.1 by site) and were more likely to be admitted with a principal diagnosis other than cancer (odds ratios ranged from 2.1 to 5.2 by site). Patients admitted for cancer-specific surgery were more likely to have their cases captured by both the HDF and the registry. Increasing age was also associated with having a case captured only by the HDF, except in patients with cervical cancer. However, the number of elderly women with cervical cancer was low overall, which may have statistically limited our ability to identify differences. For colorectal and lung cancers, women were more likely to have their cases reported only by the HDF. These associations were true for the larger sample, as shown in table 5, and were also true when the regression analysis was conducted in the sample with only validated incident cases.


View this table:
[in this window]
[in a new window]
 
TABLE 5. Odds ratios for factors predicting ascertainment of cancer cases through the Virginia hospital discharge file only versus both sources (hospital discharge file plus the Virginia Cancer Registry), by cancer site, 1995
 

    DISCUSSION
 TOP
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS
 DISCUSSION
 REFERENCES
 
In this paper, we have described the potential utility of using HDFs for increasing the ascertainment of cancers reportable to a central registry. Claims data have been evaluated for their utility as a source of epidemiologic or health services information on cancer patients (7, 17, 20, 2629). The utility of these sources to independently enhance completeness of case reporting for a central registry has not been evaluated. To our knowledge, this is the first study that assessed the utility of secondary data to supplement a cancer registry across multiple sites and that used a source not limited to a particular insured population or age group. This study verified cases through comparison with an independent gold standard, medical record review.

The results of this study suggest that neither a central cancer registry nor an HDF may be sufficient independently as a source for complete capture of incident breast, cervical, colorectal, lung, and prostate cancer cases. Both the HDF and the registry data were highly valid in correctly identifying the cancer site. The positive predictive value for accurately identifying a cancer case was well over 90 percent for each source, yet both sources were missing substantial proportions of cases.

The variations observed in the proportion of cases captured by the registry as compared with the HDF were dissimilar. When the registry capture rates were relatively low (e.g., lung cancer), the HDF capture rates were relatively high. In some cases, these differences were reversed. For prostate and cervical cancers, where outpatient diagnosis and treatment may be more standard, the hospital discharge claims may be less complete in capturing cases, since they represent only inpatient stays. These variations are consistent with those previously identified using claims, death certificate, and registry data (8). Cancer registries are designed to capture cancer cases regardless of site, and the ability of registry personnel to locate pathology reports, surgical logs, and hospital discharge diagnosis reports is probably the deciding factor. Conversely, the HDF is composed of inpatient discharge abstracts from administrative claims, largely designed to document health care service delivery for billing purposes (15). Cautious use of the two sources in combination may be an effective means of enhancing traditional methods of cancer surveillance. The effectiveness of using claims data in combination with and to complement a registry is consistent with prior recommendations (33).

Although HDFs are limited to inpatient data, there are benefits that make them uniquely valuable. These files are now available in over 40 states. They are more accessible to central cancer registries than other secondary data sources such as Medicare or Medicaid data (34). Unlike other secondary claims sources such as Medicare, they typically represent the entire population of persons with inpatient stays for that state, regardless of age or insurance status. Thus, discharge files can be a valuable source of clinical information on the uninsured and the nonelderly population (11).

Several studies have assessed the use of either hospital claims data or insurance claims data as compared with registries for ascertainment of selected diseases, including cancer, with varying positive predictive values depending on the definitions used and the disease under evaluation (17, 2629, 3537). It has been suggested that using these data with due care can provide useful epidemiologic information either for identifying cases or for adjusting incidence estimates. Some investigators have suggested using these data in conjunction with a registry or another primary data source to enhance or complement the data captured by the registry (8, 11, 33). The results of the validation study reported here support the accuracy of the claims for use in identification of potential cancer cases. In our study, the positive predictive value for the HDF-identified cases in accurately capturing a cancer’s correct site was high: 88 percent.

The HDF may be less valuable in its ability to correctly differentiate incident cases from prevalent cases. Of unique interest is the component of the validation that provides information on whether cases captured only by the HDF are incident cancers. The estimated percentage increase for incident cancer cases captured only by the HDF is 11 percent. This proportionate gain is greater than anticipated given the hospital-based surveillance used by registries and the mandatory reporting of incident cancers by hospitals in Virginia in 1995. This increment may be lower in states with higher rates of case ascertainment. It may be reduced more recently, even in Virginia, where ascertainment rates have increased substantially from 81.2 percent in 1992–1997 to 88.6 percent in 1995–1999 through the initiation of active case-finding methods.

The gain for nonincident but potentially accessionable cases was higher: 21 percent. Although these additional cases were validated as prevalent in 1994 or prior, none were reported to the registry subsequent to 1990. Therefore, even though they were not incident cases, those cases may have been accessionable in 1995. Ongoing, prospective linkage with the HDF might also have identified these cases at the time of initial diagnosis if it involved an inpatient cancer admission, thus moving these prevalent cases into the incident category.

The analysis employing logistic regression to identify factors that could be used to improve the accuracy and completeness of the registry is important, because it identifies characteristics of underreporting hospitals. Characteristics of cases that were not captured by the registry included admission to a non-ACOS-certified hospital; this was a very important factor in predicting whether a case would be reported. This was consistent across all five cancers. Focusing on improving reporting from those hospitals is likely to result in enhanced cancer surveillance (30).

It is unlikely that most cancer registries would simply accept a case reported from claims data without some further validation. At a minimum, this might entail confirmation from the "reporting" hospital. This confirmation would reduce both the risk of reporting prevalent cases and the risk of capturing as incident cancers false-positive cases that had a cancer ICD-9-CM code on the discharge diagnosis list. A simple potential intervention aimed at these hospitals could include using the supplemental data from hospitals in the HDF to enhance ascertainment of cases. Although not without some cost, verification of an HDF potential case could be relatively simple. It would probably be much less expensive than other options, such as training medical records personnel or hiring "circuit rider" abstractors to go to those hospitals for independent case finding. Verification of HDF cases may be no less accurate than more traditional methods as well, since use of "circuit riders" has been shown to be associated with poorer data quality (5).

This study suggests that using HDFs to supplement a central cancer registry may be a valuable and relatively efficient method of enhancing cancer surveillance, particularly for those hospitals with lower rates of reporting completeness. For registries with higher rates of completeness, the benefit of supplementation with an HDF would probably be less. There is variation by cancer site in both the potential incremental gain and the accuracy in detecting incident cases. However, given the low cost of obtaining and using these files and their ready accessibility and ease of use, supplementation with hospital discharge data is likely to provide a moderately cost-effective and accurate method of supplementing cancer reporting.


    ACKNOWLEDGMENTS
 
This project was performed with the support of grant CA 71533 from the National Cancer Institute.

The authors acknowledge Virginia Health Information Systems for assistance with record linkage and for providing the data.


    NOTES
 
Correspondence to Dr. Lynne Penberthy, Division of Quality Health Care, Medical College of Virginia, Virginia Commonwealth University, 1200 East Broad Street, Richmond, VA 23298-0306 (e-mail: lpenbert{at}mail2.vcu.edu). Back


    REFERENCES
 TOP
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS
 DISCUSSION
 REFERENCES
 

  1. Division of Cancer Prevention and Control, Centers for Disease Control and Prevention. Cancer registries: the foundation for comprehensive cancer control. Atlanta, GA: National Center for Chronic Disease Prevention and Health Promotion, 2001. (World Wide Web URL: http://www.cdc.gov/cancer/). (Accessed August 30, 2001).
  2. Hankey BF, Ries LA, Edwards BK. The Surveillance, Epidemiology, and End Results Program: a national resource. Cancer Epidemiol Biomarkers Prev 1999;8:1117–21.[Free Full Text]
  3. Cancer Statistics Branch, Surveillance Research Program, Division of Cancer Control and Population Sciences, National Cancer Institute. SEER registries: about SEER. Bethesda, MD: National Cancer Institute, 2003. (Wide Web URL: http://seer.cancer.gov/about).
  4. Nattinger AB, McAuliffe TL, Schapira MM. Generalizability of the Surveillance, Epidemiology, and End Results registry population: factors relevant to epidemiologic and health care research. J Clin Epidemiol 1997;50:939–45.[CrossRef][ISI][Medline]
  5. Zippin C, Lum D. Study of completeness of the Surveillance, Epidemiology and End Results (SEER) Program case ascertainment by hospital size and casefinding source. Health Rep 1993;5:87–90.[Medline]
  6. Fanning J, Gangestad A, Andrews SJ. National Cancer Data Base/Surveillance Epidemiology and End Results: potential insensitive-measure bias. Gynecol Oncol 2000;77:450–3.[CrossRef][ISI][Medline]
  7. McClish DK, Penberthy L, Whittemore M, et al. Ability of Medicare claims data and cancer registries to identify cancer cases and treatment. Am J Epidemiol 1997;145:227–33.[Abstract]
  8. Stang A, Glynn RJ, Gann PH, et al. Cancer occurrence in the elderly: agreement between three major data sources. Ann Epidemiol 1999;9:60–7.[CrossRef][ISI][Medline]
  9. Potosky AL, Riley GF, Lubitz JD, et al. Potential for cancer related health services research using a linked Medicare-tumor registry database. Med Care 1993;31:732–48.[ISI][Medline]
  10. Doebbeling BN, Wyant DK, McCoy KD, et al. Linked insurance-tumor registry database for health services research. Med Care 1999;37:1105–15.[CrossRef][ISI][Medline]
  11. Brooks JM, Chrischilles E, Scott S, et al. Information gained from linking SEER cancer registry data to state-level hospital discharge abstracts. Surveillance, Epidemiology, and End Results. Med Care 2000;38:1131–40.[CrossRef][ISI][Medline]
  12. Du X, Freeman JL, Goodwin JS. Information on radiation treatment in patients with breast cancer: the advantages of the linked Medicare and SEER data. J Clin Epidemiol 1999;52:463–70.[CrossRef][ISI][Medline]
  13. Du X, Goodwin JS. Patterns of use of chemotherapy for breast cancer in older women: findings from Medicare claims data. J Clin Oncol 2001;19:1455–61.[Abstract/Free Full Text]
  14. Iezzoni L. Assessing quality using administrative data. Ann Intern Med 1997;127:666–74.[Abstract/Free Full Text]
  15. Retchin SM, Ballard DJ. Establishing standards for the utility of administrative claims data. Health Serv Res 1998;32:119–24.
  16. Roos LL, Walld R, Wajda A, et al. Record linkage strategies, outpatient procedures, and administrative data. Med Care 1996;34:570–82.[CrossRef][ISI][Medline]
  17. Cooper GS, Yuan Z, Stange KC, et al. The sensitivity of Medicare claims data for case ascertainment of six common cancers. Med Care 1999;37:436–44.[CrossRef][ISI][Medline]
  18. Du X, Freeman JL, Warren JL, et al. Accuracy and completeness of Medicare claims data for surgical treatment of breast cancer. Med Care 2000;38:719–27.[CrossRef][ISI][Medline]
  19. Hillner BE, Penberthy L, Desch CE, et al. Variation in staging and treatment of local and regional breast cancer in the elderly. Breast Cancer Res Treat 1996;40:75–86.[ISI][Medline]
  20. Warren JL, Feuer E, Potosky AL, et al. Use of Medicare hospital and physician data to assess breast cancer incidence. Med Care 1999;37:445–56.[CrossRef][ISI][Medline]
  21. Whittle J, Steinberg E, Anderson G, et al. Accuracy of Medicare claims data for estimation of cancer incidence rates and resection among elderly Americans. Med Care 1991;29:126–36.
  22. McBean A, Babish J, Warren J. Determination of lung cancer incidence in the elderly using Medicare claims data. Am J Epidemiol 1993;137:226–34.[Abstract]
  23. McGeechan K, Kricker A, Armstrong B, et al. Evaluation of linked cancer registry and hospital records of breast cancer. Aust N Z J Public Health 1998;22:765–70.[ISI][Medline]
  24. Pinfold SP, Goel V, Sawka C. Quality of hospital discharge and physician data for type of breast cancer surgery. Med Care 2000;38:99–107.[CrossRef][ISI][Medline]
  25. Cooper GS, Yuan Z, Stange KC, et al. Agreement of Medicare claims and tumor registry data for assessment of cancer-related treatment. Med Care 2000;38:411–21.[CrossRef][ISI][Medline]
  26. Freeman JL, Zhang D, Freeman DH, et al. An approach to identifying incident breast cancer cases using Medicare claims data. J Clin Epidemiol 2000;53:605–14.[CrossRef][ISI][Medline]
  27. Leung K, Hasan A, Rees K, et al. Patients with newly diagnosed carcinoma of the breast: validation of a claim-based identification algorithm. J Clin Epidemiol 1999;52:57–64.[CrossRef][ISI][Medline]
  28. Solin L, Legorreta A, Schultz D, et al. Analysis of a claims database for the identification of patients with carcinoma of the breast. J Med Syst 1994;18:23–32.[ISI][Medline]
  29. Warren JL, Riley GF, McBean AM, et al. Use of Medicare data to identify incident breast cancer cases. Health Care Financ Rev 1996;18:237–46.[ISI][Medline]
  30. Zippin C, Lum D, Hankey B. Completeness of hospital cancer case reporting from the SEER Program of the National Cancer Institute. Cancer 1995;76:2343–50.[ISI][Medline]
  31. Virginia Department of Health. Regulations for disease reporting and control: sections relevant to cancer surveillance. 12 VAC Authority 5-90: pp 150–180, 2001. Richmond, VA: Virginia Department of Health, 2001.
  32. Virginia Department of Health. Virginia Cancer Registry. Richmond, VA: Virginia Department of Health, 2000. (World Wide Web URL: http://www.vdh.state.va.us/epi/cancer/about.asp).
  33. Newschaffer C, Bush T, Penberthy L. Comorbidity measurement in elderly female breast cancer patients with administrative and medical records data. J Clin Epidemiol 1997;50:725–33.[CrossRef][ISI][Medline]
  34. Office of Health Care Information, Agency for Healthcare Research and Quality, US Department of Health and Human Services. Healthcare cost and utilization project (HCUP). Rockville, MD: Agency for Healthcare Research and Quality, 2003. (World Wide Web URL: http://www.ahrq.gov/data/hcup/).
  35. Ellekjaer H, Holmen J, Kruger O, et al. Identification of incident stroke in Norway: hospital discharge data compared with a population-based stroke register. Stroke 1999;30:56–60.[Abstract/Free Full Text]
  36. Mahonen M, Saloma V, Brommels M, et al. The validity of hospital discharge register data on coronary heart disease in Finland. Eur J Epidemiol 1997;13:403–15.[CrossRef][ISI][Medline]
  37. Solin L, Hanchak N, Schultz D, et al. Evaluation of an algorithm to identify women with carcinoma of the breast. J Med Syst 1997;21:189–99.[CrossRef][ISI][Medline]