1 Department of Internal Medicine, Medical College of Virginia, Virginia Commonwealth University, Richmond, VA.
2 Department of Biostatistics, Medical College of Virginia, Virginia Commonwealth University, Richmond, VA.
3 Virginia Cancer Registry, Division of Surveillance and Epidemiology, Virginia Department of Health, Richmond, VA.
Received for publication April 10, 2002; accepted for publication August 28, 2002.
![]() |
ABSTRACT |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
hospital records; neoplasms; population surveillance; registries
Abbreviations: Abbreviations: ACOS, American College of Surgeons; HDF, hospital discharge file; ICD-9-CM, International Classification of Diseases, Ninth Revision, Clinical Modification; VCR, Virginia Cancer Registry.
![]() |
INTRODUCTION |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
The use of secondary databases, including Medicare, Medicaid, or Blue Cross/Blue Shield data or statewide hospital discharge files (HDFs), has been suggested to augment the capture of incident cases and treatment (917, 23, 24). For the populations these databases represent, capture rates are high in comparison with central cancer registries (7, 8, 17, 2022, 25, 26). Several studies have specifically evaluated the validity of claims data using medical records in a health maintenance organization as a gold standard in identifying incident breast cancers (27, 28). The positive predictive values reported in those studies were high, ranging from 83 percent to 96 percent. Other studies have assessed the accuracy in incidence reporting using cancer registries as the gold standard (7, 25, 26, 29). The study reported here was specifically designed both to evaluate accuracy and to determine whether a secondary data source, hospital discharge data, can enhance the capture of incident cancers for a central cancer registry. Validation of diagnoses and diagnosis dates was based on the quality control model from the Surveillance, Epidemiology, and End Results Program and used the inpatient medical record as the gold standard (3, 5, 30).
![]() |
MATERIALS AND METHODS |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Data sources
Virginia Cancer Registry
The VCR has been population-based for incident cancers since 1990. For the 50 percent of hospitals certified by the American College of Surgeons (ACOS), data are likely to be more completely reported. For non-ACOS facilities, the same data are less consistently available (31). State regulations have required reporting from all hospital, laboratory, and other medical facilities since 1990, although active surveillance did not begin in free-standing radiation treatment and surgery centers until 1997 (31, 32). Despite these regulations, estimates of completeness based on the predictive models of the North American Association of Central Cancer Registries during the study period ranged from 85 percent to 92 percent (32). Therefore, any potential method for supplementing case reporting is important. Cancers evaluated in this study included breast, prostate, colon and rectal, cervical, and lung cancers.
Virginia HDF
The Virginia HDF is a statewide HDF that has electronically collected the universal billing (UB92) forms from all acute-care hospitals in Virginia since 1994. By 1995, over 91 percent of patients had a Social Security number included as a unique identifier on their universal billing form. Cancer cases were identified on the basis of International Classification of Diseases, Ninth Revision, Clinical Modification (ICD-9-CM) codes in up to any of the 10 possible positions on the claim form for breast (code 174), prostate (code 165), colon and rectal (codes 153 and 154), cervical (code 180), and lung (codes 162 and 163) cancer. Because we identified a substantial number of cancers reported as carcinoma in situ in the registry data, these codes were included in the study. They included codes for cancer of the breast (code 2330), colon/rectum (codes 2330323306), cervix (code 2331), lung (code 23312), and prostate (code 2334). The date of diagnosis for HDF cases was defined as the date of first admission in 1995 with a diagnosis code in any position for one of the five cancer sites.
Matching of VCR and HDF cases
The case-finding abilities of the VCR and HDF were compared by matching cases from the two files on the basis of Social Security numbers. These matches were confirmed with the patients date of birth and gender. If Social Security information was missing (9.8 percent of patients with cancer admission in 1995) or there was no match based on the Social Security number, a match was performed between the VCR and the HDF based on last name, first initial, and date of birth. For those patients who were matched on the latter criteria, a review was carried out by analysis personnel as a final check to determine whether the two patients were the same person, using additional confirming factors such as first name, middle initial, medical record number, and dates of admission or diagnosis. Potential errors in matching may have occurred because of erroneous Social Security numbers, incorrectly spelled names, or names missing from the HDF file.
Population
The population for this study included all cases of breast, cervical, colorectal, lung, and prostate cancer identified by either the registry, the HDF, or both in 1995. The index year 1995 was selected for this study because it was the most recent year permitting evaluation of prevalent cases in the HDF during the year prior to the initial diagnosis. Prevalent cases uniquely identified by the HDF were excluded if they had a prior diagnosis either in the registry in 19901994 or in the HDF in 1994.
Sample for validation
A random sample of 2,625 cases was selected for validation. This number provided sufficient power to assess validity for reporting of treatment and incidence by cancer site. Validation consisted of review and abstraction of selected data elements from all inpatient admissions of each case-patient for 1995. Medical records were available and data on the cases were abstracted for 82 percent of these patients. Prior to analysis, 13 patients were deleted because the abstraction was incomplete. Three cases were deleted because we were not able to verify patient identification due to coding errors such as an incorrect Social Security number or name. Three males with breast cancer were deleted. A validated case was defined as complete and included in the gold standard for the validation sample if all hospital admissions in 1995 were abstracted and data fields were complete. Cases not meeting this requirement were deleted (n = 108); this left 2,025 complete cases to serve as the gold standard in the validation sample.
Abstraction was performed by a trained nurse abstractor or cancer registrar. Items to be validated included: patient identifying and demographic information, cancer site, dates of initial diagnosis and treatment, and the names of physicians providing care for the patient.
Analysis
Analysis was done using PC SAS, version 8.0 (SAS Institute, Inc., Cary, North Carolina). The numbers of unique cases from each of the two data sources and the number of cases common to both data sources were counted to determine the total number of unique cases reported. From this information, the percentage of total cases captured by each data source was calculated. Cases uniquely reported from the HDF estimate the potential added benefit of the HDF data to the VCR.
Estimates of the accuracy of the HDF in capturing incident cases were calculated as the positive predictive value of the HDF. The positive predictive value estimates the ability of the claims data (HDF) to distinguish true-positive cases from false-positive cases. In this instance, true-positive cases are those cases identified as cancer by both the medical record (the "gold standard") and the HDF. False-positive cases are those cases identified as cancer by the HDF but not verified by the gold standard. The HDF subset of the 2,025 cases in the validation study was initially used to estimate accuracy related to ascertainment of cancer site. This represents the positive predictive value of using ICD-9-CM codes in the HDF file to correctly identify the site. Validation of the year of diagnosis for these cases provided a measure of the accuracy of the HDF in detecting incident cases. False-positive cases captured as incident cases only by the HDF and not reported to the registry (as either incident or prevalent in 19901994) were reviewed.
Logistic regression analysis was performed to identify factors that would enhance the accuracy and completeness of the cancer registry data in identifying additional cases from the HDF. The model is written as follows:
logit(HDF-only) = B0 + B1 age + B2 gender + B3 ACOS + B4 principal position + B5 cancer surgery + .
The dependent variable for this model is logit(HDF-only) = log(probability that a case is captured by HDF only)/(probability that a case is captured by HDF and registry). Independent variables included age (in years), gender, hospital ACOS cancer-center certification status, position of the ICD-9-CM code on the claim, and cancer-specific surgery during the index hospitalization.
![]() |
RESULTS |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
|
|
|
Further detailed analyses of these false-positive cases were performed. Of those cancers reported as cervical (n = 11), colorectal (n = 3), and prostate (n = 4) cancers by the discharge data, the cancer was validated as cancer of an adjacent anatomic site. In addition, there were two cases in which the claims identified the cancer as lung cancer but the cancer was validated as having metastasized from another site to the lung. In other instances, when no cancer was confirmed by the medical record, the patient had benign disease such as cervical intraepithelial neoplasia I or II (n = 2) or colon polyps (n = 2).
We assessed cases identified only by the HDF to verify that they were incident cancers. This analysis was based on the year of diagnosis reported in the gold standard. The results from the validation for incidence are reported in table 4. For this analysis, HDF-only cases for which the site was correctly identified were used (n = 493). Between 32 percent and 60 percent of these cases were verified as incident cases in 1995 by cancer site. The remaining patients either had prevalent cases based on the gold standard in 1994 or earlier or were stated to have a "history of cancer" in the gold standard. Additional cases (n = 26) either did not have a date of diagnosis in the medical record or were stated to have a clinical diagnosis only. These were categorized as "date of diagnosis ambiguous."
|
|
![]() |
DISCUSSION |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
The results of this study suggest that neither a central cancer registry nor an HDF may be sufficient independently as a source for complete capture of incident breast, cervical, colorectal, lung, and prostate cancer cases. Both the HDF and the registry data were highly valid in correctly identifying the cancer site. The positive predictive value for accurately identifying a cancer case was well over 90 percent for each source, yet both sources were missing substantial proportions of cases.
The variations observed in the proportion of cases captured by the registry as compared with the HDF were dissimilar. When the registry capture rates were relatively low (e.g., lung cancer), the HDF capture rates were relatively high. In some cases, these differences were reversed. For prostate and cervical cancers, where outpatient diagnosis and treatment may be more standard, the hospital discharge claims may be less complete in capturing cases, since they represent only inpatient stays. These variations are consistent with those previously identified using claims, death certificate, and registry data (8). Cancer registries are designed to capture cancer cases regardless of site, and the ability of registry personnel to locate pathology reports, surgical logs, and hospital discharge diagnosis reports is probably the deciding factor. Conversely, the HDF is composed of inpatient discharge abstracts from administrative claims, largely designed to document health care service delivery for billing purposes (15). Cautious use of the two sources in combination may be an effective means of enhancing traditional methods of cancer surveillance. The effectiveness of using claims data in combination with and to complement a registry is consistent with prior recommendations (33).
Although HDFs are limited to inpatient data, there are benefits that make them uniquely valuable. These files are now available in over 40 states. They are more accessible to central cancer registries than other secondary data sources such as Medicare or Medicaid data (34). Unlike other secondary claims sources such as Medicare, they typically represent the entire population of persons with inpatient stays for that state, regardless of age or insurance status. Thus, discharge files can be a valuable source of clinical information on the uninsured and the nonelderly population (11).
Several studies have assessed the use of either hospital claims data or insurance claims data as compared with registries for ascertainment of selected diseases, including cancer, with varying positive predictive values depending on the definitions used and the disease under evaluation (17, 2629, 3537). It has been suggested that using these data with due care can provide useful epidemiologic information either for identifying cases or for adjusting incidence estimates. Some investigators have suggested using these data in conjunction with a registry or another primary data source to enhance or complement the data captured by the registry (8, 11, 33). The results of the validation study reported here support the accuracy of the claims for use in identification of potential cancer cases. In our study, the positive predictive value for the HDF-identified cases in accurately capturing a cancers correct site was high: 88 percent.
The HDF may be less valuable in its ability to correctly differentiate incident cases from prevalent cases. Of unique interest is the component of the validation that provides information on whether cases captured only by the HDF are incident cancers. The estimated percentage increase for incident cancer cases captured only by the HDF is 11 percent. This proportionate gain is greater than anticipated given the hospital-based surveillance used by registries and the mandatory reporting of incident cancers by hospitals in Virginia in 1995. This increment may be lower in states with higher rates of case ascertainment. It may be reduced more recently, even in Virginia, where ascertainment rates have increased substantially from 81.2 percent in 19921997 to 88.6 percent in 19951999 through the initiation of active case-finding methods.
The gain for nonincident but potentially accessionable cases was higher: 21 percent. Although these additional cases were validated as prevalent in 1994 or prior, none were reported to the registry subsequent to 1990. Therefore, even though they were not incident cases, those cases may have been accessionable in 1995. Ongoing, prospective linkage with the HDF might also have identified these cases at the time of initial diagnosis if it involved an inpatient cancer admission, thus moving these prevalent cases into the incident category.
The analysis employing logistic regression to identify factors that could be used to improve the accuracy and completeness of the registry is important, because it identifies characteristics of underreporting hospitals. Characteristics of cases that were not captured by the registry included admission to a non-ACOS-certified hospital; this was a very important factor in predicting whether a case would be reported. This was consistent across all five cancers. Focusing on improving reporting from those hospitals is likely to result in enhanced cancer surveillance (30).
It is unlikely that most cancer registries would simply accept a case reported from claims data without some further validation. At a minimum, this might entail confirmation from the "reporting" hospital. This confirmation would reduce both the risk of reporting prevalent cases and the risk of capturing as incident cancers false-positive cases that had a cancer ICD-9-CM code on the discharge diagnosis list. A simple potential intervention aimed at these hospitals could include using the supplemental data from hospitals in the HDF to enhance ascertainment of cases. Although not without some cost, verification of an HDF potential case could be relatively simple. It would probably be much less expensive than other options, such as training medical records personnel or hiring "circuit rider" abstractors to go to those hospitals for independent case finding. Verification of HDF cases may be no less accurate than more traditional methods as well, since use of "circuit riders" has been shown to be associated with poorer data quality (5).
This study suggests that using HDFs to supplement a central cancer registry may be a valuable and relatively efficient method of enhancing cancer surveillance, particularly for those hospitals with lower rates of reporting completeness. For registries with higher rates of completeness, the benefit of supplementation with an HDF would probably be less. There is variation by cancer site in both the potential incremental gain and the accuracy in detecting incident cases. However, given the low cost of obtaining and using these files and their ready accessibility and ease of use, supplementation with hospital discharge data is likely to provide a moderately cost-effective and accurate method of supplementing cancer reporting.
![]() |
ACKNOWLEDGMENTS |
---|
The authors acknowledge Virginia Health Information Systems for assistance with record linkage and for providing the data.
![]() |
NOTES |
---|
![]() |
REFERENCES |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|