RE: "INTERRATER RELIABILITY: COMPLETING THE METHODS DESCRIPTION IN MEDICAL RECORDS REVIEW STUDIES"

Kazim Sheikh

US Department of Health and Human Services, Centers for Medicare & Medicaid Services, Kansas City, MO 64106

In a Journal article, Yawn and Wollan (1Go) reported interrater agreement to determine the reliability of data abstracted from the ambulatory care and inpatient medical records of 1,200 patients with heart disease. Six to nine nurses had abstracted three categories of data from the same medical records at five intervals over 2 years. The data abstraction procedure involved the nurses making a provisional diagnosis of heart disease based on the abstracted data. The authors found "very good to excellent" agreement.

They declared that there were no standard methods for assessing interrater reliability (1Go). Among several authors, Fleiss (2Go) has described several methods of measuring interrater agreement. Contrasting observed with expected agreement and correcting or adjusting for chance-expected agreement was a common factor in these methods (2Go). The kappa statistic (3Go) or a similar interclass correlation coefficient is generally accepted as a quantitative measure of interrater agreement. Medical records are commonly used in medical care or outcomes research, but, unlike in Yawn and Wollan's study (1Go), the data abstractors are not required to interpret symptoms and other data and make diagnoses. The abstractors usually copy the diagnoses recorded by the attending physicians, or the researchers use standard algorithms or indices to convert abstracted data elements into diagnoses (e.g., Ellerbeck et al. (4Go), Oei et al. (5Go)), thereby avoiding misclassification.

Surprisingly, the authors chose not to use the kappa statistic to assess interrater agreement because it was irrelevant to their study (1Go). All raters in their study were well-qualified nurses familiar with the medical records, the terminology used therein, and the data elements abstracted. For these reasons, they were expected to agree with each other (6Go). Furthermore, agreement between two or more raters may occur simply by chance, and there has to be a way of separating observed from expected-chance agreement (7Go). The kappa statistic corrects or adjusts for this expected-chance agreement. Because of the multiple ratings per subject, a weighted average of kappa or some other formula, such as those described by Fleiss (8Go), was necessary in Yawn and Wollan's study (1Go). Consequently, they could not tell how much of the observed "excellent" agreement was due to expected-chance agreement.

ACKNOWLEDGMENTS

The views expressed in this letter do not represent the views and policies of the Centers for Medicare & Medicaid Services or the United States.

Conflict of interest: none declared.

References

  1. Yawn BP, Wollan P. Interrater reliability: completing the methods description in medical records review studies. Am J Epidemiol 2005;161:974–7.[Abstract/Free Full Text]
  2. Fleiss JL, ed. Statistical methods for rates and proportions: the measurement of interrater agreement. New York, NY: John Wiley & Sons, 1981:212–36.
  3. Landis JR, Koch GG. The measurement of observer agreement for categorical data. Biometrics 1977;33:159–74.[ISI][Medline]
  4. Ellerbeck EF, Jencks SF, Radford MJ, et al. Quality of care for Medicare patients with acute myocardial infarction. JAMA 1995;273:1509–14.[Abstract]
  5. Oei HH, Vliegenthart R, Deckers JW, et al. The association of Rose questionnaire angina pectoris and coronary calcification in a general population: the Rotterdam Coronary Calcification Study. Ann Epidemiol 2004;14:431–6.[CrossRef][ISI][Medline]
  6. Sheikh K. Disability scales: assessment of reliability. Arch Phys Med Rehabil 1986;67:245–9.[ISI][Medline]
  7. Rogot E, Goldberg ID. A proposed index for measuring agreement in test-retest studies. J Chronic Dis 1966;19:991–1006.[CrossRef][ISI][Medline]
  8. Fleiss JL. Measuring nominal scale agreement among many raters. Psychol Bull 1971;76:378–82.




This Article
Extract
Full Text (PDF)
All Versions of this Article:
162/9/919    most recent
kwi287v1
Alert me when this article is cited
Alert me if a correction is posted
Services
Email this article to a friend
Similar articles in this journal
Similar articles in PubMed
Alert me to new issues of the journal
Add to My Personal Archive
Download to citation manager
Disclaimer
Request Permissions
Google Scholar
Articles by Sheikh, K.
PubMed
PubMed Citation
Articles by Sheikh, K.