Larsen scoring of digitized X-ray images

C. Solymossy, J. Dixey, M. Utley1, S. Gallivan1, A. Young2, N. Cox3, P. Davies4, P. Emery5, A. Gough6, D. James7, P. Prouse8, P. Williams9 and J. Winfield10

Robert Jones and Agnes Hunt Orthopaedic Hospital, Oswestry,
1 Clinical Operational Research Unit, University College London,
2 St Albans City Hospital, St Albans,
3 Royal Hampshire County Hospital, Winchester,
4 Broomfield Hospital, Chelmsford,
5 Rheumatology Research Unit, University of Leeds, Leeds,
6 Harrogate Hospital, Harrogate,
7 Grimsby Hospital, Grimsby,
8 Basingstoke District Hospital, Basingstoke,
Medway Hospital, Gillingham and
10 Royal Hallamshire Hospital, Sheffield, UK

Correspondence to: J. J. Dixey, Robert Jones and Agnes Hunt Orthopaedic and District Hospital NHS Trust, Oswestry, Shropshire SY10 7AG, UK.


    Abstract
 Top
 Abstract
 Introduction
 Patients and methods
 Results
 Discussion
 References
 
Objective.To determine how Larsen scores from digitized X-rays compare to those from film originals.

Methods.A hundred sets of radiographs of patients recruited with early rheumatoid arthritis (RA) were assessed using the Larsen scoring system. Digitized copies of these sets were then viewed on a computer screen and scored according to Larsen in a random order. The quality of the digitized image was also recorded. For each set of X-rays, the signed difference between the score from film and the score from the digitized images was calculated.

Results.A total of 95% of the digitized X-ray sets were scored successfully; 5% were not scored due to the images being unreadable. The mean difference between the two sets of scores was -1.2 (95% CI [-2.06, -0.37]). There was no trend in the difference with respect to the mean of the two scores (P>0.1).

Conclusion.The Larsen scoring of digitized X-ray images has been validated.

KEY WORDS: Early RA, Erosive disease, Digitized images, Assessment of X-rays


    Introduction
 Top
 Abstract
 Introduction
 Patients and methods
 Results
 Discussion
 References
 
The analysis of X-rays as a measure of outcome in rheumatoid arthritis (RA) is accepted as standard practice. The Early Rheumatoid Arthritis Study (ERAS) is a multicentre study with a cohort of over 1000 patients recruited with early RA. One important aspect of this study is the collection and subsequent analysis of hands and feet X-rays taken annually.

In an attempt to facilitate X-ray storage, retrieval and subsequent analysis, the film originals have been digitized and stored electronically using inexpensive scanning technology. The purpose of this study is to ensure that X-rays scored with the Larsen method from an electronic image are equivalent to those scored from the film original.


    Patients and methods
 Top
 Abstract
 Introduction
 Patients and methods
 Results
 Discussion
 References
 
Consecutive patients thought to have RA by their consultant rheumatologist were recruited from routine out-patient clinics of nine rheumatology departments if symptoms of RA had lasted <2 yr and disease-modifying anti-rheumatic drugs (DMARDs) had not yet been used.

The digitization of the X-ray sets was carried out at the Robert Jones and Agnes Hunt Orthopaedic Hospital in Oswestry, UK. A Hewlett Packard Scanjet 4c flatbed scanner equipped with a Hewlett Packard Transparency Adaptor C2521B was used to create the 8 bit (256 grey levels) digitized images at a resolution of 75 pixels per inch. The images were stored with the tagged image file format (TIFF). Checks ensured that the images had the correct orientation.

For the current study, 100 ERAS patients were selected at random from those whose X-rays had been scanned in and not yet returned to the relevant centre. This included X-rays from four different centres. For the selected patients, a single set of hands and feet X-rays was selected at random from the set of films available. The randomization ensured that a suitable range of erosive severity was represented by the sample.

The sets of hands and feet X-rays were scored by a single medically qualified researcher (CS). Scoring was according to the Larsen [1] scoring system with a score (0–5) assigned to each of the proximal interphalangeal (PIP) and metacarpophalangeal (MCP) joints, the first interphalangeal (IP) joints, the second to fifth metatarsophalangeal (MTP) joints and to each wrist, with the wrist scores being multiplied by five when constructing the total Larsen score. This gave a total Larsen score in the range 0–200.

During the scoring process, the 100 sets of X-ray films were examined and the scores for each joint recorded on a standard proforma. Scoring was restricted to sessions lasting 2 h with a break of at least 15 min between sessions. The sets of films were scored at a rate of 10–15 sets per session.

After a gap of a day, the 100 sets of corresponding digitized images were scored in a different, randomized, order according to a pre-specified protocol. For this process, digitized images of the X-rays were viewed on a 21'' computer screen with the screen resolution set at 1024x768. The digitized images were viewed at the same size as the film originals. The quality of each set of images was judged to be `excellent', `fair', `poor' or `unreadable' by the scorer. If a digitized image was deemed `unreadable', then the study protocol prescribed that that set was not scored. Software had been prepared to allow the display of an onscreen score pad superimposed on the X-ray images, without obscuring the joints. Using the computer mouse and key pad, this allowed the scorer to enter details about the erosion status of each individual joint. As with the sets of films, scoring was restricted to sessions lasting 2 h, separated by a break of at least 15 min. Sets of digitized images were scored at approximately the same rate as film sets, i.e. 10–15 sets per 2 h session.

Computerized manipulation of the digitized image was restricted by the protocol to adjusting the brightness and contrast of the image, and rotation such that the images had the standard orientation (fingers/toes pointing upwards). For some images, any adjustment of brightness or contrast was performed when the X-rays were originally scanned in. For the rest, this adjustment was performed at the time of scoring and a copy of the adjusted image file made. None of the images were adjusted at both the scanning and scoring stages.

Statistical analysis
A simple scatter plot of the Larsen score obtained from the standard films plotted against the Larsen score obtained from the digitized images was produced and the correlation factor, R, calculated. Although giving some impression of the correspondence between the two measurement methods, correlation analysis is recognized as being of limited value for comparing different methods of measuring the same quantity, particularly where the possible values have both lower and upper bounds. High R values are almost inevitable in such circumstances and there is a strong danger of interpreting such values too optimistically.

More useful information is obtained by using a graphical technique suggested by Bland and Altman [2]. For the cases where both the plain film score and the digitized image score were available, the difference between the scores was plotted against the mean of the two scores (see Fig. 2Go). This gives what is referred to as a Bland–Altman plot [2]. Displaying information in this way allows several properties of the measurement methods to be examined. Typically, for two unbiased measurement methods, the Bland–Altman plot takes the form of a uniform scatter of points symmetrically distributed around the horizontal axis. The overall vertical dispersion of the scatter of points reflects how closely the two measures agree. It is common to display horizontal lines showing the limits corresponding to 2 S.D. since these lines bracket ~95% of the scatter of points.



View larger version (11K):
[in this window]
[in a new window]
 
FIG. 2.  Bland–Altman plot of difference vs mean for paired Larsen scores.

 
Overall systematic differences between the two measures are reflected by an overall displacement of the scatter of points so that it has a tendency to be either above the horizontal axis or below. Confidence intervals for this were calculated using standard methods based on the paired t-test [3].

Another form of systematic bias can occur whereby the scatter of points is above the horizontal axis for low values and below for higher values (or vice versa). Linear regression was carried out and confidence limits on the slope of the regression line were calculated to examine any such trend [4].


    Results
 Top
 Abstract
 Introduction
 Patients and methods
 Results
 Discussion
 References
 
Of the 100 patients selected, 69 (69%) were female and 31 (31%) were male. Median age at onset of disease was 51 yr (interquartile range 40–60) and the median duration of symptoms before entry into the study was 6 months (interquartile range 4–9). The median duration of disease at the time the X-rays were taken (duration of symptoms prior to entry plus follow-up) was 28.5 months (interquartile range 17–47).

The mean of the 100 scores obtained from Larsen scoring of the original X-ray film was 25 (median 16.5, interquartile range 6–39).

Five (5%) of the digitized sets of images were judged to be unreadable and hence were not scored. A further 2 (2%) of the images were deemed to be of `poor' quality, but were scored nonetheless. Seventy-six (76%) sets of images were judged to be `excellent' and the remaining 17 (17%) were classed as `fair'.

Of the 100 sets of hands and feet X-rays studied, it was possible to compare 95 pairs of Larsen scores obtained from hard copy film and from the digitized image. For these 95 pairs, Fig. 1Go shows a scatter plot of the Larsen score assigned to the original X-ray film plotted against the Larsen score assigned to the digitized image. There is a high degree of correlation with an R of 0.97.



View larger version (9K):
[in this window]
[in a new window]
 
FIG. 1.  Scatter plot of Larsen(film) vs Larsen(digitized).

 
Examining the data in greater detail, Fig. 2Go shows a Bland–Altman plot of the difference between scores plotted against their mean. The two dotted horizontal lines (±2 S.D. from the mean of the data) bracket ~95% of the scatter. The overall mean signed difference is -1.20 (95% CI [-2.06, -0.37]).

Also shown in Fig. 2Go is the line obtained from the linear regression. The slope of the regression line is -0.033 (95% CI [-0.069, 0.004]). This indicates that the distribution of differences between the paired scores is comparable across the range of Larsen scores examined.


    Discussion
 Top
 Abstract
 Introduction
 Patients and methods
 Results
 Discussion
 References
 
The majority (95%) of digitized images were deemed to be scorable. The five sets of X-rays that were unreadable when digitized had previously been assigned Larsen scores of 0, 4, 6, 37 and 58 when interpreted from film. Therefore, an association between the clarity of the digitized image and the degree of erosions present is unlikely. The overall mean difference between scores of X-ray films and of digitized images was -1.2 (95% CI [-2.06, -0.37]). This is a statistically significant difference from zero. However, it is a small effect compared to the standard deviation of the differences and might not be considered clinically significant. Importantly, there is no significant trend in the difference between scores with increasing degree of erosion, which could have introduced a considerable source of bias were it present. It should be noted that our use of Bland–Altman analysis is concerned purely with the difference of scores over the range of levels of erosions. If these scores were expressed in terms of percentage errors, a different picture would emerge and a much wider distribution of percentage errors would occur for lower Larsen scores than for higher scores.

Therefore, this study has shown that the Larsen score taken from an electronic image is equivalent to that of the celluloid original. A recent US study [5] concluded that assessing digitized images gave results equivalent to those obtained from film originals, although this conclusion was based on the use of correlation analysis alone. The results presented here are of particular interest due to the relatively inexpensive and accessible technology used (the resolution of the images being poorer than that used for the US study [5] as a result) and the informative comparisons of the two sets of scores using Bland–Altman analysis.

In studies which involve the analysis of X-ray data as a measure of outcome in RA, electronic storage and subsequent scoring of the X-ray images is a valid method, and is likely to overcome the logistic problem of filing and retrieval of multiple X-ray packets. Furthermore, in the future, it is likely that X-ray images will increasingly be presented in an electronic form in routine clinical practice. As shown here, the ability to perform formal analysis of X-ray images will be retained. These advantages have to be balanced against the fact that some digitized sets of X-rays (5% in this study) may be unreadable.


    References
 Top
 Abstract
 Introduction
 Patients and methods
 Results
 Discussion
 References
 

  1. Larsen A, Dale K, Eek M. Radiographic evaluation of rheumatoid arthritis and related conditions by standard reference films. Acta Radiol (Diagn) 1977;18:481–91.
  2. Bland JM, Altman DG. Statistical methods for assessing agreement between two methods of clinical measurement. Lancet 1986;i:307–10.
  3. Altman DG. Practical statistics for medical research. London: Chapman and Hall, 1991:190.
  4. Gardner JG, Altman DG. Statistics with confidence. London: BMJ Press, 1989:34.
  5. Genant HK, Jiang Y, Peterfy C, Lu Y, Redei J, Countryman PJ. Assessment of rheumatoid arthritis using a modified scoring method on digitized and original radiographs. Arthritis Rheum 1998;41:1583–90.[ISI][Medline]
Submitted 29 December 1998; revised version accepted 25 May 1999.