School for Health and Related Research and
1 Institute for Bone and Joint Medicine, Medical School, University of Sheffield, Sheffield, UK
Correspondence to:
J. Brazier, School for Health and Related Research, University of Sheffield, Regent Court, 30 Regent Street, Sheffield S1 4DA, UK.
![]() |
Abstract |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Methods. Patients were recruited from two settings: 118 from knee surgery waiting lists and 112 from rheumatology clinics. Four self-completion questionnaires [Western Ontario and McMaster University Osteoarthritis Index (WOMAC), Health Assessment Questionnaire (HAQ), Short Form-36 (SF-36) and Euroqol] were sent to subjects on two occasions 6 months apart. Construct validity, convergent validity, internal consistency and responsiveness were examined using primarily non-parametric methods.
Results. All instruments proved satisfactory in terms of ease of use, acceptability to patients, internal consistency and reliability. In the surgical group, the OA-specific WOMAC performed better than the HAQ and the generic measures in terms of validity and responsiveness to change, whereas in the rheumatology group the SF-36 was more responsive.
Conclusion. WOMAC is the instrument of choice for evaluating the outcome of knee replacement surgery in OA. The SF-36 provides a more general insight into patients' health and may be more responsive to change than the WOMAC in a heterogeneous rheumatology clinic population. Researchers wishing to undertake an economic evaluation might consider the EQ-5D for a surgical, but not a rheumatology clinic group.
KEY WORDS: Osteoarthritis, Knee, Health-related quality of life, Outcomes.
![]() |
Introduction |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
It is increasingly recognized that a key outcome measure for any health care intervention for OA, as for many other conditions, is change in health-related quality of life (HRQoL) [2, 3]. In the field of OA of the knee, clinicians and researchers are faced with a choice of measures. One established measure for arthritis is the UK version of the Health Assessment Questionnaire (HAQ), but most experience with this instrument has been with rheumatoid arthritis [4]. A more recent development has been a measure specific to OA: the Western Ontario and McMaster University Osteoarthritis Index (WOMAC) [5]. This has shown considerable promise, but there has been little published comparative evidence on its validity by an independent group of researchers.
There has also been increasing use of generic (i.e. not disease-specific) measures of health-related quality of life. These have the potential advantage of being more able to measure side-effects or complications of treatment, which may be unrelated to the condition itself. Many people with OA will also have co-morbidities and so to obtain a more holistic view of HRQoL in this patient group there is a case for using a generic measure. A generic measure which has been widely used for other conditions is the Short Form-36 (SF-36), which generates a profile of eight dimensions and for which there is some evidence for validity in OA patients [3, 6]. However, it is claimed that generic measures are less responsive to health changes than condition-specific measures. This claim needs to be tested.
In a resource-constrained environment, it is important to be able to examine the relative cost effectiveness of interventions, and for this a single-index measure of HRQoL is required. The Euroqol instrument (EQ) is a recently developed brief and easy to use single-index measure [7], which has been used successfully on patients with rheumatoid arthritis, but its validity has yet to be examined for OA patients. Concern has been raised about its crudeness and whether it is sufficiently sensitive to many changes in health [8].
The objective of the study reported here was to assess these four instruments for measuring HRQoL in patients with OA of the knee, in terms of their ability to discriminate between patient groups (i.e. their discriminative properties) and their sensitivity to change in both patients undergoing surgery and in patients being treated medically (i.e. their evaluative properties). The aim is to help clinicians and researchers in choosing instruments when measuring the outcomes of surgery or pharmacological interventions.
![]() |
Methods |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Patients were invited to participate in the study by letter explaining the study and signed by their consulting physician or surgeon. Their names and addresses were obtained from the physicians' secretaries or from the pre-operative assessment clinic. A questionnaire booklet and pre-paid reply envelope were enclosed together with the letter, patient information sheet and consent form. For rheumatology clinic patients, a global assessment of the severity of their condition was supplied by their consulting physician, based upon their clinical impression.
Instruments
The questionnaire booklet contained the four self-completed HRQoL questionnaires (WOMAC, HAQ, SF-36 and EQ), a short section on socio-demographic information and recent use of health services, and a question asking about any other major health problems apart from their knee trouble (i.e. co-morbidities). Medical data were obtained from medical records of rheumatology clinic patients when these were not available from physicians' secretaries.
The WOMAC is a 24-item questionnaire, taking around 5 min to complete, and originally designed for use in clinical trials in patients with OA of the knee or hip. The Likert-scaled version was used in this study. Scores are generated for the three dimensions of Pain, Stiffness and Physical Function by summing the coded responses and then dividing by the number of items to provide a score range of 04 [9]. The HAQ was developed for arthritic conditions in general and the final version, modified for British patients [4, 10], contains 20 items covering eight categories of disability which combine to derive a single disability index ranging from 0 to 3 (Table 1). For both these instruments, a low score indicates good health. The SF-36, revised for use in a British population [11], contains 36 items and generates a profile of eight dimension scores ranging from 0 to 100, where high scores indicate good health [12]. The EQ is a brief two-page questionnaire, the first page containing five items describing health status across five dimensions (mobility, self-care, usual activity, pain/distress and depression/anxiety) (the EQ-5D), and the second displaying a visual analogue rating scale on which the respondent marks an assessment of their overall health [7]. The responses to the five items of the EQ-5D can be scored using a utility-weighted algorithm [13], which has been recommended for use in economic evaluation. The EQ therefore provides two single-index measures of health, the Rating scale and the EQ-5D index, ranging from 0 to 100. The recommended methods of substitution for missing responses for the WOMAC and SF-36 were carried out, but not for the HAQ and EQ, for which there are no methods of substitution.
|
Analysis
The primary purpose of the analysis was to assess the discriminative and evaluative properties of the four measures of HRQoL used, i.e. their ability to discriminate between patient groups and their sensitivity to change, respectively.
The discriminative properties were examined in terms of their construct validity, where the distribution of scores is compared between groups with expected health differences. For the rheumatology clinic group, this was undertaken by estimating score differences between those classified by their physician as having mild or moderate disease, on the one hand, and those with severe disease, on the other. Further, for both rheumatology and surgical groups, score differences were examined between those who reported and those who did not report a non-musculoskeletal co-morbidity. The significance of any difference was tested with the MannWhitney U-test (a non-parametric equivalent of the t-test), and the importance of each difference assessed by calculating an `effect size', which is the mean difference between the groups divided by the pooled standard deviation. This can be regarded as an indication of the ability of a measure to distinguish the `signal' from the overall `noise' or variance, and it provides the basis for comparing measures with differing scales. Effect sizes were judged against criteria recommended by Cohen [14]: 0.2<0.5,
0.5<0.8 and
0.8 indicating small, moderate and large effect sizes, respectively.
Validity was examined in terms of the convergence between like dimensions of the WOMAC, HAQ and SF-36 questionnaires. The opportunity was also taken to examine the internal consistency of the measures by calculating Cronbach's coefficients for the two condition-specific questionnaires and the SF-36. According to Streiner and Norman [15], a value of 0.8 is usually regarded as acceptable. This statistic is not relevant for the EQ, which has only one item per dimension.
The evaluative properties were examined in terms of sensitivity to change or `responsiveness'. In part, the ability to respond to change can be assessed in terms of the proportion of patients at the floor (i.e. the worst score) or the ceiling (the best score) of each scale [16]. If many patients score at either extreme of a scale, the instrument will have limited ability to register deterioration or improvement, respectively. A more complete method is to examine the change in scores in patients who have experienced a change in health status.
For the knee replacement group, responsiveness was assessed in terms of the changes in scores before and after their arthroplasty, since this procedure has been found to bring about health improvement in most patients [2]. For the rheumatology clinic group, there was no external indicator of change, and hence responsiveness was assessed by comparing mean changes in scores across three distinct groups of patients: those who rated their health as having improved, worsened or stayed the same between the first and second surveys (i.e. their response to the self-perceived transition question of the SF-36 questionnaire). (Item 2 is not used in the scoring of the SF-36.) This global change item has been used to assess responsiveness for a number of conditions, including rheumatoid arthritis [17, 18].
The statistical significance of the changes in scores between different groups was assessed using the KruskalWallis test in the rheumatology clinic group and by the MannWhitney U-test in the knee replacement group. For both groups, the four measures were also compared in terms of the standardized response mean (SRM), which is the mean change between assessments divided by the standard deviation of the change, and can be thought of as an indicator of the ability to distinguish `signal' from `noise' [19]. Cohen's criteria for effect size were also applied to this statistic [14].
![]() |
Results |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
The mean age of respondents was 71 yr (range 4787 yr). More than half the sample were female. Non-respondents were of similar age to respondents, and were more likely to be female.
Rheumatology clinic sample.
Questionnaire booklets were sent to 125 patients attending rheumatology out-patient clinics with a primary diagnosis of OA. Contact was not made with one patient who had changed address. After one reminder, 112 (effective response rate 90%) patients consented to participate by returning questionnaires, with no adverse comments. Of these, 102 returned questionnaires at the follow-up assessment.
The mean age of rheumatology clinic respondents was 64 yr, considerably younger than the sample of patients undergoing TKR, with more than twice as many women as men. Fifty-four per cent of patients were classified by their physicians as having mild disease and 41% as having severe disease. Only six (5%) patients were classified as moderate and, on the basis of their scores, these were combined in subsequent analyses with those classified as mild. Non-respondents were on average younger (57 yr), more likely to be female and to have mild disease.
Patients recruited from both sources were broadly typical of patients with OA of the knee seen in secondary care settings in the UK.
Completion
For the WOMAC, HAQ and EQ questionnaires, item completion rates exceeded 90%. The majority of SF-36 dimensions achieved completion rates of >90% for each dimension. The exception was that of the Physical functioning dimension for the knee replacement sample (86% completion). It was found that 11 knee replacement and eight rheumatology clinic respondents failed to separate the pages of the booklet, thus omitting items unintentionally.
Cross-sectional analysis: discriminitive properties of instruments
The dimension scores at the initial assessment for the knee replacement sample are shown in Table 1 and for the rheumatology clinic sample in Table 2
. [Two week retest reliability was assessed (for the WOMAC only) by examining score differences for patients who said that their health had not changed (n=30). For all three dimensions, there were no statistically significant differences between test and retest scores [20]. The reliability properties of the other instruments are already well established.]
|
|
|
Convergent validity.
An inspection of the correlation of like dimensions across all dimensions found the expected convergence between dimension scores across instruments. Spearman's rank correlation coefficient between the WOMAC's Physical functioning dimension and the HAQ Disability index was 0.68. For the generic SF-36 and WOMAC, correlation between the physical functioning dimensions was 0.70 and also 0.70 between pain dimensions. These correlations exceeded those between WOMAC Physical function and WOMAC dimensions of Pain and Stiffness (0.65 and 0.63, respectively). As expected, correlations of Mental health and Vitality (SF-36) with WOMAC dimensions were low.
Internal consistency.
Cronbach's coefficients were acceptable for all three dimensions of the WOMAC, according to standards recommended by Streiner and Norman [15]. For the HAQ, four of the eight categories did not meet these. The
coefficients of the SF-36 were also below these standards, but in only one instance was the
coefficient <0.7 (role limitations due to physical problems).
Longitudinal analysis: evaluative properties of the instruments
Score distributions.
`Floor' effects of >10% of responses were observed for the Physical functioning and Role limitations dimensions of the SF-36 in both samples (Tables 1 and 2). For the HAQ, over 10% of responses for Rising, Hygiene, Reach and Activities in the knee replacement sample were on the `floor', and for Hygiene, Reach and Activities in the rheumatology clinic sample. Two dimensions of the SF-36, Social functioning and Role limitations due to emotional problems, showed `ceiling' effects in both patient groups. The HAQ showed `ceiling' effects in four dimensions in the knee replacement sample and seven in the rheumatology clinic sample. Neither the WOMAC nor the EQ indices demonstrated substantial `floor' or `ceiling' effects.
Perceived health change.
(i) Rheumatology clinic sample.
For the rheumatology clinic sample, all dimension scores were found to be associated to some extent with the perceived direction of change (Table 5). The pattern was found to be significant for six dimensions of the SF-36, both EQ indices and all dimensions of the WOMAC, but not the HAQ, at the 5% level using the KruskalWallis test. The condition-specific measures did not perform noticeably better than either of the generic measures in terms of the standardized response mean (Table 7
). Only the SRM for the Pain dimension of SF-36 was moderate in size, while all other SRMs for all four instruments were either small or not significant.
|
|
|
![]() |
Discussion |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Differences in performance between these measures were found in the comparisons of validity and responsiveness. This is the first time that these outcome measures have been evaluated together in terms of their discriminatory and evaluative properties for these two groups of patients with OA of the knee, and there is no reason to suppose that the same instrument should perform well in both groups. It is commonly assumed that the condition-specific measure should be the more responsive and this hypothesis is supported by our results in the knee replacement group, who received a major intervention. The OA-specific WOMAC physical functioning scale was more responsive than the more general HAQ Disability index, and this and Pain were more responsive than the equivalent dimensions of the SF-36. These results confirm previous studies comparing WOMAC to the HAQ [9] and the SF-36 [3, 21]. In the present study, WOMAC emerges as the instrument of choice for assessing the consequences of surgery for OA of the knee. However, the results also support the use of a generic instrument on this group of patients, since the SF-36 was better at distinguishing those reporting a non-musculoskeletal co-morbidity from those who did not.
The advantages of the OA-specific WOMAC were less clear for the rheumatology clinic patients, in whom the HAQ and equivalent dimensions of the SF-36 (Physical functioning and Pain) were just as able to distinguish between severity groups. Furthermore, many of their dimension scores were able to discriminate between patients with and without non-musculoskeletal co-morbidity, whereas the WOMAC was not. Most importantly, some dimensions of the SF-36 (Pain, Vitality and General health) were more responsive than the WOMAC for these patients. A possible reason for this result could be that the rheumatology clinic patients were a less well-defined and homogeneous group of patients, with more frequent health problems unrelated to OA of the knee. The changes being experienced by this group may have been more general in nature, so that the generic SF-36 was better at detecting them. An important feature of the rheumatology patients was that more of them reported a deterioration in their health than an improvement, and this was better reflected in the SF-36 than in the condition-specific measures. However, there are reasons to interpret this result with caution. The analysis of this group is limited by a small sample size, although it compares well with other studies. In addition, the result may apply only to the broad mix of patients typically attending NHS rheumatology clinics, rather than to medically managed OA patients in general.
For both patient groups, there is the question of which generic instrument is most appropriate for use [21]. This is the first time that the EQ has been evaluated in patients with OA. Results from a study using the EQ with patients with rheumatoid arthritis found that it performed as well as the more specific HAQ [22]. In the present study, the EQ-5D was able to discriminate on the basis of severity for patients with OA of the knee attending a rheumatology clinic, and was comparable in terms of responsiveness to the best-performing dimensions of the SF-36 in the knee replacement group. However, the EQ-5D was noticeably less responsive to change in the rheumatology clinic group than many of the dimensions of the SF-36. This may reflect the fact that it is based on a more crude description of status in any given dimension, which makes it efficient for large changes, but less so for the more subtle and diverse changes experienced by the rheumatology clinic group. The advantage of the EQ-5D is its brevity (occupying a single page), but this is at the expense of lower sensitivity, and it does not give the broad picture available from a profile measure such as the SF-36. In summary, these results suggest that the EQ-5D may be suitable for economic evaluations of surgical interventions in this group, but for other purposes, the SF-36 would be preferred.
The EQ Rating scale is even simpler, but its performance was inconsistent. It proved unable to distinguish between severity groups, and remarkably unresponsive to the changes following TKR. It performed better in detecting non-musculoskeletal co-morbidity and change in the rheumatology clinic group, but dimensions of the SF-36 performed as well or better in all respects.
![]() |
Conclusions |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
![]() |
Acknowledgments |
---|
![]() |
References |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|