Generic and condition-specific outcome measures for people with osteoarthritis of the knee

J. E. Brazier, R. Harper, J. Munro, S. J. Walters and M. L. Snaith1

School for Health and Related Research and
1 Institute for Bone and Joint Medicine, Medical School, University of Sheffield, Sheffield, UK

Correspondence to: J. Brazier, School for Health and Related Research, University of Sheffield, Regent Court, 30 Regent Street, Sheffield S1 4DA, UK.


    Abstract
 Top
 Abstract
 Introduction
 Methods
 Results
 Discussion
 Conclusions
 References
 
Objectives. The aims of this study were to evaluate two condition-specific and two generic health status questionnaires for measuring health-related quality of life in patients with osteoarthritis (OA) of the knee, and to offer guidance to clinicians and researchers in choosing between them.

Methods. Patients were recruited from two settings: 118 from knee surgery waiting lists and 112 from rheumatology clinics. Four self-completion questionnaires [Western Ontario and McMaster University Osteoarthritis Index (WOMAC), Health Assessment Questionnaire (HAQ), Short Form-36 (SF-36) and Euroqol] were sent to subjects on two occasions 6 months apart. Construct validity, convergent validity, internal consistency and responsiveness were examined using primarily non-parametric methods.

Results. All instruments proved satisfactory in terms of ease of use, acceptability to patients, internal consistency and reliability. In the surgical group, the OA-specific WOMAC performed better than the HAQ and the generic measures in terms of validity and responsiveness to change, whereas in the rheumatology group the SF-36 was more responsive.

Conclusion. WOMAC is the instrument of choice for evaluating the outcome of knee replacement surgery in OA. The SF-36 provides a more general insight into patients' health and may be more responsive to change than the WOMAC in a heterogeneous rheumatology clinic population. Researchers wishing to undertake an economic evaluation might consider the EQ-5D for a surgical, but not a rheumatology clinic group.

KEY WORDS: Osteoarthritis, Knee, Health-related quality of life, Outcomes.


    Introduction
 Top
 Abstract
 Introduction
 Methods
 Results
 Discussion
 Conclusions
 References
 
Osteoarthritis (OA) is the single most important cause of disability and limitation of activity of elderly people in the UK [1]. As a method of treatment, joint replacement is now commonplace for hips and increasingly so for knees, while the pharmacological management of people with OA continues to be important. In this context, it is necessary to identify valid and acceptable outcome measures so that progress in treating OA can be evaluated. Such measures should benefit not only clinicians managing OA and purchasers of health care for this condition, but also, ultimately, patients through improved forms of treatment.

It is increasingly recognized that a key outcome measure for any health care intervention for OA, as for many other conditions, is change in health-related quality of life (HRQoL) [2, 3]. In the field of OA of the knee, clinicians and researchers are faced with a choice of measures. One established measure for arthritis is the UK version of the Health Assessment Questionnaire (HAQ), but most experience with this instrument has been with rheumatoid arthritis [4]. A more recent development has been a measure specific to OA: the Western Ontario and McMaster University Osteoarthritis Index (WOMAC) [5]. This has shown considerable promise, but there has been little published comparative evidence on its validity by an independent group of researchers.

There has also been increasing use of generic (i.e. not disease-specific) measures of health-related quality of life. These have the potential advantage of being more able to measure side-effects or complications of treatment, which may be unrelated to the condition itself. Many people with OA will also have co-morbidities and so to obtain a more holistic view of HRQoL in this patient group there is a case for using a generic measure. A generic measure which has been widely used for other conditions is the Short Form-36 (SF-36), which generates a profile of eight dimensions and for which there is some evidence for validity in OA patients [3, 6]. However, it is claimed that generic measures are less responsive to health changes than condition-specific measures. This claim needs to be tested.

In a resource-constrained environment, it is important to be able to examine the relative cost effectiveness of interventions, and for this a single-index measure of HRQoL is required. The Euroqol instrument (EQ) is a recently developed brief and easy to use single-index measure [7], which has been used successfully on patients with rheumatoid arthritis, but its validity has yet to be examined for OA patients. Concern has been raised about its crudeness and whether it is sufficiently sensitive to many changes in health [8].

The objective of the study reported here was to assess these four instruments for measuring HRQoL in patients with OA of the knee, in terms of their ability to discriminate between patient groups (i.e. their discriminative properties) and their sensitivity to change in both patients undergoing surgery and in patients being treated medically (i.e. their evaluative properties). The aim is to help clinicians and researchers in choosing instruments when measuring the outcomes of surgery or pharmacological interventions.


    Methods
 Top
 Abstract
 Introduction
 Methods
 Results
 Discussion
 Conclusions
 References
 
Recruitment of patients
Patients were recruited from two distinct clinical settings in five UK hospitals. Over an 8 month period, all new patients attending rheumatology clinics, and those assessed pre-operatively for total knee replacement (TKR), were eligible for the study. From these groups, recruitment was restricted to patients with a diagnosis of OA of the knee made by a hospital rheumatology or orthopaedic specialist. No further inclusion or exclusion criteria were applied, so that the recruited patients were likely to be representative of those seen in everyday hospital clinical practice in the UK.

Patients were invited to participate in the study by letter explaining the study and signed by their consulting physician or surgeon. Their names and addresses were obtained from the physicians' secretaries or from the pre-operative assessment clinic. A questionnaire booklet and pre-paid reply envelope were enclosed together with the letter, patient information sheet and consent form. For rheumatology clinic patients, a global assessment of the severity of their condition was supplied by their consulting physician, based upon their clinical impression.

Instruments
The questionnaire booklet contained the four self-completed HRQoL questionnaires (WOMAC, HAQ, SF-36 and EQ), a short section on socio-demographic information and recent use of health services, and a question asking about any other major health problems apart from their knee trouble (i.e. co-morbidities). Medical data were obtained from medical records of rheumatology clinic patients when these were not available from physicians' secretaries.

The WOMAC is a 24-item questionnaire, taking around 5 min to complete, and originally designed for use in clinical trials in patients with OA of the knee or hip. The Likert-scaled version was used in this study. Scores are generated for the three dimensions of Pain, Stiffness and Physical Function by summing the coded responses and then dividing by the number of items to provide a score range of 0–4 [9]. The HAQ was developed for arthritic conditions in general and the final version, modified for British patients [4, 10], contains 20 items covering eight categories of disability which combine to derive a single disability index ranging from 0 to 3 (Table 1Go). For both these instruments, a low score indicates good health. The SF-36, revised for use in a British population [11], contains 36 items and generates a profile of eight dimension scores ranging from 0 to 100, where high scores indicate good health [12]. The EQ is a brief two-page questionnaire, the first page containing five items describing health status across five dimensions (mobility, self-care, usual activity, pain/distress and depression/anxiety) (the EQ-5D), and the second displaying a visual analogue rating scale on which the respondent marks an assessment of their overall health [7]. The responses to the five items of the EQ-5D can be scored using a utility-weighted algorithm [13], which has been recommended for use in economic evaluation. The EQ therefore provides two single-index measures of health, the Rating scale and the EQ-5D index, ranging from 0 to 100. The recommended methods of substitution for missing responses for the WOMAC and SF-36 were carried out, but not for the HAQ and EQ, for which there are no methods of substitution.


View this table:
[in this window]
[in a new window]
 
TABLE 1.  Dimension scores for knee replacement patients at initial assessment
 
An initial assessment using these four instruments was undertaken at recruitment, with a follow-up assessment ~6 months later. At follow-up, item 2 of the SF-36 questionnaire was adapted as follows to measure patient-perceived health change: `Compared to the last time you completed the questionnaire, how would you rate your health in general now?'. The responses available were: much better, somewhat better, about the same, somewhat worse, much worse.

Analysis
The primary purpose of the analysis was to assess the discriminative and evaluative properties of the four measures of HRQoL used, i.e. their ability to discriminate between patient groups and their sensitivity to change, respectively.

The discriminative properties were examined in terms of their construct validity, where the distribution of scores is compared between groups with expected health differences. For the rheumatology clinic group, this was undertaken by estimating score differences between those classified by their physician as having mild or moderate disease, on the one hand, and those with severe disease, on the other. Further, for both rheumatology and surgical groups, score differences were examined between those who reported and those who did not report a non-musculoskeletal co-morbidity. The significance of any difference was tested with the Mann–Whitney U-test (a non-parametric equivalent of the t-test), and the importance of each difference assessed by calculating an `effect size', which is the mean difference between the groups divided by the pooled standard deviation. This can be regarded as an indication of the ability of a measure to distinguish the `signal' from the overall `noise' or variance, and it provides the basis for comparing measures with differing scales. Effect sizes were judged against criteria recommended by Cohen [14]: >=0.2<0.5, >=0.5<0.8 and >=0.8 indicating small, moderate and large effect sizes, respectively.

Validity was examined in terms of the convergence between like dimensions of the WOMAC, HAQ and SF-36 questionnaires. The opportunity was also taken to examine the internal consistency of the measures by calculating Cronbach's {alpha} coefficients for the two condition-specific questionnaires and the SF-36. According to Streiner and Norman [15], a value of 0.8 is usually regarded as acceptable. This statistic is not relevant for the EQ, which has only one item per dimension.

The evaluative properties were examined in terms of sensitivity to change or `responsiveness'. In part, the ability to respond to change can be assessed in terms of the proportion of patients at the floor (i.e. the worst score) or the ceiling (the best score) of each scale [16]. If many patients score at either extreme of a scale, the instrument will have limited ability to register deterioration or improvement, respectively. A more complete method is to examine the change in scores in patients who have experienced a change in health status.

For the knee replacement group, responsiveness was assessed in terms of the changes in scores before and after their arthroplasty, since this procedure has been found to bring about health improvement in most patients [2]. For the rheumatology clinic group, there was no external indicator of change, and hence responsiveness was assessed by comparing mean changes in scores across three distinct groups of patients: those who rated their health as having improved, worsened or stayed the same between the first and second surveys (i.e. their response to the self-perceived transition question of the SF-36 questionnaire). (Item 2 is not used in the scoring of the SF-36.) This global change item has been used to assess responsiveness for a number of conditions, including rheumatoid arthritis [17, 18].

The statistical significance of the changes in scores between different groups was assessed using the Kruskal–Wallis test in the rheumatology clinic group and by the Mann–Whitney U-test in the knee replacement group. For both groups, the four measures were also compared in terms of the standardized response mean (SRM), which is the mean change between assessments divided by the standard deviation of the change, and can be thought of as an indicator of the ability to distinguish `signal' from `noise' [19]. Cohen's criteria for effect size were also applied to this statistic [14].


    Results
 Top
 Abstract
 Introduction
 Methods
 Results
 Discussion
 Conclusions
 References
 
Response
Knee replacement sample.
Questionnaire booklets were mailed to 151 patients whose names were on surgical waiting lists to undergo TKR surgery in the near future. Contact was not made with two patients, one of whom was admitted for surgery earlier than expected, and another who could not be traced. One hundred and eighteen patients (effective response rate 79%) consented to participate by returning questionnaires at the initial assessment, with no adverse comments received. Reminder letters to non-respondents were not sent because of the short interval between theatre lists being drawn up and the date for surgery. Of the responders to the initial assessment, 109 questionnaires were returned at the follow-up assessment.

The mean age of respondents was 71 yr (range 47–87 yr). More than half the sample were female. Non-respondents were of similar age to respondents, and were more likely to be female.

Rheumatology clinic sample.
Questionnaire booklets were sent to 125 patients attending rheumatology out-patient clinics with a primary diagnosis of OA. Contact was not made with one patient who had changed address. After one reminder, 112 (effective response rate 90%) patients consented to participate by returning questionnaires, with no adverse comments. Of these, 102 returned questionnaires at the follow-up assessment.

The mean age of rheumatology clinic respondents was 64 yr, considerably younger than the sample of patients undergoing TKR, with more than twice as many women as men. Fifty-four per cent of patients were classified by their physicians as having mild disease and 41% as having severe disease. Only six (5%) patients were classified as moderate and, on the basis of their scores, these were combined in subsequent analyses with those classified as mild. Non-respondents were on average younger (57 yr), more likely to be female and to have mild disease.

Patients recruited from both sources were broadly typical of patients with OA of the knee seen in secondary care settings in the UK.

Completion
For the WOMAC, HAQ and EQ questionnaires, item completion rates exceeded 90%. The majority of SF-36 dimensions achieved completion rates of >90% for each dimension. The exception was that of the Physical functioning dimension for the knee replacement sample (86% completion). It was found that 11 knee replacement and eight rheumatology clinic respondents failed to separate the pages of the booklet, thus omitting items unintentionally.

Cross-sectional analysis: discriminitive properties of instruments
The dimension scores at the initial assessment for the knee replacement sample are shown in Table 1Go and for the rheumatology clinic sample in Table 2Go. [Two week retest reliability was assessed (for the WOMAC only) by examining score differences for patients who said that their health had not changed (n=30). For all three dimensions, there were no statistically significant differences between test and retest scores [20]. The reliability properties of the other instruments are already well established.]


View this table:
[in this window]
[in a new window]
 
TABLE 2.  Dimension scores for rheumatology clinic patients at initial assessment
 
Construct validity.
Mean score differences between patients with a clinical assessment of mild/moderate vs severe disease in the rheumatology clinic group were highly significant for all three dimensions of the WOMAC questionnaire and the HAQ at the 1% level (Table 3Go). Seven dimensions of the SF-36 also discriminated significantly between the patient groups in terms of severity. Results for the EQ were mixed, where differences between the EQ-5D measures, but not the Rating scale, reached significant levels. These results were reflected in the effect sizes, which were large for the Pain dimensions of WOMAC and moderate for the other significant dimensions. Overall, the larger effect sizes were associated with WOMAC.


View this table:
[in this window]
[in a new window]
 
TABLE 3.  Score differences between rheumatology clinic patients with mild/moderate and severe knee osteoarthritis
 
Six dimensions of the SF-36, the EQ Rating scale and the condition-specific HAQ produced significant differences between rheumatology clinic patients with co-morbidity and those without, at the 5% level (Table 4Go). However, neither the three WOMAC dimensions nor the EQ-5D produced significant differences. The largest effect size was observed for the General health perception dimension of the SF-36 (0.78) with moderate effect sizes being observed for three other SF-36 dimensions and the HAQ (Table 4Go).


View this table:
[in this window]
[in a new window]
 
TABLE 4.  Effect sizes for patients in relation to co-morbidity
 
For those patients soon to undergo TKR surgery, all instruments discriminated to some extent between patients with and without co-morbidity. However, the relative performance of the instruments was different. The clearest picture emerged for the EQ-5D and EQ Rating scale. For the SF-36, differences between Mental health, Pain and General health perception reached statistical significance. The HAQ Disability index and two dimensions of WOMAC were significantly different. Moderate effect sizes were found for two WOMAC dimensions, three SF-36 dimensions and both EQ indices, but not for the HAQ (Table 4Go).

Convergent validity.
An inspection of the correlation of like dimensions across all dimensions found the expected convergence between dimension scores across instruments. Spearman's rank correlation coefficient between the WOMAC's Physical functioning dimension and the HAQ Disability index was 0.68. For the generic SF-36 and WOMAC, correlation between the physical functioning dimensions was 0.70 and also 0.70 between pain dimensions. These correlations exceeded those between WOMAC Physical function and WOMAC dimensions of Pain and Stiffness (0.65 and 0.63, respectively). As expected, correlations of Mental health and Vitality (SF-36) with WOMAC dimensions were low.

Internal consistency.
Cronbach's {alpha} coefficients were acceptable for all three dimensions of the WOMAC, according to standards recommended by Streiner and Norman [15]. For the HAQ, four of the eight categories did not meet these. The {alpha} coefficients of the SF-36 were also below these standards, but in only one instance was the {alpha} coefficient <0.7 (role limitations due to physical problems).

Longitudinal analysis: evaluative properties of the instruments
Score distributions.
`Floor' effects of >10% of responses were observed for the Physical functioning and Role limitations dimensions of the SF-36 in both samples (Tables 1 and 2GoGo). For the HAQ, over 10% of responses for Rising, Hygiene, Reach and Activities in the knee replacement sample were on the `floor', and for Hygiene, Reach and Activities in the rheumatology clinic sample. Two dimensions of the SF-36, Social functioning and Role limitations due to emotional problems, showed `ceiling' effects in both patient groups. The HAQ showed `ceiling' effects in four dimensions in the knee replacement sample and seven in the rheumatology clinic sample. Neither the WOMAC nor the EQ indices demonstrated substantial `floor' or `ceiling' effects.

Perceived health change.
(i) Rheumatology clinic sample.
For the rheumatology clinic sample, all dimension scores were found to be associated to some extent with the perceived direction of change (Table 5Go). The pattern was found to be significant for six dimensions of the SF-36, both EQ indices and all dimensions of the WOMAC, but not the HAQ, at the 5% level using the Kruskal–Wallis test. The condition-specific measures did not perform noticeably better than either of the generic measures in terms of the standardized response mean (Table 7Go). Only the SRM for the Pain dimension of SF-36 was moderate in size, while all other SRMs for all four instruments were either small or not significant.


View this table:
[in this window]
[in a new window]
 
TABLE 5.  Mean score differences between initial assessment and follow-up in relation to patient-perceived health change for rheumatology clinic patients
 

View this table:
[in this window]
[in a new window]
 
TABLE 7.  Responsiveness of instruments indicated by SRM
 
(ii) Knee replacement sample.
The mean changes found at post-operative follow-up were statistically significant for all dimensions of the WOMAC and the HAQ disability index. For the SF-36, three dimensions (Physical functioning, Pain and Vitality) were significantly different, as was the EQ-5D, but not the EQ Rating scale (Table 6Go).


View this table:
[in this window]
[in a new window]
 
TABLE 6.  Mean score differences between initial assessment and follow-up: knee replacement patients
 
These results were reflected in the SRMs, where a high value was observed for Physical function and Pain for the WOMAC compared to a small SRM for the HAQ. SRMs were moderate for the Physical functioning and Pain dimensions of the SF-36 and for the EQ-5D. (Table 7Go).


    Discussion
 Top
 Abstract
 Introduction
 Methods
 Results
 Discussion
 Conclusions
 References
 
The high response rates achieved, and the absence of any adverse comments from respondents, suggest that all instruments may be acceptable to this clinical population. In addition, in both samples, completion rates were very satisfactory, which is encouraging in an elderly group of patients. These results confirm previous studies using these instruments [3, 5, 9, 21]. WOMAC, SF-36 and HAQ scores were similar, though not identical, to those found in other OA cohorts [9, 21].

Differences in performance between these measures were found in the comparisons of validity and responsiveness. This is the first time that these outcome measures have been evaluated together in terms of their discriminatory and evaluative properties for these two groups of patients with OA of the knee, and there is no reason to suppose that the same instrument should perform well in both groups. It is commonly assumed that the condition-specific measure should be the more responsive and this hypothesis is supported by our results in the knee replacement group, who received a major intervention. The OA-specific WOMAC physical functioning scale was more responsive than the more general HAQ Disability index, and this and Pain were more responsive than the equivalent dimensions of the SF-36. These results confirm previous studies comparing WOMAC to the HAQ [9] and the SF-36 [3, 21]. In the present study, WOMAC emerges as the instrument of choice for assessing the consequences of surgery for OA of the knee. However, the results also support the use of a generic instrument on this group of patients, since the SF-36 was better at distinguishing those reporting a non-musculoskeletal co-morbidity from those who did not.

The advantages of the OA-specific WOMAC were less clear for the rheumatology clinic patients, in whom the HAQ and equivalent dimensions of the SF-36 (Physical functioning and Pain) were just as able to distinguish between severity groups. Furthermore, many of their dimension scores were able to discriminate between patients with and without non-musculoskeletal co-morbidity, whereas the WOMAC was not. Most importantly, some dimensions of the SF-36 (Pain, Vitality and General health) were more responsive than the WOMAC for these patients. A possible reason for this result could be that the rheumatology clinic patients were a less well-defined and homogeneous group of patients, with more frequent health problems unrelated to OA of the knee. The changes being experienced by this group may have been more general in nature, so that the generic SF-36 was better at detecting them. An important feature of the rheumatology patients was that more of them reported a deterioration in their health than an improvement, and this was better reflected in the SF-36 than in the condition-specific measures. However, there are reasons to interpret this result with caution. The analysis of this group is limited by a small sample size, although it compares well with other studies. In addition, the result may apply only to the broad mix of patients typically attending NHS rheumatology clinics, rather than to medically managed OA patients in general.

For both patient groups, there is the question of which generic instrument is most appropriate for use [21]. This is the first time that the EQ has been evaluated in patients with OA. Results from a study using the EQ with patients with rheumatoid arthritis found that it performed as well as the more specific HAQ [22]. In the present study, the EQ-5D was able to discriminate on the basis of severity for patients with OA of the knee attending a rheumatology clinic, and was comparable in terms of responsiveness to the best-performing dimensions of the SF-36 in the knee replacement group. However, the EQ-5D was noticeably less responsive to change in the rheumatology clinic group than many of the dimensions of the SF-36. This may reflect the fact that it is based on a more crude description of status in any given dimension, which makes it efficient for large changes, but less so for the more subtle and diverse changes experienced by the rheumatology clinic group. The advantage of the EQ-5D is its brevity (occupying a single page), but this is at the expense of lower sensitivity, and it does not give the broad picture available from a profile measure such as the SF-36. In summary, these results suggest that the EQ-5D may be suitable for economic evaluations of surgical interventions in this group, but for other purposes, the SF-36 would be preferred.

The EQ Rating scale is even simpler, but its performance was inconsistent. It proved unable to distinguish between severity groups, and remarkably unresponsive to the changes following TKR. It performed better in detecting non-musculoskeletal co-morbidity and change in the rheumatology clinic group, but dimensions of the SF-36 performed as well or better in all respects.


    Conclusions
 Top
 Abstract
 Introduction
 Methods
 Results
 Discussion
 Conclusions
 References
 
This investigation has confirmed that WOMAC is the instrument of choice for evaluating the outcome of TKR in patients with OA of the knee. For a more general insight into patients' health and as a means of making comparisons across conditions, the SF-36 should also be used. For researchers wishing to undertake an economic evaluation, the EQ-5D might be considered for a surgical but not a heterogeneous medically managed clinic group. Our results suggest that the SF-36 is probably a better choice than WOMAC for detecting change in the less condition-specific morbidity found in this diverse patient population, though care should be taken in generalizing this last result to all medically managed OA populations.


    Acknowledgments
 
We wish to express our gratitude to the consultant orthopaedic surgeons and rheumatologists and their staff, and to the patients who gave up their time to complete the questionnaires. The study was funded by the UK NHS Executive (Trent).


    References
 Top
 Abstract
 Introduction
 Methods
 Results
 Discussion
 Conclusions
 References
 

  1.  McAlindon TE, Cooper C, Kirwan JR, Dieppe PA. Knee pain and disability in the community. Br J Rheumatol 1992;31:189–92.[ISI][Medline]
  2.  Liang MH, Fossel AH, Larson MG. Comparisons of five health status instruments for orthopedic evaluation. Med Care 1990;28:632–42.[ISI][Medline]
  3.  Bombardier C, Melfi CA, Paul J, Green R, Hawker G, Wright J et al. Comparison of a generic and a disease-specific measure of pain and physical function after knee replacement surgery. Med Care 1995;33(suppl.): AS131–44.[ISI][Medline]
  4.  Kirwan JR, Reeback JS. Stanford Health Assessment Questionnaire modified to assess disability in British patients with rheumatoid arthritis. Br J Rheumatol 1986;25:206–9.[ISI][Medline]
  5.  Bellamy N, Watson Buchanan W, Goldsmith CH, Campbell J, Stitt LW. Validation study of WOMAC: A health status instrument for measuring clinically important patient relevant outcomes in antirheumatic drug therapy in patients with osteoarthritis of the hip or knee. J Rheumatol 1988;15:1833–40.[ISI][Medline]
  6.  Stucki G, Liang MH, Phillips C, Katz JN. The Short Form-36 is preferable to the SIP as a generic health status measure in patients undergoing elective total hip arthroplasty. Arthritis Care Res 1995;8:174–81.[Medline]
  7.  The Euroqol Group. Euroqol—a facility for the measurement of health-related quality of life. Health Policy 1990;16:199–208.[ISI][Medline]
  8.  McDowell I, Newell C. A guide to rating scales and questionnaires. Oxford: Oxford University Press, 1987.
  9.  Griffiths G, Bellamy N, Bailey WH, Bailey SI, McLaren AC, Campbell J. A comparative study of the relative efficiency of the WOMAC, AIMS and HAQ instruments in evaluating the outcome of total knee arthroplasty. Inflammopharmacology 1995;3:1–6.
  10. Wilkin D, Hallam L, Doggett MA. Measures of need and outcome for primary health care. Oxford: Oxford University Press, 1992.
  11. Brazier JE, Harper R, Jones NMB et al. Validating the SF-36 health survey questionnaire: new outcome measure for primary care. Br Med J 1992;305:160–4.[ISI][Medline]
  12. Ware JE, Snow KK, Kolinski M, Gandeck B. SF-36 health survey manual and interpretation guide. Boston: The Health Institute, New England Medical Centre, 1993.
  13. Williams A. The measurement and valuation of health: a chronicle. Discussion Paper 136. Centre for Health Economics, York Health Economics Consortium, NHS Centre for Reviews and Dissemination. York: University of York, 1995.
  14. Cohen J. Statistical power analysis for the behavioural sciences. New York: Academic Press, 1978.
  15. Streiner DL, Norman GR. Health measurement scales: a practical guide to their development and use. Oxford: Oxford University Press, 1989.
  16. Fortin PR, Stucki G, Katz JN. Measuring relevant change: an emerging challenge in rheumatological clinical trials. Arthritis Rheum 1995;38:1027–30.[ISI][Medline]
  17. Fitzpatrick R, Ziebland S, Jenkinson C, Mowat A, Mowat A. A generic health status instrument in the assessment of rheumatoid arthritis. Br J Rheumatol 1992;31:87–90.[Medline]
  18. Garratt AM, Ruta DA, Abdalla MI. The SF-36 health survey questionnaire: an outcome measure suitable for routine use within the NHS. Br Med J 1992;306:1440–4.[ISI]
  19. Katz JN, Larson MG, Phillips CB, Fossel AH, Liang MH. Comparative measurement sensitivity of short and longer health status instruments. Med Care 1992; 30:917–25.[ISI][Medline]
  20. Brazier J, Snaith M, Munro J. Measuring health outcome in people with osteoarthritis of the knee. Report to NHS Executive (Trent), UK, 1996.
  21. Hawker G, Melfi C, Paul J, Green C, Bombardier C. Comparison of a generic (SF-36) and a disease-specific (WOMAC) instrument in the measurement of outcomes after knee replacement surgery. J Rheumatol 1995; 22:1193–6.[ISI][Medline]
  22. Hurst NP, Kind P, Ruta D, Hunter M, Stubbings A. Measuring health-related quality of life in rheumatoid arthritis: validity, responsiveness and reliability of Euroqol (EQ-5D). Br J Rheumatol 1997;36:551–9.[ISI][Medline]
Submitted 12 March 1998; revised version accepted 1 April 1999.