Reliability and validity of the EuroQol in patients with osteoarthritis of the knee

M. Fransen and J. Edmonds

St George Hospital/University of NSW, Sydney, Australia

Correspondence to: M. Fransen, Department of Rheumatology, St George Hospital, Gray Street, Kogarah 2217, Australia.


    Abstract
 Top
 Abstract
 Introduction
 Methods
 Results
 Discussion
 Conclusions
 Appendix 1
 References
 
Objective. To assess the reliability and validity of the EuroQol (EQ-5D) for osteoarthritis of the knee (OA knee).

Methods. Eighty-two patients with OA knee were asked to complete on two occasions, separated by 1 week, the EQ-5D, the Western Ontario and McMaster Universities (WOMAC) Osteoarthritis Index and the 36-item short form of the Medical Outcomes Study (SF-36).

Results. In this patient population, <10% of the 243 EQ-5D health states were active. The EQ-5D demonstrated a non-Gaussian distribution. Reliability [intraclass correlation coefficient (ICC)=0.70] is acceptable for aggregate level data. There were significant rank correlations with both the WOMAC and SF-36.

Conclusions. This study provides some evidence of EQ-5D construct validity and reliability. However, the restricted and non-normal distribution of scores, the marked difference between patients' self evaluation and derived societal utility tariffs, as well as the lack of discriminative ability for patients with `moderate' morbidity within each of the five EQ-5D dimensions, are of concern.

KEY WORDS: EuroQol, OA knee, QALY, Utility, WOMAC, SF-36, Health status measures, Health index.


    Introduction
 Top
 Abstract
 Introduction
 Methods
 Results
 Discussion
 Conclusions
 Appendix 1
 References
 
The diminishing marginal gains in survival due to technological improvements in health care in `developed' countries have led to the emergence of health-related quality of life as an important issue. It is hoped that health care will not only provide optimal length of life, but also maximize quality of life. Cost per quality-adjusted life year (QALY), analysed from aggregate population-based data, provides a well-developed framework to guide rational and explicit health policy decision making aiming to maximize allocative efficiency of diminishing health care resources [1]. Currently, an abundance of validated disease-specific and generic health status scales and questionnaires can produce a profile of health status with scores on various areas of functioning [2, 3]. For comparative economic analysis, however, it is necessary to obtain a single index of the perceived health-related quality of life which accurately reflects preferences for different health states.

Within a QALY framework, the quality adjustment factor is an estimate of the utility associated with preferences for different health states [46]. The strength of an individual's preference or utility is usually elicited using either the standard gamble or the time trade-off method. The standard gamble technique estimates utility under conditions of risk or uncertainty, whereas the time trade-off incorporates choice under certainty. The numbers or weightings representing the strength of an individual's preferences for an experienced or described health state are scored between 0 (death or worst imaginable health state) and 1 (full health or best imaginable health state). The quality adjustment is then multiplied by the expected life years in the assessed health state to arrive at the number of QALYs achieved. The expected utility associated with a health care intervention is then the sum of the probability of entering a health state multiplied by the utility (QALY) associated with that state.

The aim of this study was to assess the measurement reliability and validity of the EuroQol (EQ-5D), an instrument designed to derive from five dimensions of health (mobility, self-care, usual activities, pain, mood) a single cardinal index for the quality weighting of QALYs. The EQ-5D uses valuations derived with the time trade-off method from a large general population survey to score the five-dimension health profile self-reported, in this study, by subjects with osteoarthritis of the knee (OA knee). The numerous ethical implications of using QALYs as a basis for resource allocation or the valuation validity of the time trade-off method were beyond the scope of this study [7].


    Methods
 Top
 Abstract
 Introduction
 Methods
 Results
 Discussion
 Conclusions
 Appendix 1
 References
 
Patients
All patients appearing over a 20 month period on the physiotherapy waiting list of a large public hospital with a diagnosis of OA knee (or knee pain) aged 50 yr and older were invited to participate in a randomized controlled clinical trial assessing the effectiveness of physiotherapy. More than 95% of those with a confirmed diagnosis of OA knee [8] agreed to participate in the study as participation resulted in a 67% chance of avoiding the usual 6–8 week waiting time for treatment. The first 82 patients (25 males, 57 females), mean age 68.0 yr (range 46–88), mean symptom duration 9.9 yr (range 1–50), were asked to complete three questionnaires: the EuroQol, the Western Ontario and McMaster Universities (WOMAC) Osteoarthritis Index and the 36-item short form of the Medical Outcomes Study (SF-36) health survey questionnaire. Each questionnaire was re-administered to each study participant 1 week later. The questionnaires were presented in the same order on both pre-randomization assessment occasions and self-completed by the subjects. Study participants were not informed at the baseline assessment that they would be completing the same questionnaires at the follow-up assessment, nor were they given access to their baseline responses.

Health status instruments
The EuroQol (index measure).
The EuroQol instrument `has been purposefully developed to generate a cardinal index of health' primarily for evaluative studies and policy research and `is intended to complement other forms of quality of life measures' [9, 10]. The self-report questionnaire has two sections. The first part (EQ-5D) consists of five questions covering the dimensions of mobility, self-care, usual activities, pain/discomfort and anxiety/depression, each with three levels of response (Appendix 1). The time frame is given as `... your own health state today'. The resultant five-part health profile (e.g. 21122, 11223, etc.) is weighted according to EuroQol Group guidelines. The tariffs, or weights, for any of the theoretically possible 245 health profiles have been derived from the latest UK population survey (n=3337) using the time trade-off valuation method [11, 12], to create a single index of quality of life ranging from 0 to 1. The second part (EQ-VAS) of the EuroQol consists of a 20 cm vertical visual analogue scale (VAS) ranging from 100 (best imaginable health state) to 0 (worst imaginable health state). The EQ-VAS gives a self-assessed measure of overall health state. The EQ-VAS does not express any longevity trade-offs or uncertainty and therefore does not have utility valuation validity, and is not intended as quality adjustments for QALYs.

WOMAC (disease-specific measure).
The WOMAC Osteoarthritis Index is a disease-specific self-report questionnaire assessing pain (five questions), stiffness (two questions) and physical disability (17 questions) in patients with OA of either the hip or the knee [13, 14]. The time frame is given as `currently experiencing'. Responses to the questions are rated by the subject on 100 mm VAS ranging from 0 (indicating no pain, stiffness or difficulty) to 100 (indicating extreme pain, stiffness or difficulty). The two questions concerning stiffness are currently being replaced by the originator and will not be analysed further in this paper.

SF-36 (generic profile measure).
The SF-36 [15, 16] is a widely used self-report generic health status questionnaire measuring eight dimensions of health status: physical functioning (10 questions), role limitations due to physical health problems (four questions), bodily pain (two questions), social functioning (two questions), general mental health (five questions), role limitations due to emotional problems (three questions), vitality (four questions) and general health perceptions (five questions). The time frame is given as `during the past 4 weeks'. Each of the eight subscales generates a score from 100 to 0 with 100 indicating best health and 0 indicating worst health. An additional question, not scored, relates to perceived change in health status over the past year. Most of the SF-36 subscales have been validated for diverse patient groups [1719] and the SF-36 can be completed within 10 min by most people. Recently, the originators of the SF-36 have developed algorithms to calculate two psychometrically based summary measures: the Physical Component Summary Scale Score (PCS) and the Mental Component Summary Scale Score (MCS) [20, 21]. The PCS and MCS are norm-based scores. A linear T-score transformation method is used so that both the PCS and MCS have a mean of 50 and a S.D. deviation of 10 in the general US population. The PCS and the MCS provide greater precision, reduce the number of statistical comparisons needed, and eliminate the floor and ceiling effects noted in several of the subscales [17, 19, 2224].

Analysis
As the EQ-5D is intended for use in evaluative studies and policy research rather than as a clinical tool, only the reliability of aggregate level data is presented and compared with the WOMAC and SF-36. Systematic error or agreement have been analysed using the difference scores with their associated 95% confidence intervals (CIs). Reliability was analysed using intraclass correlation coefficients (ICCs).

Establishing the criterion or content validity of an instrument claiming to measure health-related quality of life is difficult as there are no established gold standards for comparison. Evidence for construct validity can only be accumulated by `a priori' hypothesized patterns of associations with other validated instruments. In this study, it is hypothesized that associations with the disease-specific WOMAC scores will be weaker than associations with the generic SF-36 scores or the EQ-VAS in this older sample with probable co-morbidities.

To investigate discriminative validity, an analysis was made to assess whether the three response levels available within each of the five health dimensions of the EQ-5D would accurately discriminate symptom severity or functional disability as rated by comparable generic SF-36 scores.

Data analysis was carried out using SAS Proprietary Software Release 6.12 (SAS Institute Inc., Cary, NC 27511, USA).


    Results
 Top
 Abstract
 Introduction
 Methods
 Results
 Discussion
 Conclusions
 Appendix 1
 References
 
Distribution
Less than 10% of the possible 243 EQ-5D health states were active in this sample. Whilst the WOMAC and the SF-36 scores were normally distributed, the EQ-5D score had a marked bimodal distribution (Fig. 1Go). As the EQ-5D baseline (and follow-up) scores demonstrated a non-normal distribution, non-parametric procedures were often needed. Difference scores between baseline and follow-up scores were, however, all normally distributed, allowing parametric procedures.



View larger version (39K):
[in this window]
[in a new window]
 
FIG. 1.  Distribution of scores.

 
Reliability
Paired t-tests were used to detect any systematic error from baseline to follow-up (Table 1Go). There were no significant systematic changes over the 1 week period at the 5% significance level. The 95% CIs for systematic error were narrow. Analysing each of the five dimensions of the EQ-5D, perfect test–retest agreement on the three response levels ranged from 77% (Usual activities) to 88% (Self-care).


View this table:
[in this window]
[in a new window]
 
TABLE 1.  Intra-subject agreement. Paired t-tests (two-tailed)
 
The lower bound 95% CI scores for the ICCs (1,1) ranged from 0.51 to 0.78 (Table 2Go). Test–retest analysis of each of the five EQ-5D dimensions as ordinal level data with Spearman's rho, rank correlations ranged from 0.29 (P=0.008) for Mobility to 0.60 (P=0.001) for Anxiety/depression.


View this table:
[in this window]
[in a new window]
 
TABLE 2.  Intraclass correlation coefficients [ICC (1,1)]
 
Construct validity
Convergent.
The EQ-5D scores demonstrated no significant associations (Spearman's rho) with age or sex. There were consistent significant associations with symptom duration, EQ-VAS, WOMAC and SF-36 MCS at both assessments (Table 3Go), but the correlation coefficients demonstrate only what are generally considered low to moderate associations. The strength of the associations should, however, be viewed relative to the maximum theoretical correlation, not positive or negative unity. The maximum correlation between two measures is in fact the square root of the product of their reliabilities [3].


View this table:
[in this window]
[in a new window]
 
TABLE 3.  Correlation coefficients—Spearman's rho
 
Support for the `a priori' hypotheses of stronger associations between the two generic health status questionnaires (EQ-5D and SF-36) compared with associations between the EQ-5D and the disease-specific WOMAC could not be demonstrated due to the relatively small sample size resulting in wide 95% CIs (Fisher's Z transformation) around the correlation coefficients (Table 3Go).

Discriminant.
Discriminant validity was analysed with what were considered comparable scores on the generic SF-36 questionnaire (Table 4Go). The EQ-5D does appear to be able to discriminate accurately within each separate health state descriptor.


View this table:
[in this window]
[in a new window]
 
TABLE 4.  Discriminant validity
 

    Discussion
 Top
 Abstract
 Introduction
 Methods
 Results
 Discussion
 Conclusions
 Appendix 1
 References
 
Distribution
In this patient sample, 10 health states (out of a theoretically possible 243) accounted for 82% of the responses with the remaining responses divided over only 13 other health states. This limited distribution of scores is not restricted to patient samples. In a large general population survey, Brazier et al. [23] found that only 10 health states (out of a theoretically possible 216 in a previous version of the EuroQol) accounted for 95% of the responses. In our non-institutionalized sample, in fact, the choice was essentially limited to two response levels at each of the five dimensions of health (Table 4Go), resulting in <10% of the EuroQol health states being active in this out-patient sample and restricting the potential responsiveness of the instrument in this population.

Whilst only one of the EQ-5D dimensions demonstrated a marked ceiling effect in this sample with `moderate' morbidity (Table 4Go), the crudeness of the levels means that someone responding with moderate levels of morbidity `is not permitted by the descriptive system to describe a further deterioration in health that is anything less than a major collapse' [25]. Similarly, the EQ-5D descriptive system allows patients with `moderate' morbidity only to describe a full recovery. A large study involving patients with rheumatoid arthritis (RA) did claim to demonstrate that the EQ-5D was `as responsive to self-reported clinical change ... as many of the condition-specific instruments' [26]. The RA patients were selected to `obtain a broad cross-section of disease severity'; however, approximately half of the sample had severe morbidity with a baseline median EQ-5D score of <=0.12.

The bimodal distribution of the EQ-5D scores reflects the relatively large decrement in societal valuations from level 2 to level 3 compared with decrements from level 1 to level 2. For example, a fall in Pain/discomfort only from level 1 to level 2 decreases the derived societal index by 0.15, whilst a fall from level 2 to level 3 decreases the valuation by 0.45, or nearly half the total score range. Decrements from level 2 to level 3 in the other four dimensions range from 0.25 to 0.43 for Mobility. The population tariffs therefore reveal that there is a huge difference in health state utility between `moderate pain or discomfort' and `extreme pain or discomfort'. Similarly for mobility, there is a large unassessed area between `some problems walking about' and `confined to bed'. Whilst the EuroQol instrument makes no claims that each of the five dimensions has interval properties [9, 27], the disadvantages of the EuroQol descriptive system for other patients with moderate levels of morbidity were well demonstrated in a recent randomized controlled trial involving patients with benign prostatic hypertrophy [25]. An additional level between level 2 and level 3 would potentially overcome the limited and abnormal distribution of scores in patient populations with `moderate' morbidity. It is suggested that this could be achieved without the EQ-5D become burdensome and losing its aim of being suitable as an `add-on to other data collection' [27].

Reliability
The ICCs of the EQ-5D (Table 2Go) could be considered modest considering the short test–retest interval of 1 week and may not be adequate for studies using a similar, relatively small sample. For aggregate analysis, however, it is claimed that a reliability of >0.50 may be acceptable with a large sample size [2]. However, ICC analysis does result in lower values compared with the Pearson equivalent as ICCs account for any additive or multiplicative element [3]. Furthermore, the EQ-5D (and the WOMAC pain) are each derived from only five scores, whilst the WOMAC physical function score is a mean of 17 VAS ratings, and the SF-36 PCS and SF-36 MCS are each derived from 35 questions. The inclusion of a greater number of subscales necessarily results in a more stable score, less susceptible to measurement error. This possibly partly explains the superior reliability demonstrated by the WOMAC physical function score and the SF-36 scores.

Modest reliability may, however, just reflect fluctuating symptomatology characteristic of OA knee as both the EuroQol and the WOMAC ask the responders to consider their `health state today' or `recently experienced'. In contrast, the SF-36 asks responders to consider `the past 4 weeks', potentially allowing greater 1 week test–retest reliability compared with the EuroQol and the WOMAC.

Validity
Construct validity cannot be proved definitively, but it is a continuing process of accumulating evidence [28]. This study provides some evidence of concurrent validity with significant and mostly stable associations between the EQ-5D and the other health status questionnaires on both assessment occasions (Table 3Go).

The EQ-5D was easily able to discriminate symptom severity as scored by the chosen SF-36 scores, but discriminative ability was crude as it was essentially limited to two levels within each of the five health dimensions for this sample (Table 4Go).

Although mental health is an important health determinant, for a disease such as OA knee it is noteworthy that the EQ-5D appears to demonstrate a stronger association with a patient's emotional state (SF-36 MCS) than with a patient's physical state (SF-36 PCS) (Table 3Go). In fact, it has been claimed that `the attenuation of validity coefficients due to unreliability implies that a correlation of 0.60 between two measures represents an extremely strong association' ([2], p. 34). More than 40% of this sample, referred for physiotherapy, reported having at least moderate levels of anxiety/depression on the EQ-5D (Table 4Go). Demonstrated mental distress may have been the stimulus for physicians to refer particularly these patients with OA knee for active physical intervention. Therefore, although exclusion co-morbidities [neurological disorders affecting gait, joint arthroplasty in the lower limbs or unstable cardiac conditions that preclude aerobic exercise at a very moderate level (50–60% maximum heart rate)] were limited, this sample may not be truly representative of community-dwelling people with OA knee. Hurst et al. [29] found in a pilot study of 55 patients with RA that `change in mood' was the strongest predictor of change in the EQ-5D. It may be that a patient's emotional state is a good index of disease severity in OA knee. If, however, a patient's emotional state markedly influences their physical health status perception, the resultant random measurement error would restrict the validity of the EQ-5D or other self-report questionnaires to only relatively large studies.

General points
Our subjects experienced two main problems with the wording of the EuroQol. Many subjects found the grouping together of the words pain and discomfort confusing. Discomfort was perceived as a mild pain; therefore, extreme discomfort was equated by many as moderate pain. There were also some interpretation difficulties concerning the term `walking about'—how far? include stairs? This may explain the low test–retest association for Mobility (Spearman's rho=0.29) compared with the other four dimensions (Spearman's rho range 0.50–0.60).

In agreement with other studies [26, 30], it would appear that patients in this study placed a higher valuation on their own health state than that given by the general population tariffs. If the two EuroQol scores are made mathematically comparable by multiplying the EQ-5D score by 100, the poor agreement between the patient's rating (median score 74.0) of their own general health state and the valuation given by the general population survey (median score 66.0) is highlighted. It could easily be argued that agreement between the scores is not an issue as there is a fundamental difference between a simple `risk-free' valuation of a health state (EQ-VAS) and a utility measure of preferences involving a longevity trade-off (EQ-5D). Furthermore, it has been well documented that comparable time frame, context and prognosis information are crucial when evaluating a health state [7]. Responders in the population survey were told that each health state, derived from the EuroQol descriptive system (Appendix 1), was to be regarded as lasting for 10 yr without change, followed by death, and were given no information regarding specific disease context [11, 12]. In contrast, the study participants have context knowledge, but uncertain and differing opinions on their probable disease progression rate and life expectancy. It is probable, however, that the study participant's median assessment of 74.0 would be even higher if longevity trade-offs were involved in the valuation process. The validity and wider ethical implications of using non-contextual health state descriptors to elicit quality of life valuations in the general population, with little direct experience of various levels of morbidity and little opportunity for reflection, have frequently been questioned [7, 31, 32].

OA knee has been identified as one of the five diseases responsible for the greatest proportion of physical disability in non-institutionalized elderly men and women [33]. `Moderate pain/discomfort', `some problems walking about' and `some problems with usual activities' cover an enormous spectrum of pain and disability. In fact, `some problems walking about' in the EQ-5D covers the entire range of gait disability as it is the only level between `no problems' and `confined to bed'! Similarly, `some problems with performing my usual activities' spans the entire spectrum of activities of daily life dysfunction from people able to live entirely independently, and even provide support for others, to those people needing maximum support of carers, equipment and/or finances. The probable limited responsiveness of the EQ-5D to clinically significant changes will be a major problem for patients with diseases of mostly `moderate' chronic morbidity, such as OA knee. QALYs already discriminate against the elderly as, with their decreased life expectancy, they have decreased potential to accrue QALYs. An unresponsive measure of quality of life for chronic `moderate' morbidity will further disadvantage patients with OA knee in resource allocation decisions.


    Conclusions
 Top
 Abstract
 Introduction
 Methods
 Results
 Discussion
 Conclusions
 Appendix 1
 References
 
The EQ-5D is a short, easily completed questionnaire feasible for use as an add-on data collection in large clinical studies. The study demonstrated that the EQ-5D has reliability sufficient for aggregate level data and comparable with the WOMAC and the SF-36. The study also provides some evidence of moderate convergent validity. However, the coarseness of the levels within each of the five EQ-5D health dimensions is of concern as it results in <10% of the health states being active in this population despite their chronic `moderate' morbidity, thereby permitting only crude discriminative ability and restricting responsiveness. The described restricted and demonstrated non-normal distribution of the EQ-5D scores, and the incomparability between patient's valuation of their own health state and the derived societal utility tariffs, should be further investigated before the EuroQol is considered a valid instrument for evaluative studies and policy research in this population. If health care resource allocation becomes purely EQ-5D QALY based, it is highly conceivable that these older patients with `moderate' levels of chronic morbidity will be severely disadvantaged.


    Appendix 1
 Top
 Abstract
 Introduction
 Methods
 Results
 Discussion
 Conclusions
 Appendix 1
 References
 
EQ-5D
By placing a tick (thus {checkmark}) in one box in each group below, please indicate which statements best describe your own health today.

Mobility
I have no problems in walking about

I have some problems in walking about

I am confined to bed

Self-Care
I have no problems with self-care

I have some problems washing or dressing myself

I am unable to wash or dress myself

Usual Activities (e.g. work, study, housework, family or leisure activities)
I have no problems with performing my usual activities

I have some problems with performing my usual activities

I am unable to perform my usual activities

Pain/Discomfort
I have no pain or discomfort

I have moderate pain or discomfort

I have extreme pain or discomfort

Anxiety/Depression
I am not anxious or depressed

I am moderately anxious or depressed

I am extremely anxious or depressed

Compared with my general level of health over the past 12 months, my health state today is:

Better

Much the same

Worse


    References
 Top
 Abstract
 Introduction
 Methods
 Results
 Discussion
 Conclusions
 Appendix 1
 References
 

  1.  Mooney G. Economics, medicine and health care, 2nd edn. UK: Harvester Wheatsheaf, 1992, p. 39.
  2.  Brooks R. Health status measurement. a perspective on change. London: Macmillan, 1994: Chapter 2.
  3.  McDowell I, Newell C. Measuring health. A guide to rating scales and questionnaires, 2nd edn. New York: Oxford University Press, 1996.
  4.  Richardson J. Cost utility analysis: what should be measured? Soc Sci Med 1994;39:7–21.[ISI][Medline]
  5.  Schoemaker P. The Expected Utility Model: Its variants, purposes, evidence and limitations. J Econ Lit 1982;20:529–63.[ISI]
  6.  Torrance G, Feeny D. Utilities and quality-adjusted life years. Int J Technol Assess Health Care 1989;5:559–75.[Medline]
  7.  Loomes G, McKenzie C. The use of QALYs in health care decision making. Soc Sci Med 1989;28:299–308.[ISI][Medline]
  8.  Altman R, Asch E, Bloch D et al. Development of criteria for the classification and reporting of osteoarthritis. Classification of osteoarthritis of the knee. Arthritis Rheum 1986;29:1039–49.[ISI][Medline]
  9.  Brooks R, EuroQol Group. EuroQol: the current state of play. Health Policy 1996;37:53–72.[ISI][Medline]
  10. Brazier J, Jones N, Kind P. Testing the validity of the Euroqol and comparing it with the SF-36 health survey questionnaire. Qual Life Res 1993;2:169–80.[ISI][Medline]
  11. Dolan P. Modeling valuations for EuroQol health states. Med Care 1997;35:1095–108.[ISI][Medline]
  12. Dolan P, Gudex C, Kind P, Williams A. The time trade-off method: results from a general population survey. Health Econ 1996;5:141–54.[ISI][Medline]
  13. Bellamy N. Musculoskeletal clinical metrology. The Netherlands: Kluwer Academic, 1993, pp. 92–4.
  14. Bellamy N, Buchanan W, Goldsmith C et al. Validation study of WOMAC: a health status instrument for measuring clinically important patient relevant outcomes to antirheumatic drug therapy in patients with osteoarthritis of the hip or knee. J Rheumatol 1988;15:1833–40.[ISI][Medline]
  15. Ware J, Snow K, Kosinski M et al. SF-36 Health Survey: manual and interpretation guide. Boston, MA: The Health Institute, New England Medical Center, 1993.
  16. Ware J, Sherbourne D. The MOS 36-item Short-Form Health Survey (SF-36): I. Conceptual framework and item selection. Med Care 1992;30:473–83.[ISI][Medline]
  17. McHorney C, Ware J, Lu J et al. The MOS 36-item Short-Form Health Survey (SF-36): III. Tests of data quality, scaling assumptions, and reliability across diverse patient groups. Med Care 1994;32:40–66.[ISI][Medline]
  18. McHorney C, Ware J, Raczek A. The MOS 36-item Short-Form Health Survey (SF-36): II. Psychometric and clinical tests of validity in measuring physical and mental health constructs. Med Care 1993;31:247–63.[ISI][Medline]
  19. Jenkinson C, Layte R, Lawrence K. Development and testing of the Medical Outcomes Study 36-item short form health survey summary scale scores in the United Kingdom. Results from a large-scale survey and a clinical trial. Med Care 1997;35:410–6.[ISI][Medline]
  20. Ware J, Kosinski M, Bayliss M et al. Comparison of methods for the scoring and statistical analysis of SF-36 Health Profile and summary measures: summary of results from the Medical Outcomes Study. Med Care1995;4:AS264–79.
  21. Ware J, Kosinski M, Keller S. SF-36 physical and mental summary scales: A user's manual. Boston: Health Institute, New England Medical Center, 1994.
  22. Ruta A, Hurst N, Kind P et al. Measuring health status in British patients with rheumatoid arthritis: reliability, validity and responsiveness of the short form 36-item health survey (SF-36). Br J Rheumatol 1998;37:425–36.[ISI][Medline]
  23. Brazier J, Jones N, Kind P. Testing the validity of the Euroqol and comparing it with the SF-36 health survey questionnaire. Qual Life Res 1993;2:169–80.[ISI][Medline]
  24. Brazier J, Walters J, Nicholl J, Kohler B. Using the SF-36 and the Euroqol on an elderly population. Qual Life Res 1996;5:195–204.[ISI][Medline]
  25. Jenkinson C, Gray A, Doll H et al. Evaluation of index and profile measures of health status in a randomized controlled trial. Comparison of the Medical Outcomes Study 36-Item Short Form Health Survey, EuroQol, and disease specific measures. Med Care 1997;35:1109–18.[ISI][Medline]
  26. Hurst N, Kind P, Ruta D et al. Measuring health-related quality of life in rheumatoid arthritis: validity, responsiveness and reliability of EuroQol (EQ-5D). Br J Rheumatol 1997;36:551–5.[ISI][Medline]
  27. The EuroQol Group. EuroQol—a reply and reminder. Health Policy 1992;20:329–32.[ISI]
  28. Froberg D, Kane R. Methodology for measuring health-state preferences II: scaling methods. J Clin Epidemiol 1989;42:459–71.[ISI][Medline]
  29. Hurst N, Jobanputra P, Hunter M et al. Validity of EuroQol—a generic health status instrument in patients with rheumatoid arthritis. Br J Rheumatol 1994; 33:655–62.[ISI][Medline]
  30. Boyd N, Sutherland H, Heasman K, Tritchler D, Cummings B. Whose utilities for decision analysis? Med Decision Making 1990;10:58–67.[ISI][Medline]
  31. Shiell A, Hawe P, Seymour J. Values and preferences are not necessarily the same. Health Econ 1997;6:515–8.[ISI][Medline]
  32. Froberg D, Kane R. Methodology for measuring health-state preferences—III: population and context effects. J Clin Epidemiol 1989;42:585–92.[ISI][Medline]
  33. Guccione A, Felson D, Anderson J, Anthony J et al. The effects of specific medical conditions on the functional limitations of elders in the Framingham study. Am J Public Health 1994;84:351–8.[Abstract]
Submitted 4 September 1998; revised version accepted 15 March 1999.