St George Hospital/University of NSW, Sydney, Australia
Correspondence to:
M. Fransen, Department of Rheumatology, St George Hospital, Gray Street, Kogarah 2217, Australia.
![]() |
Abstract |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Methods. Eighty-two patients with OA knee were asked to complete on two occasions, separated by 1 week, the EQ-5D, the Western Ontario and McMaster Universities (WOMAC) Osteoarthritis Index and the 36-item short form of the Medical Outcomes Study (SF-36).
Results. In this patient population, <10% of the 243 EQ-5D health states were active. The EQ-5D demonstrated a non-Gaussian distribution. Reliability [intraclass correlation coefficient (ICC)=0.70] is acceptable for aggregate level data. There were significant rank correlations with both the WOMAC and SF-36.
Conclusions. This study provides some evidence of EQ-5D construct validity and reliability. However, the restricted and non-normal distribution of scores, the marked difference between patients' self evaluation and derived societal utility tariffs, as well as the lack of discriminative ability for patients with `moderate' morbidity within each of the five EQ-5D dimensions, are of concern.
KEY WORDS: EuroQol, OA knee, QALY, Utility, WOMAC, SF-36, Health status measures, Health index.
![]() |
Introduction |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Within a QALY framework, the quality adjustment factor is an estimate of the utility associated with preferences for different health states [46]. The strength of an individual's preference or utility is usually elicited using either the standard gamble or the time trade-off method. The standard gamble technique estimates utility under conditions of risk or uncertainty, whereas the time trade-off incorporates choice under certainty. The numbers or weightings representing the strength of an individual's preferences for an experienced or described health state are scored between 0 (death or worst imaginable health state) and 1 (full health or best imaginable health state). The quality adjustment is then multiplied by the expected life years in the assessed health state to arrive at the number of QALYs achieved. The expected utility associated with a health care intervention is then the sum of the probability of entering a health state multiplied by the utility (QALY) associated with that state.
The aim of this study was to assess the measurement reliability and validity of the EuroQol (EQ-5D), an instrument designed to derive from five dimensions of health (mobility, self-care, usual activities, pain, mood) a single cardinal index for the quality weighting of QALYs. The EQ-5D uses valuations derived with the time trade-off method from a large general population survey to score the five-dimension health profile self-reported, in this study, by subjects with osteoarthritis of the knee (OA knee). The numerous ethical implications of using QALYs as a basis for resource allocation or the valuation validity of the time trade-off method were beyond the scope of this study [7].
![]() |
Methods |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Health status instruments
The EuroQol (index measure).
The EuroQol instrument `has been purposefully developed to generate a cardinal index of health' primarily for evaluative studies and policy research and `is intended to complement other forms of quality of life measures' [9, 10]. The self-report questionnaire has two sections. The first part (EQ-5D) consists of five questions covering the dimensions of mobility, self-care, usual activities, pain/discomfort and anxiety/depression, each with three levels of response (Appendix 1). The time frame is given as `... your own health state today'. The resultant five-part health profile (e.g. 21122, 11223, etc.) is weighted according to EuroQol Group guidelines. The tariffs, or weights, for any of the theoretically possible 245 health profiles have been derived from the latest UK population survey (n=3337) using the time trade-off valuation method [11, 12], to create a single index of quality of life ranging from 0 to 1. The second part (EQ-VAS) of the EuroQol consists of a 20 cm vertical visual analogue scale (VAS) ranging from 100 (best imaginable health state) to 0 (worst imaginable health state). The EQ-VAS gives a self-assessed measure of overall health state. The EQ-VAS does not express any longevity trade-offs or uncertainty and therefore does not have utility valuation validity, and is not intended as quality adjustments for QALYs.
WOMAC (disease-specific measure).
The WOMAC Osteoarthritis Index is a disease-specific self-report questionnaire assessing pain (five questions), stiffness (two questions) and physical disability (17 questions) in patients with OA of either the hip or the knee [13, 14]. The time frame is given as `currently experiencing'. Responses to the questions are rated by the subject on 100 mm VAS ranging from 0 (indicating no pain, stiffness or difficulty) to 100 (indicating extreme pain, stiffness or difficulty). The two questions concerning stiffness are currently being replaced by the originator and will not be analysed further in this paper.
SF-36 (generic profile measure).
The SF-36 [15, 16] is a widely used self-report generic health status questionnaire measuring eight dimensions of health status: physical functioning (10 questions), role limitations due to physical health problems (four questions), bodily pain (two questions), social functioning (two questions), general mental health (five questions), role limitations due to emotional problems (three questions), vitality (four questions) and general health perceptions (five questions). The time frame is given as `during the past 4 weeks'. Each of the eight subscales generates a score from 100 to 0 with 100 indicating best health and 0 indicating worst health. An additional question, not scored, relates to perceived change in health status over the past year. Most of the SF-36 subscales have been validated for diverse patient groups [1719] and the SF-36 can be completed within 10 min by most people. Recently, the originators of the SF-36 have developed algorithms to calculate two psychometrically based summary measures: the Physical Component Summary Scale Score (PCS) and the Mental Component Summary Scale Score (MCS) [20, 21]. The PCS and MCS are norm-based scores. A linear T-score transformation method is used so that both the PCS and MCS have a mean of 50 and a S.D. deviation of 10 in the general US population. The PCS and the MCS provide greater precision, reduce the number of statistical comparisons needed, and eliminate the floor and ceiling effects noted in several of the subscales [17, 19, 2224].
Analysis
As the EQ-5D is intended for use in evaluative studies and policy research rather than as a clinical tool, only the reliability of aggregate level data is presented and compared with the WOMAC and SF-36. Systematic error or agreement have been analysed using the difference scores with their associated 95% confidence intervals (CIs). Reliability was analysed using intraclass correlation coefficients (ICCs).
Establishing the criterion or content validity of an instrument claiming to measure health-related quality of life is difficult as there are no established gold standards for comparison. Evidence for construct validity can only be accumulated by `a priori' hypothesized patterns of associations with other validated instruments. In this study, it is hypothesized that associations with the disease-specific WOMAC scores will be weaker than associations with the generic SF-36 scores or the EQ-VAS in this older sample with probable co-morbidities.
To investigate discriminative validity, an analysis was made to assess whether the three response levels available within each of the five health dimensions of the EQ-5D would accurately discriminate symptom severity or functional disability as rated by comparable generic SF-36 scores.
Data analysis was carried out using SAS Proprietary Software Release 6.12 (SAS Institute Inc., Cary, NC 27511, USA).
![]() |
Results |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
|
|
|
|
Discriminant.
Discriminant validity was analysed with what were considered comparable scores on the generic SF-36 questionnaire (Table 4). The EQ-5D does appear to be able to discriminate accurately within each separate health state descriptor.
|
![]() |
Discussion |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Whilst only one of the EQ-5D dimensions demonstrated a marked ceiling effect in this sample with `moderate' morbidity (Table 4), the crudeness of the levels means that someone responding with moderate levels of morbidity `is not permitted by the descriptive system to describe a further deterioration in health that is anything less than a major collapse' [25]. Similarly, the EQ-5D descriptive system allows patients with `moderate' morbidity only to describe a full recovery. A large study involving patients with rheumatoid arthritis (RA) did claim to demonstrate that the EQ-5D was `as responsive to self-reported clinical change ... as many of the condition-specific instruments' [26]. The RA patients were selected to `obtain a broad cross-section of disease severity'; however, approximately half of the sample had severe morbidity with a baseline median EQ-5D score of
0.12.
The bimodal distribution of the EQ-5D scores reflects the relatively large decrement in societal valuations from level 2 to level 3 compared with decrements from level 1 to level 2. For example, a fall in Pain/discomfort only from level 1 to level 2 decreases the derived societal index by 0.15, whilst a fall from level 2 to level 3 decreases the valuation by 0.45, or nearly half the total score range. Decrements from level 2 to level 3 in the other four dimensions range from 0.25 to 0.43 for Mobility. The population tariffs therefore reveal that there is a huge difference in health state utility between `moderate pain or discomfort' and `extreme pain or discomfort'. Similarly for mobility, there is a large unassessed area between `some problems walking about' and `confined to bed'. Whilst the EuroQol instrument makes no claims that each of the five dimensions has interval properties [9, 27], the disadvantages of the EuroQol descriptive system for other patients with moderate levels of morbidity were well demonstrated in a recent randomized controlled trial involving patients with benign prostatic hypertrophy [25]. An additional level between level 2 and level 3 would potentially overcome the limited and abnormal distribution of scores in patient populations with `moderate' morbidity. It is suggested that this could be achieved without the EQ-5D become burdensome and losing its aim of being suitable as an `add-on to other data collection' [27].
Reliability
The ICCs of the EQ-5D (Table 2) could be considered modest considering the short testretest interval of 1 week and may not be adequate for studies using a similar, relatively small sample. For aggregate analysis, however, it is claimed that a reliability of >0.50 may be acceptable with a large sample size [2]. However, ICC analysis does result in lower values compared with the Pearson equivalent as ICCs account for any additive or multiplicative element [3]. Furthermore, the EQ-5D (and the WOMAC pain) are each derived from only five scores, whilst the WOMAC physical function score is a mean of 17 VAS ratings, and the SF-36 PCS and SF-36 MCS are each derived from 35 questions. The inclusion of a greater number of subscales necessarily results in a more stable score, less susceptible to measurement error. This possibly partly explains the superior reliability demonstrated by the WOMAC physical function score and the SF-36 scores.
Modest reliability may, however, just reflect fluctuating symptomatology characteristic of OA knee as both the EuroQol and the WOMAC ask the responders to consider their `health state today' or `recently experienced'. In contrast, the SF-36 asks responders to consider `the past 4 weeks', potentially allowing greater 1 week testretest reliability compared with the EuroQol and the WOMAC.
Validity
Construct validity cannot be proved definitively, but it is a continuing process of accumulating evidence [28]. This study provides some evidence of concurrent validity with significant and mostly stable associations between the EQ-5D and the other health status questionnaires on both assessment occasions (Table 3).
The EQ-5D was easily able to discriminate symptom severity as scored by the chosen SF-36 scores, but discriminative ability was crude as it was essentially limited to two levels within each of the five health dimensions for this sample (Table 4).
Although mental health is an important health determinant, for a disease such as OA knee it is noteworthy that the EQ-5D appears to demonstrate a stronger association with a patient's emotional state (SF-36 MCS) than with a patient's physical state (SF-36 PCS) (Table 3). In fact, it has been claimed that `the attenuation of validity coefficients due to unreliability implies that a correlation of 0.60 between two measures represents an extremely strong association' ([2], p. 34). More than 40% of this sample, referred for physiotherapy, reported having at least moderate levels of anxiety/depression on the EQ-5D (Table 4
). Demonstrated mental distress may have been the stimulus for physicians to refer particularly these patients with OA knee for active physical intervention. Therefore, although exclusion co-morbidities [neurological disorders affecting gait, joint arthroplasty in the lower limbs or unstable cardiac conditions that preclude aerobic exercise at a very moderate level (5060% maximum heart rate)] were limited, this sample may not be truly representative of community-dwelling people with OA knee. Hurst et al. [29] found in a pilot study of 55 patients with RA that `change in mood' was the strongest predictor of change in the EQ-5D. It may be that a patient's emotional state is a good index of disease severity in OA knee. If, however, a patient's emotional state markedly influences their physical health status perception, the resultant random measurement error would restrict the validity of the EQ-5D or other self-report questionnaires to only relatively large studies.
General points
Our subjects experienced two main problems with the wording of the EuroQol. Many subjects found the grouping together of the words pain and discomfort confusing. Discomfort was perceived as a mild pain; therefore, extreme discomfort was equated by many as moderate pain. There were also some interpretation difficulties concerning the term `walking about'how far? include stairs? This may explain the low testretest association for Mobility (Spearman's rho=0.29) compared with the other four dimensions (Spearman's rho range 0.500.60).
In agreement with other studies [26, 30], it would appear that patients in this study placed a higher valuation on their own health state than that given by the general population tariffs. If the two EuroQol scores are made mathematically comparable by multiplying the EQ-5D score by 100, the poor agreement between the patient's rating (median score 74.0) of their own general health state and the valuation given by the general population survey (median score 66.0) is highlighted. It could easily be argued that agreement between the scores is not an issue as there is a fundamental difference between a simple `risk-free' valuation of a health state (EQ-VAS) and a utility measure of preferences involving a longevity trade-off (EQ-5D). Furthermore, it has been well documented that comparable time frame, context and prognosis information are crucial when evaluating a health state [7]. Responders in the population survey were told that each health state, derived from the EuroQol descriptive system (Appendix 1), was to be regarded as lasting for 10 yr without change, followed by death, and were given no information regarding specific disease context [11, 12]. In contrast, the study participants have context knowledge, but uncertain and differing opinions on their probable disease progression rate and life expectancy. It is probable, however, that the study participant's median assessment of 74.0 would be even higher if longevity trade-offs were involved in the valuation process. The validity and wider ethical implications of using non-contextual health state descriptors to elicit quality of life valuations in the general population, with little direct experience of various levels of morbidity and little opportunity for reflection, have frequently been questioned [7, 31, 32].
OA knee has been identified as one of the five diseases responsible for the greatest proportion of physical disability in non-institutionalized elderly men and women [33]. `Moderate pain/discomfort', `some problems walking about' and `some problems with usual activities' cover an enormous spectrum of pain and disability. In fact, `some problems walking about' in the EQ-5D covers the entire range of gait disability as it is the only level between `no problems' and `confined to bed'! Similarly, `some problems with performing my usual activities' spans the entire spectrum of activities of daily life dysfunction from people able to live entirely independently, and even provide support for others, to those people needing maximum support of carers, equipment and/or finances. The probable limited responsiveness of the EQ-5D to clinically significant changes will be a major problem for patients with diseases of mostly `moderate' chronic morbidity, such as OA knee. QALYs already discriminate against the elderly as, with their decreased life expectancy, they have decreased potential to accrue QALYs. An unresponsive measure of quality of life for chronic `moderate' morbidity will further disadvantage patients with OA knee in resource allocation decisions.
![]() |
Conclusions |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
![]() |
Appendix 1 |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Mobility
I have no problems in walking about
I have some problems in walking about
I am confined to bed
Self-Care
I have no problems with self-care
I have some problems washing or dressing myself
I am unable to wash or dress myself
Usual Activities (e.g. work, study, housework, family or leisure activities)
I have no problems with performing my usual activities
I have some problems with performing my usual activities
I am unable to perform my usual activities
Pain/Discomfort
I have no pain or discomfort
I have moderate pain or discomfort
I have extreme pain or discomfort
Anxiety/Depression
I am not anxious or depressed
I am moderately anxious or depressed
I am extremely anxious or depressed
Compared with my general level of health over the past 12 months, my health state today is:
Better
Much the same
Worse
![]() |
References |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|