1 Department of Health Sciences and Clinical Evaluation, University of York, York YO1 5DD,
2 Interdisciplinary Research Centre in Health, Physiotherapy and Dietetics Subject Group, School of Health and Social Sciences, Coventry University, Coventry CV1 5FB,
3 Unit of Health-Care Epidemiology, Institute of Health Sciences, University of Oxford, Oxford OX3 7LF,
4 Department of Physiotherapy Studies and Primary Care Sciences Research Centre, Keele University, Staffordshire ST5 5BG and
5 Staffordshire Rheumatology Centre, High Lane, Burslem, Stoke-on-Trent, Staffordshire ST6 7AG, UK
![]() |
Abstract |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Methods. Instruments were administered by means of a self-completed questionnaire to AS patients recruited from across the United Kingdom. Instruments were assessed for data quality and scaling assumptions. Testretest reliability was assessed in those patients reporting no change in general health at 2 weeks. The convergent validity of both instruments was assessed and scores were correlated with responses to health transition questions. Responsiveness was assessed for patients reporting change in health at 6 months.
Results. The instruments had high completion rates. Although slightly skewed towards better levels of health, scores covered the available range for both sections of the EuroQol [EQ-5D and EQ-visual analogue scale (EQ-VAS)]. Score distributions approximated normality for the SF-12. Testretest reliability estimates support the use of both instruments in group evaluation and the SF-12 Physical Component Summary score (PCS) in individual evaluation (>0.90). Correlations between instruments were in the hypothesized direction and were of a moderate level. The EQ-VAS had the strongest linear relationship, with responses to both specific and general health transition questions (P<0.01). The EQ-VAS and SF-12 PCS were the most responsive instruments. The EQ-5D was the least responsive instrument.
Conclusion. The instruments have undergone a comprehensive comparative evaluation to assess the measurement properties required for patient-assessed measures of health outcome in AS. Adequate levels of acceptability, reliability and validity were found for both instruments. Although evidence supporting instrument responsiveness was strong for the EQ-VAS and SF-12 PCS, it was very weak for the EQ-5D and SF-12 Mental Component Summary Scale (MCS). The EQ-VAS and SF-12 PCS can both be recommended for use in group evaluation, and the SF-12 PCS is recommended in routine practice or research. However, the lower reliability of the SF-12 MCS and the limited ability of both the EQ-5D and SF-12 MCS to detect change in health may restrict these roles.
KEY WORDS: Ankylosing spondylitis, Generic instruments, Patient-assessed health outcome, Reliability, Responsiveness, Validity.
![]() |
Introduction |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Two broad approaches to measuring patient perceptions of health-related quality of life (HRQL) can be described: generic instruments that provide a broad summary of HRQL, and specific instruments that focus on issues of relevance to a specific disease or patient group. The application of disease-specific, patient-assessed measures of outcome in the evaluation of patients with ankylosing spondylitis (AS) has continued to grow, but little consideration has been given to the role of generic instruments [3].
Generic instruments are not age-, disease- or treatment-specific, and contain multiple HRQL concepts of relevance to patients and the general population, supporting application in both populations [5, 6]. Population-based normal values can be calculated, which supports data interpretation from disease-specific groups [7]. Two classes of generic instrument can be described: health profiles and utility measures. Scores on different domains of HRQL covered by a single health profile are presented separately to support data interpretation, therefore reflecting a clinical perspective [7]. Sometimes a single or summary score may be generated, but proponents for profiles argue that measurement is most meaningful within separate domains. The Short Form 36-item Health Survey Questionnaire (SF-36) is a widely used example of a generic health profile [6]. The items cover eight domains of HRQL, including physical and social functioning and mental health. Responses to each item are summed (0100), where 0 is the worst possible HRQL, and 100 the best. Mental (MCS) and physical (PCS) component summary scales may also be generated. Population norms have been calculated in several countries [6, 7].
The values and preferences for outcome generated by the patient (direct weighting) or the general population (indirect weighting) provide external weightings for utility measurement [8]. Although utility measures can cover several domains of HRQL, the weighting generates a single index that relates HRQL to death (0) or perfect health (1) [9]. The EuroQol (EQ-5D) is an example of a utility measure that incorporates indirect valuations of health states [10]. A benefit of utility measures is the recommendation for use in costutility economic analysis, but a disadvantage is that the single score limits data interpretation [8, 11].
Specific instruments may be specific to a particular disease (e.g. AS), to a patient population (e.g. child health), to a specific problem (e.g. pain, limited range of movement), or to a described function (e.g. functional ability) [9]. For example, the Revised Leeds Disability Questionnaire (RLDQ) is an AS-specific measure of functional disability [12]. Disease-specific instruments may have greater clinical appeal due to the specificity of content, and an associated increased responsiveness to specific change in condition [9, 13]. However, the broad content of generic instruments supports identification of co-morbid features and treatment side-effects that may not be captured by specific instruments, but this may reduce instrument responsiveness to small but important changes. Their combined use has therefore been recommended in the evaluation of health outcome [7, 9]. Although widely applied in rheumatoid arthritis (RA) [14, 15], there is limited published evidence for the measurement properties of generic instruments in patients with AS [4].
The study reported here describes the first application of two widely used patient-assessed generic measures of health outcome in patients with AS: the EuroQol [10] and the Short-Form 12-item Health Survey Questionnaire (SF-12) [16]. The instruments are compared for acceptability, data quality and measurement properties following self-completion by AS patients recruited from UK rheumatology centres, and recommendations for use in routine practice and clinical research are made.
![]() |
Methods |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
The postal questionnaire, which included the two generic measures of health outcome, two health transition items and sociodemographic questions, was sent to 451 patients. Patients not wishing to participate were asked to return uncompleted and pre-coded questionnaires using a reply-paid envelope. Non-responders were sent reminders at 2 and 4 weeks.
Patient-assessed generic instruments
The EuroQol [10] and the SF-12 [16] were identified as two short and comprehensive generic approaches to assessing overall health, a selection supported by a recent review of generic instruments [18]. Although not applied in the evaluation of AS, both instruments have good evidence of measurement properties when applied in the evaluation of patients with disease of a similar nature to AS [15, 19].
The EuroQol includes health state valuations and therefore has greater potential than the SF-12 for application in economic evaluation [8, 11]. It has two sections: the first (EQ-5D) has five items covering the domains of mobility, self-care, usual activity, pain/discomfort and anxiety/depression. Each item has a three-point scale from 1 (no problems) to 3 (inability/extreme problems). Scores range from -0.59 to 1.00, where 1.00 is perfect health and a score <0 is considered worse than death. The second section (EQ-VAS) includes a vertical visual analogue scale (VAS) on which the patient rates their overall health today from 0 (worst imaginable) to 100 (best imaginable).
The SF-12 comprises 12-items derived from the SF-36, and uses descriptive responses. It produces mental and physical health summary scales based on scores for the general population [range 0100, mean 50, standard deviation (S.D.) 10], where a higher score indicates a better HRQL [16]. Each instrument requires an approximate completion time of 2 min [10, 16].
Instrument evaluation
Data quality. Individual items within each instrument were assessed for missing data, the distribution and symmetry of item response scores and endorsement frequencies.
Reliability. Testretest reliability of both instruments was assessed for those patients indicating that their general health had remained the same at 2 weeks on a general health transition question [20, 21]. Participants completed a second questionnaire at 2 weeks. The intraclass correlation coefficient [(ICC) 2,1] [22] was used to measure the agreement between test and retest [23]. For group comparisons, levels of reliability >0.70 are required [6, 20], and for the evaluation of individuals levels of >0.90 have been recommended [20, 21].
Validity. Construct validity was assessed by correlating the scores for the separate sections of both instruments to assess the convergent validity of related dimensions (Pearson's correlation coefficient). Hypothesized theoretical relationships between both instruments were considered a priori. The EuroQol and the SF-12 have similar item content, and so a moderate correlation is hypothesized between the EuroQol and the SF-12 PCS (>0.5), and a small to moderate correlation with the SF-12 MCS (0.30.5). Evidence suggests that a small to moderate correlation is expected between the EQ-5D and EQ-VAS (0.30.5) [24], and a small correlation is expected between the two components of the SF-12 (<0.3) [6].
Validity was assessed further in relation to occupational status. Patients reporting an inability to work due to ill health were expected to have scores reflecting poorer health than their counterparts. t-Tests were used to test for differences in scores.
For purposes of assessing longitudinal construct validity, instrument scores were compared with self-reported AS-specific and general health transition at 6 months (Compared with 6 months ago, how would you rate your AS/general health now: much better, somewhat better, about the same, somewhat worse, much worse?). Changes in instrument scores and patient responses to the transition questions were assessed for linear trend [13]. To the extent that a patient-assessed instrument is a valid measure of health capable of measuring change, a strong association with a patient-reported health transition item is expected [13, 25].
Responsiveness. Both instruments were compared for responsiveness to change over the 6-month period by calculating the modified standardized response mean (MSRM), which is equal to the mean change in scores divided by the standard deviation of change scores in patients defined as stable at 6 months [23]. MSRMs were calculated for patients reporting an improvement or deterioration in health on health transition (general or AS-specific).
![]() |
Results |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
The majority of patients were male (n=259; 74.2%) with a mean age of 46.1 yr (S.D. 12.6, range 1875 yr) (Table 1). The mean symptom duration of participants was 19.8 yr (S.D. 11.8, range 156 yr), suggesting a broad spectrum of disease presentation.
|
Instrument evaluation
Data quality. The item and scale properties of both instruments are shown in Table 2. The levels of missing data for the five items of the EuroQol EQ-5D range from 0 to 0.9%. Scale scores were computable for 98.0% of patients. The most frequently omitted items were 4 (pain/discomfort; 0.9%) and 5 (anxiety/depression; 0.9%) (Table 2
). The majority of patients responded using the first two response options. However, no item produced an end effect or level of endorsement of >80%. The broad range of index scores was found, although the distribution was slightly skewed towards better health states. Twenty-five patients (7.3%) scored at the ceiling of the scale range (best possible health state) and two patients scored at the floor (0.6%; worst possible health state).
|
The most frequently omitted items from the SF-12 were items 7 (didn't do work or other activities as carefully as usual: emotional) (2.6%) and 5 (were limited in the kind of work or other activities: physical) (2.0%) (Table 2). The response options for all items were covered and no item produced an end effect >80%. A total of 331 (94.8%) patients completed all items in the SF-12, a necessary requirement to calculate a scale score. A wide range of MCS and PCS scores were found. The mean score for the MCS was 45.7 (S.D. 11.6), a value that was closer to the mean population score (50, S.D. 10) than the PCS mean value of 35.8 (S.D. 11.0). The scores for the MCS approximated normality. Scale sores for the PCS were slightly skewed towards poorer health.
Reliability. The testretest reliability of both instruments was assessed by correlating the two sets of scores for those patients who indicated no change on the general health transition question. The intraclass correlations are shown in Table 2. Both instruments have levels of testretest reliability that makes them suitable for use in groups of patients, and the SF-12 PCS has levels that may support use in individual evaluation.
Validity. The results of the correlations between both instruments are shown in Table 3. Moderate correlations between both sections of the EuroQol and the SF-12 PCS were found. This was as hypothesized, as the majority of the EuroQol questions are concerned with physical health. Small to moderate correlations were found between both sections of the EuroQol and the SF-12 MCS. A very small correlation between the two sections of the SF-12, and a moderate to large correlation between the EQ-5D and EQ-VAS was found.
|
|
|
Responsiveness. The results of responsiveness testing are shown in Table 5. The EQ-VAS produced the largest levels of responsiveness for groups of patients when compared with the SF-12 in those patients reporting improvement or deterioration in general health and specific health. The SF-12 PCS also produced moderate levels of responsiveness for groups of patients reporting improvement or deterioration in either general or specific health. On the whole, both instruments were more responsive to changes in general health than changes in specific health. In patients reporting an improvement or deterioration in health, responsiveness statistics >0.5 were found for the EQ-VAS, representing a level of change that is at least one half of a standard deviation of the change scores for stable patients. Low levels of responsiveness were found for both the EQ-5D and SF-12 MCS for groups of patients reporting change in general or specific health.
![]() |
Discussion |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Both sections of the EuroQol had high completion rates and levels of reliability that support application of the instrument in group evaluation. Evidence supports the validity of the instrument as a measure of generic HRQL in AS. The EQ-VAS is responsive to both improvement or deterioration in AS-specific or general health over 6 months. However, the EQ-5D was not responsive to change in general or specific health. Due to the poor levels of responsiveness found for the EQ-5D, only the EQ-VAS can be recommended for the evaluation of groups in both routine practice and clinical research in AS.
Completion rates of the SF-12 were satisfactory although lower than the EuroQol. Items relating to limitation in work or usual activities due to emotional or physical health problems were most frequently omitted. Non-completion of these items in the SF-12 and in the parent instrument, the SF-36, has been reported previously in patients with RA [15, 19]. Although the omission of individual items was not high (range 0.62.6%), the omission of a single item prevents the calculation of a final score in the SF-12 (5.2%), whereas patients may omit up to half of the items within each domain of the SF-36 without jeopardizing a final score [6]. Reliability of the SF-12 PCS supports application in individual evaluation, but the MCS should only be used in the evaluation of groups. Evidence of validity supports its role as a generic measure of HRQL in AS. The levels of responsiveness for the SF-12 PCS were similar to those found for the EQ-VAS, and both instruments had a stronger relationship with general health transition than with change in specific health. However, the SF-12 PCS was less able to detect deterioration in general or AS-specific health than the EQ-VAS. Low levels of responsiveness were found for the SF-12 MCS. Where a limited health profile is required with minimal respondent burden, the SF-12 is recommended in routine practice or research. However, the requirement for all items to be completed to produce the final score, along with the lower reliability and limited ability to detect change in health associated with the MCS, may limit this role.
Both instruments are relatively brief and simple to complete, but due to the external weighting of the EuroQol EQ-5D and the scoring algorithms of the SF-12, computer-based scoring of these instruments is recommended which may reduce the feasibility of including these instruments in routine clinical evaluation.
Few studies have compared the EuroQol and the SF-12 [26]. Although both measure generic HRQL, the moderate correlation between instruments in the current study suggests that they measure different aspects of HRQL. Selection of the SF-12 is supported by the results of the current study. The SF-12 has good levels of reliability that support the use of the instrument in group evaluation, and the use of the PCS in individual evaluation. There is also good evidence of responsiveness to change in general and specific health over 6 months. Although the EuroQol has greater discriminatory power when sociodemographic variables are assessed, only the EQ-VAS demonstrates a greater responsiveness to change in AS or general health over 6 months than the SF-12. The inability of the EQ-5D to detect change in health is an important consideration if the instrument is used to assess AS relative to other disorders within health care.
The study does not provide sufficient information to determine why the EQ-5D was not responsive to change in health over the 6-month period. In conditions with a similar impact on function as AS such as RA [14] and angina [8], good levels of responsiveness for both sections of the EuroQol have been demonstrated over a similar time period. However, in conditions with a lesser impact on functioning, such as obstructive sleep apnoea, the EuroQol has not demonstrated such satisfactory responsiveness [27].
The SF-12 was identified for the current study in preference to the parent instrument, the SF-36, because of the lower respondent burden. However, the SF-36 describes both component summary scores and a profile across eight domains of health. Future studies should consider the advantages of the additional information provided by the SF-36 against instrument acceptability and feasibility. The measurement properties of the SF-36 have not been rigorously tested in AS and it should be compared with the EuroQol to provide a further assessment of the role and usefulness of generic profile and utility measures in AS.
Although the domains addressed by the generic instruments may not be expected to change over the relatively short period of the study in patients with stable AS, the 6-month period reflects normal practice in the routine evaluation of AS in many rheumatology centres [3, 28]. The level of responsiveness found in the EQ-VAS or SF-12 PCS may make either instrument suitable for routine monitoring of health outcome in the longitudinal evaluation of AS, where routine management may result in subtle changes in HRQL. In addition, levels of reliability suggest that the SF-12 PCS may be suitable for individual assessment. However, the SF-12 PCS represents only half of the items included in the SF-12 and respondent burden should be taken into account when considering the use of multi-item instruments in research or routine practice.
In conclusion, both generic instruments demonstrated high levels of acceptability and reliability. Good evidence for the validity of both instruments as generic measures of HRQL was also found. The SF-12 and the EuroQol-VAS were the most responsive instruments over the 6-month period. It is recommended that short-form generic instruments such as the EuroQol, the SF-36 or SF-12 should be used alongside disease-specific instruments to provide a standardized generic measure of HRQL, and for purposes of normative comparison and economic evaluation [8, 9]. However, further evaluation of the role of these instruments in clinical research and routine practice is required.
![]() |
Acknowledgments |
---|
![]() |
Notes |
---|
![]() |
References |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|