Generic measures of health-related quality of life in ankylosing spondylitis: reliability, validity and responsiveness

K. L. Haywood1,2,, A. M. Garratt3, K. Dziedzic4 and P. T. Dawes5

1 Department of Health Sciences and Clinical Evaluation, University of York, York YO1 5DD,
2 Interdisciplinary Research Centre in Health, Physiotherapy and Dietetics Subject Group, School of Health and Social Sciences, Coventry University, Coventry CV1 5FB,
3 Unit of Health-Care Epidemiology, Institute of Health Sciences, University of Oxford, Oxford OX3 7LF,
4 Department of Physiotherapy Studies and Primary Care Sciences Research Centre, Keele University, Staffordshire ST5 5BG and
5 Staffordshire Rheumatology Centre, High Lane, Burslem, Stoke-on-Trent, Staffordshire ST6 7AG, UK


    Abstract
 Top
 Abstract
 Introduction
 Methods
 Results
 Discussion
 References
 
Objective. To assess the acceptability and measurement properties of two generic measures of health-related quality of life (HRQL): the EuroQol and the Short Form 12-item Health Survey Questionnaire (SF-12) in ankylosing spondylitis (AS).

Methods. Instruments were administered by means of a self-completed questionnaire to AS patients recruited from across the United Kingdom. Instruments were assessed for data quality and scaling assumptions. Test–retest reliability was assessed in those patients reporting no change in general health at 2 weeks. The convergent validity of both instruments was assessed and scores were correlated with responses to health transition questions. Responsiveness was assessed for patients reporting change in health at 6 months.

Results. The instruments had high completion rates. Although slightly skewed towards better levels of health, scores covered the available range for both sections of the EuroQol [EQ-5D and EQ-visual analogue scale (EQ-VAS)]. Score distributions approximated normality for the SF-12. Test–retest reliability estimates support the use of both instruments in group evaluation and the SF-12 Physical Component Summary score (PCS) in individual evaluation (>0.90). Correlations between instruments were in the hypothesized direction and were of a moderate level. The EQ-VAS had the strongest linear relationship, with responses to both specific and general health transition questions (P<0.01). The EQ-VAS and SF-12 PCS were the most responsive instruments. The EQ-5D was the least responsive instrument.

Conclusion. The instruments have undergone a comprehensive comparative evaluation to assess the measurement properties required for patient-assessed measures of health outcome in AS. Adequate levels of acceptability, reliability and validity were found for both instruments. Although evidence supporting instrument responsiveness was strong for the EQ-VAS and SF-12 PCS, it was very weak for the EQ-5D and SF-12 Mental Component Summary Scale (MCS). The EQ-VAS and SF-12 PCS can both be recommended for use in group evaluation, and the SF-12 PCS is recommended in routine practice or research. However, the lower reliability of the SF-12 MCS and the limited ability of both the EQ-5D and SF-12 MCS to detect change in health may restrict these roles.

KEY WORDS: Ankylosing spondylitis, Generic instruments, Patient-assessed health outcome, Reliability, Responsiveness, Validity.


    Introduction
 Top
 Abstract
 Introduction
 Methods
 Results
 Discussion
 References
 
Ankylosing spondylitis (AS) is a systemic and inflammatory disease predominantly affecting the axial skeleton and thoracic cage [1]. The associated pain and stiffness are symptoms frequently mentioned by patients [2], closely followed by reports of fatigue and sleep disturbance [3]. However, AS often has a significant impact upon an individual's health-related quality of life (HRQL) [3]. For example, uncontrolled disease may threaten the ability of a young man to remain in employment, and hence to support his wife and family both financially and emotionally. Consequently, the ability to fulfil his role as a husband, a father and a member of society is challenged. Traditional methods of evaluation, with a focus on the locomotor system and measures of impairment, may fail to describe the extensive multi-dimensional issues associated with the disease.

Two broad approaches to measuring patient perceptions of health-related quality of life (HRQL) can be described: generic instruments that provide a broad summary of HRQL, and specific instruments that focus on issues of relevance to a specific disease or patient group. The application of disease-specific, patient-assessed measures of outcome in the evaluation of patients with ankylosing spondylitis (AS) has continued to grow, but little consideration has been given to the role of generic instruments [3].

Generic instruments are not age-, disease- or treatment-specific, and contain multiple HRQL concepts of relevance to patients and the general population, supporting application in both populations [5, 6]. Population-based normal values can be calculated, which supports data interpretation from disease-specific groups [7]. Two classes of generic instrument can be described: health profiles and utility measures. Scores on different domains of HRQL covered by a single health profile are presented separately to support data interpretation, therefore reflecting a clinical perspective [7]. Sometimes a single or summary score may be generated, but proponents for profiles argue that measurement is most meaningful within separate domains. The Short Form 36-item Health Survey Questionnaire (SF-36) is a widely used example of a generic health profile [6]. The items cover eight domains of HRQL, including physical and social functioning and mental health. Responses to each item are summed (0–100), where 0 is the worst possible HRQL, and 100 the best. Mental (MCS) and physical (PCS) component summary scales may also be generated. Population norms have been calculated in several countries [6, 7].

The values and preferences for outcome generated by the patient (direct weighting) or the general population (indirect weighting) provide external weightings for utility measurement [8]. Although utility measures can cover several domains of HRQL, the weighting generates a single index that relates HRQL to death (0) or perfect health (1) [9]. The EuroQol (EQ-5D) is an example of a utility measure that incorporates indirect valuations of health states [10]. A benefit of utility measures is the recommendation for use in cost–utility economic analysis, but a disadvantage is that the single score limits data interpretation [8, 11].

Specific instruments may be specific to a particular disease (e.g. AS), to a patient population (e.g. child health), to a specific problem (e.g. pain, limited range of movement), or to a described function (e.g. functional ability) [9]. For example, the Revised Leeds Disability Questionnaire (RLDQ) is an AS-specific measure of functional disability [12]. Disease-specific instruments may have greater clinical appeal due to the specificity of content, and an associated increased responsiveness to specific change in condition [9, 13]. However, the broad content of generic instruments supports identification of co-morbid features and treatment side-effects that may not be captured by specific instruments, but this may reduce instrument responsiveness to small but important changes. Their combined use has therefore been recommended in the evaluation of health outcome [7, 9]. Although widely applied in rheumatoid arthritis (RA) [14, 15], there is limited published evidence for the measurement properties of generic instruments in patients with AS [4].

The study reported here describes the first application of two widely used patient-assessed generic measures of health outcome in patients with AS: the EuroQol [10] and the Short-Form 12-item Health Survey Questionnaire (SF-12) [16]. The instruments are compared for acceptability, data quality and measurement properties following self-completion by AS patients recruited from UK rheumatology centres, and recommendations for use in routine practice and clinical research are made.


    Methods
 Top
 Abstract
 Introduction
 Methods
 Results
 Discussion
 References
 
Data collection
Following published work evaluating patient-assessed measures of health outcome, a population of >400 patients was deemed acceptable for the postal survey [13, 14]. A simple random sample of patients with a confirmed diagnosis of AS (Modified New York Criteria, 1984) [17], registered with one of a group of specialist centres of rheumatology in England and Scotland and aged between 18 and 75 yr, were invited to participate in the study. Pregnancy, learning difficulties or an inability to comprehend written English were study exclusion criteria. The survey used a multi-centre study design and was approved by the Northern and Yorkshire multi-centre research ethics committee and relevant local ethics committees. Written consent was gained from all patients.

The postal questionnaire, which included the two generic measures of health outcome, two health transition items and sociodemographic questions, was sent to 451 patients. Patients not wishing to participate were asked to return uncompleted and pre-coded questionnaires using a reply-paid envelope. Non-responders were sent reminders at 2 and 4 weeks.

Patient-assessed generic instruments
The EuroQol [10] and the SF-12 [16] were identified as two short and comprehensive generic approaches to assessing overall health, a selection supported by a recent review of generic instruments [18]. Although not applied in the evaluation of AS, both instruments have good evidence of measurement properties when applied in the evaluation of patients with disease of a similar nature to AS [15, 19].

The EuroQol includes health state valuations and therefore has greater potential than the SF-12 for application in economic evaluation [8, 11]. It has two sections: the first (EQ-5D) has five items covering the domains of mobility, self-care, usual activity, pain/discomfort and anxiety/depression. Each item has a three-point scale from 1 (no problems) to 3 (inability/extreme problems). Scores range from -0.59 to 1.00, where 1.00 is perfect health and a score <0 is considered worse than death. The second section (EQ-VAS) includes a vertical visual analogue scale (VAS) on which the patient rates their overall health today from 0 (worst imaginable) to 100 (best imaginable).

The SF-12 comprises 12-items derived from the SF-36, and uses descriptive responses. It produces mental and physical health summary scales based on scores for the general population [range 0–100, mean 50, standard deviation (S.D.) 10], where a higher score indicates a better HRQL [16]. Each instrument requires an approximate completion time of 2 min [10, 16].

Instrument evaluation
Data quality. Individual items within each instrument were assessed for missing data, the distribution and symmetry of item response scores and endorsement frequencies.

Reliability. Test–retest reliability of both instruments was assessed for those patients indicating that their general health had remained the same at 2 weeks on a general health transition question [20, 21]. Participants completed a second questionnaire at 2 weeks. The intraclass correlation coefficient [(ICC) 2,1] [22] was used to measure the agreement between test and retest [23]. For group comparisons, levels of reliability >0.70 are required [6, 20], and for the evaluation of individuals levels of >0.90 have been recommended [20, 21].

Validity. Construct validity was assessed by correlating the scores for the separate sections of both instruments to assess the convergent validity of related dimensions (Pearson's correlation coefficient). Hypothesized theoretical relationships between both instruments were considered a priori. The EuroQol and the SF-12 have similar item content, and so a moderate correlation is hypothesized between the EuroQol and the SF-12 PCS (>0.5), and a small to moderate correlation with the SF-12 MCS (0.3–0.5). Evidence suggests that a small to moderate correlation is expected between the EQ-5D and EQ-VAS (0.3–0.5) [24], and a small correlation is expected between the two components of the SF-12 (<0.3) [6].

Validity was assessed further in relation to occupational status. Patients reporting an inability to work due to ill health were expected to have scores reflecting poorer health than their counterparts. t-Tests were used to test for differences in scores.

For purposes of assessing longitudinal construct validity, instrument scores were compared with self-reported AS-specific and general health transition at 6 months (Compared with 6 months ago, how would you rate your AS/general health now: much better, somewhat better, about the same, somewhat worse, much worse?). Changes in instrument scores and patient responses to the transition questions were assessed for linear trend [13]. To the extent that a patient-assessed instrument is a valid measure of health capable of measuring change, a strong association with a patient-reported health transition item is expected [13, 25].

Responsiveness. Both instruments were compared for responsiveness to change over the 6-month period by calculating the modified standardized response mean (MSRM), which is equal to the mean change in scores divided by the standard deviation of change scores in patients defined as stable at 6 months [23]. MSRMs were calculated for patients reporting an improvement or deterioration in health on health transition (general or AS-specific).


    Results
 Top
 Abstract
 Introduction
 Methods
 Results
 Discussion
 References
 
Data collection
Of the 451 patients who were mailed a postal questionnaire, 349 (77.4%) returned a completed questionnaire. One patient had changed address at the 2 week follow-up and 303 (87.1%) patients returned the 2-week questionnaire. Two-hundred and eighty-nine patients (82.8%) patients returned the 6-month questionnaire.

The majority of patients were male (n=259; 74.2%) with a mean age of 46.1 yr (S.D. 12.6, range 18–75 yr) (Table 1Go). The mean symptom duration of participants was 19.8 yr (S.D. 11.8, range 1–56 yr), suggesting a broad spectrum of disease presentation.


View this table:
[in this window]
[in a new window]
 
TABLE 1. Patient characteristics at baseline

 

Instrument evaluation
Data quality. The item and scale properties of both instruments are shown in Table 2Go. The levels of missing data for the five items of the EuroQol EQ-5D range from 0 to 0.9%. Scale scores were computable for 98.0% of patients. The most frequently omitted items were 4 (pain/discomfort; 0.9%) and 5 (anxiety/depression; 0.9%) (Table 2Go). The majority of patients responded using the first two response options. However, no item produced an end effect or level of endorsement of >80%. The broad range of index scores was found, although the distribution was slightly skewed towards better health states. Twenty-five patients (7.3%) scored at the ceiling of the scale range (best possible health state) and two patients scored at the floor (0.6%; worst possible health state).


View this table:
[in this window]
[in a new window]
 
TABLE 2. Item and scale properties of the EuroQoL and SF-12 (n=321)

 
Completion of the EuroQol-VAS was good and a scale score was computable for 98.7% of patients. Although the results were slightly skewed towards a more positive health state (mean±S.D. 59.2±20.8), there was a wide range of scores (0–100) for this component of the EuroQol.

The most frequently omitted items from the SF-12 were items 7 (didn't do work or other activities as carefully as usual: emotional) (2.6%) and 5 (were limited in the kind of work or other activities: physical) (2.0%) (Table 2Go). The response options for all items were covered and no item produced an end effect >80%. A total of 331 (94.8%) patients completed all items in the SF-12, a necessary requirement to calculate a scale score. A wide range of MCS and PCS scores were found. The mean score for the MCS was 45.7 (S.D. 11.6), a value that was closer to the mean population score (50, S.D. 10) than the PCS mean value of 35.8 (S.D. 11.0). The scores for the MCS approximated normality. Scale sores for the PCS were slightly skewed towards poorer health.

Reliability. The test–retest reliability of both instruments was assessed by correlating the two sets of scores for those patients who indicated no change on the general health transition question. The intraclass correlations are shown in Table 2Go. Both instruments have levels of test–retest reliability that makes them suitable for use in groups of patients, and the SF-12 PCS has levels that may support use in individual evaluation.

Validity. The results of the correlations between both instruments are shown in Table 3Go. Moderate correlations between both sections of the EuroQol and the SF-12 PCS were found. This was as hypothesized, as the majority of the EuroQol questions are concerned with physical health. Small to moderate correlations were found between both sections of the EuroQol and the SF-12 MCS. A very small correlation between the two sections of the SF-12, and a moderate to large correlation between the EQ-5D and EQ-VAS was found.


View this table:
[in this window]
[in a new window]
 
TABLE 3. Convergent validity (Pearson's correlation coefficient) (n=321)

 
The remaining tests of validity are shown in Tables 4Go and 5Go. Compared with those unable to work due to ill health, patients in work have significantly better levels of health on both instruments (P<0.01) (Table 4Go). The greatest difference in scores were found for the SF-12 PCS and the EQ-5D.


View this table:
[in this window]
[in a new window]
 
TABLE 4. Mean (standard deviation) instrument scores according to whether working or not due to ill health

 

View this table:
[in this window]
[in a new window]
 
TABLE 5. Mean score change (standard deviation) and modified standardized response mean (MSRM) by 6-month AS-specific and general health transition

 
The change scores for the EQ-VAS and both sections of the SF-12 reflect the categories of both general and AS-specific health transition (Table 5Go). A significant linear relationship with both generic and specific health transition was found. The EQ-VAS and the SF-12 have greater levels of change and a stronger linear relationship with responses to the generic transition than with the AS transition question; for example, patients who say that their general health is much better over the 6 months have an average improvement in EQ-VAS score of 12.57 (on a scale of 0 to 100, where 100 is the best possible health state), and an improvement of 10.07 when AS health is better. Those patients whose general health is worse have an average deterioration of -10.01, and an average deterioration of -8.01 when AS is worse. The SF-12 PCS has similar levels of change and a linear relationship with responses to both transition questions. Although significant, the SF-12 MCS has a weak linear relationship with both AS-specific and general health transition. A non-significant relationship between the EuroQol (EQ-5D) and both general and AS-specific transition was found.

Responsiveness. The results of responsiveness testing are shown in Table 5Go. The EQ-VAS produced the largest levels of responsiveness for groups of patients when compared with the SF-12 in those patients reporting improvement or deterioration in general health and specific health. The SF-12 PCS also produced moderate levels of responsiveness for groups of patients reporting improvement or deterioration in either general or specific health. On the whole, both instruments were more responsive to changes in general health than changes in specific health. In patients reporting an improvement or deterioration in health, responsiveness statistics >0.5 were found for the EQ-VAS, representing a level of change that is at least one half of a standard deviation of the change scores for stable patients. Low levels of responsiveness were found for both the EQ-5D and SF-12 MCS for groups of patients reporting change in general or specific health.


    Discussion
 Top
 Abstract
 Introduction
 Methods
 Results
 Discussion
 References
 
Evidence of the acceptability and measurement properties of the EuroQol and the SF-12 has been demonstrated in a number of studies of patients with disease of a similar nature to AS [14, 15]. However, neither instrument has been applied in published studies of AS, and before recommendations can be made for the use of these instruments they must be shown to be acceptable and to have satisfactory measurement properties in patients with AS in the UK. This study represents the first application of these generic instruments in a large and representative population of AS outpatients and an extensive comparative evaluation of instrument measurement properties and acceptability.

Both sections of the EuroQol had high completion rates and levels of reliability that support application of the instrument in group evaluation. Evidence supports the validity of the instrument as a measure of generic HRQL in AS. The EQ-VAS is responsive to both improvement or deterioration in AS-specific or general health over 6 months. However, the EQ-5D was not responsive to change in general or specific health. Due to the poor levels of responsiveness found for the EQ-5D, only the EQ-VAS can be recommended for the evaluation of groups in both routine practice and clinical research in AS.

Completion rates of the SF-12 were satisfactory although lower than the EuroQol. Items relating to limitation in work or usual activities due to emotional or physical health problems were most frequently omitted. Non-completion of these items in the SF-12 and in the parent instrument, the SF-36, has been reported previously in patients with RA [15, 19]. Although the omission of individual items was not high (range 0.6–2.6%), the omission of a single item prevents the calculation of a final score in the SF-12 (5.2%), whereas patients may omit up to half of the items within each domain of the SF-36 without jeopardizing a final score [6]. Reliability of the SF-12 PCS supports application in individual evaluation, but the MCS should only be used in the evaluation of groups. Evidence of validity supports its role as a generic measure of HRQL in AS. The levels of responsiveness for the SF-12 PCS were similar to those found for the EQ-VAS, and both instruments had a stronger relationship with general health transition than with change in specific health. However, the SF-12 PCS was less able to detect deterioration in general or AS-specific health than the EQ-VAS. Low levels of responsiveness were found for the SF-12 MCS. Where a limited health profile is required with minimal respondent burden, the SF-12 is recommended in routine practice or research. However, the requirement for all items to be completed to produce the final score, along with the lower reliability and limited ability to detect change in health associated with the MCS, may limit this role.

Both instruments are relatively brief and simple to complete, but due to the external weighting of the EuroQol EQ-5D and the scoring algorithms of the SF-12, computer-based scoring of these instruments is recommended which may reduce the feasibility of including these instruments in routine clinical evaluation.

Few studies have compared the EuroQol and the SF-12 [26]. Although both measure generic HRQL, the moderate correlation between instruments in the current study suggests that they measure different aspects of HRQL. Selection of the SF-12 is supported by the results of the current study. The SF-12 has good levels of reliability that support the use of the instrument in group evaluation, and the use of the PCS in individual evaluation. There is also good evidence of responsiveness to change in general and specific health over 6 months. Although the EuroQol has greater discriminatory power when sociodemographic variables are assessed, only the EQ-VAS demonstrates a greater responsiveness to change in AS or general health over 6 months than the SF-12. The inability of the EQ-5D to detect change in health is an important consideration if the instrument is used to assess AS relative to other disorders within health care.

The study does not provide sufficient information to determine why the EQ-5D was not responsive to change in health over the 6-month period. In conditions with a similar impact on function as AS such as RA [14] and angina [8], good levels of responsiveness for both sections of the EuroQol have been demonstrated over a similar time period. However, in conditions with a lesser impact on functioning, such as obstructive sleep apnoea, the EuroQol has not demonstrated such satisfactory responsiveness [27].

The SF-12 was identified for the current study in preference to the parent instrument, the SF-36, because of the lower respondent burden. However, the SF-36 describes both component summary scores and a profile across eight domains of health. Future studies should consider the advantages of the additional information provided by the SF-36 against instrument acceptability and feasibility. The measurement properties of the SF-36 have not been rigorously tested in AS and it should be compared with the EuroQol to provide a further assessment of the role and usefulness of generic profile and utility measures in AS.

Although the domains addressed by the generic instruments may not be expected to change over the relatively short period of the study in patients with stable AS, the 6-month period reflects normal practice in the routine evaluation of AS in many rheumatology centres [3, 28]. The level of responsiveness found in the EQ-VAS or SF-12 PCS may make either instrument suitable for routine monitoring of health outcome in the longitudinal evaluation of AS, where routine management may result in subtle changes in HRQL. In addition, levels of reliability suggest that the SF-12 PCS may be suitable for individual assessment. However, the SF-12 PCS represents only half of the items included in the SF-12 and respondent burden should be taken into account when considering the use of multi-item instruments in research or routine practice.

In conclusion, both generic instruments demonstrated high levels of acceptability and reliability. Good evidence for the validity of both instruments as generic measures of HRQL was also found. The SF-12 and the EuroQol-VAS were the most responsive instruments over the 6-month period. It is recommended that short-form generic instruments such as the EuroQol, the SF-36 or SF-12 should be used alongside disease-specific instruments to provide a standardized generic measure of HRQL, and for purposes of normative comparison and economic evaluation [8, 9]. However, further evaluation of the role of these instruments in clinical research and routine practice is required.


    Acknowledgments
 
We are very grateful to all of the patients who so willingly gave of their time to complete the various questionnaires. We also thank the following consultant rheumatologists for allowing access to AS patient databases and local physiotherapists for their support: Professor Roger Sturrock and Ms Fiona Gough; Professor Ian Haslock, Dr Mike Plant and Mrs Kay West; Dr Tom Price, Ms Carol David and Ms Louise Preston; Professor Hill Gaston and Mrs Julie Isaacson; Dr Paul Creemer and Mrs Rachel Lewis; all consultant rheumatologists, nursing and clinic staff from the Staffordshire Rheumatology Centre, with particular thanks to Ms Jackie Waterfield for assistance with data collection. We also thank Dr Kelvin Jordan for his statistical advice. This study was supported by a grant from the Arthritis Research Council and funding from the Staffordshire Rheumatology Centre, and forms part of a larger study submitted for a DPhil in Health Sciences, University of York.


    Notes
 
Correspondence to: K. Haywood, Interdisciplinary Research Centre in Health, Physiotherapy and Dietetics Subject Group, School of Health and Social Sciences, Coventry University, Priory Street, Coventry CV1 5FB, UK. Back


    References
 Top
 Abstract
 Introduction
 Methods
 Results
 Discussion
 References
 

  1. Russell AS. Ankylosing spondylitis—history. In: Klippel JH, Dieppe PA, eds. Rheumatology. 2nd edn. London: Mosby, 1998, pp. 1–2.
  2. Dziedzic K. Ankylosing spondylitis. In: David C, Lloyd J, eds. Rheumatological physiotherapy. London: Mosby, 1998, pp. 97–114.
  3. Ward MM. Quality of life in patients with ankylosing spondylitis. Rheum Dis Clin North Am1998;24:815–27.[ISI][Medline]
  4. Haywood KL. Health outcomes in ankylosing spondylitis: an evaluation of patient-based and anthropometric measures. DPhil thesis,2000, University of York, York.
  5. Guyatt GH, Van Zanten SJO, Feeny DH, Patrick DL. Measuring quality of life in clinical trials: a taxonomy and review. Can Med Assoc J1989;140:1441–8.[Abstract]
  6. Ware JE. SF-36 health survey manual and interpretation guide. The Medical Outcomes Trust. Boston, MA: Nimrod Press,1997.
  7. McDowell I, Newell C. Measuring health. A guide to rating scales and questionnaires. 2nd edn. Oxford: Oxford University Press,1996
  8. Garratt AM, Hutchinson A, Russell I. The UK version of the Seattle Angina Questionnaire (SAQ-UK): reliability, validity and responsiveness. J Clin Epidemiol2001;54:907–15.[ISI][Medline]
  9. Guyatt GH, Feeny DH, Patrick DL. Measuring health-related quality of life. Ann Intern Med1993;118:622–9.[Abstract/Free Full Text]
  10. EuroQol Group. EuroQol: a new facility for the measurement of health-related quality of life. Health Policy1990;16:199–208.[ISI][Medline]
  11. Kind P, Dolan P, Gudex C, Williams A. Variations in population health status: results from a United Kingdom national questionnaire survey. Br Med J1998;316:736–72.[Abstract/Free Full Text]
  12. Abbott CA, Helliwell PS, Chamberlain MA. Functional assessment in ankylosing spondylitis: evaluation of a new self-administered questionnaire and correlation with anthropometic variables. Br J Rheumatol1994;33:1060–6.[ISI][Medline]
  13. Garratt AM. A comparison of four approaches to measuring health outcome. DPhil Thesis,1997, University of Aberdeen, UK.
  14. Hurst NP, Kind P, Ruta D, Hunter M, Stubbings A. Measuring health-related quality of life in rheumatoid arthritis: validity, responsiveness and reliability of EuroQol (EQ-5D). Br J Rheumatol1997;36:551–9.[ISI][Medline]
  15. Hurst NP, Ruta DA, Kind P. Comparison of the MOS Short Form-12 (SF-12) health status questionnaire with the SF-36 in patients with rheumatoid arthritis. Br J Rheumatol1998;37:862–9.[ISI][Medline]
  16. Ware JE, Kosinski M, Keller SD. SF-12: how to score the SF-12 Physical and Mental Health Summary Scales, 1995. Boston, MA: The Health Institute, New England Medical Centre.
  17. van der Linden SJ, Valkenburg HA, Cats A. Evaluation of diagnostic criteria for ankylosing spondylitis: a proposal for modification of the New York Criteria. Arthritis Rheum1984;27:361–8.[ISI][Medline]
  18. Coons SJ, Rao S, Keininger DL, Hays RD. A comparative review of generic quality-of-life instruments. Pharmacoeconomics2000;17:13–35.[ISI][Medline]
  19. Ruta DA, Hurst NP, Kind P, Hunter M, Stunnings A. Measuring health status in British patients with rheumatoid arthritis: reliability, validity and responsiveness of the short form 36-item health survey (SF-36). Br J Rheumatol1998;37:425–36.[ISI][Medline]
  20. Fitzpatrick R, Zieblans S, Jenkinson C, Mowat A, Mowat A. Transition questions to assess outcomes in rheumatoid arthritis. Br J Rheumatol1993;32:807–11.[ISI][Medline]
  21. Streiner DL, Norman GR. Health measurement scales. A practical guide to their development and use. 2nd edn. Oxford: Oxford Medical Publications, Inc.,1995.
  22. Shrout PE, Fleiss JL. Intraclass correlations: Uses in assessing rater reliability. Psychol Bull1979;86:420–8.[ISI]
  23. Deyo RA, Diehr P, Patrick DL. Reproducibility and responsiveness of health status measures. Statistics and strategies for evaluation. Control Clin Trials1991;12:142S–58S.[Medline]
  24. Hurst NP, Jobabputra P, Hunter M, Lambert M, Lochhead A, Brown H. Validity of the EuroQol, a generic health status instrument, in patients with rheumatoid arthritis. Br J Rheumatol1994;33:655–62.[ISI][Medline]
  25. Fitzpatrick R, Davey C, Buxton MJ, Jones DR. Evaluating patient-based outcome measures for use in clinical trials. Health Technol Assess1998;2:1–74.
  26. Johnson JA, Coons SJ. Comparison of the EQ-5D and SF-12 in an adult US sample. Qual Life Res1999;7:155–66.[ISI]
  27. Jenkinson C, Stradling J, Petersen S. How should we evaluate health status? A comparison of three methods in patients presenting with obstructive sleep apnoea. Qual Life Res1998;7:95–100.[ISI][Medline]
  28. Lubrano E, Butterworth M, Heddelden A, Well S, Helliwell P. An audit of anthropometric measurements by medical and physiotherapy staff in patients with ankylosing spondylitis. Clin Rehabil1998;12:216–20.[ISI][Medline]
Submitted 26 September 2001; Accepted 13 May 2002