1 Department of Health Sciences and Clinical Evaluation, University of York, York YO1 5DD,
2 Interdisciplinary Research Centre in Health, Physiotherapy and Dietetics Subject Group, School of Health and Social Sciences, Coventry University, Coventry CV1 5FB,
3 Unit of Health-Care Epidemiology, Institute of Health Sciences, University of Oxford, Oxford OX3 7LF,
4 Primary Care Sciences Research Centre,
5 Department of Physiotherapy Studies and Primary Care Sciences Research Centre, Keele University, Staffordshire ST5 5BG and
6 Staffordshire Rheumatology Centre, High Lane, Burslem, Stoke-on-Trent, Staffordshire ST6 7AG, UK
![]() |
Abstract |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Methods. Instruments were administered by means of a self-completed questionnaire to patients recruited from across the United Kingdom (UK). Instruments were assessed for data quality and scaling assumptions. Where appropriate, dimensionality was assessed using principle component analysis (PCA). Internal consistency reliability was tested using Cronbach's alpha. Testretest reliability was assessed in those patients reporting no change in AS-specific health at 2 weeks. The convergent validity of the instruments was assessed and scores were correlated with responses to the health transition questions. Responsiveness was assessed for patients reporting change in health at 6 months.
Results. The BASDAI and Body Chart have low self-completion rates. Item responses for the RLDQ were skewed towards higher levels of functional ability. PCA supported instrument unidimensionality. Cronbach's alpha ranged from 0.87 (BASDAI) to 0.93 (RLDQ). Testretest reliability estimates support the use of the ASQoL and RLDQ in individual evaluation (>0.90). Correlations between instruments were in the hypothesized direction; the largest was between the ASQoL and BASDAI (0.79). The BASDAI had the strongest linear relationship, with responses to both specific and general health transition questions (P<0.01). With the exception of the Body Chart, instruments had a stronger relationship with general health transition. The BASDAI was the most responsive instrument. The Body Chart and RLDQ had low levels of responsiveness.
Conclusion. The instruments have undergone a comprehensive comparative evaluation to assess the measurement properties required for patient-assessed measures of health outcome. Adequate levels of reliability and validity were found for all instruments. The BASDAI and the ASQoL were the most responsive to self-perceived change in health, but the BASDAI had low levels of self-completion.
KEY WORDS: Ankylosing spondylitis, Patient-assessed health outcome, Reliability, Responsiveness, Validity.
![]() |
Introduction |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Following a systematic review of the literature [4, 5], the Bath AS Disease Activity Index (BASDAI) [6] and the Revised Leeds Disability Questionnaire (RLDQ) [7] were identified as patient-assessed instruments worthy of further evaluation as measures of disease activity and functional disability, respectively. The BASDAI contains six items representative of disease activity in AS. Each item has a 10-cm horizontal visual analogue scale (VAS) anchored by adjectival descriptors none and very severe. Item 6 (morning stiffness, duration) is anchored by a time scale (02 h). The mean of items 5 (morning stiffness, severity) and 6 is calculated. The summated item score is converted to a 010 scale, with a lower score indicating less disease activity. The RLDQ contains 16 items describing four areas of AS-specific functional disability: mobility, bending down, reaching up and neck movements, and posture. Each item has a four-point scale from 0 (no difficulty) to 3 (unable to do). Scores range from 0 to 48, where higher scores indicate greater functional disability. Both instruments require 2 min to self-complete [5, 6].
The Body Chart has been developed as an AS-specific measure of global bodily pain [8, 9]. Patients sketch areas of current pain onto a body manikin (anterior and posterior views) and then score each area on a four-point scale [1 (mild pain) to 4 (very severe pain)]. Area scores are totalled, with a lower score indicating less bodily pain. There is no maximum score. The instrument was developed and tested following interview administration in a clinic environment, and preliminary evidence suggests satisfactory measurement properties [9]. Completion time ranges from several seconds (no pain) to several minutes depending upon the extent of perceived pain and detail provided by the individual patient [5].
Despite increasing interest in the conceptualization and measurement of patient-assessed health related quality of life (HRQL) in chronic disease [10, 11], a published AS-specific measure of HRQL is not currently available [5]. Indeed, the ASAS group did not recommend quality of life as a core domain in AS evaluation because of uncertainty over the most suitable approach [3]. Communication with measurement experts identified the AS Quality of Life Questionnaire (ASQoL) (P. Helliwell and L. Doward, Galen Research, Manchester, personal communication, 1998) [12], a new and unpublished measure comprising 18 items relating to AS-specific HRQL. It adopts dichotomous responses (yes/no) and requires 2 min to self-complete. Items are summated (018), with a lower score indicating a better level of AS-specific HRQL.
The aim of the study was to examine the acceptability, data quality and measurement properties of four patient-assessed measures of health outcome in AS patients recruited from UK rheumatology centres. The ASQoL (HRQL), BASDAI (disease activity), Body Chart (global pain) and the RLDQ (functional disability) describe several domains considered important in the evaluation of patients with AS, and the results of the study will support the recommendation of instruments to fulfil these domains.
![]() |
Methods |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Four-hundred and fifty-one patients were asked to self-complete a mailed questionnaire. Patients not wishing to participate were asked to return uncompleted and pre-coded questionnaires using a reply-paid envelope. Non-responders were sent reminders at 2 and 4 weeks. The questionnaire included the four disease-specific measures of health outcome, two health transition items and sociodemographic questions.
Instrument evaluation
Data quality. Individual items within each instrument were assessed for missing data, the distribution and symmetry of item response scores and endorsement frequencies. Principle component analysis (PCA) was used to assess the dimensionality of instruments based on multi-item scales [16]. The ASQoL, BASDAI and RLDQ are unidimensional, and PCA was used to confirm the existence of a single dimension for each instrument. The item-total correlation of individual items within these instruments was also assessed.
Reliability. The internal consistency reliability of the instruments was assessed by Cronbach's alpha [16, 17]. The Body Chart is not a multi-item scale and therefore could not be tested for internal reliability.
Patient-reported health transition questions that describe the magnitude and direction of change in general or specific health over a given time period are a valid approach to measuring change, and have been widely used as external criteria in the evaluation of instrument testretest reliability and responsiveness [18, 19]. Instrument testretest reliability was assessed for those patients indicating that their AS-specific health had remained the same at 2 weeks on a health transition question. This method reduces the influence of information recall associated with shorter periods of retest and produces a more robust estimate of instrument reliability [17]. Patients were asked to complete a second questionnaire at 2 weeks. The intra-class correlation coefficient [(ICC) 2,1] [20] was used to measure the agreement between test and retest [21]. For group comparisons, levels of reliability over 0.70 are required [16, 17], and for the evaluation of individuals levels above 0.90 have been recommended [17, 22].
Validity
Construct validity was assessed by correlating the scores for the separate instruments to assess the convergent validity of related dimensions (Pearson's correlation coefficient). Hypothesized theoretical relationships between instruments were considered a priori. The ASQoL (HRQL), BASDAI (disease activity), Body Chart (bodily pain) and RLDQ (functional disability) measure related aspects of HRQL. The main issues measured by the BASDAI and Body Chart, and several items measured by the RLDQ, are within the item content of the ASQoL. Therefore, a high level of correlation (>0.70) was hypothesized between the ASQoL and BASDAI, with moderate to high levels of correlation (0.500.70) between the ASQoL and both the Body Chart and RLDQ. The BASDAI and Body Chart measure closely related aspects of health and have a similar item content, and a large correlation was hypothesized. There is minimal overlap of item content between the BASDAI, Body Chart and RLDQ. However, all instruments measure related aspects of health and disease that may impact on normal function. Therefore, moderate levels of correlation between these instruments was hypothesized.
Validity was further assessed in relation to occupational status. Patients reporting an inability to work due to ill health were expected to have scores reflecting poorer health than their counterparts. t-Tests were used to test for differences in scores.
For purposes of assessing longitudinal construct validity, instrument scores were compared with self-reported AS-specific and general health transition at 6 months (Compared with 6 months ago, how would you rate your AS/general health now: much better, somewhat better, about the same, somewhat worse, much worse?). Changes in instrument scores and patient response to the transition questions were assessed for a linear trend [23]. To the extent that the patient-assessed instruments are valid measures of health capable of measuring change, a strong association with a patient-reported health transition item is expected [20, 21].
Responsiveness
Instruments were compared for responsiveness to change over the 6-month period by calculating the modified standardized response mean (MSRM), which is equal to the mean change in scores divided by the standard deviation of change scores in patients defined as stable [21]. Guidance for data interpretation has been proposed: a score of >0.8 represents a high level of responsiveness, a score of 0.5 a moderate level, and a score of 0.2 a low level [24]. MSRMs were calculated for patients reporting an improvement or deterioration in health on health transition (general or AS-specific).
![]() |
Results |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
The majority of patients were male (n=259, 74.2%), with a mean age of 46.1 yr [standard deviation (S.D.) 12.6 yr, range 1875 yr] (Table 1). The mean symptom duration of participants was 19.8 yr (S.D. 11.8; range 156 yr), suggesting a broad spectrum of disease presentation.
|
Instrument evaluation
Data quality. The item and scale properties of the disease-specific instruments are shown in Table 2. For all instruments a lower item or scale score reflects a better health state and may be described as the floor of the scale [24].
|
The levels of missing data for the 10 items of the BASDAI ranged from 8.9 to 24.9% (Table 2). Item 6 was the most frequently omitted. Scoring allows for the omission of up to two items and scale scores were computable for 91.1% (n=318) of patients. The full range of responses was observed for all items and no item produced an end-effect >80%. Score distribution at scale level approximated normality. The single dimension of the BASDAI proposed by the instrument developers was supported by the results of the PCA, which produced a single-component solution with all-item component loadings above 0.65. Itemtotal correlation ranged from 0.56 to 0.81, and Cronbach's alpha was 0.87 (Table 2
).
The Body Chart does not consist of individual items and is therefore assessed at the scale level (Table 2). Final scores were computable for 89.9% (n=310) of responders. Features of incorrect completion included a failure to score the areas indicated on the body chart as painful. If more than two areas are not scored, a final score cannot be computed. A wide range of Body Chart scores was observed (range 0122). Score distribution was skewed towards lower pain levels (2.3% reported no pain) (Table 2
). The percentage of patients scoring at the ceiling of the Body Chart range represent those patients recording the maximum scores in this study population.
The levels of missing data for the 16 items of the RLDQ ranged from 0.9 to 5.2% and a scale score was computable for 98.0% (n=342) of patients (Table 2). The full range of response options was observed for all items. Three items had very low levels of endorsement for the option Unable to do and a large proportion of patients scored at the floor of several items, indicating that the activity could be performed without difficulty. Although no item produced an end-effect of >80.0%, the skewed distribution of item responses shows that the majority of respondents experience no or moderate limitations in functional activities assessed by the RLDQ. This is reflected in the low mean values for all items and in the final scale scores, which are positively skewed towards better levels of functional ability (Table 2
). No patient scored >41 on the 048 scale. The single dimension of the RLDQ was supported by the results of the PCA, which produced a single-component solution with all component loadings >0.48. Itemtotal correlation ranged from 0.41 to 0.79, and Cronbach's alpha was 0.93 (Table 2
).
Reliability. The testretest reliability of all instruments was assessed by correlating the two sets of scores for those patients who indicated no change on the AS-specific health transition question. The ICCs are shown in Table 3. Reporting the associated confidence intervals for the ICC as an estimation of testretest reliability is based on the assumption that data is approximately normally distributed. Therefore, the highly skewed data for the Body Chart was logarithmically transformed to yield a log-normal distribution [25] (Table 3
). The highest levels of reliability were observed for the ASQoL and RLDQ (>0.90), but for all instruments levels were >0.80.
|
|
|
|
Responsiveness. The results of responsiveness testing are shown in Table 6. The BASDAI produced the largest levels of responsiveness for groups of patients whose AS-specific or general health had improved or deteriorated according to transition question responses. In patients reporting an improvement in health responsiveness, statistics over 0.5 were found for the BASDAI, representing a level of change that is at least one half a standard deviation of the change scores for stable patients. The ASQoL also produced responsiveness statistics over 0.5 in patients reporting a deterioration in AS-specific health or an improvement in general health. Lower levels of responsiveness were found for the remaining instruments, with the lowest levels found for the Body Chart.
![]() |
Discussion |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
This study represents an extensive comparative evaluation of the measurement properties and acceptability of an evidence-based selection of disease-specific, patient-assessed instruments in a large and representative population of AS outpatients. The study has provided important information against which instruments can be judged in terms of necessary measurement properties. The results of tests of data quality and scaling assumptions have not been reported previously. Furthermore, there is no published evidence of empirical tests for dimensionality.
The ASQoL and the RLDQ had low levels of missing data, which is evidence for their acceptability to patients. However, the high levels of missing data for the BASDAI suggest that a revision of the instrument scaling or item content may improve completion rates. The Body Chart was not originally developed as a self-administered postal questionnaire and there is scope for improving the written instructions provided to patients. The instruments based on summated rating scales (ASQoL, BASDAI and RLDQ) had good evidence to support the unidimensional structure recommended by the instrument developers. The data for the Body Chart were highly skewed, and assessment using conventional methods was therefore difficult.
The tests of data quality and scaling assumptions were largely met by the instruments and all items showed a moderate to high level of correlation with hypothesized scales. The four instruments have levels of reliability that support their use in groups. The ASQoL and RLDQ had levels of internal and testretest reliability that supports their use in individual patients.
Evidence for the construct validity of the four instruments was provided by the moderate to high levels of correlation between the instrument scores, which met a priori hypotheses. Correlations were of a sufficient magnitude to suggest that the instruments are measuring related aspects of disease-specific health. Further evidence for the validity of the instruments was provided by the significant association with work status and self-reported health transition. Scores for three of the instruments had a stronger association with general health transition than specific health transition, which was not as hypothesized. The exception, the Body Chart, is a measure of global body pain and it is possible that AS-related pain is a dominant component of self-reported changes in the disease. The other instruments measure broader considerations, and items within the ASQoL are not anchored to the disease.
The BASDAI showed a good level of responsiveness for self-perceived improvement and deterioration in both AS and general health. Good levels of responsiveness were also found for the ASQoL for change in general health. However, only moderate and low levels of responsiveness were found for the RLDQ and Body Chart, respectively.
Most patients with AS present with multiple coexisting problems, and a multi-dimensional approach to evaluation has been recommended [3]. The study instruments address four domains of disease impact: HRQL (ASQoL), disease activity (BASDAI), pain (Body Chart) and functional disability (RLDQ). Instrument selection must consider available evidence in light of the proposed application. For applications in research including clinical trials, all instruments demonstrate adequate levels of reliability and validity. In addition, there is good evidence for the responsiveness of the BASDAI and the ASQoL. However, the low levels of completion for the BASDAI and the self-administered version of the Body Chart are cause for concern. Furthermore, the Body Chart and RLDQ were not responsive to self-perceived change in AS. These are also important considerations for applications in both research and clinical practice. Moreover, reliability may be an even more important issue when selecting instruments for individual evaluation [31]. The BASDAI and Body Chart are close to the reliability criterion of 0.90 recommended by some authors [17]. The ASQoL and RLDQ exceed this criterion but were found not to be as responsive as the BASDAI.
Two additional AS-specific measures of functional disability have been recommended by the ASAS group [29]: the (modified) Dougados Functional Index (DFI) [32, 33] and the Bath AS Functional Index (BASFI) [34]. Direct comparison of available disease-specific measures of functional disability is required to support the selection of a single instrument. Following the extensive evaluation of instrument performance described in the current study, modification of the item content and response format of the RLDQ is suggested to enhance both data quality and measurement properties [5]. Such a detailed evaluation of the data quality, scaling assumptions and measurement properties of the DFI or BASFI has not been published [5]. For example, the response scales of the BASFI are identical to the BASDAI and self-completion in a patient population unfamiliar with the instrument, as described here, will provide important information relating to instrument acceptability.
It is increasingly recognized that generic and disease-specific instruments are complementary [35]. The broad content of generic instruments may support the identification of co-morbid features and treatment side-effects that may not be captured by disease-specific instruments. Furthermore, generic instruments are useful for comparing different groups of patients and have wider scope for application in economic evaluation. Evidence of the performance of widely used generic instruments such as the Short-Form 36-item Health Survey Questionnaire (SF-36) [14, 22] and the EuroQol [36, 37] in the AS population has yet to be described.
In conclusion, this study provides valuable information on the data quality, measurement properties and acceptability of four disease-specific measures of health outcome following self-completion in a large UK population of patients with AS. The best performing instruments were the ASQoL and BASDAI. However, it is recommended that all instruments undergo further evaluation following modification to improve data quality and responsiveness before recommendation for their use in the evaluation of AS patients in routine practice or clinical research can be made.
![]() |
Acknowledgments |
---|
![]() |
Notes |
---|
![]() |
References |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|