Academic Unit of Musculoskeletal and Rehabilitation Medicine, University of Leeds, Leeds, UK
Correspondence to: B. B. Bhakta, Academic Unit of Musculoskeletal and Rehabilitation Medicine, University of Leeds, 36 Clarendon Road, Leeds LS2 9NZ, UK. E-mail: b.bhakta{at}leeds.ac.uk
Abstract
Objective. To identify a subset of clinical features of Behçet's disease (BD) that can be summated to form an overall index of disease activity appropriate for clinical and research use internationally.
Methods. Completed Behçet's Disease Current Activity Forms were collected from a total of 524 patients with BD from five countries. The data from 14 questions on the form were subjected to Rasch analysis to establish whether these items form a hierarchical and unidimensional scale of disease activity, both within and between countries.
Results. The data showed a good fit to the Rasch model within three countries using a dichotomous scoring function. However, when the data from these three countries were pooled, the fit to the model was poor. Cross-cultural differential item functioning (DIF) was found in seven items in the pooled data. When the items with DIF by country were separated and two items were removed, the resulting 26-item scale showed a good fit to the Rasch model.
Conclusions. Within Turkey, Korea and the UK, the 14 items can be summated to give an index of disease activity. Analysis of the pooled data confirmed that the index is not suitable for comparison between countries or for pooling of data in the raw form, but after fitting the data to the Rasch model such comparisons can be made. This gives a scaling tool that is quick and easy to use in the clinical situation.
KEY WORDS: Behçet's disease, Disease activity, Rasch analysis, Cross-cultural validity
Currently, there are few reliable laboratory markers that reflect the fluctuating clinical signs and symptoms in Behçet's disease (BD). Judgement of disease activity is based on clinical features. There is a need to develop a standardized assessment of current disease activity for use in monitoring disease progression and for evaluating the effects of therapeutic interventions. In addition, any assessment would need to be validated for cross-cultural comparability before it is used internationally.
The Behçet's Disease Current Activity Form (BDCAF) was developed with this in mind on behalf of the International Scientific Committee on Behçet's Disease. The content of the BDCAF was based on previous work [1] that compared two schemes that were available to assess disease activity: the Iranian Behçet's Disease Dynamic Measure (IBDDAM) [2] and a European scheme initially developed in the UK. Although there was greater variability in inter- and intra-rater scoring when the Iranian form was used, the opinion of the clinicians participating in the study was that both forms had important aspects that could be incorporated into an internationally accepted activity form. As a result of this study, a prototype form was developed incorporating aspects of both forms. This was circulated to all members of the International Scientific Committee for comments. A workshop was held in Leeds, UK in 1994 to collate these comments and arrive at a consensus view about which clinical signs and symptoms to include within the activity form (BDCAF). There was an emphasis on the need for clarity of the wording of questions to allow potential use by clinicians world-wide. There was general agreement that standardized questions should be developed for each organ system which could be readily translated for international use. Previous research on the BDCAF has found good inter-observer reliability for assessing general disease activity in British patients [3], and for the orogenital ulcers and eye involvement of BD using a translated version on Turkish patients [4]. A further study found that the agreement between clinicians was better using the BDACF, which scores clinical features for the past 28 days, than for the IBDDAM, in which a variable time period is taken [5].
The current project was undertaken with the aim of confirming the construct and cross-cultural validity of a previously identified subset of clinical features [6] from the BDCAF. Preliminary work used Rasch analysis to identify the clinical features which best fitted the unidimensional construct of disease activity. The aims of this project were to reassess the psychometric properties of these clinical features in a larger international data set, to verify that they can be summated, and to produce an index to indicate how active the patient's disease has been in a defined period (1 month) preceding the consultation. The resulting index should measure only disease activity (i.e. it should be unidimensional), as distinct from permanent damage. When the intention is to sum the scores on any scale it is essential that all items of the scale are measuring the same construct, otherwise it is not possible to interpret the clinical significance of the total score. The validity of the index should also extend to its use in comparing disease activity in populations from different countries. This is particularly important in the context of multicentre international trials of therapeutic interventions.
Methods
Data collection
The clinical disease activity data used in this study were collected during the patients' routine medical care. The completed BDCAFs were returned from the five countries which participated in this study (China, Korea, Iraq, Turkey and UK). The responses to the 12 items on the BDCAF, previously identified as a potential unidimensional scale, were used [6] for the analysis as well as the two questions which assess disease activity from the patient's and clinician's perspective. These items were: a Likert scale, represented by smiley faces ranging from very bad to very good, to indicate how the patient or the clinician felt the disease had been over the past 4 weeks, and the presence or absence (over the last 4 weeks prior to the clinic visit) of arthralgia, arthritis, diarrhoea, erythema nodosum, eye inflammation, genital ulcers, headaches, mouth ulcers, nausea/vomiting, new central nervous system involvement, new major vessel inflammation, and pustules.
Statistical analysis
A glossary of Rasch terminology is given in Appendix 1.
The Rasch model was used to determine the internal construct validity of the index, and to test for the cross-cultural equivalence of the items in the index. The model assumes that the probability of a particular patient affirming a given item or category is a logistic function of the severity of the item and the activity of the patient's disease [7]. For the index, the more active the disease the more likely the patient is to affirm any given item.
The Rasch model (dichotomous case) is given by the equation:
![]() |
An implicit assumption in this approach is that the items in the index will form a hierarchical scale that measures the spectrum from no disease activity to severe disease activity.
Where data fit this model (i.e. the observations accord with the model expectations), then, given local independence (i.e. the responses to a given item are independent of the responses to other items in the scale [8]), the data are derived from a unidimensional scale.
Fit of the data to the model is assessed by a number of fit statistics [9]. The overall fit of the scale is given by the item x trait interaction 2 statistic. This statistic gives an indication of any significant deviation of the data from the Rasch model and gives an indication of how well the items fit together to form a hierarchical and unidimensional scale. This statistic is calculated by summing all the
2 values for each of the individual items (see below) and calculating the significance values using the summated degrees of freedom (d.f.). An estimate of the reliability of the scores on the index can also be made. This is based on the traditional method of reliability estimation, Cronbach's
. However, instead of using the raw scores, the activity estimates on the logit scale for each person are used to calculate reliability. This is called the person separation index, and estimates will be very similar to the value of Cronbach's
, hence the interpretation of the value is the same, i.e. above 0.8 is very good. In addition, the individual fit to the model of each item can be considered. The residual value is the standardized difference between each person's actual and predicted response to an item, which is summed over all persons and standardized (i.e. divided by the standard deviation). If this value is outside the desired range of ±2.5, the item is regarded as showing misfit to the model. Individual item fit is also assessed by the
2 test. The
2 values are calculated by grouping the patients into class intervals (approximately 50 patients in each) on the basis of their overall level of disease activity, and the mean expected response within that group is compared with the observed responses in the class intervals. A significant
2 value indicates misfit. As the data have been collected from five countries, the analysis was done firstly within countries, i.e. the data from each country were analysed separately, and then across countries, i.e. the data from each country were analysed altogether, to specifically assess cross-cultural differential item functioning (DIF).
Rasch analysis also allows us to evaluate the consistency of the fit of data from different countries. That is, we can examine whether the scale is working the same way in different countries. This is done using DIF analysis. Within the framework of Rasch measurement, the scale should work in the same way irrespective of which group is being assessed. Thus, the probability of affirming an item should be the same between groups, given the same trait level [10]. Analysis of variance (ANOVA) based on the residuals (calculated for each person) is used to check for the presence of bias between countries. Again, the patients are split into class intervals and the comparison is made between the subgroups at the same level (or class interval) of disease activity. The two factors in the ANOVA are the person factor (e.g. age, gender or country) and the class intervals (as described above). DIF may manifest itself as a constant difference between countries across the trait (uniform DIF, which is the main effect), or as a variable difference, where the response functions of the two groups cross over (non-uniform DIF, which is the interaction effect). Both the country factor and the interaction with the class interval might be significant in some cases, as with the main and interaction effects in any ANOVA. Tukey's post hoc tests determine where the statistically significant differences are to be found when there are more than two groups.
When some but not all items display DIF, it is possible to make an adjustment to allow items with DIF to vary by country. To do this, an item is substituted for a series of country-specific items (e.g. headaches becomes headachesIraq, headachesTurkey, etc.). For each country, only the scores observed in its corresponding item are considered, while the other items are assigned missing values. Subsequent analysis is undertaken on this expanded data set (i.e. original plus split items). This procedure has been used successfully and documented for a measure of disease activity in manic depression [11].
Due to the number of significance tests undertaken within each analysis, a significance level of P < 0.01 was used.
The software package RUMM 2010 [12] was used to complete the Rasch analysis of the data, and SPSS version 10.1 (SPSS, Chicago, IL, USA) was used for other descriptive analysis.
Results
Patient characteristics
Between 1995 and 2002, 542 completed BDCAFs were returned. The characteristics of the patients involved in this study are shown in Table 1.
|
In all countries, for the questions relating to arthralgia, arthritis, erythema nodosum, genital ulcers, headaches, mouth ulcers, nausea/vomiting, diarrhoea, and pustules, problems were found with the three category response options. With a response category of 0 indicating no symptom in the past 4 weeks, 1 indicating a symptom for up to 2 weeks in the past 4 weeks and 2 indicating a symptom for more than 2 weeks in the past 4 weeks, the analysis showed that this response function was not working as intended for these items (i.e. they displayed disordered thresholds). Thus, for items at the level of disease activity at which a score of 1 would be expected by the model, patients were more likely to score 0 or 2. Consequently, in each country all of the items listed above were re-scored to create a dichotomous response function that theoretically represented a response of symptom not present in past 4 weeks or symptom has been present in the last 4 weeks.
Following the re-scoring of these items, the scale demonstrated reasonable fit to the Rasch model within all countries (Table 2). However, given the low person separation index in China and Iraq (indicating a low level of reliability in the scores) and the small number of cases from each country (33 and 49 respectively), the data from these countries were not included in the pooled analysis.
|
|
One possible cause of item misfit is DIF; therefore DIF was examined in relation to the country from which the data were obtained. Of the 14 items, seven displayed DIF by country; these items were any new central nervous system involvement, any new major vessel involvement, arthralgia, disease activity (patient), erythema nodosum, headaches, and pustules. Post hoc analysis (Tukey's test) did not identify one particular country that was showing the most deviation. Therefore, the seven items with DIF were separated for all countries and the scale was re-analysed, which effectively created a 28-item scale (seven items split across three countries and seven original items which act as link items). Unfortunately, the items disease activity patientKorea and disease activity clinician showed significant misfit to the model and had to be removed from the analysis. Following this, the scale showed good fit to the Rasch model both at the individual level and overall (2 = 41.85, d.f. = 25, P = 0.0186), with a person separation index of 0.71, which confirmed that the problem with misfit in the pooled data was driven by differences at the country level.
Distribution of BDAI across countries
There is some variation in the distribution of symptom location on the underlying continuum of disease activity between countries (Fig. 1). For example, new major vessel involvement is the item representing the highest level of disease activity in Korea and the UK, while the highest level of disease activity in Turkey is represented by the item diarrhoea or rectal bleeding. In all countries, the item that represented the lowest level of disease activity is mouth ulcers.
|
The data were collected using the original proforma (BDCAF) from five countries. Fourteen items previously identified to form an index of disease activity [6] were analysed using the Rasch method. The results of the analysis showed that the three-category scoring function for seven items was not working as intended, i.e. a higher score did not consistently indicate more disease activity within a particular item. Therefore, a two-category scoring function was necessary to satisfy the requirements of the Rasch model for the BDAI. Unfortunately, in two countries (China and Iraq) the person separation index was too low to be considered acceptable. Coupled with the small number of cases from these countries, it was felt that it was not possible to draw any meaningful conclusions from the analysis of these data. Among the remaining three countries, the UK displayed very good fit of the data to the model and Turkey and Korea showed a small amount of misfit, though for all three countries the person separation index was reasonable.
The pooled analysis demonstrated some cross-cultural DIF between the three countries, though a solution was found by splitting the items that displayed DIF. Two of the items in the split scale had to be deleted as they showed misfit. This is only necessary for the purpose of cross-cultural comparison; the full index (BDAI) can still be used within each country. This means that, to produce accuracy of outcome measurement (e.g. in an international study of a drug intervention), it would be necessary to use not the raw data, but the Rasch-transformed scores. By separating the items for each country and fitting the data to the Rasch model, it is possible to use the score obtained from the index to make comparisons between countries. This is feasible as the fit to the model allows the estimation of disease activity on an interval scale [13], and all items are calibrated on the same linear metric. Interval level measurement is necessary to perform parametric statistics and calculate change scores [14]. Data from countries that were not involved in this study can similarly be analysed following the same process to allow accurate between-country comparisons. Thus, the BDAI is potentially a useful tool for international research.
There are a number of limitations to the study. The approach assumes that it is true cross-cultural differences that cause DIF by country. However, it is acknowledged that country may be serving as a proxy factor for other sources of DIF. For example, we may be uncovering differing presentations of BD across different populations, different co-morbidities (age of appearance, symptom pattern), and the cultural or medical bias towards some items perceived as more important than others may be sources of DIF.
Another important issue is that of translating the activity measures into the main language of the various countries. In our study, only the forms used to collect data from patients in Turkey have been officially translated using appropriate methods [4]. For other countries, the responsibility for the translation during an individual patient consultation was left to the clinician using the scale. While all clinicians administering the form in this study had an excellent command of the English language, the translation was not standardized. For accuracy in recording data, ideally the form should be translated formally, using the well-recognized procedure for self-report measures [15]. Cultural differences may mean that, in certain cultures, it may be more acceptable to express symptoms or difficulties in certain ways and not in others. When the condition involves genital symptoms there may be less freedom of discussion between males and females (patients may be less willing to volunteer information).
Despite some of these limitations, we feel that the BDCAF is a convenient and logical tool, i.e. it follows the natural course of a normal consultation with a BD patient, and so can easily be administered during the course of a routine consultation with a BD patient, and can be used to generate a useful index of disease activity. An overall disease activity score (BDAI) can be derived from the form and can be used in clinical trials of BD interventions, with the caveat that the analysis presented in this paper applies only to the countries mentioned. Using the Rasch method, some of the problems associated with cross-cultural DIF can be overcome when the activity index is used in international studies. For the purpose of research, treatment trials targeted at a specific organ system would require a more comprehensive measure of activity within that organ system. For instance, a study of treatment aimed at oral ulceration [16] would require assessment not only of the duration of symptoms but also of other features which may relate to activity (number of ulcers, size of ulcers, number of crops of ulcers, site of ulcers, etc.). However, it is suggested that detailed assessment of a particular organ system is accompanied by an overall assessment of disease activity using the BDAI. This will alert the clinician conducting any trials to advantageous or deleterious effects of the trial drug on other organ systems.
|
Appendix 1. Glossary of terminology used in Rasch analysis (adapted from Bond [17])
|
Acknowledgments
We are very grateful to Professor Z. Al-Rawi, Dr D. Bang, Dr F. Gogus, Professor D. Haskard, Dr M. R. Helbert, Professor S. Lee, Dr R. Powell, Dr S. Salman, Professor A. J. Silman, Professor H. Yazici, Professor D. Yi, Dr Z. Zhuoli and many more who provided us with data.
The BDCAF form can be obtained form Dr B. Bhakta at the correspondence address.
References
|