Rheumatology and Rehabilitation Research Unit, University of Leeds,
1 ARC Epidemiology Unit, University of Manchester,
2 Department of Ophthalmology, St James University Hospital, Leeds and
3 Department of Ophthalmology, The General Infirmary at Leeds, Leeds, UK
Correspondence to:
B. B. Bhakta, Rheumatology and Rehabilitation Research Unit, University of Leeds, 36 Clarendon Road, Leeds LS2 9NZ, UK.
![]() |
Abstract |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Methods. Nineteen patients fulfilling the International Study Group criteria for BD were randomly allocated, questioned and examined independently on the same day by five physicians experienced in BD.
Results. There was good agreement between the physicians' rating of oral [intraclass correlation coefficient (ICC)=0.87] and genital (ICC=0.95) ulceration, skin involvement (ICC=0.62 for pustules and ICC=0.66 for erythema nodosum), arthritis (ICC=0.62), headache (ICC=0.80), large vessel (kappa=0.53), nervous system (kappa=0.61) and eye involvement (kappa=0.77). There was poor agreement for the question relating to the presence of bloody diarrhoea (ICC=0.28). There was significant bias in the rating of fatigue by one of the physicians (F=5.2, P=0.001).
Conclusion. Overall, this instrument has good interobserver reliability for assessing general disease activity. We therefore suggest that this proforma has a place in routine clinical monitoring of patients with BD, as well as assessing outcome in therapeutic trials.
KEY WORDS: Behçet's disease, Activity, Measurement, Reliability.
![]() |
Introduction |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Previous work [3] compared two schemes that were available to assess disease activity [The Iranian Behçet's Disease Dynamic Measure (IBDDAM) [4] and a European scheme initially developed in the UK]. Interobserver and intra-observer agreement between two clinicians using both forms were assessed in 13 patients with BD as defined by the ISG criteria. Reliability depends not only on the patient's accurate recall of symptoms, but also on the clinician's interpretation of them. This study suggested that agreement between clinicians in scoring of clinical features was greater when the standard period was 28 days, as in the European form, compared with the Iranian form in which a variable time period is taken [3]. Although there was greater variability in scoring when the Iranian form was used, the opinion of the clinicians was that both forms had good aspects and that an internationally accepted activity form could be derived from them without great difficulty.
As a result of this study, a prototype form was developed incorporating aspects of both forms. This was circulated to all members of the International Scientific Committee for comments. A workshop was held in Leeds, UK, in 1994 to arrive at a consensus view about the contents of the activity form (face validity) with emphasis on the need for clarity and consistency for potential use by clinicians worldwide. The inclusion of laboratory parameters within the activity form was raised by various members of the International Scientific Committee on Behçet's Disease. Although it was agreed that inclusion of erythrocyte sedimentation rate (ESR) and C-reactive protein (CRP) measurements would not add significantly to overall measurement of disease activity [5], it was appreciated that where disease appeared to be clinically inactive, a raised ESR or CRP might prompt further investigation. There was general agreement that standardized questions should be developed for each organ system which could be readily translated for international use. The interobserver reliability of this new instrument developed from these discussions is presented here.
![]() |
Materials and methods |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Instrument
The BDCAF scores oral and genital ulceration, skin, joint and gastrointestinal involvement, presence of fatigue and headache according to the duration of symptoms. The presence and type of large-vessel and central nervous system (CNS) involvement are documented. Eye activity was deemed present if there was a history of blurring of vision or if the eye was painful or red. All patients were examined by an ophthalmologist who completed the Behçet's Oculopathy Index. This index was subject to a separate reproducibility evaluation in a subsequent study. In addition, patients were asked to rate on a seven-point scale how active they felt their BD disease had been over the preceding 4 weeks and on the day of assessment. The clinicians also completed a seven-rating scale to assess their opinion of overall activity. Only new symptoms over the preceding 4 weeks that the clinicians felt were due to BD were scored. Standardized questions were developed for all parts of the form. For use during routine clinical practice, changes to current medication could also be documented. The layout and instructions for scoring are shown in Fig. 1a and b, respectively.
|
Statistical analysis
The testretest reliability of the scoring was assessed with respect to two properties: bias and agreement. The presence of bias relates to the systematic deviation between observers in their scoring patterns, while the level of agreement reflects the extent of random differences in scoring between the five observers. A high level of reliability for the form constitutes a high level of agreement in the score for each organ system in the absence of bias.
The level of agreement for each organ system was assessed by calculating the kappa statistic. This is the chance-corrected measure of agreement with values close to one indicating a high level of agreement (Table 1). For dichotomous variables, such as assessment of eye, nervous system and major vessel involvement, the calculation of the kappa statistic between two observers is a simple procedure. An overall kappa for all five observers was also calculated.
|
The possibility of bias was also considered separately for each organ system. The proportions of patients who were scored as having involvement within each of the dichotomous variables (eye, CNS and major vessel involvement) were compared using the 2 statistic (values >3.84 indicating bias). For the organ systems that were rated using a five-point scale, the distribution of scoring was compared using the ANOVA method to detect any clinician consistently scoring higher or lower for an organ system. The strength of any bias is given as a P value. Finally, comparison of the clinicians' and patients' perception of overall disease activity was calculated using the ICC.
![]() |
Results |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
|
|
|
|
![]() |
Discussion |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Several instruments have been developed worldwide incorporating broadly similar organ system subscales. In some instruments, the measurement of disease activity relies solely on clinical features [2], while others include laboratory investigations, and changes in body weight and temperature. Although changes in body weight and temperature may indicate systemic activity, we have found them neither specific nor sensitive enough as markers for disease activity. Similarly, haemoglobin and ESR do not correlate well with activity [5].
In common with many other rheumatic diseases, the clinical features in BD vary considerably over time. In order to document this variation, new clinical features present over the preceding 28 days are scored using the BDCAF. This represents a compromise between assessing disease activity based on (a) clinical features on the day of assessment, which may be unrepresentative of overall disease activity and (b) clinical features present over a longer time period, as in the IBDDAM, which reduces reliability in terms of accurate recall of symptoms by the patient.
The disease activity rating for oral and genital ulceration, and skin lesions using the BDCAF relies solely on the duration of symptoms and does not take into account the size or number of lesions present, which might also reflect activity. Unfortunately, although documentation of the latter may be more representative of activity, its reliability is likely to be poor because of the difficulty for patients in recalling these symptoms accurately (the number and size of ulcers/skin lesions). There was good agreement between assessors in the scoring of oral and genital ulceration, and joint and skin manifestations using duration of symptoms alone. Although duration alone may not encapsulate all the features of activity in these organ systems, this scoring appears to be relatively simple to use, reliable and free of bias. Fatigue is a common problem in patients with BD, although it is not known how it correlates with the other clinical features. Assessor 3 consistently rated the fatigue symptoms lower than the other four assessors. The results from this study confirm that clinicians may differ in whether they attribute fatigue to BD or to other conditions that may co-exist, such as fibromyalgia. Although the presence of fibromyalgia was not specifically identified in this study, further studies are needed to quantify fibromyalgic symptoms in patients with BD.
Surrogate indicators are required for lesions that are less visible. Routine direct scoring of gastrointestinal (GI) activity is difficult. The scale to assess GI tract activity is based on two questions designed to identify upper and lower GI tract inflammation. The scoring represents a compromise between ease of monitoring activity and the accuracy with which the answers to the questions reflect inflammation. The lack of agreement in rating the presence and duration of bloody diarrhoea suggests that it is not often easy to determine from history alone whether this symptom relates to mucosal inflammation or is merely the result of a combination of other conditions such as drug-related diarrhoea (e.g. colchicine) and bleeding haemorrhoids. This item was included in the BD proforma as a result of its use in a proforma for measuring clinical activity in inflammatory bowel disease by the gastroenterology service. Patients in whom mucosal inflammation was suspected were referred to the gastroenterology service for further investigation and advice.
The need for surrogate indicators of activity also applies to large-vessel and nervous system involvement. Currently, MRI does not have a role in the monitoring of neurological disease activity as MRI lesions can be identified in patients with [10] and without [11, 12] clinical evidence of CNS involvement. Therefore, assessment of CNS involvement in BD is based on clinical features. As CNS involvement may be a solitary event (e.g. stroke) or relapsing (e.g. aseptic meningitis), it seems appropriate to score activity based on the site of lesion (e.g. meningeal, hemispheric, basal ganglia, brain stem and spinal cord), bearing in mind that the pathophysiology of CNS involvement remains unclear. This system would have merit as the location of presumed pathology could be determined by clinical features and radiological imaging. Currently, there is not sufficient knowledge of the natural history of lesions at various sites to enable the severity of the type of involvement to be graded and, therefore, the BDCAF aims to record and categorize neurological events during the course of the disease. This study shows that while there was good agreement in terms of identifying new CNS involvement, there remained disagreement between the assessors on the probable location of lesions based on clinical history and examination. From a practical perspective, further opinion is sought from a neurologist when neurological symptoms or signs are present (particularly if they are new), which may then require the CNS score to be amended.
Although the skin pathergy reaction is highly specific for BD, there is considerable variation in the rate of positivity in patients from different geographical areas, which limits its clinical usefulness. A positive pathergy reaction is common in patients from Iran, Turkey and Japan, but rare in those from the UK, the USA and France. Although this test has diagnostic importance, there is no evidence to suggest that its presence correlates with disease activity. The main difficulties with using this test as a measure of disease activity are the lack of consensus on the procedure (e.g. the optimal number of needle pricks needed [13]) and the practicalities, in a routine out-patient clinic, of grading the reaction 2 days after administration. For these reasons, it was not included as part of the disease activity assessment in the BDCAF.
The reliability of scores obtained for the patient and physician perception of overall activity was only moderate and associated with bias. While an overall perception of activity is important, moderate reproducibility and the presence of bias may limit its usefulness.
The reliability and ease of use of this activity form have to be balanced against the validity of the questions being asked. If more serious manifestations can be predicted by readily assessed symptoms such as oral and genital ulceration, skin lesions, arthritis and superficial thrombophlebitis, then these organ system subscores could be expanded to provide a more accurate representation of activity. This study has highlighted the difficulty in reliably scoring uncommon manifestations such as large-vessel involvement, GI tract inflammation and nervous system involvement.
This new instrument offers an easy-to-complete and reliable method of assessing and documenting clinical activity in patients with BD for use in routine clinical practice. The proforma takes 510 min to complete (longer if detailed examination of the vascular tree or nervous system is required). For the purpose of research, treatment trials targeted at a specific organ system would require a more comprehensive measure of activity within that organ system. For instance, a study of treatment aimed at oral ulceration (e.g. thalidomide [14]) would not only require assessment of the duration of symptoms, but also other features which may relate to activity (number of ulcers, size of ulcers, number of crops of ulcers, site of ulcers, etc.). However, it is recommended that such detailed assessment is accompanied by this validated general disease activity instrument to alert the clinician conducting any trials to any advantageous or deleterious effects of the trial drug on other organ systems.
![]() |
Acknowledgments |
---|
![]() |
References |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|