Imperial College, Paterson Centre, London, UK
Parkside Health NHS Trust, Kingsbury Community Unit, Brent, London, UK
Department of Psychological Medicine, Imperial College, London, UK
Correspondence: Sherva Cooray, Consultant Psychiatrist, Parkside Health NHS Trust, Kingsbury Community Unit, Honeypot Lane, London NW9 9QY, UK
* Paper presented at the second conference of the British and Irish Group for
the Study of Personality Disorders (BIGSPD), University of Leicester, UK, 31
January to 3 February 2001.
![]() |
ABSTRACT |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Aims To evaluate the reliability of GAF in the assessment of learning disability.
Method GAF reliability was tested by simultaneous multiple rating of unselected case vignettes (n=19-25) from health professionals of different disciplines, under controlled conditions. Analysis of reliability was made with the intraclass correlation coefficient (R1) with separate assessments to determine rater bias and individual performance of raters.
Results The results of three data-sets showed generally poor overall levels of agreement, with R1 levels of 0.35 and 0.28 and somewhat better levels for current GAF scores (R1=0.49). However, a subset of raters was identified that achieved much higher levels (R1=0.54 to 0.74).
Conclusions The GAF, in its current format, is not reliable enough to be used in the routine assessment of learning disability. A subgroup of raters, however, have ratings that are, by current biostatistical criteria, sufficiently reliable.
![]() |
INTRODUCTION |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
![]() |
METHOD |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Each phase of the study included the following stages: the selection of vignettes; explanation of the scoring system and of the completion of ratings; and analysis of data.
In a first phase, preliminary testing of a modified form of the GAF scale with more tightly defined anchor points (Hall, 1995) was carried out on 48 vignettes of clients with mild to moderate learning disability by 19 raters. In a second preliminary phase, the original GAF scale was used and training given to all 25 raters. The second data-set included 38 case vignettes of clients with severe learning disability. Although the 38 case vignettes were prepared to specific World Health Organization (2002) guidelines, not all provided information on the clients' current clinical presentation so that only the worst symptomatology scores were recorded for this data-set.
Selection of vignettes
Case vignettes were selected from the case-load of 12 senior psychiatrists
to represent the heterogeneous psychopathology in people with learning
disability. This process ensured that there was a representative selection of
case material that was heterogeneous in nature but which correctly reflected
current practice and documentation in the catchment area. The psychiatrists
were asked to include a summary of the presenting problem, history findings
and course and treatmentresponse information, although the last of
these was optional.
Scoring procedure
The vignettes were assessed independently and simultaneously by 19
professionals in a first phase (Table
1) and 25 in a second phase
(Table 2). In the first phase,
all participants received written course material and 2 hours' common
introduction to scoring the Modified GAF scale. In the second phase, they
received written course material and 2 hours' common introduction to the
scoring of the original GAF. The training emphasised that both scales were
continuous and the anchor points were only guides; and that although all forms
of disability and symptomatology should be assessed, some allowances should
normally be made for the intellectual level of the subject concerned when
scoring her/his function. For each vignette, during the first phase the
assessor was asked to record the GAF score both currently and at the time of
greatest dysfunction or worst score (the choice about this time being left to
the assessor). During the second phase, the assessor was asked to record only
the worst score.
|
|
Analysis of data
All data were analysed for interrater reliability using the intraclass
correlation coefficient (Bartko,
1966). This is appropriate for the assessment of continuous data
and allowance is made for chance association in calculating agreement. Using a
computer program BigRi (Cicchetti &
Showalter, 1988), both overall levels of agreement and rater bias
were assessed for the raters. We also applied a new reliability statistic that
assesses examiner agreement and bias in ratings on a case-by-case basis
(Cicchetti et al,
1997,
1999;
Cicchetti & Showalter,
1997; Baca-Garcia et
al, 2001). The step-by-step method for data analysis is
described in Table 3.
|
![]() |
RESULTS |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
![]() |
DISCUSSION |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
There was considerable rater bias in the assessments of GAF scores, with a wide variation between mean scores for each rater. The variation was associated with poorer agreement. The fact that there was concordance between reliable and unreliable raters suggests that the achieving of good and poor reliability is not a chance event and is probably accounted for by different perceptions of the GAF scale in its current form.
The findings are similar to those of Loevdahl & Friis (1996), who estimated the level of GAF agreement with 104 raters from 6 therapeutic centres in their assessment of 5 clinical case vignettes. Systematic differences between centres were up to 6 points, and the authors concluded that GAF reliability was unsatisfactory in routine clinical settings. However, Rey et al (1995), using well-trained raters, reported interrater reliability ranging from 0.83 to 0.87 for the GAF of general psychiatric patients in a clinical setting. The reliability and the validity of the GAF was also tested by Jones et al (1995) with psychiatric patients, and their trained raters had an interrater reliability score of 0.72 for the GAF in total.
Several methods could improve agreement in learning disability. These include:
We conclude that, although in its present form the GAF scale is not suitable for general learning disability use, it is none the less possible to identify from among a larger pool of independent examiners those whose ratings are, by current biostatistical criteria, sufficiently reliable for both clinical and research applications. Specifically, we have been able to find and crossvalidate subsets of reliable raters (RI values between 0.53 and 0.74) from among a larger pool of clinical examiners.
![]() |
Clinical Implications and Limitations |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
LIMITATIONS
![]() |
APPENDIX |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Problems include:
The above problems have been present over most of his life since adolescence. Longitudinal monitoring of his behaviour indicates that there is a definite waxing and waning of the intensity, and the pattern appears to be cyclical regardless of environmental and other variables. Functional analysis demonstrates that there is also a clear relationship to attention-seeking and staff changes.
History
C comes from a close-knit but disorganised, large family. Very little is
known about his natural father who left home when C was an infant. Early
history is sparse, except that his mother had a prolonged labour. He was
described as slow and difficult from childhood. Speech was limited to the odd
word and noises. At the long-stay institution he continued to be disruptive
and aggressive towards other people. From the age of 12 he was sexually active
and needed constant supervision in the mixed children's ward to prevent
attacks on both male and female children. He was admitted to a community
children's unit for people with severe learning disability (National Health
Service) and subsequently to an assessmenttreatment facility where he
has remained in view of his complex needs. Intensive work within the unit has
resulted in considerable improvement of his activities of daily living and
communication.
Findings
On examination, C is a well-built man who is likely to be intimidating to
strangers or, alternatively, over-friendly. He has no dysmorphic features. He
has limited eye contact and is able to communicate his basic needs using
single words or very short sentences in conjunction with Makaton signs.
Attention span is limited. He likes repetitive movements and flicking as well
as ritualistic tapping and slapping. Likes playing with his bodily fluids.
Does not like changes in routine, repeats the same words and sounds. He enjoys
music, especially rhythms with a strong beat. Periodically he becomes
persistently over-excited, when meaningful communication is replaced by
increased episodes of hooting, screaming and constant slapping as well as
sexual over-arousal. At such times his sleep pattern becomes even more
disrupted, reducing from about 3-5 hours at night to sometime less than 1
hour. Despite this he does not appear to be tired. Since his speech improved,
staff have commented that he goes through his whole repertoire of language
parrot-fashion repeatedly. Self-injurious behaviour is common and he appears
to have a very high pain threshold.
Course
Management has particular emphasis on social-skills training. The behaviour
problems have responded in a limited way as a result of the specialist input,
structure and discipline, within the unit. Nevertheless, he continues to need
intensive supervision at all times and has been detained under Section 3 of
the Mental Health Act since 1990, following a serious physical attack on a
fellow resident. The cyclicity of his hyperactivity inclusive of escalation of
behaviour problems and sleep disorder has been much reduced by the current
regimen of medication.
![]() |
ACKNOWLEDGMENTS |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
![]() |
REFERENCES |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Baca-Garcia, E., Blanco, C., Saiz-Ruiz, J., et al (2001) Assessment of reliability in the clinical evaluation among investigators in a multi-center clinical trial. Psychiatry Research, 102, 163-173.[CrossRef][Medline]
Bartko, J. J. (1966) The intraclass correlation coefficient as a measure of reliability. Psychological Reports, 19, 3-11.[Medline]
Bech, P., Haaber, A., Joyce, C. B., et al (1986) Experiments on clinical observation and judgement in the assessment of depression: profiled videotapes and judgement analysis. Psychological Medicine, 16, 873-883.[Medline]
Cicchetti, D. V. & Sparrow, S. S. (1981) Developing criteria for the rating of specific items in a given inventory. American Journal of Mental Deficiency, 86, 127-137.[Medline]
Cicchetti, D. V. & Showalter, D. (1988) A computer program for determining the reliability of dimensionally scaled data when the numbers and specific sets of examiners may vary at each assessment. Educational and Psychological Measurement, 48, 717-720.
Cicchetti, D. V. & Showalter, D. (1997) A computer program for assessing inter-examiner agreement when multiple ratings are made on a single subject. Psychiatry Research, 72, 65-68.[CrossRef][Medline]
Cicchetti, D. V., Showalter, D. & Rosenheck, R. (1997) A new method for assessing inter-examiner agreement when multiple ratings are made on a single subject: applications to the assessment of neuropsychiatric symptomatology. Psychiatry Research, 72, 51-63.[CrossRef][Medline]
Cicchetti, D. V., Rosenheck, R., Showalter, D., et al (1999) Inter-rater reliability levels of multiple clinical examiners in the evaluation of a schizophrenic patient. Quality of life: level of functioning and neuropsychological symptomatology. Clinical Neuropsychologist, 13, 157-170.[Medline]
Endicott, J., Spitzer, R. L., Fleiss, J. L., et al (1976) The Global Assessment Scale. Archives of General Psychiatry, 33, 766-771.[Abstract]
Hall, R. C. (1995) Global Assessment of Functioning a modified scale. Psychosomatics, 36, 267-275.[Abstract]
Hjortso, S., Butler, B., Clemmesen, L., et al (1989) The use of case vignettes in studies of inter-rater reliability of psychiatric target syndromes and diagnoses a comparison of ICD-8, ICD-10 and DSMIII. Acta Psychiatrica Scandinavica, 80, 632-638.[Medline]
Jones, S. H., Thornicroft, G., Coffey, M., et al (1995) A brief mental health outcome scale-reliability and validity of the Global Assessment of Functioning (GAF). British Journal of Psychiatry, 166, 654-659.[Abstract]
Loevdahl, H. & Friis, S. (1996) Routine evaluation of mental health: reliable information or worthless guesstimates? Acta Psychiatrica Scandinavica, 93, 125-128.[Medline]
Luborsky, L. (1962) Clinicians' judgements of mental health. A proposed scale. Archives of General Psychiatry, 7, 407-417.
Rey, J. M., Starling, J., Wever, C., et al (1995) Inter-rater reliability of global assessment of functioning in a clinical setting. Journal of Child Psychology & Psychiatry & Allied Disciplines, 36, 787-792.[Medline]
Tyrer, P., Evans, K., Gandhi, N., et al
(1998) Randomised controlled trial of two models of care for
discharged psychotic patients. BMJ,
316,
106-109.
World Health Organization (2002) The International Classification of Mental and Behavoural Disorders (ICD-10 Chapter V) Educational Kit (App. 2). http://www.who.int/msa/ems/icd10/icd10ekit/intro.htm# contents.