Editorial I

Development and use of scoring systems for assessment of clinical competence

R. J. Glavin and N. J. Maran

Scottish Clinical Simulation Centre Stirling Royal Infirmary Livilands Gate Stirling FK8 2AU UK

The assessment of clinical competence is one of the greatest challenges facing medicine today. A useful construct for the assessment of competence is provided in the Miller pyramid (Fig. 1).1



View larger version (9K):
[in this window]
[in a new window]
 
Fig 1 The Miller pyramid.

 
The first two stages, ‘knows’ and ‘knows how’, can be assessed using the traditional assessment tools of written and oral tests. However, ‘knowing’ and ‘knowing how’ clearly do not necessarily extrapolate to the application of knowledge in the workplace. To demonstrate clinical competence, assessment at levels 3 and 4 becomes more important, but also more challenging. Level 3, ‘shows how’, is currently assessed by practical examinations, observed long or short cases, or OSCE style examinations. However, the only way to assess level 4, ‘does’, is to observe the practitioner at work in the real world.

Assessment tools developed for this purpose should allow us to compare performance with some pre-existing standard. They should also allow us to identify deficiencies in the performance of the person being assessed, so that subsequent training can be targeted to the areas of greatest need. This requires that we break performance down into more manageable and identifiable components, and this is normally achieved using scoring systems.

The first challenge is therefore to identify the important constituent parts of good medical practice that should make up the elements of the scoring system. We can examine our current practice and seek out those components necessary for good performance. In this issue, Forrest and colleagues2 describe the use of a modified Delphi technique to develop a scoring system for assessment. First used in the 1950s,3 this technique seeks to gain a consensus from a group of experts in response to an open-ended question. It has become popular in a wide range of fields, including economics and social policy as well as health-care. The Delphi technique involves sending an initial questionnaire to a group of identified experts. This will generate a variety of responses and ideas, and these are collated and form the basis of the second questionnaire, which is sent to the same group of experts. Subsequent responses are handled statistically to produce relative frequencies, and continued review by the expert group gives them the opportunity to change their mind or include any other items. This process is repeated until consensus is reached, which may require as many as 10 rounds, although this is often reduced when the modified technique is used. Keeney and colleagues have recently published a critical review of the use of the technique in nursing research.4 While the Delphi technique works well to identify the key technical skills involved in a process such as rapid sequence induction, one of the drawbacks of asking experts to identify all attributes of clinical competence is that they may not be conscious of the full range of these components. Many aspects of expert performance have become intuitive5 and may not be readily accessible by individuals. Much of anaesthetic practice involves a cognitive component, which is also not readily accessible using techniques such as a Delphi process. This poses the question of how complete the Delphi technique can be in providing the material from which a scoring system to assess clinical competence will be compiled. Any scoring system that attempts to address the assessment of clinical competence clearly has to address both technical and non-technical skills.

The technique of cognitive task analysis is designed to access the key non-technical skills which underpin performance, such as decision making, problem solving, attention allocation, planning and workload management. It uses a combination of methods, such as observation and interview, to ‘unpack and make explicit the expert knowledge that is often implicit and difficult for analysis to observe or for experts to verbalise’.6 These methods are clearly more time-consuming and specialist, but have been used extensively in other high-reliability industries and have been used more recently in anaesthesia to identify key components.7

The identified components must be assembled into a scoring system that satisfies the characteristics of a good assessment tool, namely, validity, reliability and feasibility.8 The scoring system must then undergo pilot testing to demonstrate that it has these characteristics. Forrest and colleagues have addressed all of these issues.2 The content validity (the extent to which the assessment representatively samples what it is supposed to be measuring) of their scoring system is high, given the methods used to develop their tool. Construct validity (the extent to which a new measure is related to specific variables in accordance with a hypothetical construct) is also high, given that they were able to demonstrate improved scores with increasing clinical experience. They also were able to demonstrate good inter-rater reliability, using multiple assessors to score the same performance. They have demonstrated the potential role of the simulator in the testing of an assessment tool, and they established the feasibility of using their scoring system to rate performance from videos of performance in the simulator. However, such a detailed scoring system may be more challenging to use during real-time observation in clinical practice.

The final challenge of developing a new assessment tool is to ensure that the subjects have confidence not only in the tool itself but also in those carrying out the assessment. This becomes increasingly important as we venture into assessment outside the traditional framework of knowledge and skills, and as the stakes involved increase. This requires that those carrying out the assessment have had sufficient training in its application to ensure not only that the assessment is fair but also that it is seen to be fair. The higher the stakes, the more important it is that all of these challenges have been met properly. Judgement of clinical incompetence and its impact on the career of the subject must be defensible from legal challenge. The assessment systems of the future will not only be used to ensure the competence of our trainees, but they will also stand to become the tools of revalidation. Every one of us will then be subjected to them. It is time to take assessment seriously.

References

1 Miller GE. The assessment of clinical skills/competence/performance. Acad Med 1990; 65 (Suppl): S63–7[ISI][Medline]

2 Forrest FC, Taylor MA, Postlethwaite K, Aspuell R. Testing validity of a high fidelity stimulator for assessment of performance: development of a technical performance scoring system and its application in the assessment of novice anaesthetists. Br J Anaesth 2002; 88: 338–44[Abstract/Free Full Text]

3 Grbich C. Qualitative Research in Health: an Introduction. London, UK: Sage Publications, 1999

4 Keeney S, Hasson F, McKenna HP. A critical review of the Delphi technique as a research methodology for nursing. Int J Nurs Stud 2001; 38: 195–200[ISI][Medline]

5 Atkinson L. Trusting your own judgement (or allowing yourself to eat the pudding). In: Atkinson L, Claxton G, eds. The Intuitive Practitioner. Buckingham, UK: Open University Press, 2000; 53–65

6 Seamster TL, Redding RE, Kaempf GL. Applied Cognitive Task Analysis in Aviation. Aldershot, UK: Avebury Aviation, 1997

7 Fletcher G, Flin R, McGeorge P, Glavin R, Maran N, Patey R. Final Report: Development of a Behavioural Marker System for Anaesthetists’ Non-Technical Skills (ANTS). [Grant Report for SCPMDE, project reference RDNES/991/C]. Aberdeen, UK: University of Aberdeen, 2001

8 Joint Centre for Education in Medicine. The Good Assessment Guide: A Practical Guide to Assessment and Appraisal for Higher Specialist Training. London, UK: Joint Centre for Education in Medicine, 1997; 34–39