1 Department of Psychology, University of Aberdeen, Kings College, Aberdeen AB24 2UB, UK. 2 Scottish Clinical Simulation Centre, Stirling Royal Infirmary, Livilands Gate, Stirling FK8 2AU, UK 3 Department of Anaesthesia, Aberdeen Royal Infirmary, Foresterhill, Aberdeen AB25 2ZN, UK
Declaration of interest: The ANTS system was developed under research funding from the Scottish Council for Postgraduate Medical and Dental Education, now part of NHS Education for Scotland, through grants to the University of Aberdeen from September 1999 to August 2003. The views presented in this paper are those of the authors and should not be taken to represent the position or policy of the funding body.
Accepted for publication: January 17, 2003
![]() |
Abstract |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Method. The Anaesthetists Non-Technical Skills (ANTS) system prototype comprises four skill categories (task management, team working, situation awareness, and decision making) divided into 15 elements, each with example behaviours. To investigate its experimental validity, reliably and usability, 50 consultant anaesthetists were trained to use the ANTS system. They were asked to rate the behaviour of a target anaesthetist using the prototype system in eight videos of simulated anaesthetic scenarios. Data were collected from the ratings forms and an evaluation questionnaire.
Results. The results showed that the system is complete, and that the skills are observable and can be rated with acceptable levels of agreement and accuracy. The internal consistency of the system appeared sound, and responses regarding usability were very positive.
Conclusions. The findings of the evaluation indicated that the ANTS system has a satisfactory level of validity, reliability and usability in an experimental setting, provided users receive adequate training. It is now ready to be tested in real training environments, so that full guidelines can be developed for its integration into the anaesthetic curriculum.
Br J Anaesth 2003; 90: 5808
Keywords: anaesthetists; education, training
![]() |
Introduction |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
It is first necessary to identify the skills required for a specific job (and operational environment) using appropriate task analysis techniques.1113 It is also essential to be able to assess these skills explicitly to provide structured feedback about performance14 and to allow training effectiveness to be evaluated.15 To meet a similar need for objective and transparent methods assessing CRM (non-technical) skills, the aviation industry developed behavioural marker systems.16 17 Behavioural markers are observable, non-technical behaviours that contribute to superior or substandard performance within a work environment.17 Derived from empirical data, they are usually developed into structured skill taxonomies and combined with a rating scale to allow the skills, which are demonstrated through behaviour, to be assessed by trained, calibrated raters. In addition to providing a tool for assessing aspects of performance traditionally judged on gut feeling, behavioural marker systems supply a common language for discussing non-technical skills and can function as frameworks to structure teaching and debriefing.
Behavioural markers are already being used in the medical domain.10 18 19 However, these tools have mainly been developed outside the UK and from existing aviation systems, e.g. the Line/Line Operational Simulation Checklist (LLC),20 for specific purposes, e.g. to investigate team performance,20 and to measure particular aspects of performance, e.g. crisis management.19 Cultural differences at the organizational, professional or national level have been found to have a considerable impact on crew resource management attitudes and behaviour,20 and so should be taken into account when developing a behavioural marker system. Until now, there have been very few attempts to design a marker system for anaesthetists non-technical skills from first principles, based on a systematic analysis of their task requirements, and none of these have been conducted in the UK. Fewer studies still have sought to evaluate empirically the measurement properties of such behavioural marker systems, yet unless the behavioural marker system is valid and reliable it has little value as an assessment tool.17 21 The aim of this study was to investigate the experimental validity, reliability and usability of the Anaesthetists Non-Technical Skills (ANTS) system.
![]() |
Method |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
|
|
|
It is widely recognized that raters need to be trained in order to assess non-technical skills.17 36 37 This is particularly important for the ANTS system, where users do not have knowledge of the system and are not experienced in making explicit assessments of non-technical skills. While necessarily constrained because of time availability, the training package was developed to address components previously established as being effective with behavioural performance measures.36 Training was provided by a psychology researcher with assistance from a consultant anaesthetist. The course consisted of: (i) background on human factors and non-technical skills, including information about human error, threat management and crew resource management training; (ii) an introduction to the ANTS system and how to make behavioural assessments, which included detailed descriptions of the categories, elements and behavioural markers, supported by showing video snippets of examples; and (iii) instructions for rating non-technical skills, possible biases to avoid, and practice in scoring two scenarios with the full system and rating scale. Importantly, participants were told that, while the layout of the categories and elements in the table may suggest a temporal sequence, this is not necessarily meant to reflect an ordering priority when making their observations. Participants were sent a booklet describing the full ANTS system in advance and were able to use this for reference throughout the evaluation. No attempt was made to calibrate the raters to a standard scoring. Not only would this have taken a considerable amount of time, but it would also have resulted in an evaluation of the calibration process and the ANTS system, not just of the ANTS system. Hence a calibration phase was excluded to prevent compromising the data.
Materials for data collection were a set of rating forms and an evaluation questionnaire. The rating forms showed the ANTS system elements and categories with a four-point scale (Fig. 3). Each point of the scale had a descriptor to provide guidance on when it should be used. An additional rating option of not observed was provided for when the skills could not be identified in a particular situation, because either they did not need to be used or they could not be detected from behaviour. Separate element and category rating forms were supplied for each scenario. The evaluation questionnaire was divided into five parts plus an other comments section: (i) 10 general questions about the completeness and design of the system and the observability of the non-technical skills; (ii) four questions asking about the rating scale; (iii) five questions about the training; (iv) three questions about the video scenarios; and (v) five questions about the role of the ANTS system. A separate background information questionnaire was used to collect data on experience as a consultant anaesthetist, involvement in training, and assessment and knowledge of human factors.
|
Procedure
Each session consisted of approximately 4 h of training, as described above, followed by 3 h for rating the eight experimental video scenarios. As practised in the training phase, ratings were made of the non-technical skills of the main anaesthetist in each scenario using the ANTS system rating forms. Participants were instructed to watch the scenario first, if necessary making notes of key behaviours observed, and, once the scenario was over, to rate the observed elements on the rating form. Having scored the element, participants rated the higher-level ANTS categories. All ratings were made individually and participants were not permitted to discuss their scores with others. At the end of the session, participants completed the evaluation questionnaire.
Data from the rating forms were transferred into SPSS (Statistical Package for Social Sciences; SPSS, Chicago, IL, USA) and data from the questionnaires into Microsoft Access and Excel. A number of analyses were conducted on the ratings data according to the hypothesis being tested (Table 1). The nature of the ratings data was such that for most of the analyses each scenario was examined separately (scores were expected to vary across the different scenarios and so averaging them would render them meaningless). To provide an overall result to test the hypotheses, an average was taken of the results from each of the eight scenarios.
![]() |
Results |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
As a considerable amount of data was analysed,38 only the key findings for each evaluation criteria (validity, reliability and usability) are described. These are shown in Tables 24.
|
|
![]() |
Discussion |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
The reliability of the prototype was investigated from a number of perspectives. In the ANTS system, it was expected that the elements within each category would be closely related to each other (internal consistency) and that the individual elements would be related to their own categories better than to other categories. The results from the study support this and show that the structure of the ANTS system appears sound.
Inter-rater reliability is of particular concern amongst practitioners. In spite of their limited familiarity with the ANTS system, participants were still able to use the system with a reasonable level of agreement. Indeed, the levels reached were higher than might have been expected, especially with such a large sample. Across the whole system, Situation Awareness showed a slightly lower level of agreement than other categories. This is not surprising as it is a cognitive skill that makes observable behaviours more difficult to detect, and it is not a concept currently described in UK anaesthetic training. The levels of rater agreement for the ANTS system shown in this preliminary test exercise are obviously not as high as recommended for trained non-technical skill assessors.37 Nevertheless, they provide a good indication of the basic reliability afforded by the system with minimal training (half a day). When users become more familiar with both the system and the rating task, inter-rater reliability can be expected to improve. Research into behavioural marker systems has shown that, with comprehensive training and calibration (23 days for people already familiar with human factors concepts), inter-rater agreement can be increased to above 0.7.37 For the ANTS system, it will be important to establish the amount of training that is practical whilst allowing users to develop the skills to use the tool at the necessary level of inter-rater reliability.
The last psychometric property investigated in this evaluation is rater accuracy. This is the degree of raters agreement with the baseline reference. The levels of accuracy achieved by the consultants using the ANTS system were acceptable; that is, averaged across scenarios, 8897% of raters matched the reference rating to within 1 scale point (Table 3). The participants suggested that limitations in accuracy occurred as a consequence of not knowing where to set the boundaries for each scale point and should therefore be resolvable with training and calibration. A previous study identified difficulties for intra-rater reliability when raters had to average performance across longer periods.19 This could yet be encountered when the ANTS system is used in real training situations.
|
From the results of this evaluation, the ANTS system appears not only to have a high level of acceptability but also to provide a reasonable level of reliability and accuracy when used by anaesthetists in an experimental setting to rate non-technical skills demonstrated in simulator scenarios. There were some limitations in the study. The first is the use of scripted videos viewed in a controlled setting rather than live anaesthetic situations. Hence the current results refer to experimental evaluation and not to real-world testing. The second is that the participants in the study could only be given limited training and were not permitted to calibrate their ratings. This was reflected by their feelings of unfamiliarity with the system, the presence of boundary errors, and the level of rater agreement. Nonetheless, taken together, these experimental results show that the ANTS system has a satisfactory basic level of validity, reliability and usability. Once field testing has been undertaken and proper guidelines for its implementation have been produced, the ANTS system can become an important tool for non-technical skills training in anaesthesia, supporting non-technical skills training and simulator-based human factors courses. It could also be used as a measure to allow the effectiveness of such training to be evaluated.39 Long-term feedback on the use of the system will allow broader conclusions to be drawn about its operational validity, and from these it will be possible to make recommendations about more advanced use.
![]() |
Appendix 1 |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
|
![]() |
References |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
2 Helmreich R. Managing human error in aviation. Sci Am 1997; 276: 405
3 Weiner E, Kanki B, Helmreich R, eds. Cockpit Resource Management. San Diego, California: Academic Press, 1993
4 Helmreich RL. The evolution of Crew Resource Management. Paper presented at the IATA Human Factors Seminar, October 31, 1996, Warsaw, Poland; 5
5 Flin R, OConnor P, Mearns K. Crew Resource Management: enhancing team performance in high reliability industries. Team Perform Manag 2002; 8: 6878[CrossRef]
6 Pizzi L, Goldfarb N, Nash D. Crew Resource Management and its applications in medicine. In: Shojania K, Duncan B, McDonald K, Wachter R, eds. Making Health Care Safer: A Critical Analysis of Patient Safety Practices. Washington, DC: Agency for Healthcare Research and Quality, 2001; Chapter 44: 51119
7 Howard SK, Gaba DM, Fish KJ, Yang GS, Sarnquist FH. Anesthesia crisis resource management training: teaching anesthesiologists to handle critical incidents. Aviat Space Environ Med 1992; 63: 76370[ISI][Medline]
8 Maran N, Glavin R, Fletcher GCL. Training in human factors for anaesthetists in Scotland: identifying key skills and developing a training programme. Proceedings of the 7th European Forum on Quality Improvement in Health Care, Mar 2123 2002, Edinburgh, Scotland. London: BMJ Publishing Group, 2002
9 Sexton B, Marsch S, Helmreich R, Betzendoerfer D, Kocher T, Scheidegger D. Participant evaluation of Team Oriented Medical Simulation, In: Henson LC, Lee AC, eds. Simulators in Anesthesiology Education. New York: Plenum, 1998: 1078
10 Small SD, Wuerz RC, Simon R, Shapiro N, Conn A, Setnik G. Demonstration of high-fidelity simulation team training for emergency medicine. Acad Emerg Med 1999; 6: 31223
11 Kirwan B, Ainsworth LK, eds. A Guide to Task Analysis. London: Taylor & Francis, 1992
12 Seamster TL, Kaempf GL. Identifying resource management skills for airline pilots. In: Salas E, Bowers C, Edens E, eds. Improving Teamwork in Organisations: Applications of Resource Management Training. Mahwah, New Jersey: Laurence Erlbaum, 2001; 930
13 Seamster TL, Redding RE, Kaempf GL. Applied Cognitive Task Analysis in Aviation. Aldershot: Avebury Aviation, 1997
14 Greaves JD, Grant J. Watching anaesthetists work: using the professional judgement of consultants to assess the developing clinical competence of trainees. Br J Anaesth 2000; 84: 52533[Abstract]
15 Murray E, Gruppen L, Catton P, Hays R, Woolliscroft JO. The accountability of clinical education: its definition and assessment. Med Educ 2000; 43: 8719[CrossRef]
16 Flin R, Martin L. Behavioural markers for Crew Resource Management: a survey of current practice. Int J Aviat Psychol 2001; 11: 95118[CrossRef][ISI]
17 Klampfer B, Flin R, Helmreich RL, et al. Enhancing performance in high risk environments: recommendations for the use of behavioural markers. Ladenburg: Daimler-Benz Shiftung, 2001. Downloadable version available from: www.psyc.abdn.ac.uk/serv02: 10
18 Helmreich RL, Schaefer H-G, Sexton JB. Operating room checklist. Aerospace Crew Resource Project Technical Report 954. Austin, Texas: University of Texas, 1995
19 Gaba DM, Howard SK, Flanagan B, Smith BE, Fish KJ, Botney R. Assessment of clinical performance during simulated crises using both technical and behavioral ratings. Anesthesiology 1998; 89: 12347[CrossRef]
20 Helmreich RL, Merritt AC. Culture at Work in Aviation and Medicine. Aldershot: Ashgate, 1998
21 Holt RW, Boehm-Davis DA, Beaubien JM. Evaluating resource management training. In: Salas E, Bowers C, Edens E, eds. Improving Teamwork in Organisations: Applications of Resource Management Training. Mahwah, New Jersey: Laurence Erlbaum, 2001; 16588
22 Fletcher G, Flin R, McGeorge P, Glavin R, Maran N, Patey R. Rating non-technical skills: developing a behavioural marker system for use in anaesthesia. Cognition Technology and Work. (Submitted)
23 Fletcher G, Flin R, McGeorge P, Glavin R, Maran N, Patey R. Final report: the identification and measurement of anaesthetists non-technical skills. University of Aberdeen Grant Report for SCPMDE, 2001. Aberdeen: University of Aberdeen, 2001
24 Altmaier EM, From RP, Pearson KS, Gorbatenko-Roth KG, Ugolini KA. A prospective study to select and evaluate anesthesiology residents: phase 1, the critical incident technique. J Clin Anesth 1997; 9: 62936[CrossRef][ISI][Medline]
25 Department of Anaesthesia, University of Basel-Kantonspital. Kommunikations-Status (KOMSTAT), Operationssal-Beobachtungen, Ver. 2.0, 07/97. Personal communication, 2000; available from Swiss anaesthesia server: http://www.medana.unibas.ch [in German]
26 Flanagan JC. The Critical Incident Technique. Psychol Bull 1954; 51: 32758[ISI]
27 Klein GA, Calderwood R, MacGregor D. Critical decision method for eliciting knowledge. IEEE Trans Syst Man Cybern 1989; 19: 46272[CrossRef][ISI]
28 Strauss A, Corbin J. Basics of Qualitative Research: Grounded Theory Procedures and Techniques. Newbury Park, California: Sage Publications, 1990
29 Flin R, Goeters KM, Hormann J, Martin L. A generic structure of non-technical skills for training and assessment. Proceedings of 23rd Conference of the European Association for Aviation Psychology, Sept 1418, 1998, Vienna, Austria. Vienna, 1998
30 Flin R, Fletcher G, McGeorge P, Sutherland A, Patey R. Anaesthesists attitudes to teamwork and safety. Anaesthesia 2003; 58: 23342[CrossRef][ISI][Medline]
31 OConnor P, Hörmann H-J, Flin R, Lodge M, Goeters K-M, JARTEL Group. Developing a method for evaluating Crew Resource Management skills: a European perspective. Int J Aviat Psychol 2002; 12: 26385[CrossRef][ISI]
32 James LR, Demaree RG, Wolf G. Estimating within-group interrater reliability with and without response bias. J Appl Psychol 1984; 69: 8598[CrossRef][ISI]
33 James LR, Demaree RG, Wolf G. rwg. An assessment of within-group interrater agreement. J Appl Psychol 1993; 78: 3069[CrossRef][ISI]
34 Johnson PJ, Goldsmith TE. The importance of quality data in evaluating aircrew performance. US Federal Aviation Authority Technical Report, 1998. Available from Federal Aviation Authority website: www.faa.gov/avr/afs/aqphome
35 Goldsmith TE, Johnson PJ. Assessing and improving evaluation of aircrew performance. Int J Aviat Psychol 2002; 12: 22340[CrossRef][ISI]
36 Baker DP, Mulqueen C, Dismukes RK. Training raters to assess resource management skills. In: Salas E, Bowers C, Edens E, eds. Improving Teamwork in Organisations: Applications of Resource Management Training. Mahwah, New Jersey: Laurence Erlbaum, 2001; 13145
37 Williams DM, Holt RW, Boehm-Davis DA. Training for inter-rater reliability: baselines and benchmarks. Proceedings of the 9th International Symposium on Aviation Psychology, Apr 27May 1, 1997, Columbus, Ohio. Columbus, Ohio: Ohio State University, 1997: 51420
38 Fletcher G, Flin R, McGeorge P, Glavin R, Maran N, Patey R. Evaluation of the prototype Anaesthesists Non-Technical Skills (ANTS) behavioural marker system: WP7 Experimental Report. University of Aberdeen Technical Report for NHS Education for Scotland, 2002. Aberdeen: University of Aberdeen, 2002
39 Byrne AJ, Greaves JD. Assessment instruments used during anaesthetic simulation: review of published studies. Br J Anaesth 2001; 86: 44550