Quality of perioperative AEP—variability of expert ratings

G. Schneider*, W. Nahm, E. F. Kochs, P. Bischoff, C. J. Kalkman, H. Kuppe and C. Thornton

Department of Anaesthesiology, Technische Universität München, Klinikum rechts der Isar, Ismaningerstr. 22, D-81675 Munich, Germany

*Corresponding author. E-mail: gerhard.schneider@lrz.tum.de

Accepted for publication: August 7, 2003


    Abstract
 Top
 Abstract
 Introduction
 Methods and results
 Comment
 References
 
Background. Previous studies suggest that auditory evoked potentials (AEP) may be used to monitor anaesthetic depth. However, during surgery and anaesthesia, the quality of AEP recordings may be reduced by artefacts. This can affect the interpretation of the data and complicate the use of the method. We assessed differences in expert ratings of the signal quality of perioperatively recorded AEPs.

Methods. Signal quality of 180 randomly selected AEP, recorded perioperatively during a European multicentre study, was rated independently by five experts as ‘invalid’ (0), ‘poor’ (1), or ‘good’ (2). Average (n=5) quality rating was calculated for each signal. Differences between quality ratings of the five experts were calculated for each AEP: inter-rater variability (IRV) was calculated as the difference between the worst and best classification of a signal.

Results. Average signal quality of 57% of the AEPs was rated as ‘invalid’, 39% as ‘poor’, and only 4% as ‘good’. IRV was 0 in only 6%, 1 in 62%, and 2 in 32% of the AEP, that is in 32% one expert said signal quality was good, whereas a different expert thought the identical signal was invalid.

Conclusions. There is poor agreement between experts regarding the signal quality of perioperatively recorded AEPs and, as a consequence, results obtained by one expert may not easily be reproduced by a different expert. This limits the use of visual AEP analysis to indicate anaesthetic depth and may affect the comparability of AEP studies, where waveforms were analysed by different experts. An objective automated method for AEP analysis could solve this problem.

Br J Anaesth 2003; 91: 905–8

Keywords: brain, auditory evoked potential; brain, auditory evoked response; monitoring, depth of anaesthesia; monitoring, evoked potential


    Introduction
 Top
 Abstract
 Introduction
 Methods and results
 Comment
 References
 
Auditory evoked potentials (AEP) have been proposed as a measure of anaesthetic depth. Peaks and troughs of the mid-latency component of the AEP (MLAEP) show characteristic changes during anaesthesia. Increasing concentrations of anaesthetics cause increased latencies and decreased amplitudes of the MLAEP. Although these changes have been known for years,1 the method is not often used clinically. This may be because of several reasons. Intraoperatively recorded AEP may be affected by artefacts. This may reduce signal quality and complicate the identification of peaks. Currently, there is no gold standard that allows unequivocal quality rating of AEP. We compared expert quality ratings of perioperative AEP recorded during a multi-centre study of AEP as a measure of depth of anaesthesia.2


    Methods and results
 Top
 Abstract
 Introduction
 Methods and results
 Comment
 References
 
Two hundred AEPs were chosen from the database of a European multi-centre study that contained perioperatively recorded EEG with concomitant AEP trigger information for offline averaging and analysis.2 The study protocol was approved from the ethics committees of the five participating centres in Amsterdam, Hamburg, London, Lübeck, and Munich. The database contained AEP of 93 consenting pre-medicated (midazolam 7.5 mg p.o.) patients of both genders (aged 18–60 yr; ASA I–II) who were given general anaesthesia (propofol induction, followed by succinylcholine (1 mg kg–1), isoflurane, and nitrous oxide 50% in oxygen) for abdominal or orthopaedic surgery. During the study, EEG was recorded with a sampling rate of 1 kHz from Fpz-A2 with a bandpass of 0.5–500 Hz. Binaural stimuli were applied via insert earphones (6.1 Hz, rarefaction clicks 70 dB above hearing threshold) and the exact position of each click was marked in the raw data file, allowing offline averaging of AEP. EEG was filtered with an analogue 400 Hz low-pass and a digital 25 Hz high-pass filter. AEPs were averaged from 1024 single sweeps using normal ensemble averaging and three-point smoothing. One hundred pairs of two consecutive AEPs were chosen randomly from all periods of the study. A semi-automatic analysis program was designed that presented two consecutive AEPs on a computer screen and allowed filtering of signals, if the expert wanted to use a filter. Filtered and unfiltered AEPs were simultaneously presented in an overlap mode. In addition, for easier comparison of the consecutive AEPs, signal pairs could also be presented in an overlap mode. If an AEP signal was influenced by a severe artefact, presentation of the consecutive (or preceding) AEP facilitated detection of this artefact. In a joint meeting, 20 of the 200 AEPs were used to introduce the program to the participating experts, who had several years of experience with intraoperative recording and assessment of evoked potentials. The remaining 180 AEPs were used for analysis. Each observer rated signal quality of the remaining 180 AEPs using a numeric scoring system with three levels (0–2). First, the presence of peak V of the brainstem response was required to verify that the signal was an AEP. The quality of signals without peak V were classified ‘invalid’ (0). When peak V was present, but mid-latency components could not be identified, classification was also 0. This would include some data where the mid-latency components are unidentifiable, because they were absent on account of the patient being deeply anaesthetized. Strictly speaking these are not ‘invalid’ quality data, but they cannot be differentiated from ‘invalid’ quality data, where mid-latency components are absent on account of artefacts. Noisy signals were classified as 0, when signal to noise ratio was too low to allow labelling of peaks. The quality of signals was classified as ‘poor’ (1) when labelling was possible but uncertain with respect to one or more peaks. Finally, signals with an unequivocally good waveform, which allows precise identification of peaks were classified as ‘good’ (2) (Fig. 1). Each observer independently classified the 180 signals. Thus, for each signal five independent quality ratings were obtained. We analysed quality ratings as follows. First, we calculated average quality rating for each AEP by averaging individual quality classifications of the five experts and rounding of the result to the next discrete value. Secondly, differences between quality ratings of the five experts were calculated for each AEP. The inter-rater variability (IRV) was calculated as the difference between the worst and best classification of a signal. Therefore, IRV is a discrete value between 0 (all experts agree on signal quality) and 2 (at least one expert classified the signal as excellent another expert classified the same signal as invalid).

On average, 57% of AEPs were classified as invalid (0). Signal quality of 39% of the AEPs was rated poor (1), and only 4% had good quality (2). In addition, IRV was great. All experts agreed on the signal quality in only 6% of the AEPs. IRV was 1 for 62%, and 2 for 32% of the AEPs.


    Comment
 Top
 Abstract
 Introduction
 Methods and results
 Comment
 References
 
We found a large IRV in the signal quality scores of perioperatively recorded AEPs. In only 6% of all AEP, did all experts agree about signal quality. In 32% of all AEP waveforms, IRV was 2, that is, in almost one-third at least one expert rated the signal as good whereas another expert thought it was invalid and should not be used for analysis. The numerous potential artefact sources present during surgery, for example electrocautery or signals from other devices required for surgery or anaesthesia reduce signal quality and are likely to contribute to this. Many of these artefact sources cannot be eliminated because the offending devices are essential for patient treatment. Taking the average of the signal quality score of the five experts, only 4% of signals were rated ‘good’. In all remaining signals, peaks could either not be identified or their exact position could not be determined. Visual analysis of AEP as an indicator of depth of anaesthesia is based on measurements of amplitudes and latencies of specific components of the MLAEP. Clear identification of these components is only possible in a very limited number of perioperative AEPs. This shows that visual AEP analysis has limitations as a measure of anaesthetic depth.

These findings are similar to previous results obtained for ratings of electrocochleography waves. The electrocochleography wave is measured within 5 ms after a stimulus and reflects stimulus-evoked electrical potentials that occur in the cochlea and auditory nerves. The wave consists of an initial low peak, the summating potential (SP), generated within the cochlea. The second peak, the action potential (AP), is generated within the auditory nerve. The SP/AP ratio is used diagnostically in Ménière’s disease or perilymphatic fistula. Misinterpretation may lead to an inaccurate diagnosis and eventually even surgery without indication. In a study on inter-interpreter variability, ratings of 10 audiologists were compared with the ratings of the authors of the study. Each of the 10 audiologists received 10 out of 100 waves that were classified as ‘no response’ by the authors. Thirty-two of these 100 waves were interpreted as a response by the audiologists.3 In contrast to the described approach, we did not use our own ratings of signal quality as the standard, but treated quality ratings of all experts equally and analysed maximum differences between expert ratings. Using this approach, we also found that 32% of the signals were rated as ‘invalid signal’ by at least one expert and as ‘good signal’ by at least one other. This may in part explain why visual analysis of AEP—although useful in establishing the relationship between the AEP and depth of anaesthesia4 5 has not gained popularity or been established as a standard monitor of depth of anaesthesia. Our results also suggest that results of visual AEP analysis as obtained by one expert are not necessarily confirmed by another expert. An objective automated method for AEP analysis may be a solution for this problem.


    Acknowledgement
 
This study was supported by a grant from the European Union (BMH1-CT93-1506).



View larger version (17K):
[in this window]
[in a new window]
 
Fig 1 Different AEPs as used in the study: (A) shows an AEP, where both peak V and mid-latency components can be identified. (B) The position of mid-latency peaks is not as clear as in (A), but easier to identify than in (C). (D) Peak V can be identified, but high frequency artefacts are superimposed to mid-latency components. (E) A signal where peak V is absent, that is the signal is not an AEP.

 

    References
 Top
 Abstract
 Introduction
 Methods and results
 Comment
 References
 
1 Thornton C, Heneghan CP, James MF, Jones JG. Effects of halothane or enflurane with controlled ventilation on auditory evoked potentials. Br J Anaesth 1984; 56: 315–23[Abstract]

2 Kochs E, Kalkman CJ, Thornton C, et al. Middle latency auditory evoked responses and electroencephalographic derived variables do not predict movement to noxious stimulation during 1 minimum alveolar anesthetic concentration isoflurane/nitrous oxide anesthesia. Anesth Analg 1999; 88: 1412–7[Abstract/Free Full Text]

3 Roland PS, Roth L. Interinterpreter variability in determining the SP/AP ratio in clinical electrocochleography. Laryngoscope 1997; 107: 1357–61[ISI][Medline]

4 Schwender D, Haessler R, Klasing S, Madler C, Poppel E, Peter K. Mid-latency auditory evoked potentials and circulatory response to loud sounds. Br J Anaesth 1994; 72: 307–14[Abstract]

5 Thornton C, Sharpe RM. Evoked responses in anaesthesia. Br J Anaesth 1998; 81: 771–81[Free Full Text]