Assessment of a simple artificial neural network for predicting residual neuromuscular block

J. G. Laffey, É. Tobin, J. F. Boylan and A. J. McShane*

Department of Anaesthesia, Intensive Care and Pain Medicine, St Vincent’s University Hospital, Dublin, Ireland*Corresponding author: Department of Anaesthesia, Intensive Care and Pain Medicine, St Vincent’s University Hospital, Elm Park, Dublin 4, Ireland. E-mail: l.mcnicholas@st-vincents.ie

{dagger}This work was presented in part at the National Scientific Meeting of the Royal College of Physicians in Ireland, Dublin, March 1998, and at the International Anesthesia Research Society, Honolulu, Hawaii, March 2000.

Accepted for publication: August 30, 2002


    Abstract
 Top
 Abstract
 Introduction
 Patients and methods
 Results
 Discussion
 References
 
Background. Postoperative residual curarization (PORC) after surgery is common and its detection has a high error rate. Artificial neural networks are being used increasingly to examine complex data. We hypothesized that a neural network would enhance prediction of PORC.

Methods. In 40 previously reported patients, neuromuscular function, neuromuscular block/antagonist usage and time intervals were recorded throughout anaesthesia until tracheal extubation by an observer uninvolved in patient care. PORC was defined as significant ‘fade’ (train of four <0.7) at extubation. Neuromuscular function was classified as PORC (value=1) or no PORC (value=0). A back-propagation neural network was trained to assign similar values (0, 1) for prediction of PORC, by examining the impact of (i) the degree of spontaneous recovery at reversal, and (ii) the time since pharmacological reversal, using the jackknife method. Successful prediction was defined as attainment of a predicted value within 0.2 of the target value.

Results. Twenty-six patients (65%) had PORC at tracheal extubation. Clinical detection of PORC had a sensitivity of 0 and specificity of 1, with an indeterminate positive predictive value and a negative predictive value of 0.35. Using the artificial neural network, one patient with residual block and one with adequate neuromuscular function were incorrectly classified during the test phase, with no indeterminate predictions, giving an artificial neural network sensitivity of 0.96 ({chi}2=44, P<0.001) and specificity of 0.92 (P=1), with a positive predictive value of 0.96 and a negative predictive value of 0.93 ({chi}2=12, P<0.001).

Conclusions. Neural network-based prediction, using readily available clinical measurements, is significantly better than human judgement in predicting recovery of neuromuscular function.

Br J Anaesth 2003; 90: 48–52

Keywords: measurement techniques, train-of-four; model, artificial neural network; neuromuscular block, atracurium


    Introduction
 Top
 Abstract
 Introduction
 Patients and methods
 Results
 Discussion
 References
 
Residual neuromuscular block after surgery may be a significant problem, even after the use of medium- or short-acting agents.1 It is uncertain whether peripheral nerve stimulator (PNS) use can reduce the incidence of clinically significant postoperative residual curarization (PORC).2 3 Human error in assessing PNS data is very common,4 possibly stemming in part from perceptual limitations.5 Errors of judgement in anaesthetic practice may also result from pressure of work.6 A train-of-four value greater than 0.7 has been suggested as a minimum criterion for safe tracheal extubation, as lower values may be associated with impaired airway protection.7 Even higher train-of-four threshold values have been proposed, although they cannot be reliably assessed by the naked eye.3 Acceleromyographic techniques, while gaining in popularity, may not agree reliably with mechanomyographically derived data, limiting their validity.8 9 Transduced or electromyographic monitoring systems are unlikely to enter clinical practice in the immediate future, suggesting that the assessment of neuromuscular block will remain semiquantitative.

Several variables predict successful reversal of neuromuscular block.10 However, the failure rates observed suggest that these are imperfect and insensitive in practice. Artificial neural networks consist of computer software designed to mimic multiple inputs and non-linear interactions, and are being increasingly used to examine complex data.11 We tested the hypothesis that a neural network-based analysis would enhance the prediction of PORC when compared with human decision-making.


    Patients and methods
 Top
 Abstract
 Introduction
 Patients and methods
 Results
 Discussion
 References
 
We retrospectively examined data on neuromuscular function during and after general anaesthesia, obtained during a previous study.12 After hospital ethical approval and individual informed consent, 40 patients were recruited. Data were collected by an observer (ÉT) uninvolved in clinical care. The transduced twitch height and train-of-four values were collected from immediately after induction of anaesthesia and before atracurium administration up to the time of tracheal extubation. Atracurium dosages, antagonists and time intervals from induction through antagonism to tracheal extubation were recorded. Anaesthetic care was provided by residents in training under the direction of a consultant anaesthetist; care providers were blinded to the data being collected and clinical anaesthetic management was not influenced by research personnel. For the purposes of the study, a train-of-four value of 0.7 or less at the time of the decision to perform tracheal extubation was defined as PORC.

To construct the predictive model, a feed-forward, back-propagation neural network model was developed using a commercial neural network software package (Neuralyst 1.40; Cheshire Engineering, Cheshire, CA, USA), integrated with a spreadsheet programme (Microsoft Excel 98). We used a training algorithm with three layers (an input layer, an output layer and a hidden layer containing four nodes; Fig. 1). The nodes in artificial neural networks are abstract entities which correspond to the weighting and summation of input data and its transformation into output values. A neural network is an algorithm consisting of a series of parallel mathematical equations in which variables from the external world are assigned internal weightings by each equation. Summation of these equations results in output values which are then used as variables for one or more series of new equations (one or more intermediate, ‘hidden’ layers), and the summed, weighted output from these equations is projected to an output layer as a non-linear function of input. The internal weightings for each equation in the network are set at random values initially and are altered during training as the network-derived, ‘predicted’ outputs are compared with actual outputs (‘back-propagation of error’). Training consists of multiple empirical modifications of the weightings to minimize the difference between predicted and actual output values, allowing a form of pattern recognition to occur.



View larger version (13K):
[in this window]
[in a new window]
 
Fig 1 Schematic representation of an artificial neural network with one hidden layer, containing four nodes.

 
In this feed-forward model, the units contained in each layer processed data and fed outputs into the next layer. Two input units and a single output unit were used. The predictive variables examined using the input layer were: (i) the number of twitch responses at the time of antagonism; and (ii) the time elapsed from neostigmine administration until tracheal extubation. The single-node output layer delivered a value of 1 or 0 corresponding to the presence or absence, respectively, of residual neuromuscular block in the training data set. The number of training cycles and the learning rate (viz., the degree of weighting adjustment between cycles) was preset by the investigators. Successful prediction during the training and test phases was defined as a performance within a tolerance of 0.2, where tolerance was defined as a difference between observed and predicted output (0, 1) of 0.2 or less. Thus, successful prediction of residual block corresponded to a predicted output value of 0.8–1.0 for an observed output value of 1, while successful prediction of adequate reversal corresponded to a predicted output value of 0.0–0.2 for an observed output value of 0. Misclassification was defined as a predicted output value of 0.8–1.0 for an observed output value of 0 (i.e. an erroneous prediction of inadequate neuromuscular function), or predicted output value of 0.0–0.2 for an observed output value of 1 (i.e. an erroneous prediction of adequate neuromuscular function). Other predicted values (0.21–0.79) were considered indeterminate. During preliminary data analysis, a broad range of learning rates were chosen, in all of which the model reached optimal predictive performance within 5000 cycles, each cycle representing a scan through the data with a training update. We chose our two input variables on the basis of previous work10 indicating that the most important determinants of successful recovery of neuromuscular function are the degree of spontaneous neuromuscular recovery before antagonism, and the time elapsed since administration of pharmacological antagonism. To ensure validity of analysis (specifically, to prevent overlearning, in which data points are memorized rather than patterns being detected), the network was trained a total of 40 times using these variables, using the ‘leave-out-k’ or jackknife method, which trained the neural network on all data except for a group (in this case, one patient) that was subsequently used to test the network’s conclusions. A pass through the data set using every individual in turn as a single test case allowed comparison of the predictive powers of the training and test phases. At the start of each new training period, previous weightings were erased, so that test cases were evaluated using fresh training data.

For diagnostic and predictive purposes, ‘positive’ was defined as the presence of residual neuromuscular block, and ‘negative’ was defined as the absence of residual neuromuscular block. Neural network sensitivity and specificity data were derived by sequentially using each of the 40 patients as the test case. Human and neural network performances were compared using {chi}2 analysis or Fisher’s exact probability test as appropriate.


    Results
 Top
 Abstract
 Introduction
 Patients and methods
 Results
 Discussion
 References
 
The physical characteristics of the patients were presented previously.12 Data for clinical detection of PORC and neural network-based prediction of PORC are presented in Table 1. Of the 40 patients, 26 had a train-of-four ratio less than or equal to 0.7 at the time of tracheal extubation. Because all care providers thought that neuromuscular block was adequately antagonized at extubation, this indicates a sensitivity of zero (i.e. PORC not diagnosed in 26) and specificity of 1 (i.e. adequate antagonism diagnosed in 14/14), with a positive predictive value that was indeterminate (0/0) and a negative predictive value of 0.35 (14/40). One patient with residual block was consistently classified by the artificial neural network as having adequate neuromuscular function during the training phase. During the test phase, the same patient with a train-of-four less than 0.7 and a further patient with adequate neuromuscular function were misclassified. Overall success in diagnosis/prediction was thus 14/40 (clinical) vs 38/40 (artificial neural network) ({chi}2=29, P<0.001).


View this table:
[in this window]
[in a new window]
 
Table 1 Comparison of diagnostic performances of clinical diagnosis and neural network-based (ANN) prediction
 
The performances of the training and test phases of the artificial neural network were essentially identical, the network correctly classifying 38 of 39 during training and 38 of 40 during testing ({chi}2=0.3, P=0.57). The test phase of the neural network performed with a sensitivity of 0.96 (25/26; {chi}2=44, P<0.001, relative to clinical performance) and a specificity of 0.93 (13/14; P=0.54, Fisher’s test), with a positive predictive value of 0.96 (25/26) and negative predictive value of 0.93 (13/14; {chi}2=12, P<0.001, relative to clinical performance). As the positive predictive value of human detection ability was mathematically indeterminate (0/0), no statistical comparison was possible between the two methods. The negative predictive value of the artificial neural network (i.e. its ability to predict adequate neuromuscular function) was markedly superior to the detection ability of the individuals providing anaesthetic care.


    Discussion
 Top
 Abstract
 Introduction
 Patients and methods
 Results
 Discussion
 References
 
In this small test group, with a high incidence of residual neuromuscular block, a simple neural network was able to predict the likelihood of normal or abnormal neuromuscular function at the time of the decision to perform tracheal extubation with greater accuracy than that of clinical assessment. This suggests that, even with limited, imprecise data, artificial neural network-based prediction of simple drug pharmacodynamic relationships can equal or outperform human assessment in this setting.

Increasing processor speeds and computer power have moved neural networks from an esoteric topic to one where they offer practical uses in a wide range of fields. At their most basic they can be used as cheap, small programs (essentially macro add-ins to commercial spreadsheets). Successful medical uses of neural networks include their use in pharmacokinetic/pharmacodynamic prediction (antibiotic peak and trough levels)13 and enhancement of diagnostic skills. In the clinical arena, the emergency room diagnosis of myocardial infarction14 and radiologists’ diagnosis of pulmonary embolism15 have been studied with favourable results. A neural network for detecting oesophageal intubation has been described.16 More recently, neural network-based prognostication in critical care has been shown to be superior to a conventional statistical approach.17 Enhanced performance by neural networks relative to humans is related to (i) more appropriate weighting given by the network to common diagnostic factors, and (ii) empirical use by networks of diagnostic rules of thumb not previously utilized. In this study, the diagnostic variables used by the software were deliberately limited to two of the commonest ones used in clinical practice, instead of allowing all possible information to be used. Although one patient with residual block was consistently misclassified by the network, less common predictors of incomplete reversal are beyond the scope of this study. Our findings suggest that their assessment would require much larger patient numbers. For example, assuming a successful prediction rate of 90% by the network using only the two variables employed in this study, in order to have an 80% chance of detecting a further improvement in performance to, say, 95%, we would have needed about 240 patients.

Neural networks have several limitations. A major theoretical concern is the ‘black box’ nature of their output, i.e. conclusions are generated without explanations. This has led to claims that they lack rigour, and a reluctance to accept their results. This criticism implies that conclusions should follow from hypotheses supported by data and rejects the role of pattern recognition in decision making. However, the success of neural networks in medical decision-making, including a performance equal to that of radiologists in diagnosis of pulmonary embolism,15 and superior to that of emergency physicians in the diagnosis of myocardial infarction,14 suggests that they are competent at the very least. Black box concerns may be overcome by sensitivity analyses, where the effects of variables are assessed by their inclusion or exclusion. Our a priori choice of a small number of predictor variables in this study also addresses this criticism.

Neural networks may be difficult to validate in practice. If settings, especially learning rate, are not chosen carefully during the training phase, overlearning may occur. In overlearning, the network memorizes the data points and gives correct answers without recognizing patterns at all. As in this case, the problem is addressed by using low values for learning rate at the expense of decreased speed, and/or using separate training and test sets. The jackknife technique used during this study successfully prevented overlearning, as demonstrated by similar performances for the training and test phases, without reducing success rates. The sensitivity and specificity of any diagnostic or predictive technique will vary as a function of the cut-off (tolerance) points used to determine agreement. The success rate seen with tolerance values of 0.2 indicates that, in 95% of cases, the network correctly predicted adequate or inadequate return of neuromuscular function.

The high success rate with this model may be partly related to an unusually high incidence of undetected residual block in the training data set. The presumed failure to make a clinical diagnosis of PORC in any patient limited our ability to make a comparison of indices such as sensitivity and positive predictive value; however, the negative predictive values suggest that, compared with clinical care providers, the artificial neural network was much more effective in predicting recovery of adequate neuromuscular block. Even board-certified anaesthetists may fail to diagnose residual relaxant effects, as evidenced by a 25% requirement for unscheduled antagonism in adults receiving mivacurium without pharmacological antagonism in a clinical trial setting.18 It is possible that trainees have even less capacity to assess the speed and degree of pharmacological antagonism and that this could be improved significantly. The pilot nature of these data and the fact that they provide a single point estimate of the prevalence of PORC in a small number of patients make it premature to suggest that neural networks be used for real-time decision-making in the antagonism of neuromuscular block.

Neural networks are limited by the quality of their data and may need to be retrained periodically if performance changes over time, but this may be less applicable in the assessment of drug pharmacokinetics and pharmacodynamics. In predicting peak and trough antibiotic levels, network-based prediction has already been shown to be as good as or better than standard population-based pharmacokinetic modelling.13 For optimal assessment of network-based prediction, further validation would be required in the form of a prospective comparison with clinical practice and conventional statistical analysis. Also, higher train-of-four values (of the order of 0.9) are required to ensure full airway competence19 and minimize subjective discomfort.2 Because such data are not clinically detectable with routine monitoring, neural network techniques may play a role in enhancing prediction of greater degrees of neuromuscular recovery. This area merits further study.

We deliberately chose simple variables as predictors of residual block. Other data collected, such as absolute values for twitch height depression, were excluded from analysis, because they are not routinely measured in practice. In the present study, the predictive ability of the neural network was so strong using two simple indices that incorporating extra variables could not have enhanced performance with this sample size. This suggests that most of the variation seen in neuromuscular antagonism is attributable to these two factors. The difference in negative predictive values obtained using the two methods suggests that routinely collected variables are adequate in most patients for assessment of neuromuscular block when emphasis is placed on specificity (i.e. priority is given to exclusion rather than detection of PORC). Although our data are limited to the assessment of a single clinical decision (i.e. whether to extubate) in a small series of patients, our findings suggest that human factors are a major component in the erroneous diagnosis of adequate neuromuscular antagonism.

In conclusion, using simple, routinely measured variables, a neural network-based prediction is useful in estimating the likelihood of clinically significant residual neuromuscular block at the time of tracheal extubation, with a significant performance improvement over that of the anaesthetic trainees from whose practice the data were derived. Other predictive problems in anaesthesia where low sensitivity or non-linear interactions are a limiting factor may be amenable to neural network-based analysis.


    Acknowledgement
 
É. Tobin and A. J. McShane were supported in part by the Health Research Board (Ireland).


    References
 Top
 Abstract
 Introduction
 Patients and methods
 Results
 Discussion
 References
 
1 Bevan DR, Smith CE, Donati F. Postoperative neuromuscular blockade: a comparison between atracurium, vecuronium and pancuronium. Anesthesiology 1988; 69: 272–6[ISI][Medline]

2 Kopman AF, Yee PS, Neuman GG. Relationship of the train-of-four fade ratio to clinical signs and symptoms of residual paralysis in awake volunteers. Anesthesiology 1997; 86: 765–71[ISI][Medline]

3 Shorten GD, Merk H, Sieber T. Perioperative train-of-four monitoring and residual curarization. Can J Anaesth 1995; 42: 711–15[Abstract]

4 Viby-Mogensen J, Jensen NH, Olsen NV, et al. Tactile and visual evaluation of the response to train-of-four nerve stimulation. Anesthesiology 1985; 63: 440–3[ISI][Medline]

5 Brull SJ, Silverman DG. Real time versus slow-motion train-of-four monitoring: a theory to explain the inaccuracy of visual assessment. Anesth Analg 1995; 80: 548–51[Abstract]

6 Gaba DM, Howard SK, Jump B. Production pressure in the work environment: California anesthesiologists’ attitudes and experiences. Anesthesiology 1994; 81: 488–500[ISI][Medline]

7 Brand JB, Cullen DJ, Wilson NE, Ali HH. Spontaneous recovery from nondepolarizing neuromuscular blockade: correlation between clinical and evoked responses. Anesth Analg 1977; 56: 55–9[Abstract]

8 Harper NN, Martlew R, Strang T, Wallace M. Monitoring neuromuscular block by acceleromyography: comparison of the Mini-Accelograph with the Myograph 2000. Br J Anaesth 1994; 72: 411–14[Abstract]

9 Loan PB, Paxton LD, Mirakhur RK, Connolly FM, McCoy EP. The TOF-Guard neuromuscular monitor: a comparison with the Myograph 2000. Anaesthesia 1995; 50: 699–702[ISI][Medline]

10 Bevan DR, Donati F, Kopman AF. Reversal of neuromuscular blockade. Anesthesiology 1992; 77: 785–805[ISI][Medline]

11 Cross SS, Harrison RF, Kennedy RL. Introduction to neural networks. Lancet 1995; 346: 1075–9[ISI][Medline]

12 McCaul C, Tobin É, Boylan JF, McShane AJ. Atracurium is still associated with postoperative residual curarization. Br J Anaesth 2002; 89: 766–9[Abstract/Free Full Text]

13 Brier ME, Zurada JM, Aronoff GR. Neural network predicted peak and trough gentamicin concentrations. Pharm Res 1995; 12: 406–12[CrossRef][ISI][Medline]

14 Baxt WG. Use of an artificial neural network for the diagnosis of myocardial infarction. Ann Intern Med 1991; 115: 843–8[ISI][Medline]

15 Patil S, Henry JW, Rubenfire M, Stein PD. Neural network in the clinical diagnosis of acute pulmonary embolism. Chest 1993; 104: 1685–9[Abstract]

16 León MA, Räsänen J, Mangar D. Neural network-based detection of esophageal intubation. Anesth Analg 1994; 78: 548–53[Abstract]

17 Dybowski R, Weller P, Chang R, Gant V. Prediction of outcome in critically ill patients using artificial neural network synthesised by genetic algorithm. Lancet 1996; 347: 1146–50[ISI][Medline]

18 Bevan DR, Kahwaji R, Ansermino JM, et al. Residual block after mivacurium with or without edrophonium reversal in adults and children. Anesthesiology 1996; 84: 362–71[CrossRef][ISI][Medline]

19 Eriksson LI, Sundman E, Olsson R, et al. Functional assessment of the pharynx at rest and during swallowing in partially paralyzed humans. Simultaneous videomanometry and mechanomyography of awake human volunteers. Anesthesiology 1997; 87: 1035–43[CrossRef][ISI][Medline]





This Article
Abstract
Full Text (PDF)
E-Letters: Submit a response to the article
Alert me when this article is cited
Alert me when E-letters are posted
Alert me if a correction is posted
Services
Email this article to a friend
Similar articles in this journal
Similar articles in ISI Web of Science
Similar articles in PubMed
Alert me to new issues of the journal
Add to My Personal Archive
Download to citation manager
Search for citing articles in:
ISI Web of Science (1)
Disclaimer
Request Permissions
Google Scholar
Articles by Laffey, J. G.
Articles by McShane, A. J.
PubMed
PubMed Citation
Articles by Laffey, J. G.
Articles by McShane, A. J.