Center for Molecular and Behavioral Neuroscience, Rutgers, The State University of New Jersey, Newark, New Jersey 07102
![]() |
ABSTRACT |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Harris, Kenneth D., Darrell A. Henze, Jozsef Csicsvari, Hajime Hirase, and György Buzsáki. Accuracy of Tetrode Spike Separation as Determined by Simultaneous Intracellular and Extracellular Measurements. J. Neurophysiol. 84: 401-414, 2000. Simultaneous recording from large numbers of neurons is a prerequisite for understanding their cooperative behavior. Various recording techniques and spike separation methods are being used toward this goal. However, the error rates involved in spike separation have not yet been quantified. We studied the separation reliability of "tetrode" (4-wire electrode)-recorded spikes by monitoring simultaneously from the same cell intracellularly with a glass pipette and extracellularly with a tetrode. With manual spike sorting, we found a trade-off between Type I and Type II errors, with errors typically ranging from 0 to 30% depending on the amplitude and firing pattern of the cell, the similarity of the waveshapes of neighboring neurons, and the experience of the operator. Performance using only a single wire was markedly lower, indicating the advantages of multiple-site monitoring techniques over single-wire recordings. For tetrode recordings, error rates were increased by burst activity and during periods of cellular synchrony. The lowest possible separation error rates were estimated by a search for the best ellipsoidal cluster shape. Human operator performance was significantly below the estimated optimum. Investigation of error distributions indicated that suboptimal performance was caused by inability of the operators to mark cluster boundaries accurately in a high-dimensional feature space. We therefore hypothesized that automatic spike-sorting algorithms have the potential to significantly lower error rates. Implementation of a semi-automatic classification system confirms this suggestion, reducing errors close to the estimated optimum, in the range 0-8%.
![]() |
INTRODUCTION |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Most knowledge about the
physiological function of the brain is based on sequential analysis of
single-site recordings. Although it has long been recognized that the
computational power of complex neuronal networks cannot fully be
recognized by studying the properties of single cells or the activity
of a few selected sites, experimental access to the emergent properties
of cooperating neurons has largely been impossible until quite
recently. Direct investigation of the temporal dynamics of neuronal
populations can only be based on simultaneous observation of large
neuronal aggregates (Buzsaki et al. 1992; Wilson
and McNaughton 1993
). The two critical requirements for
achieving this goal are placing a large number of recording electrodes
in a small amount of tissue without significant tissue damage and
efficient isolation of action potentials emanating from individual
neurons. Action potentials generated by neurons ("unit" or
"spike" activity) can be monitored by extracellular glass pipettes,
single etched (sharp) electrodes, or multiple-site probes. The closer
the electrode to the soma of a neuron, the larger the size of the
extracellularly recorded spikes. Whereas glass pipettes and
high-impedance sharp metal electrodes can be used to monitor the
activity of a single cell (Evarts 1968
;
Georgopoulos et al. 1993
), these electrodes are not
always practical in freely moving animals because small movements of
the electrode can damage the neuron and because isolation of large
numbers of neurons with independently moving drives is difficult
(Kruger and Aiple 1988
; Llinas and Sasaki
1989
; McNaughton et al. 1996
). Larger size wires can record the activity of multiple neurons and provide better mechanical stability because the electrode tip is not placed directly against the cell membrane. The use of multiple recording channels from
the same neuron(s) provides improved methods for single-unit sorting,
based on the temporal coherence of spikes across channels (Drake
et al. 1988
; Gray et al. 1995
; McNaughton
et al. 1983
; Nadasdy et al. 1998
; Recce
and O'Keefe 1989
; Wilson and McNaughton 1993
).
The most widely used multiple-site probe, the wire "tetrode" (Recce and O'Keefe 1989
; Wilson and McNaughton
1993
), separates neurons based on their spatial location,
assuming that neurons are point sources of action potential-associated currents.
There are three stages between the recording of extracellular unit
activity and the identification of spikes representing the activity of
a single neuron. The first stage is spike detection, in which the
electrical activity measured on the electrodes is used to derive the
times corresponding to extracellular spikes. This is usually achieved
by high-pass filtering followed by thresholding and may be done by
hardware or software. The second stage is feature extraction. During
this stage, a feature vector (i.e., an array of quantitative
parameters) is calculated for every spike. In the simplest cases, the
feature vector represents the amplitude of the spikes recorded by the
four tetrode sites. More advanced methods quantify additional
information about the spikes, such as waveshape and discharge pattern.
Waveshapes may be quantified by measuring parameters such as spike
width, or by principal component analysis (Abeles and Goldstein
1977). The third stage is "clustering" of spikes. In this
stage spikes with similar feature vectors are grouped into clusters,
assumed to represent the spikes of a single neuron. In most
laboratories, the clustering stage is done manually, with a graphical
user interface that displays scatter plots in feature space and allows
the operator to separate the clusters by drawing polygons or straight
lines around them (Gray et al. 1995
; Rebrick et
al. 1997
; Skaggs and McNaughton 1996
;
Wilson and McNaughton 1993
; Wood et al.
1999
). This process is time consuming, and may be affected by
subjective factors. It has been proposed that automatic methods may
significantly speed up the clustering process, and reduce the effect of
subjective factors (Fee et al. 1996
; Lewicki
1998
; Sahani 1999
; Wright et al.
1998
).
A major problem with all currently used spike separation methods is
that their reliability cannot be verified with independent methods, and
thus errors inherent in subjective clustering by the human operator
cannot be measured quantitatively. Errors can occur because spikes
belonging to different neurons are grouped together (false positive,
Type I, or commission error) or because not all spikes emitted by a
single neuron are grouped together (false negative, Type II, or
omission error). In this paper, we quantified the error rates of spike
separation methods. This was achieved by simultaneous recording from
the same neuron intracellularly with a glass pipette and
extracellularly with a tetrode (Henze et al. 2000;
Wehr et al. 1998
). This approach allowed us to reveal the major causes of error in the spike separation process.
![]() |
METHODS |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Simultaneous intracellular-extracellular recording in anesthetized rats
Description of the surgical and implantation methods are
described in the accompanying paper (Henze et al. 2000).
Spike detection
The continuously recorded wideband signals were digitally
high-pass filtered (Hamming window-based finite impulse response filter, cutoff 800 Hz, filter order 50). Spike detection was achieved by a previously described system (Csicsvari et al.
1998). Briefly, the root-mean-square power of the filtered
signal was computed using a sliding window. The mean and standard
deviation of the power were computed, and spikes were extracted when
the power exceeded a threshold derived from the standard deviation from the baseline mean. The spikes were then upsampled using the sampling theorem, and the waveforms were realigned by peak position.
Feature vector extraction
Feature vectors were extracted from the spike waveforms in two
ways. In the first method, the feature vector consisted of the
peak-to-peak (i.e., maximum-to-minimum) amplitudes of the resampled
filtered waveforms. In the second method, we employed a principal
component analysis (PCA) to create feature vectors (Abeles and
Goldstein 1977; Csicsvari et al. 1998
). For each
channel, the resampled waveforms of all spikes were pooled and the
first three principal components of the waveform set were found. A
12-dimensional feature vector was then created for each spike by
projecting the waveform on each channel onto each of the three
principal components.
Manual cluster cutting
The feature vectors were clustered using a previously developed custom graphical interface program (gclust). This program allows the operator to select two feature vector components to produce a two-dimensional scatter plot for the chosen components. The operator then draws polygons in the two-dimensional space to assign spikes to individual clusters. By iteratively viewing different projections, the operator can refine the boundaries of clusters. The program also displays superimposed waveforms for each cluster, autocorrelograms for each cluster, and cross-correlograms for all pairs of clusters.
Correspondence between intracellular and extracellular traces
Intracellular action potentials were detected by the method
described in the accompanying paper. As a function of the site of the
impalement of the neuron (somatic vs. dendritic), the delay between the
peak of the intracellular action potential and the negative peak of the
wideband recorded extracellular spike varied as much as 1 ms
(Henze et al. 2000). However, the delay was constant for
a given set of intracellular and extracellular recordings. Some jitter
of the extracellular peak was caused by the background activity.
Therefore extracellular spikes corresponding to intracellular action
potentials were identified if they occurred within 0.25 ms on either
side of the peak of the extracellular spike, as predicted by the
intracellular spike. The false positive error rate was determined as
the percentage of spikes clustered by an operator that were not
correlated by the simultaneous presence of an intracellular action
potential. The false negative error rate was determined as the
percentage of the intracellular spikes for which the operator's cluster did not contain a corresponding extracellular spike.
Best ellipsoid error rates
To estimate the optimal clustering performance for a given set
of feature vectors, we defined a measure called the "best ellipsoid error rate" (BEER). This measure was designed to estimate the optimal
clustering performance for a given set of feature vectors by searching
over all possible ellipsoidal cluster boundaries. In the present
context, "optimal" may be defined in several different ways since
it is possible to give different weights to Type I and Type II errors.
We therefore introduced a "conservatism" parameter and defined the
optimal ellipsoid to be the one that minimized a cost function equal to
the weighted averaged Type I and Type II errors, with weights specified
by the conservatism parameter. By systematically varying this
parameter, the BEER method produced a curve, illustrating the trade-off
between Type I and Type II errors. The minimization of the cost
function over the space of ellipsoidal cluster boundaries was performed
by a two-layer neural network. The first or "input" layer contained
one node for every element of the feature vector plus one node for each
pairwise product of feature vector elements (so for an
n-dimensional feature vector, this layer would have
n + n(n +1)/2 nodes). The second or
"output" layer had a single logistic node which was trained to be 1 if the spike belonged to the identified cell and 0 otherwise. The
network was trained by the "quickprop" algorithm (Fahlman 1988), modified to allow for differential weighting of Type I and Type II errors. After convergence, the weights of the network specified an optimal quadratic criterion for spike identification, i.e., an ellipsoidal cluster boundary. To reduce the training time of
the network and to ensure that it did not converge to an incorrect
local maximum, the initial values of the weights were specified by the
ellipsoid whose center and axes were derived from the mean vector and
covariance matrix of the feature vectors corresponding to
intracellularly identified spikes.
Error rates were evaluated with a cross-validation method. The data were split into two halves consisting of even and odd spike numbers. The network was trained twice, once on each half of the data, and tested on the other half. This method produced two sets of error percentages for each value of the conservatism parameter, i.e., two error curves. The network was deemed to have converged, and training was terminated when the percentage errors on the test set had not changed more than 0.1% within the last 15 training epochs. The final percentage error value was then taken to be the mean error value over these last 15 epochs. The computations were performed on an IBM SP2 station (IBM, Armonk, NY) with a custom written C++ program.
The BEER measure provides an estimate of the optimal performance of any
clustering system, automatic or manual. It determines the location of
the optimal cluster based on a knowledge of which spikes correspond to
the identified cell (i.e., it performs supervised learning). This is
only possible because of the simultaneous intracellular recording.
Manual or automatic spike sorting systems will not have this knowledge
(i.e., they must perform unsupervised learning). The BEER method
therefore estimates an upper bound on the performance achievable by any
unsupervised system. This upper bound will be approached by an
automatic system based on an accurate probabilistic model in the limit
of a large number of data points (Anderson 1984). It
should be emphasized that the ellipsoidal cluster boundary used by the
BEER measure is not necessarily optimal. Even for perfectly ellipsoidal
clusters, the optimal cluster boundary will be given not by a single
quadratic but by an intersection of quadratics. A search over all such
boundaries is possible in theory but would require excessive computer
time. Therefore the BEER measure may underestimate optimal possible performance.
Prediction of cluster covariance matrix from background activity
To compare the cluster shape to that predicted by a fixed spike shape superimposed with background noise, we extracted periods of background activity containing no extracellular spikes. These were determined by removing 32 samples around every extracellularly detected spike and retaining any remaining intervals of 100 samples or longer. The auto- and cross-covariance functions of the background activity were determined from these periods. These functions were used to create Toeplitz matrices for the time-domain covariance matrix between each pair of channels. The covariance between two principal components for the selected channels was then obtained by pre- and post-multiplying by the time-domain projection template used to generate the principal components. Analysis was performed in MATLAB (The MathWorks, Natick, MA).
Detection of sharp waves
Sharp waves were detected based on the occurrence of their associated high-frequency ripples. Broadband extracellular signals were filtered between 100 and 250 Hz using a digital finite impulse response filter, rectified, and smoothed using a median smoothing algorithm (LabView, National Instruments, Austin, TX). The occurrence of a ripple was determined based on a deflection of the processed extracellular signal >7 SDs from the baseline. The start and end of the ripple were then determined from the closest zero values preceding and following the threshold crossing.
Automatic cluster cutting
To gauge the practicability of automatic cluster cutting, we
tested the program AutoClass (Cheeseman and Stutz 1996,
http://ic-www.arc.nasa.goc/ic/projects/bayes-group/autoclass). The
program was used with the multivariate normal mode, with two parameters
changed from their default values (max_n_tries = 50, rel_delta_range = 0.01), to reduce running time. As AutoClass tended to overcluster the data, the output of the program was further
examined by a human operator, using the same program used for manual
cluster cutting. Nearby clusters with visible refractory period in the
cross-correlogram were judged to correspond to a single cell, and
merged. In one case, a cluster produced by the program showed a bimodal
shape, and was judged to correspond to two cells. This was confirmed by
the cross-correlogram of the two subclusters, which showed no
refractoriness or burst shoulders.
![]() |
RESULTS |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Errors of human operators
We recorded from 33 neurons in 30 rats, both intracellularly and
extracellularly (Henze et al. 2000). Of these, we
selected six data sets from three neurons, which contained sufficient
action potentials for statistically significant error analysis, and
which represented various levels of difficulties in clustering related to amplitude and "burstiness."
Nine human operators were asked to cluster the feature vector sets for these six sessions. The error rates of the operators are shown in Table 1 and Fig. 1. There was a spread of error rates with some operators being more "conservative" (making more false negative errors but less false positive errors) and others being more "liberal" (making less false negative errors but more false positive errors). The same operators fell in similar portions of the scale on several data sets, indicating some individual bias. In general, neurons with lower amplitude spikes, burst patterns, or several simultaneously active neighbors were more difficult to cluster. The potential causes of the human errors are considered in detail in the following text.
|
|
CAUSES OF ERROR.
Similarity of spikes of different cells.
Figure 2, A and B,
insets, illustrates the mean extracellular waveforms of an
identified cell (cell 2 in Table 1) together with the mean
waveforms of spikes emitted by another cell recorded by the same
tetrode. The latter neuron was not intracellularly recorded, and spikes
of this cell were determined by our standard clustering method. Based
on our cell classification criteria (Csicsvari et al.
1999), this second cell was also a pyramidal cell. The amplitude profile of the confounding cell across the four recording sites was very similar to the identified cell. This similarity is also
visible on the cluster diagram of the first principal component (Fig.
2A). There is considerable overlap between the spikes of the
identified neuron (red) and confounding unit (green). However, the wave
shape of the confounding cell was different from the identified neuron.
The most prominent difference was a larger "initial positivity" and
a smaller late positive wave after the large negative deflection. These
differences in waveform were revealed by the second principal component
(Fig. 2B).
|
|
|
|
Postclustering tools: autocorrelograms and correlograms
It has been proposed that the neuron's spike refractory period,
as reflected by the absence of spikes at short intervals in the
autocorrelogram, can be used as a reliable indicator of whether the
recording was made from a single cell or from multiple neurons. Conversely, cross-correlation of two clusters with a common refractory period is taken as an indication that the clusters actually represent the same cell (Fee et al. 1996; Johnston and Wu
1995
).
Auto- and cross-correlograms provide valuable help during the
clustering process. However, they should be interpreted with caution,
due to the possibility of factors causing two cells not to co-fire
(such as spatially separated place fields) or the presence of effective
monosynaptic connections (Csicsvari et al. 1998). Comparison of autocorrelograms of clustered spikes and intracellular action potentials further helped to clarify the situations where correlograms may be safely used. Figure
6, A and B,
illustrates the autocorrelogram of a poorly isolated cluster (Type I
error: 43.4%). The presence of spikes in the refractory period (<2
ms) indicated the contamination of other units. However, the height of
the autocorrelogram in the central (refractory) portion was quite small
relative the large "shoulders" surrounding it. Does this mean that
Type I (false positive) error was negligible? To get an insight why
even in the presence of "confounding" spikes the autocorrelograms
look "normal," we must consider the mathematical properties of
correlograms in more detail.
|
To simplify the discussion, we scaled the y axis of the
cross-correlogram to represent the "cross-intensity function"
(Moore et al. 1970). This is achieved by dividing the
number of counts in each bin by the product of the bin length and the
total number of spikes of cell 1. The y axis can
then be read in hertz, and the value at time t interpreted
as the probability of cell 2 firing per unit time, given
that cell 1 fired a spike a time t ago
(Moore et al. 1970
).
In the case of two completely unrelated spike trains, the cross-correlogram is expected to be flat, with value in hertz equal to the firing rate of cell 2. Even if the cells show an interaction (e.g., if cell 1 drives cell 2), we expect the asymptotic value of the cross-correlogram at long times to be equal to the firing rate of cell 2. For an autocorrelogram of a single isolated cell, we expect that the asymptotic value will be equal to the firing rate of the cell. For an autocorrelogram of a poorly isolated cell, the central bins in the autocorrelogram (i.e., the refractory period) reflect the frequency of contaminating spikes.
The heights of the autocorrelogram "shoulders," if they exist, are set by a different time scale. The shoulder height reflects the probability per unit time that a spike will occur given that another spike occurred one intra-burst interspike interval ago. The shoulder height is therefore of the order of the probability that a given spike participates in a burst, divided by the variability of the intra-burst interspike intervals. Because this variability is <1 ms, the height of the autocorrelogram shoulders will be orders of magnitude larger than the central portion of the autocorrelogram, even for a poorly isolated cell.
The central portion of the autocorrelogram should therefore not be compared with the shoulders but with the asymptotic height at long times. For illustration purposes, we considered simulated spike trains. Figure 6C shows an autocorrelogram created from spikes of a mixture of two simulated bursting cells with equal firing rates (2.5 Hz) and 3-ms refractory periods. The refractory period (central part) of the correlogram looks reasonable when compared with the height of the shoulders. However, when the number of spikes in the refractory period is compared with the asymptotic portion of the autocorrelogram (Fig. 6D), it becomes clear that the spike train did not derive from a single neuron. In the case of slow discharging cells or short recordings, there may not be enough spikes in the central portion of the autocorrelogram to allow accurate comparison to the asymptotic region. In these cases, the autocorrelogram cannot be used to determine the successful elimination of Type I errors.
The cross-correlogram can also be useful for evaluating the success of
cell isolation (Fee et al. 1996). In particular,
shoulders of the cross-correlogram for two clusters may indicate that
both clusters contain spikes produced by the same bursting neuron (Fig. 6, E and F). Asymmetric peaks, in particular,
indicate that the decreasing amplitude spikes of a burst have been put
into separate clusters. Asymmetric cross-correlogram peaks may also be
caused by direct coupling of two cells, but those observed in rat
hippocampus may be distinguished from misclassified bursting cells on
the basis of a characteristic narrow single cross-correlogram peak (Csicsvari et al. 1998
). Conversely, the absence of
shoulders in a cross-correlogram may not be taken as a guarantee that
the clusters do not both contain spikes from a common nonbursting cell.
Theoretically optimal performance
Errors are caused by overlapping clouds in cluster space. Is it possible to improve error rates by determining more appropriate boundaries between clusters? We addressed this question by using an estimate of maximum possible performance, termed the BEER (see METHODS). This measure gives an estimate of the minimum error rates for an optimally positioned ellipsoidal cluster, given the positions of the correct and incorrect spikes. Furthermore the BEER measure allows for a differential weighing of Type I and Type II errors, parameterized by a "conservatism" parameter. Therefore the BEER measure specifies a curve showing the tradeoff between Type I and Type II errors as this parameter varies from 0 to 1.
Figure 1 shows the estimated minimum possible errors, superimposed on the performance of the human operators. Curves were generated using the peak-to-peak spike height only, the first principal component only, the first and second principal components, or the first three principal components. For each calculation, two BEER curves were generated by a cross-validation method, in which the ellipsoid position was determined from half of the data (even or odd numbered spikes) and evaluated on the other half. Examination of the two sets of curves provides an estimate of the minimum achievable error, whereas the difference between them is an indication of the accuracy of this estimate.
As was the case with the human operators, the BEER curves indicate a trade-off between false-positive and -negative errors. For all data sets, there was a strong distinction between the errors generated by the use of the peak-to-peak measures or the first principal components only versus the use of two- and three-principal components. This was particularly striking for the data set in which the intracellularly identified unit had a confounding unit with a similar amplitude distribution (Fig. 1, cell 2, and Fig. 2). There was also a slight further improvement by using three over two principal components but this difference was much smaller than the difference between the use of one versus two principal components. In general, the performance of human operators was similar to the BEER performance when only the peak-to-peak amplitudes or one-principal component was used. It must be noted here that the human operators had all the principal components to work with and could get additional information from examination of spike waveforms and auto/cross-correlograms. Importantly, all human operators did worse than the theoretical best achieved using all three principal components. The difference between human operators and BEER measures was least when the extracellular amplitude of the impaled cell was relatively large and did not burst.
Analysis of human operator errors
To assess how performance could be improved, we analyzed the distribution of operator errors on the most difficult neuron, (cell 1 of Fig. 1). Figure 7 displays the correctly identified spikes along with the false positive and negative errors made by an operator in a single two-dimensional projection. When the first principal component projections were displayed in the cluster space, errors occurred in a shell surrounding the cluster of correctly identified spikes. Furthermore more errors occurred at the border with small-amplitude spikes (Fig. 7A). Examining Fig. 7A, one could form the impression that the clustering performance cannot be substantially improved: decreasing Type I errors can be achieved only by increasing Type II errors. This impression was supported by calculating the Mahalanobis distance in the two-dimensional space for the correctly identified spikes, the false negative errors and false positive errors (Fig. 7B). The correctly identified spikes occurred closest to the center of the cluster, the false positive errors closer to the periphery, and the false negative errors were the furthest out of all (Fig. 7B). However, a different picture arose when the third-principal components were displayed. These projections are rarely used by human operators because the spikes of different cells do not form well-separated clouds. However, displaying the third-principal components revealed that the false positive errors occurred actually further out than the false negative errors (Fig. 7C). Similarly when the Mahalanobis distances were calculated in the full 12-dimensional space, the false positive errors occurred farther from the center of the cluster than the false negative errors (Fig. 7D). This conclusion was borne out in 46 of the 48 clustering sessions across all operators (Fig. 7E).
|
The preceding analysis indicated that the clusters created by the operators were not optimal. A plausible explanation is that human operators are able to use only two dimensions at a time. In any two-dimensional plane, the expected amount of cluster overlap is always more than in the full high-dimensional space. In contrast to the human operator, the BEER estimation can use all dimensions at once. This may explain the significantly lower error rates of the BEER method.
Automatic cluster cutting
The previous analysis has indicated that the performance of human operators was below the theoretical optimum, as estimated by the BEER method, and that this may be due to an inability of human operators to visualize the multidimensional cluster space. It is therefore of interest whether automatic clustering systems reduce the error rates made by the human operators.
Several approaches to automatic spike sorting have been proposed
(Fee et al. 1996; Lewicki 1998
;
Sahani 1999
; Wright et al. 1998
). In our
investigation, we examined the applicability of a general-purpose
clustering method designed to automatically classify various types of
data (Cheeseman and Stutz 1996
).
Figure 8A shows the raw output produced by the program on session 4, in which the intracellular cell was bursting. The program generated nine clusters. However, examination of autocorrelograms indicated that three of these clusters (shown in red, green, and blue) corresponded to spikes from the same cell (Fig. 8B). The clean refractory period in the cross-correlogram for all three clusters, and the asymmetric cross-correlograms between the red cluster and the green and blue clusters, suggested that the latter may correspond to correspond to subsequent spikes of the bursting cell. This is supported by the larger amplitude of the earlier discharging (red) spikes. Examination of other auto- and cross-correlograms (not shown) indicated that no other clusters corresponded to the bursting cell, and that the small-amplitude yellow, mauve, and pink clusters, and the diffuse gray cluster do not correspond to single units. The final clusters are shown in Fig. 8C. We refer to the process of automatic cluster determination, followed by manual reassignment as semi-automatic clustering.
|
The tendency of AutoClass to overcluster the data was born out in the other five recording sessions. The output of the program was examined by an operator in each case, and clusters deemed to have belonged to the same cells were merged. In session 6, the program also produced a bimodal cluster, which was deemed to contain the spikes of two cells. The cluster was split manually, and the distinctness of the two subclusters was confirmed by a flat cross-correlogram.
The error rates of the semi-automatic system (AutoClass followed by manual cluster merging) are shown in the bottom line of Table 1, and are marked by A in Fig. 1. In all six files, the performance of the semi-automatic system was better than that of any human operator. Furthermore in every case, the performance of the semi-automatic system was comparable with the theoretical optimum as estimated by the BEER measure. We therefore confirm that automatic cluster cutting, if followed by proper postclustering examination and adjustment, can indeed lead to lower error rates than manual spike sorting methods.
Comparison of tetrode to single-wire performance
While tetrodes are often acknowledged to provide more accurate
spike separation in hippocampus (Drake et al. 1988;
McNaughton et al. 1983
; Nadasdy et al.
1998
; Recce and O'Keefe 1989
; Wilson and
McNaughton 1993
), and necortex (Gray et al.
1995
; Wright et al. 1998
), single-wire
recordings are still often used in many studies (Deadwyler and
Hampson 1995
; Hampson et al. 1999
;
Nicolelis et al. 1997
). We characterized the improvement
gained from tetrode recordings by comparing optimal possible
performance using only one of the tetrode channels relative to optimal
performance using all four.
Figure 9 shows the optimal performance
estimate for the six recording sessions using each channel individually
compared with optimal performance using them all together. In each
plot, the four pairs of dotted lines show estimated optimal performance for each of the four sites, and the single pair of solid lines shows
estimated optimal performance for the full tetrode. All three principal
components were used (Abeles and Goldstein 1977).
|
The error estimate for single wires differed greatly from wire to wire with the worst wires showing Type I and Type II errors both exceeding 50% and therefore off the scale of the plots in Fig. 9. However, even the best wires show poorer performance than tetrodes with error rates typically double those for the full tetrode.
![]() |
DISCUSSION |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
We used simultaneous intracellular and extracellular recordings to
quantify errors of manual spike separation methods. To date, no other
methods have been available for an independent and objective evaluation
of any on- or off-line unit classification methods. Because of the
subjective bias and differing degree of experience of the individual
human operators, not only the overall error rates but the types of
errors inherent in manual methods are expected to vary across different
laboratories. For several types of investigations, multiple-unit
recording or contaminated single-unit recording may be adequate.
However, the analysis of emerging, cooperative activity of single
neurons (Buzsaki et al. 1992; Deadwyler and
Hampson 1995
; McNaughton et al. 1996
;
Nadasdy et al. 1999
; Riehle et al. 1997
;
Skaggs and McNaughton 1996
; Wilson and McNaughton
1993
, 1994
) requires very accurate spike sorting. Our major
finding is that the causes of separation error can be explained by
overlapping clouds in cluster space due to the spread of clusters
representing single cells and to the similarity of clusters
corresponding to different cells. Specifically, we found that
1) the Type I and Type II errors of human operators
typically ranged from 0 to 30% but could exceed 50% during periods of
synchronous population cell discharges, 2) neurons with
similar amplitude profiles across channels may be discriminated when
information about the waveforms is also utilized, 3) auto-
and cross-correlograms of spikes may identify poor clusters but cannot
always estimate confounding spikes in the clusters, 4) the
most important source of error is human error, caused by the
observer's inability to visualize the full high-dimensional cluster
space. 5) Using information from single wires produced
substantially more errors than tetrode recordings. 6) Spike
sorting errors may be significantly reduced by semi-automatic clustering.
Synchronous discharge of nearby neurons increases Type I clustering error
The most important error of biological origin occurs due to the
near simultaneous discharge of neurons in the vicinity of the recording
electrodes. This was demonstrated best by the observation that both
Type I and Type II errors increased several-fold when spikes occurred
during sharp wave-associated population bursts of CA1 neurons. We
examined two potential sources of the increased error. Hippocampal
sharp waves are associated with fast field oscillation (ripples,
100-200 Hz) in the CA1 pyramidal layer and an increased synchrony of
pyramidal cells and interneurons (Buzsaki et al. 1992).
These high-frequency ripple waves are often significantly larger in
amplitude than the units to be discriminated, and therefore inadequate
filtering may not perfectly separate field events from the
extracellularly recorded spike. However, when filtered traces of spikes
collected during the presence and absence of ripples were compared,
they were identical in amplitude and shape, indicating that digital
filtering successfully eliminated the effects of nonspike related
background fluctuations. However, in contrast to the similar mean spike
waveforms, the variability of spikes was substantially larger during
ripples than in their absence. A likely reason for this increased
variability is the random superimposition of spikes tightly coupled to
the negative peaks of ripple waves (Csicsvari et al.
1999
). If a cell fires during the spike of another neuron, the
resulting waveshape will be the combination of the two spikes. If the
added spike(s) sufficiently alters the amplitude distribution and shape
of the spike of the neuron to be clustered, an omission (false
negative) error will occur. Conversely, superimposition of smaller
amplitude spikes may result in spike waveforms sufficiently similar to
be incorporated into the cluster of the larger amplitude neuron, thus
incrementing false positive errors. We hypothesize that a major
contributing factor to Type I error is the increased probability of
discharge of many nearby neurons in a narrow time window. We have
estimated that a single tetrode can monitor >100 clusterable neurons
in the CA1 pyramidal layer (Henze et al. 2000
). In
contrast to this large number, typically <10 of these recordable cells
are routinely clustered in the awake behaving animal. The most likely
reason for the relatively low number of recovered clusters is that the
great majority of pyramidal neurons are silent most of the time.
However, during the population synchrony associated with sharp waves
(Buzsaki et al. 1992
; Csicsvari et al.
1999
) even some of the previously silent cells may discharge.
Since these silent cells do not have sufficient spike counts to form recognizable cluster clouds, the spikes they emit may be mistakenly incorporated into the clusters of other neurons.
Although population bursts of neurons provided the most robust examples
for the demonstration of spike interference effects, the same rules
also apply to any state when nearby neurons discharge together within
the duration of the action potential. In the hippocampus, such
spike-spike interference effects are relatively small because in the
absence of sharp wave bursts neighboring pyramidal cells (place cells)
only exceptionally represent the same part of the environment and
therefore rarely fire together (Recce and O'Keefe 1989). In contrast, neurons in cortical columns are typically coactivated and discharge at a high rate in response to relevant inputs
(Hubel and Wiesel 1962
; Mountcastle
1957
). In addition, neocortical networks can also become
synchronized with millisecond precision either spontaneously
(Abeles and Gerstein 1988
; Buzsaki and Kandel
1998
) or in response to sensory inputs (Jones and Barth 1999
; Kaneko et al. 1999
). Our observations
therefore indicate that separation of tetrode-recorded neocortical
neurons with the currently available clustering methods are likely to
yield much larger Type I error rates than in the hippocampal CA1 region.
Spike waveform variability of single neurons contribute to Type II error
The original motivation of the tetrode technique was that multiple
voltage sensors can reliably locate a given neuron in space on the
basis of spike-amplitude ratios (Drake et al. 1988;
Gray et al. 1995
; McNaughton et al. 1983
;
Recce and O'Keefe 1989
). However, research on the
biophysical properties of neurons indicates that neurons are not point
sources of extracellular currents underlying the action potential. Both
the perisomatic region and the dendritic compartments actively support
both the generation and propagation of action potentials (Calvin
and Hartline 1977
; Kamondi et al. 1998
;
Llinas and Nicholson 1971
; Regehr et al.
1993
; Stuart et al. 1997
). Because spike
activity of pyramidal cells can be recorded several hundred micrometers
from their somata (Buzsaki and Kandel 1998
;
Buzsaki et al. 1996
; Henze et al. 2000
),
the extracellular spike recorded by the extracellular recording
electrode represents a summed activity of somatic and dendritic
currents (Nadasdy et al. 1998
). Since the amplitude and
waveform of dendritic action potentials vary substantially as a
function of the presynaptic network activity (Kamondi et al.
1998
), the extracellular waveforms also vary along the
somadendritic axis, modulated by ongoing network behavior.
Complex-spike bursts of pyramidal cells (Ranck 1973
) represent perhaps the most extreme degree of extracellular spike amplitude and waveform variation. Our findings revealed that these factors can often lead to elongated clusters, which also increases the
probability of cluster overlap with spikes emitted by other neurons.
Because units with spikes of various shapes and sizes may be
inadvertently grouped into different clusters, the number of spikes
emitted by a single neuron may be substantially underestimated (omission error).
Improved error rates achieved by semi-automatic clustering
Examination of the errors made by human operators indicated that their performance was below the theoretical optimum and that the most likely cause of this sub-optimal performance was an inability to visualize the high-dimensional cluster space. This was confirmed by experiments with a semi-automatic clustering process, which achieved error rates closer to the theoretical optimum.
The semi-automatic process consisted of an automatic classification program, followed by examination and reassignment by a human operator. While the program's parameters needed very little adjustment, it had a tendency to overcluster the data; this tendency was expressed by dividing the cloud corresponding to a single cell into several clusters, which was most noticeable for bursting cells. This problem was rectified by a human operator who manually merged clusters corresponding to the same cell based on cluster proximity, amplitude ratios, and cross-correlograms. In addition, the program would occasionally produce a single cluster combining spikes from two cells. This again may be corrected by manual intervention, including examination for bimodality of clusters and subsequent examination of the cross-correlogram of the divided cluster.
The usual arguments advanced so far in favor of automatic spike sorting have been that it is considerably faster than the manual method and free from the subjective bias and experience level of the operator. We now can add another argument: that it may lower error rates, beyond even those of the most experienced operator.
Superiority of tetrodes over single wires
Spike isolation methods based on multiple-site recordings have
been used extensively (McNaughton et al. 1983;
Recce and O'Keefe 1989
). Nevertheless several
investigators use single wires for unit isolation mostly because of the
convenience of the fabrication and implantation of single wires.
Estimation of optimal spike separation errors for single sites of our
tetrode data showed great variability depending on which site was being
used. However, even the best channels still showed relatively poor
performance with typical error rates approximately twice those of the
full tetrode. We therefore confirm that spike separation is
significantly more effective when multiple sites are used for unit
recording (Drake et al. 1988
; McNaughton et al.
1983
; Nadasdy et al. 1998
; Recce and
O'Keefe 1989
).
How may spike separation be further improved?
Use of the full tetrode channels and a semi-automatic clustering process improved the performance from the level achieved by human operators to the theoretical estimate of optimum performance provided by the BEER measure. Can spike separation performance be further improved?
Three directions for future spike separation methods are worth
emphasizing. First, improvement of the recording hardware may enhance
signal-to-noise ratios and separability of units. Silicon technology-based electrodes are especially promising because they can
be standardized and optimized for various neuron types without increasing tissue damage (Nadasdy et al. 1998).
Second, although semi-automatic clustering can achieve close to the theoretical optimum performance for a given set of feature vectors, it may be possible to lower error rates further by improving the feature vectors themselves. Two stages are involved in going from raw traces to feature vectors: spike detection and feature extraction. We have seen here that use of waveshape information (in this case by principal component analysis) leads to improved performance over simple peak-to-peak amplitude. However, further improvements in spike detection and feature extraction may lead to more focused clusters and thus lower error rates.
Third, the semi-automatic clustering process described here still
requires inspection by a human operator. This takes time, albeit less
time than manual clustering, and raises the possibility of subjective
bias. Ideally a fully automatic clustering system would produce an
output corresponding to single cells without the need for user
intervention. The AutoClass software used in this paper was designed
for automatic classification in a wide range of domains and made no use
of neuroscience knowledge, such as refractory periods and the nature of
complex-spike bursts. The incorporation of such knowledge (Fee
et al. 1996; Sahani 1999
; K. Zhang, personal
communication) holds out hope for more reliable, fully automatic spike sorting.
![]() |
ACKNOWLEDGMENTS |
---|
We thank R. Bruno for performing cluster analysis and drawing our attention to the AutoClass program, M. Recce and P. Mitra for suggestions with data analysis and comments on the manuscript, C. King, G. Dragoi, and X. Leinekugel for performing cluster analysis, and J. Hetke and K. Wise for supplying silicon probes. The data used in this paper are available on request by e-mail to G. Buzsáki.
This work was supported by National Institutes of Health Grants NS-34994, MH-54671, and MH-12403 (to D. A. Henze) and by the Epilepsy Foundation of America (to D. A. Henze).
![]() |
FOOTNOTES |
---|
Address for reprint requests: G Buzsáki, Center for Molecular and Behavioral Neuroscience, Rutgers University, 197 University Ave., Newark, NJ 07102 (E-mail: buzsaki{at}axon.rutgers.edu).
The costs of publication of this article were defrayed in part by the payment of page charges. The article must therefore be hereby marked "advertisement" in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.
Received 10 December 1999; accepted in final form 6 April 2000.
![]() |
REFERENCES |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|