Laboratory of Neuropsychology, National Institute of Mental Health, National Institutes of Health, Bethesda, Maryland 20892-4415
![]() |
ABSTRACT |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Wiener, Matthew C. and Barry J. Richmond. Using Response Models to Estimate Channel Capacity for Neuronal Classification of Stationary Visual Stimuli Using Temporal Coding. J. Neurophysiol. 82: 2861-2875, 1999. Both spike count and temporal modulation are known to carry information about which of a set of stimuli elicited a response; but how much information temporal modulation adds remains a subject of debate. This question usually is addressed by examining the results of a particular experiment that depend on the specific stimuli used. Developing a response model allows us to ask how much more information is carried by the best use of response strength and temporal modulation together (that is, the channel capacity using a code incorporating both) than by the best use of spike count alone (the channel capacity using the spike count code). This replaces dependence on a particular data set with dependence on the accuracy of the model. The model is constructed by finding statistical rules obeyed by all the observed responses and assuming that responses to stimuli not presented in our experiments obey the same rules. We assume that all responses within the observed dynamic range, even if not elicited by a stimulus in our experiment, could be elicited by some stimulus. The model used here is based on principal component analysis and includes both response strength and a coarse (±10 ms) representation of temporal modulation. Temporal modulation at finer time scales carries little information about the identity of stationary visual stimuli (although it may carry information about stimulus motion or change), and we present evidence that, given its variability, it should not be expected to do so. The model makes use of a linear relation between the logarithms of mean and variance of responses, similar to the widely seen relation between mean and variance of spike count. Responses are modeled using truncated Gaussian distributions. The amount of stimulus-related information carried by spike count in our data are 0.35 and 0.31 bits in primary visual and inferior temporal cortices, respectively, rising to 0.52 and 0.37 bits for the two-principal-component code. The response model estimates that the channel capacity is 1.1 and 1.4 bits, respectively, using the spike count only, rising to 2.0 and 2.2 bits using two principal components. Thus using this representation of temporal modulation is nearly equivalent to adding a second independent cell using the spike count code. This is much more than estimated using transmitted information but far less than would be expected if all degrees of freedom provided by the individual spike times carried independent information.
![]() |
INTRODUCTION |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
It is not yet completely understood how
information is encoded in neuronal spike trains and how much
information is carried. In the visual system, it is clear that the
number of action potentials elicited by a visual stimulus is an
important part of the code for carrying stimulus-related information.
There is now strong evidence that modulation of the firing rate during
the course of the response to a stimulus presented for the period of a
typical intersaccadic interval (~300 ms) carries additional
information that is not available from average response strength alone
(Heller et al. 1995; Richmond and Optican 1987
,
1990
; Tovee et al. 1993
; Victor and
Purpura 1996
). Here we investigate how much information temporal modulation adds.
As often understood, the answer to this question depends on the
experiment from which we take our data. Different sets of stimuli will
elicit different sets of responses, which may encode more or less
information using temporal modulation. Using response models can help
us answer a question less tied to a particular experiment: how much
more information can be carried by the best use of temporal modulation
than by the best use of spike count alone? The maximum amount of
information that can be carried by a neuron using a particular code
(such as spike count, or spike count along with some representation of
temporal modulation) is called its channel capacity (Cover and
Thomas 1991; Shannon and Weaver 1949
). The
channel capacity is defined uniquely for a particular neuron using a
particular code. Channel capacity's uniqueness comes at a price,
however: to calculate channel capacity, we must know all possible
responses that can be elicited from a neuron rather than only those
observed in our experiment.
Our goal is to construct a response model that not only describes the
responses observed in an experiment but also can be used to predict the
responses that would be elicited by other stimuli. This substitutes
dependence on the accuracy of the model for the limitations imposed by
the amount of data typically collected in experiments. Ideally we would
like this model to include all features of the response that do carry
unique information, without complicating matters by including features
that carry little or no unique information. Recently Gershon et
al. (1998) presented an approach to constructing such a
response model involving spike count only. The model used a widely
known relation between mean and variance of spike count (Dean
1981
; Lee et al. 1998
; O'Keefe et al.
1997
; Tolhurst et al. 1981
, 1983
; van Kan
et al. 1985
; Vogels et al. 1989
), and the fact
that distributions of spike count were well fit by a truncated
Gaussian. This model allowed Gershon et al. (1998)
,
given any mean spike count, to describe the set of responses giving
rise to that mean. The set of responses corresponding to a given mean
did not depend on whether the mean had actually been elicited by a
stimulus in the experiment. Thus the model predicted the responses
associated with any possible mean. Because any set of responses has a
mean, the model described all sets of responses that could be elicited
from the neuron. This allowed Gershon et al. to estimate the neuron's
channel capacity using the spike count code.
Here we extend the approach of Gershon et al. (1998) to
a code that includes the temporal patterns of rate modulation of the responses. We represent the neuronal responses using the first two
principal components. The coefficient with respect to the first
principal component is strongly correlated with spike count, and the
coefficient with respect to the second principal component is a coarse
(±10 ms) measure of the time course of a response, often indicating
whether spikes tend to be concentrated at the beginning of a response
(Richmond and Optican 1987
, 1990
). We identify relations
among the means and variances of the principal component coefficients
similar to the relation between mean and variance of spike count. We
also find that distributions of the principal component coefficients
are well fit by truncated Gaussian distributions. Now a set of
responses must be described by two means
one for each component of the
code. As in Gershon et al. (1998)
, this allows us to
predict both the response strength and the temporal patterns of rate
modulation for all possible sets of responses.
We estimated channel capacity on the basis of the response models for
spike count and for the first two principal components. We found that
the two-component code on average increases the channel capacity by
0.8-0.9 bits over that carried by spike count alone (1.1 and 1.4 bits,
respectively) in neurons in the primary visual cortex (V1) and area
TE of inferior temporal cortex. Therefore the contribution
temporal modulation can make to neural information processing for
identification of stationary stimuli, although far smaller than if all
degrees of freedom in a spike train actually were used to carry such
information, is substantially larger than would be estimated from only
the responses seen in our experiment. This contrasts with situations in
which rapidly changing stimuli drive precisely time-locked neuronal
responses, which can lead to extremely high information transmission
rates (Buracas et al. 1998). Finally, we present
evidence that very little additional channel capacity for identifying
stationary stimuli can be expected by using more principal components
(that is, by representing the rate variation with higher precision).
![]() |
METHODS |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Data sets
We performed new analyses using previously published data. The
data came from two studies of supragranular V1 complex cells, each
study using two rhesus monkeys performing a simple fixation task
(Kjaer et al. 1997; Richmond et al.
1990
), and from one study of neurons in IT cortex in two other
monkeys performing a simple sequential nonmatch-to-sample task
(Eskandar et al. 1992
). In these three studies, the
visual stimuli were two-dimensional black-and-white patterns based on
the Walsh functions (Fig. 1).
|
In both V1 studies, stimuli were presented centered on the neuronal
receptive fields, located in the lower contralateral visual field
1-3° from the fovea. Stimuli covered the excitatory receptive field.
At 3° eccentricity, the stimuli were ~2.5° on a side. In the
first V1 study (V1 set 1) (Richmond et al. 1990), 128 stimuli were used: sixty-four 8 × 8 pixel patterns and their
contrast-reversed counterparts. For V1 set 2 (Kjaer et al.
1997
), 16 stimuli were used: a set of eight 16 × 16 pixel
patterns and their contrast-reversed counterparts.
In the IT study (Eskandar et al. 1992), stimuli were
presented centered on the fovea. The patterns were 4° square.
Thirty-two stimuli were used: sixteen 4 × 4 pixel patterns and
their contrast-reversed counterparts.
The stimulus was displayed for 320 ms in V1 and 352 ms in IT. To
account for response latencies and to avoid contamination from
off-responses, spikes were counted during the interval from 30 to 300 ms after stimulus onset for the V1 neurons and 50 to 350 ms after
stimulus onset for the IT neurons. Gershon et al. (1998)
found that both transmitted information and channel capacity based on
the spike count code are stable with respect to small changes in these
counting windows. For each neuron, each stimulus was presented
approximately the same number of times (±2) in randomized order.
Different neurons received different numbers of presentations. The
number of presentations of each stimulus was between 10 and 50 in V1
and between 19 and 50 in IT. Seven V1 neurons with <10 trials per
stimulus were omitted from these analyses to ensure stability of
information estimates (Golomb et al. 1997
; Wiener and Richmond 1998
). The timing of events, including spikes, was recorded with 1-ms resolution.
Quantifying the responses
Each spike train was low-pass filtered by convolution with a
Gaussian distribution with standard deviation of 5 ms and resampled at
1-ms resolution to create a spike density function (Fig.
2, A and B).
Convolving and resampling in this way avoids a problem of binned
histograms, which do not distinguish between spikes at the center of a
bin and those near the edges (Richmond et al. 1987;
Sanderson and Kobler 1976
; Silverman
1986
).
|
We used principal component analysis to reduce the dimension of the
data set. Principal component analysis defines an ordered set of axes,
each accounting for more of the variance in the data set than those
that follow. These axes can be computed by finding the eigenvectors of
the covariance matrix of the data (Ahmed and Rao 1975;
Deco and Obradovic 1996
). Each data point is defined uniquely by its projections onto these axes; these values are called
the coefficients with respect to the principal components, or principal
component coefficients. We use the code formed by the first
k principal component coefficients because it is the optimal
k-dimensional linear code for least-squares reconstruction and minimizes an upper bound of information loss through a one-layer network (Campa et al. 1995
; Deco and Obradovic
1996
; Plumbley 1991
). Principal-component
analysis has a long history in signal analysis (Ahmed and Rao
1975
) and has been used to study information coded by temporal
modulation in neuronal responses (Heller et al. 1995
;
Kjaer et al. 1994
; McClurkin et al. 1991
;
Optican and Richmond 1987
; Richmond and Optican
1987
, 1990
; Tovee and Rolls 1995
; Tovee
et al. 1993
). Figure 2 shows rasters of the responses elicited
by two stimuli from a single V1 neuron, the corresponding spike density
functions, the first four principal components,
1-
4, of the
responses from the neuron, and the principal component representation
of the two responses.
In this study, each neuronal response was represented by the
corresponding coefficients with respect to the first and second principal components of the spike density functions (Richmond and Optican 1987, 1990
). We call these coefficients
1 and
2.
1 and
2 can be
translated by arbitrary constants as long as the appropriate constant
multiples of
1 and
2
are subtracted from the results. This manipulation is similar to
rewriting the expression a + bx as a + bx0 + b(x
x0). It is conventional to use the average
waveform as the base waveform, translating the axes so that the average
waveform corresponds to a vector of zeros. This causes some values of
1 and
2 to be
negative. Here we translated the axes so that
1 and
2 are always
positive and logarithms could be taken. The specific form of the new
base waveform is not important for our analyses.
Model of response variability
Estimating channel capacity requires estimating all possible
(1,
2) response
distributions. Our estimate of these distributions must be based on the
responses elicited by stimuli actually presented in our experiments. It
is important to note that if the responses elicited by the stimuli
presented in our experiments are not representative of the responses
elicited by other stimuli, our model will generalize poorly.
Ideally, we would estimate the two-dimensional conditional distribution
of responses p(1,
2|s) directly for each stimulus s. However, we did not have enough responses to each
stimulus to do so. Instead, we described the distributions of
1 and
2 individually,
and assumed that
2 was independent of
1 (except for correlations imposed by
truncation at the bounds of the response space, see following text). To
be precise
![]() |
![]() |
(1) |
![]() |
![]() |
(2) |
Although 1 and
2 are
by construction uncorrelated in the responses from each neuron,
correlations are sometimes present in the responses elicited by
individual stimuli. In RESULTS, we observe that the set of
responses is bounded; these bounds will be described quantitatively at
that point. We observe correlations between
1
and
2 only in those distributions lying close
to the response bounds. Therefore our model assumes that correlations between
1 and
2 arise
only as a result of truncating distributions at the boundaries of the
response space. We make this same assumption when characterizing the
set of all possible response distributions to calculate channel
capacity. We make this assumption because we do not have data to
justify any other: under other circumstances (and with sufficient data)
we might find, and include in a model, such correlations. The close
match of our estimates of transmitted information to those obtained
using other well-validated methods (see RESULTS) suggests
that in this case the assumption is justified.
Means and variances of response distributions
Two-dimensional Gaussian distributions (given our correlation
assumptions, preceding text) are characterized by two means and two
variances. All four parameters can be measured from experimentally observed responses. In RESULTS, we show that the variances
of 1 and
2
distributions are related to mean
1 by a power
law
![]() |
(3) |
![]() |
(4) |
Estimates of log(µ) and log(2) obtained by
taking the logarithm of the sample mean and variance are biased,
resulting in underestimation of the variance of response distributions
and therefore overestimation of transmitted information. We corrected
for the bias using a Taylor series expansion; only a few terms are
needed for good results (Kendall and Stuart 1961
, p.
4-6).
We also used linear regression to estimate the mean and variance of the
values of 2 when
1
takes on a given value, no matter which stimulus elicited the response.
Transmitted information and channel capacity
The information carried in a neuron's response about which
member of a set of stimuli is present (Cover and Thomas
1991) is defined as
![]() |
(5) |
The transmitted information I(;S) depends on
p(s), the distribution of presentation
probabilities of the stimuli (see Eq. 5). The channel
capacity of a cell is the maximum value of transmitted information
over all distributions p(s), where the set of
stimuli S should now be understood to include all possible
visual stimuli. Finding this maximum requires knowing the conditional
response distributions p(r|s) for
all stimuli s. We estimate the distributions using the model
described earlier.
From Eqs. 1 and 2, we see that estimating
p(|µ), the probability with which the distribution
with mean µ = (µ1,
µ2) elicits response
= (
1,
2), requires
µ1, µ2,
12, and
22. Given the power-law
relations between the means and variances of distributions (Eq. 3), µ1 determines both
12 and
22 but not
µ2. Thus for each µ1
there are many possible response distributions, identical except for
translation in
2. Each distribution, then, is
characterized completely by the two means µ1
and µ2. When calculating transmitted
information, only the observed distributions are considered (Fig.
3, top); when calculating
channel capacity, all possible distributions must be considered (Fig.
3, bottom).
|
According to this model, two stimuli that elicit the same
µ1 and µ2 from a neuron
in fact elicit identical response distributions and therefore cannot be
distinguished from one another by that neuron. Thus a stimulus can be
identified simply by the mean response it elicits. Equation 5 now can be written
![]() |
(6) |
A derivation of Eq. 6 is given in Gershon et al.
(1998). That derivation is expressed in terms of the spike
count code but remains valid when spike count is replaced with any
other response code. Information and channel capacity based on the
spike count code were calculated using the method of Gershon et
al. (1998)
.
The search for the maximizing set of probabilities is subject to three constraints: the probabilities must be nonnegative; the probabilities must sum to one; and the range of means must be finite. The first two constraints arise from intrinsic properties of probability distributions. If the third constraint is violated, the transmitted information can be infinite and the problem of maximizing transmitted information is ill-posed. The implementation of these constraints and other numerical issues are discussed in the APPENDIX.
We did not penalize distributions for probability weight falling
outside the response envelope bounds for 2 (as
we did for probability weight falling outside the observed range of
1); we simply ignored that portion of the
distribution. This allows the widest possible separation between
distributions, causing our estimates of channel capacity to be larger
than if the boundaries had been strictly enforced. Thus our procedure
was designed to give the most generous estimates possible (consistent
with the constraints estimated from the data) of channel capacity
associated with temporal coding.
![]() |
RESULTS |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
We considered 6 neurons from one experiment in V1, 29 neurons from
a second experiment in V1, and 19 neurons from IT. To calculate transmitted information and channel capacity, we characterized the
space of responses, parameterized by 1 and
2, the coefficients with respect to the first
two principal components; determined the regression relations for the
means and variances of
1 and
2 elicited by the stimuli; and constructed a
model for the distributions of
1 and
2.
1 and
2 are somewhat
abstract, but are related to biologically relevant aspects of the
response. Responses with high (low) values of
1 correspond to responses with high (low)
numbers of spikes (McClurkin et al. 1991
;
Richmond and Optican 1987
, 1990
; Tovee et al.
1993
); in our data, the median correlation between spike count
and
1 was 0.91 (interquartile range
0.86-0.97).
2 is a coarse (±10 ms) measure
of temporal modulation (Optican and Richmond 1987
;
Richmond and Optican 1987
). It often characterizes whether or not spikes are concentrated in the early part of the response (Fig. 2).
Response space
Although the coefficients with respect to the first and second
principal components (1 and
2) are linearly uncorrelated by construction
(Ahmed and Rao 1975
; Deco and Obradovic
1996
), they are not independent. Figure
4 shows scatterplots of
2 versus
1 for two
cells from V1 and two cells from IT. In these four cells, and in all
other cells in our sample, the range of
2
increased with increasing
1. The apex of the
cone is the (
1,
2)
pair representing trials when no spikes were elicited.
|
The limited range of 2 for a given value of
1 reflects the fact that only a certain amount
of temporal modulation is possible with a given number of spikes. If
each of k spikes in a train could be placed in any of
n time bins, assuming only that no bin could hold more than
one spike, the number of possible patterns would be
(kn) = n!/[k!(n
k)!].
The range of values of
2 seen under this combinatorial assumption (Fig. 5, thin
line) is much wider than the range observed experimentally for any
given number of spikes k (Fig. 5, dots).
|
Because we do not understand the spike generation process well enough
to predict the response envelope, we developed a statistical model of
the mean and variance of distributions of 2
corresponding to different values of
1. We
divided the range of
1 into a number of bins,
each of which defines a
1 slice in the
two-dimensional response space. Figure
6A shows the top left
scatterplot from Fig. 4 with slice boundaries superimposed. The
mean and variance of
2 were calculated in each
1 slice. Figure 6B shows an example of the linear relation between log(variance) of
2 and log(mean) of
1
in each
1 slice. This relation was significant
in 47/54 neurons at the P < 0.01 level (median
r2 = 0.91, iqr = 0.79-0.94). Figure 6C shows an example of a fit of the mean
of
2 as a quadratic function of the mean of
1 in each slice. The quadratic fit was
rejected in only 3/54 neurons (
2 test,
p
0.01) and is used here.
|
We used these relations to estimate the bounds of the envelope
containing the response space by
2(
1) ± k±
2(
1), where
and
are the mean and standard deviation
estimated as in the preceding text. For each cell,
k+
(k
) was chosen so that all but 0.1%
of the points fell below (above) the boundary. Several points were
allowed to fall outside the bounds to prevent excessive influence of
outliers. We show below that small changes in the response envelope do
not affect our results. For all 54 neurons, the response space envelope
estimated using the regression (thick lines in Fig. 5) was much
narrower than the envelope assuming spikes can fall in arbitrary 1-ms
bins (the thin lines in Fig. 5).
Mean-variance relations of 1 and
2 by
stimulus
The logarithms of the mean and variance of spike count in response
to different stimuli are linearly related (Dean 1981;
Gershon et al. 1998
; Lee et al. 1998
;
O'Keefe et al. 1997
; Tolhurst et al. 1981
,
1983
; van Kan et al. 1985
; Vogel et al.
1989
). Because
1 is correlated
strongly with spike count (Richmond and Optican 1987
),
it is natural that the logarithms of the mean and variance of
1 also are related linearly (in 52/54 neurons
at P < 0.01; median
r2 = 0.79, iqr = 0.61-0.91; see Fig. 7). By simple
extension, we might expect there to be a linear relationship between
log(mean) and log(variance) of
2 as well.
However, such a relation existed in only 11/54 of the neurons
(P < 0.01; median r2 = 0.09, iqr = 0.02-0.21). Instead, we found that the
variance of
2 elicited by a single stimulus
increased with the mean of
1 (in 43/54 neurons
at P < 0.01; median
r2 = 0.66, iqr = 0.38-0.84; see Fig. 8). This is
consistent with the fact that the range (variance) of
2 increases with increasing
1 (Fig. 4). Adding
mean(
2) to the model added very little
explanatory power (median increase in
r2 = 0.02, iqr = 0.0-0.09); for simplicity, we omitted mean(
2) from the model. Using these regressions, we predicted the variance of
1 and
2 distributions
elicited by different stimuli from the means of the
1 distributions.
|
|
To check that our regression estimates were robust, we divided the data for each neuron into two sets. The regressions for the two halves were statistically indistinguishable (P > 0.01) for all 54 neurons. The fact that different subsets of the data result in indistinguishable regression lines shows that the model can generalize, and supports using the model to predict the structure of responses to stimuli other than those presented in our experiments.
Distributions of principal component coefficients
Estimating the transmitted information between the visual stimuli
and the neuronal responses requires estimating the conditional response
distributions p(1|s)
and p(
2|s). We assumed
that the
1 and
2
distributions are separable (except for the relation between
1 and the range of
2)
and examined normal, lognormal, and gamma distributions, each truncated
at the bounds of the response space, as models for the
1 and
2 distributions.
Using the observed mean and the variance estimated from the regression
relation, the normal distribution (truncated at the boundary of the
response space) was an acceptable fit for 79% of the distributions of
1 elicited by individual stimuli, 83% of the
distributions of
2 elicited by individual
stimuli, and 79% of the distributions of
2
given
1 regardless of stimulus (Kolmogorov-Smirnov test, P < 0.05). The gamma
distribution was acceptable in almost exactly the same cases as the
normal distribution, but the lognormal distribution was rejected much
more frequently. We chose to use the normal distribution. Because the
gamma distribution fit nearly as well as the normal, the information
calculations presented in the following text were repeated using the
gamma distribution for several cells; the results were
indistinguishable from those obtained using the normal distribution.
Transmitted information
In this study, our goal was to estimate channel capacity, not
transmitted information (which has been estimated for these data in the
past) (see Eskandar et al. 1992; Heller et al.
1995
; Kjaer et al. 1997
; Richmond et al.
1990
). Transmitted information was calculated as a test of our
model: estimates based on the model can be compared with estimates
obtained using a previously validated neural network method
(Golomb et al. 1997
; Kjaer et al. 1994
).
If the assumptions in our model are reasonable, the two methods should
give similar results. This has been shown to be the case when spike
count is used as the neural code (Gershon et al. 1998
);
we found that the two methods also give similar results for the
(
1,
2) code used
here. The least-squares line relating the two information estimates had
intercept 0.04 and slope indistinguishable from 1. The
r2 value was 0.94. The difference
between the two measurements was of the order of magnitude by which
Golomb et al. (1997)
found that the neural network
underestimates transmitted information with limited numbers of samples.
Finding that the information measurements using the two methods are
similar led us to believe that the assumptions in our model are
reasonable and that the model can be used to estimate channel capacity.
Figure 9 shows, for each neuron, the
transmitted information using (1,
2) as the neural code plotted against
transmitted information using spike count as the neural code. There was
no significant difference in the information transmitted using the spike count code by neurons in V1 and IT (V1: 0.35 bits median, interquartile range 0.27-0.55; IT: 0.31 bits median, interquartile range 0.18-0.39; P > 0.01 Kruskal-Wallis). Although
the increase in transmitted information from the spike count code to
the (
1,
2) code is
significantly larger in V1 than in IT (P < 0.01), the
difference in information transmitted using the
(
1,
2) code is still
not significant (V1: 0.52 bits median, interquartile range 0.40-0.78;
IT: 0.37 bits median, interquartile range 0.22-0.55; P > 0.01). This represents an increase in transmitted information of
55% (median; interquartile range 22-74%) for neurons in V1, and 19%
(median; interquartile range 9-41%) for neurons in IT. Transmitted
information using
1 (which is correlated
strongly with spike count) as the code was 12% (median; interquartile
range 3-29%) greater than transmitted information using spike count in V1 neurons and 5% (median; interquartile range
1-11%) greater in IT.
|
To check that we did not lose stimulus-related information by smearing spike arrival times with too broad a convolution kernel, we repeated the information calculations with responses smoothed using a Gaussian kernel with a standard deviation of 1 ms rather than 5 ms. No additional information was found.
Channel capacity
Figure 10 shows, for each neuron,
the channel capacity using (1,
2) as the neural code plotted against channel
capacity using spike count as the neural code. There was a small but
significant difference between the channel capacities using spike count
code of neurons in V1 and IT (V1: 1.1 bits median; interquartile range 0.91-1.30; IT: 1.4 bits median; interquartile range 1.2-1.5;
P < 0.01 Kruskal-Wallis). There was no significant
difference in channel capacity using the (
1,
2) code (V1: 2.0 bits median; interquartile
range 1.8-2.3; IT: 2.2 bits median; interquartile range 1.8-2.5).
This represents an increase in channel capacity of 84% (median;
interquartile range 62-124%) for neurons in V1 and an increase of
52% (median; interquartile range 32-95%) for neurons in IT. This
increase in channel capacity is a result of temporal modulation and is
larger than estimated using only the observed responses (transmitted
information). Channel capacity using
1 (which
was correlated strongly with spike count) as the code differed from
channel capacity using spike count as the code by 7% (median;
interquartile range
7-22%).
|
We performed several analyses to verify that our estimates of channel
capacity are robust with respect to small changes in the response space
boundaries. As for spike count code (Gershon et al.
1998), channel capacity depends on the range of responses the
cell is capable of emitting in response to a stimulus. If we
underestimate a neuron's dynamic range, we will underestimate its
channel capacity. Here we estimated the neuron's dynamic range based
on the responses observed. It is possible that we have not used stimuli
that elicit the highest possible firing rates (and so
1) from these neurons. Nonetheless the peak
firing rates we saw in these V1 and IT neurons are similar to those
reported by others using a wide variety of stimuli, including natural
stimuli (Baddeley et al. 1997
; Perrett et al.
1984
; Rolls 1984
; Rolls et al. 1982
;
Tolhurst et al. 1981
, 1983
; Vogel et al.
1989
), so we believe that our estimate of the dynamic range is
reasonable. For several neurons, we examined the effect of allowing
part of the distribution of
1 to fall outside
the observed dynamic range. When we allowed as little as 0.5% or as
much as 5% of the distribution of
1 to fall
outside the observed range, estimated channel capacity changed by
<4%. Similarly, widening the bounds on
2 for
given
1 by 5% increased the channel capacity
by <3%. If new evidence were to show that the proper range for either
1 or
2 is larger than
we have estimated here, channel capacity could be recalculated using
these methods.
Distribution of responses achieving channel capacity
We estimated channel capacity by finding the distribution (in 2 dimensions) of mean (1,
2) responses that allows the cell to transmit
the maximum possible information using a code based on
1 and
2.
Figure 11A shows an example
of such a distribution. The horizontal and vertical axes show means of
1 and
2,
respectively. Shades of gray indicate how frequently each mean is
presented to achieve channel capacity. Because some of these
distributions are quite broad, the distribution of observed responses
arising from this distribution of mean responses is diffuse (not
shown). The projection of the (
1,
2) distribution onto
1, that is, the distribution of mean
1 implied by the two-dimensional distribution, is shown as a histogram immediately below. Figure
11B shows the distribution of mean
1 values that achieves channel capacity using
1 alone as the neural code. In both histograms
the horizontal axis shows mean
1 values, and
the vertical axis shows the frequency with which distributions with the
appropriate mean value are presented to achieve channel capacity. The
projection of the optimal two-dimensional distribution onto the
1 axis is less concentrated than the optimal one-dimensional distribution.
|
Role of further principal components
Throughout this study we limited our analysis to two principal
components. The reason was practical: using a three-component code
increases the computational burden beyond the resources currently available to us. We can, however, address the issue of whether the use
of more principal components in the response representation can be
expected to lead to substantially higher estimates of information or
channel capacity. Successive principal components, by definition, account for successively smaller portions of the response variance, that is, their range decreases. This decrease is rapid in our data.
Therefore their information content (and so their contribution to
channel capacity) must decline unless the noise associated with each
principal component also decreases with the range. Figure 12 shows that this decrease does not
happen in our data. Each column shows the distribution (across 54 cells) of the signal-to-noise ratio (the variance of mean responses to
stimuli divided by the median variance of responses to single stimuli)
for spike count or one of the first 10 principal components. A
signal-to-noise ratio less than one means that the variability of
responses to a given stimulus is greater than the variability of mean
responses to different stimuli, so the responses distinguish only
poorly among the stimuli that elicited them. Thus the third principal component will contribute much less channel capacity than the first and
second principal components, and the fourth and higher principal
components are expected to contribute insignificantly if at all. We
verified that, as expected, the fifth through tenth principal
components contribute no information not redundant with information in
the first principal component. Thus our code using only
1 and
2 should carry a very large
proportion of the information available in the responses.
|
![]() |
DISCUSSION |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Here we have constructed a model of neuronal responses, based on
principal component analysis, that includes both response intensity
(firing rate) and a low-precision (±10 ms) representation of temporal
modulation. This temporal precision and this code have been shown to
carry a very large portion of the information useful for the
identification of statically presented two-dimensional stimuli
(Heller et al. 1995; McClurkin et al.
1991
; Optican and Richmond 1987
; Richmond
and Optican 1990
; Tovee et al. 1993
). Temporal
modulation at finer time scales can be used to signal rapid changes in
the stimulus (Buracas et al. 1998
), but thus far has not
been proved useful for stimulus identification.
Transmitted information, which measures how well a set of responses distinguishes among the stimuli actually presented in an experiment, is sensitive to exactly which stimuli were presented and how often. Channel capacity quantifies how useful a neuron with a particular repertory of responses could be for stimulus identification under the best possible circumstances: increased channel capacity means better stimulus identification. When the repertory of responses is based on a particular neuron's responses, the calculated channel capacity is an estimate of the neuron's channel capacity. The accuracy of the estimate depends, of course, on how well the model generalizes to responses to stimuli not presented in the experiment. To the extent it was possible to test here, our model generalized well.
Using transmitted information, it has been estimated for some of the
data used in this study (Heller et al. 1995) and in
other similar studies (Tovee et al. 1993
; Victor
and Purpura 1996
) that temporal modulation adds 0.1-0.2 bits
to the information carried by spike count alone. Channel capacity based
on the two-principal-component code used here is greater than channel
capacity based on the spike count code by 0.8-0.9 bits. This is nearly
equivalent (at least in V1) to adding a second independent cell using
the spike count code. This is much more than estimated using
transmitted information, but far less than would be expected if all
degrees of freedom provided by the individual spike times carried
independent information. This is true despite the fact that for the
code including temporal modulation we allowed our model to give the
highest estimates of channel capacity consistent with the response
envelope and noise structure estimated from the data (see
METHODS).
Representing responses with temporal modulation
Spike count can be thought of as measuring spike arrival times extremely coarsely, recording only whether each spike arrived during the sample interval or not. The principal component code considered here incorporates temporal modulation on a somewhat less coarse time scale (±10 ms). Is even more precise representation of the time course of responses from single neurons useful for transmitting information about stimuli?
Spike times originally were measured with 1-ms precision; one way to
increase the precision of the representation is to retain this
precision when calculating the spike density function rather than
smoothing the data. A number of researchers have reported finding
information about rapidly changing or moving stimuli encoded using
precisely timed spikes. Bair and Koch (1996) found that coherently moving random dot stimuli elicited spikes timed with precisions ranging from ±2 to ±15 ms in neurons in area MT of the monkey brain. Buracas et al. (1998)
found
approximately the same range of precision (±3 to plus or minus >10
ms) in area MT in response to rapidly moving Gabor gratings.
Bair and Koch (1996)
did not examine the information
content of responses; Buracas et al. (1998)
did, but did
not report the least temporal precision necessary to encode the
information they found (although they indicate that a bin size of 8 ms,
corresponding to a precision of ±4 ms, is adequate).
When information about the identity of a stationary stimulus has been
examined in the responses of monkey cortical neurons, however, less
temporal precision has been found. Heller et al. (1995)
considered a variety of codes with a broad range of temporal precisions
and found that the maximal information was carried by a principal
component code cut off at a bandwidth of ~25 Hz in V1 and ~10 Hz in
IT, corresponding to measurement precision of approximately ±10 ms in
V1 and ±20 ms in IT. Hertz and Richmond (1997)
found
that the first spike in responses in V1 was placed with a precision of
approximately ±15 ms. Victor and Purpura (1996)
, using
a different approach, found precisions ranging from ±5 to ±15 ms in
neurons in areas V1 and V2. More recently Oram et al. (1999)
confirmed directly that high-precision temporal patterns in neuronal responses from both V1 and LGN are stochastically related
to the spike count and (low-bandwidth) poststimulus time histogram and
therefore can carry no information beyond that available from the spike
count and poststimulus time histogram. In our own data, we found that
retaining greater temporal precision in the responses revealed no
additional information. We conclude that the information carrying
capacity using temporal modulation at the precision used here (±10 ms)
is likely to include almost all of the information carrying capacity
available at any time scale for identifying stationary visual stimuli.
Neuronal responses also might be represented more precisely using more
principal components. Successive principal components have more zero
crossings (Fig. 2; see also McClurkin et al. 1991; Richmond and Optican 1987
, 1990
), so they effectively
code higher-frequency fluctuations in the spike density function and so
allow greater localization of when spikes occurred. However, successive
principal components, by definition, account for successively smaller
portions of the response variance. Therefore their information content (and so their contribution to channel capacity) must decline unless the
noise associated with each principal component also decreases with the
range. We have shown that this does not happen in our data (Fig. 12). A
variety of studies confirm that a few principal components carry almost
all of the information available in the principal component
representation. Richmond and Optican (1990)
found that
the third principal component increased information transmission by
~5% over that transmitted by the first two principal components for
neurons in V1, whereas Tovee et al. (1993)
found an 8%
increase. McClurkin et al. (1991)
found a 7% increase
for neurons in the lateral geniculate nucleus. For the data in
Heller et al. (1995)
, the third principal component adds
8% to the information carried by
1 and
2
in V1 and <1% in IT, and the fourth principal component adds <1% in
each area. Therefore we believe that the first two principal components
capture most of the temporal coding capability of these neurons.
Principal components are a powerful statistical tool for representing
neuronal responses. They identify the features of responses that tend
to change most, leading to representations that are efficient for both
data compression and information transmission (Campa et al.
1995; Deco and Obradovic 1996
; Linsker
1988
; Plumbley 1991
). However, the fact that
information is available in the principal component coefficients does
not mean that information is either used or transmitted in this form.
Codes based on principal components are linear codes, and linear codes
form a small subset of possible neural codes. The brain may use
nonlinear codes that would not be efficiently conveyed using principal
components. Latency, for example, which is not concentrated in any
single principal component, has been shown to carry information about the luminance and contrast of visual stimuli (Gawne et al.
1996
; Mechler et al. 1998
; Wiener et al.
1998
). Information transmission by spike patterns such as
bursts also has been studied (Cattaneo et al. 1981
;
DeBusk et al. 1997
; Reinagel et al.
1999
). Nonlinear codes are not necessarily more effective
representations of neuronal spike trains than linear codes
(Fotheringhame and Baddeley 1997
), but we cannot rule
out that some nonlinear code might carry more information than we have
found here.
Statistical model
The response model used here relies on the linear relationship
between log(mean) and log(variance) of 1. This
relation is similar to the frequently observed linear relation between
log(mean) and log(variance) of spike count (Dean 1981
;
Gershon et al. 1998
; Lee et al. 1998
;
O'Keefe et al. 1997
; Tolhurst et al. 1981
,
1983
; van Kan et al. 1985
; Vogel et al.
1989
). We identified a similar relationship between the
log(variance) of
2 and log(mean) of
1, allowing simple modeling of
2 response variance.
1 and
2 were modeled
as normal distributions (truncated at boundaries estimated from the
data; see METHODS) with the observed mean and variance
calculated using the measured regression relation.
The model abstracts away some of the details of the observed responses,
and it is reasonable to ask whether the modeled responses are
sufficiently like the actual responses to be useful. Estimates of
transmitted information using our model were close to those from a
previously validated neural network method (Golomb et al. 1997; Heller et al. 1995
; Kjaer et al.
1994
). This leads us to believe that the model is sufficiently
accurate for the information calculations undertaken here. If a better
model of the distributions is found, calculations similar to those
presented here can be used to recalculate transmitted information and
channel capacity.
The response model necessarily is based on the responses actually
observed. It is not certain that the model accurately predicts the
responses that might be elicited using other stimuli. Several potential
problems with generalizing from observed responses can be identified.
First, the responses elicited using the Walsh patterns used here might
not indicate the full dynamic range of the neurons. We consider this
unlikely, because the ranges of spike counts seen in V1 and IT neurons
in these experiments were similar to the ranges reported by others
using a wide variety of stimuli, including natural stimuli such as
hands and faces (Baddeley et al. 1997; Perrett et
al. 1984
; Rolls 1984
; Rolls et al.
1982
; Tolhurst et al. 1981
, 1983
; Vogel
et al. 1989
). Another potential difficulty is that the
logarithms of the mean and variance of responses elicited by other
stimuli might not follow a linear relationship or they might follow a
linear relationship different from the one measured using these
stimuli. Assuming a linear model seems reasonable because a linear
relationship between the logarithms of the means and variances of spike
count has been observed in many parts of the brain (Dean
1981
; Gershon et al. 1998
; Lee et al.
1998
; O'Keefe et al. 1997
; Tolhurst et
al. 1981
, 1983
; van Kan et al. 1985
;
Vogel et al. 1989
). Furthermore we cross-validated the
model by dividing the stimuli in half and comparing the resulting mean-variance relations. In all cells, the two relations were indistinguishable, supporting the model's ability to generalize, at
least in this limited sense. All of our stimuli were Walsh patterns,
and we are unaware of any studies comparing the variability of
responses elicited by different kinds of stimuli. If responses elicited
by other stimuli are substantially more or less variable than those
elicited by the Walsh patterns, our model may not generalize well to
responses elicited by other stimuli.
We summarize some of the simplifications made in our model in Table 1.
|
Spike count, temporal modulation, and channel capacity
Spike count and temporal modulation are linked at the most basic
level. When there are no spikes, there can be no temporal modulation.
Trains with larger numbers of spikes can form larger numbers of
distinct temporal patterns: in the crudest estimate, the number of
possible temporal patterns involving k spikes is n!/[k!(n k)!],
where n is the number of distinct time bins. If each pattern
could occur with equal probability, the number of temporal patterns
observed, and therefore the range of
2, would
grow rapidly with spike count. If the train becomes crowded
that is,
when there are more than half as many spikes as bins
the range
decreases again, but this is not an issue in our data.
The range of 2 does increase with increasing
spike count and
1 (Fig. 4) but more slowly
than would be expected if all spike trains with k spikes
were equally likely.
2 is the inner product of
a spike density function with
2, the second
principal component of the data. Because
2
tends to have a narrow peak (Richmond and Optican 1987
),
an extreme positive or negative value of
2 would indicate that many spikes occur in adjacent bins in a response. Such tight bunching of spikes did not occur in these data. In fact,
such bunching is unlikely to occur in neurons with a refractory period.
This may give at least a partial understanding of why the observed
range of
2 is smaller than would be expected
if arbitrary patterns of spikes were possible (Fig. 5).
Within the constraints estimated from the data, we biased our model to
increase estimates of channel capacity due to temporal coding. We did
not penalize distributions for probability weight falling outside the
response envelope bounds for 2 (as we did for
probability weight falling outside the observed range of
1); we simply ignored that portion of the
distribution. This allows the widest possible separation between
distributions, increasing the estimate of channel capacity. This
deliberate overestimation was undertaken to obtain an upper bound on
channel capacity. Estimates of transmitted information were unaffected,
because the distributions of observed means and variances fell within
the response envelope.
Comparison with other studies
Some researchers have reported that rapidly moving or rapidly
changing stimuli elicit large numbers of precisely timed spikes in the
lateral geniculate nucleus and area MT, transmitting, in some cases,
>100 bits/s of information about the changing stimulus (Buracas
et al. 1998). However, only much lower precision timing (±10
ms) and much lower information transmission rates (up to ~4 bits/s)
have been measured using static stimuli (Heller et al.
1995
). Our estimates of channel capacity (~2.2 bits in a
330-ms window or ~6.6 bits/s), which were designed to err on the side of generosity, fall short of even 10 bits/s (even allowing for a large,
and unlikely, contribution from additional principal components). A
major difference between these approaches is that in the former, but
not the latter, the neuron is essentially entrained by the rapid
changes in the stimulus. Theunissen and Miller (1995)
have referred to this as temporal coding of a signal, "characterized by a one-to-one correspondence between the time of occurrence of a
sensory event and the time of occurrence of the corresponding neural
process," distinguishing it from temporal encoding of a signal, in which "information about static or dynamic signals is
encoded in some aspect(s) of the temporal pattern of action potentials" without the action potentials being tied to changes in
the signal.
The effects of entrainment can be seen even during experiments with
static stimuli. Information transmission rates as high as 30 bits/s
occur during the onset of responses immediately after presentation of
static stimuli (Heller et al. 1995), though these rates
drop after ~50 ms. Thus the high rate of information transmission is
not sustained during typical fixations (of ~300 ms) between saccades
in normal vision. Nonetheless Richmond et al. (1999)
found that in V1 neurons the total amount of information available about the identity of stimuli played one after another in a movie was
greater when each frame was presented for 170 ms than when each frame
was presented for 136 ms. Elsewhere this information has been found to
plateau ~150-200 ms after stimulus presentation (Gershon et
al. 1998
). Thus maintaining the highest possible information transmission rate by rapidly changing the stimuli may not be the best
strategy if the goal is to identify objects in the visual field.
Summary
We have constructed a model of responses including both response strength and temporal modulation of firing rate. This model clarifies the implications of assuming that the relations between the means and variances of response features observed during an experiment are representative of the neuron's responses in general. The central feature of the model is that the variances of response features are predictable from the means. This model demonstrates that the variances of both response strength and temporal modulation depend on the mean response strength, with important implications for neural information processing. As we have emphasized, use of the model imposes dependence on how well the model generalizes to responses not observed in our experiments. The results here raise two conflicting possibilities. The first is that the experiments were not designed so as to display the best capabilities of these neurons. The other is that our assumption that all possible responses within the two-dimensional response space actually can arise may not be true. This naturally leads to further experiments to verify under what conditions assumptions made about these features remain valid. A critical question for the future is whether response properties elicited by stimuli of one kind (here, Walsh patterns) are the same in responses elicited by stimuli of other kinds.
![]() |
APPENDIX |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
The search for the maximizing set of probabilities is subject to
three constraints: the probabilities must be nonnegative; the
probabilities must sum to one; and the range of means must be finite.
The first two constraints arise from intrinsic properties of
probability distributions. If the third constraint is violated, the
transmitted information can be infinite and the problem of maximizing
transmitted information is ill-posed. The maximization of Eq. 6 is performed numerically. The numerical implementation requires
that we discretize the continuous space of mean responses. We denote
these discretized probabilities by
![]() |
![]() |
Specifically, if 1,min and
1,max are the minimum and maximum permitted
1 values for a distribution, then we demand
that
![]() |
(A1) |
![]() |
(A2) |
![]() |
(A3) |
![]() |
![]() |
(A4) |
The choice of the functions C+ and
C involves a scaling issue not
present when calculating channel capacity based on spike count
(Gershon et al. 1998
). Because the spike count takes on
only integer values it has a natural scale that can be used in the
penalty functions C+ and
C
. The principal components,
however, have no such natural scale. Dividing all the principal
component coefficients by some constant should not change the
information or channel capacity, and therefore we must make sure that
it does not change the values of C+
and C
either. We have chosen to
scale C+ and
C
by the bin size used for
1 in the numeric calculations.
![]() |
![]() |
To find the channel capacity, we maximize transmitted information
(Eq. 6) under the constraints discussed in the preceding text. Any standard minimization algorithm can be used. We used a
sequential quadratic programming method (Lawrence et al.
1997). We first found the distribution that achieved channel
capacity for
1 alone. We uniformly smeared
this distribution over the
2 dimension for
each value of
1 to create a two-dimensional distribution, which was used as the starting point of the minimization. Because the search space is closed and convex, and transmitted information is a convex function of the probabilities, we are guaranteed a single global minimum (Cover and Thomas
1991
). In two neurons, we nonetheless verified that the
minimization converged to the same solution from several starting
points. We also checked in two cells that increasing the resolution of
the discretization of
1 and
2 did not significantly change our results.
![]() |
ACKNOWLEDGMENTS |
---|
The authors thank Drs. Peter Latham (University of California, Los Angeles), Zheng Liu (National Institute of Mental Health), and Mike Oram (School of Psychology, University of St. Andrews) for helpful comments on the manuscript.
![]() |
FOOTNOTES |
---|
Address for reprint requests: B. J. Richmond, Laboratory of Neuropsychology, National Institute of Mental Health, National Institutes of Health, Building 49, Room 1B-80, Bethesda, MD 20892-4415.
The costs of publication of this article were defrayed in part by the payment of page charges. The article must therefore be hereby marked "advertisement" in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.
Received 20 April 1999; accepted in final form 28 July 1999.
![]() |
REFERENCES |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|