Institute of Neuroscience, University of Oregon, Eugene, Oregon 97403
![]() |
ABSTRACT |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Keller, Clifford H. and
Terry T. Takahashi.
Representation of Temporal Features of Complex Sounds by the
Discharge Patterns of Neurons in the Owl's Inferior Colliculus.
J. Neurophysiol. 84: 2638-2650, 2000.
The
spiking pattern evoked in cells of the owl's inferior colliculus by
repeated presentation of the same broadband noise was found to be
highly reproducible and synchronized with the temporal features of the
noise stimulus. The pattern remained largely unchanged when the
stimulus was presented from spatial loci that evoke similar average
firing rates. To better understand this patterning, we computed the
pre-event stimulus ensemble (PESE)the average of the stimuli that
preceded each spike. Computing the PESE by averaging the pressure
waveforms produced a noisy, featureless trace, suggesting that the
patterning was not synchronized to a particular waveform in the fine
structure. By contrast, computing the PESE by averaging the stimulus
envelope revealed an average envelope waveform, the "PESE
envelope," typically having a peak preceded by a trough. Increasing
the overall stimulus level produced PESE envelopes with higher
amplitudes, suggesting a decrease in the jitter of the cell's
response. The effect of carrier frequency on the PESE envelope was
investigated by obtaining a cell's response to broadband noise and
either estimating the PESE envelope for each spectral band or by
computing a spectrogram of the stimulus prior to each spike. Either
method yielded the cell's PESE spectrogram, a plot of the average
amplitude of each carrier-frequency component at various pre-spike
times. PESE spectrograms revealed surfaces with peaks and troughs at
certain frequencies and pre-spike times. These features are
collectively called the spectrotemporal receptive field (STRF). The
shape of the STRF showed that in many cases, the carrier frequency can
affect the PESE envelope. The modulation transfer function (MTF), which
describes a cell's ability to respond to time-varying amplitudes, was
estimated with sinusoidally amplitude-modulated (SAM) noises.
Comparison of the PESE envelope with the MTF in the time and frequency
domains showed that the two were closely matched, suggesting that a
cell's response to SAM stimuli is largely predictable from its
response to a noise-modulated carrier. The STRF is considered to be a
model of the linear component of a system's response to dynamic
stimuli. Using the STRF, we estimated the degree to which we could
predict a cell's response to an arbitrary broadband noise by comparing
the convolution of the STRF and the envelope of the noise with the
cell's post-stimulus time histogram to the same noise. The STRF
explained 18-46% of the variance of a cell's response to broadband noise.
![]() |
INTRODUCTION |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
In nature, most sounds have
spectra that vary over time, and auditory systems must encode and
process these dynamic stimuli. In redwing blackbirds, for example, the
temporal structure of song plays a large role in the discrimination and
recognition of different species and subspecies (Brenowitz
1983). In echolocating bats, amplitude and frequency
modulations of the returned echo encode the wing-beat of insects and
serve as a basis for identification of prey (Schnitzler
1987
). Evidence suggests that speech comprehension is possible
with degraded spectral information as long as temporal information is
left intact (Drullman et al. 1994
; Shannon et al. 1995
; Wright et al. 1997
). Thus, coding of
temporal auditory information is as fundamental to auditory function as
is the coding of spatial location and spectral characteristics.
Textbook accounts of how the auditory system encodes the spectrum of sounds usually refer to tonotopic maps, wherein the energy at a particular frequency band is represented by the mean firing rates of frequency-tuned neurons within this tonotopic array. This, of course, is a static account. Tonotopic maps, found throughout the auditory system, must continuously update their output as the amplitude in the various spectral bands change. The activity pattern of a single cell in the tonotopic map would therefore be expected to reflect the stimulus' temporal structure.
The present study examines this process in the lateral shell of the
central nucleus of the inferior colliculus (ICc-ls) of the barn owl
(Tyto alba), which, like that in mammals, is a nearly obligatory synaptic station for ascending auditory information. In the
owl, ICc-ls cells are not only tuned to stimulus frequency but are also
sensitive to source location (Knudsen and Konishi 1978; Mazer 1995
; Wagner et al.
1987
). As a result, the cells can convey the spectral
characteristics of an auditory event at a particular location in space.
The natural auditory environment, however, is cluttered with echoes and
sounds from multiple sources, thereby complicating the process of
segregating and analyzing the spectra of individual sources. Although
the neural processing of spatial information in simple and complex
acoustical environments has been described in the owl, little is known
of how dynamic spectral information is represented with or without
clutter (Keller and Takahashi 1996a
,b
; Takahashi
and Keller 1994
). The present study takes the first step,
describing the relationship between the spiking pattern of ICc cells
and the temporal structure of a single stimulus in a cell's spatial
receptive field (RF).
![]() |
METHODS |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
All experiments have been approved by the Institutional Animal Care and Use Committee of the University of Oregon.
General procedures
Results are based on recordings from 10 adult barn owls of both sexes. A barn owl was anesthetized by intramuscular injections of ketamine (0.05-0.1 ml/h; Vetalar 100 mg/ml, Parke-Davis) and diazepam (0.025-0.05 ml/h; Diazepam C-IV 5 mg/ml, LyphoMed) and given a prophylactic dosage of ampicillin (0.2 ml intramuscular; Polyflex, 250 mg/ml Aveco). The owl was placed into a stereotaxic device that held its head tilted downwardly at a 45° angle. The scalp was infused with a local anesthetic (2% lidocaine HCl, Xylocaine, Astra Pharmaceuticals) and cut. A hole about 0.49 cm2 was opened in the skull through which a microelectrode was inserted. Silicon grease (Dow-Corning) was applied to the dural surface to prevent desiccation. Body temperature and heart rate were monitored, and the bird was warmed with a circulating-water heating pad. At the end of a recording session, the craniotomy was closed with dental cement (Vitrabond, 3M), and the scalp was sutured and covered with a topical antibacterial cream (Bacitracin-Neomycin-Polymyxin Ointment, E. Fougerra). Typically, a recording session lasted about 12-15 h. Before returning the bird to a recovery cage, it was given 0.2 ml of dexamethasone and 0.2 ml of vitamin B complex intramuscularly and 10-20 ml of 5% dextrose in lactated Ringer solution intravenously. The recovery cage was maintained at 34°C. The bird's activity was monitored until it fully recovered. A bird was typically used for four sessions with a minimum of 7 days between sessions.
All recordings were carried out in a sound-isolating booth (Industrial
Acoustics, 1.8 × 1.8 × 1.8 m). For most tests, sounds were presented over earphones (Sony MDR with custom-built cones that
fit snugly into the outer ear canal) after compensating for the
filtering properties of the earphones in situ and filtering the signals
with the individual bird's head-related transfer functions (HRTFs;
Tucker Davis Technologies PD1). We showed earlier that this virtual
auditory-space method effectively simulates a free-field source
(Keller et al. 1998). For testing how a cell's spiking pattern depended on the sound's spatial location (see Fig. 2 in RESULTS), we presented sounds from a mobile speaker (Alpine
6020AX) located 90 cm from the center of the head.
In the following text, positive azimuths denote stimuli presented from the owl's right and positive elevations denote stimuli presented from above eye level.
Stimuli
Our stimuli consisted of tone bursts, narrowband noise bursts, broadband noise bursts, and sinusoidally amplitude-modulated (SAM) noise bursts. All stimuli were synthesized digitally at a sampling rate of 30,000 points/s. Individual narrowband and broadband noise bursts were stored in computer mass-storage media and could therefore be repeated. We refer to these as "reproducible noises." In addition to their specific envelopes (e.g., SAM), they had linear onset and offset ramps lasting 5 ms. Stimuli were scaled to 20-30 dB above neuronal threshold with programmable attenuators (Tucker Davis Technologies PA4), and amplified with a stereophonic amplifier (McIntosh 754 or Tucker Davis Technologies HB6). The stimuli had the following characteristics.
BROADBAND NOISE. Broadband noises had amplitude spectra that deviated by less than 1 dB from the mean value between 2 and 11 kHz and had random phase spectra. The cutoffs were such that between 1.5 and 2 kHz, the amplitude declined by 40 dB and between 11 and 11.5 kHz, the amplitude declined by 60 dB. Our sample of cells had best frequencies between 3.5 and 9 kHz.
NARROWBAND NOISE.
Narrowband noise was constructed by filtering a broadband noise with a
gammatone filter matched to the tuning curves of the owl's 8th nerve
fibers (Köppl 1997). For a given cell, we
presented 10 repetitions each of 10 different 10-s-long noise bursts
(100 total stimuli) that had been filtered by a gammatone-filter
centered near the neuron's best frequency (BF, see below). The
envelopes of each of the 10 different gammatone noises presented to a
cell were computed using the Hilbert transform (MATLAB software
package; The Mathworks, v 5.2). The magnitude spectra of the envelopes were computed and averaged. These average envelope spectra were broadly
low-passed, declining linearly by 6 dB over a 2- to 500-Hz range of
modulation frequencies. These spectra are superimposed on the
modulation transfer functions (MTF) of some typical cells in Figs. 8,
A-D, and 9 (gray lines). As exemplified in this figure, the
envelope spectrum was wider than the MTFs of all our cells.
SAM NOISE.
SAM noise was constructed by multiplying the broadband noise,
n(t), by a sinusoidal envelope:
SAMnoise(t) = n(t)(1 + A
sin 2fet). The
modulation frequency is fe. The depth
of modulation is determined by the constant A, which varies
from 0 to 1.0. The SAM noises were presented at a level roughly 20 dB
above neuronal threshold and a modulation depth of 50%
(A = 0.5). The depth chosen approximates the
average modulation depth (or more precisely the average AC
fluctuation1) of
the noise stimuli used for the estimation of the modulation impulse
response (MIR). In five cells, additional depths and levels were tested
(see RESULTS).
TONE BURSTS. Tone bursts were constructed by multiplying a 100-ms pure tone produced by a waveform generator (Hewlett-Packard 3245A Universal Source) with the trapezoidal envelope determined by the onset and offset ramps.
Initial characterization of neurons
Action potentials from single units were isolated with
epoxy-insulated tungsten electrodes (Frederick Haer, 10 M), and the time of their occurrence relative to stimulus onset was written to
computer mass-storage media at a resolution of 10 µs (M110, Modular
Instruments). The spikes evoked by a single stimulus presentation, or
"spike train," thus consisted of a series of event times. The set
of spike trains evoked by repetitions of a given stimulus were aligned
relative to stimulus onset and summed to generate poststimulus time
histograms (PSTHs).
On isolating a cell, we first estimated the cell's firing threshold by monitoring the oscilloscope trace and audio-monitor and varying the average of the sound-pressure levels in the two ears. (These cells respond only poorly to monaural stimulation.) The average binaural level was set to a value 20-30 dB above threshold, and we ascertained that the cell was selective for both interaural level difference (ILD) and time difference (ITD), the two cues for sound localization in the owl. Neurons in the adjacent subdivision of the central inferior collicular nucleus, the ICc-core, are sensitive to ITD but not to ILD, and these tests allowed us to bypass cells in the ICc-core. On confirming the binaural selectivity of the cell, we identified its spatial receptive field (spatial RF) by counting the spikes evoked by a 100-ms broadband noise burst (below) presented from different loci in the frontal hemisphere. All cells had a RF within a 20° radius of the center of gaze (0° azimuth; 0° elevation), and some had additional RFs located more eccentrically (see RESULTS for details regarding multiple RFs). The cell's frequency response was then assessed from the center of the spatial RF closest to the midline by measuring the spike rates elicited by 100-ms tone bursts ranging from 2 to10 kHz in steps of 200 Hz or by narrowband noises with center frequencies spaced 1/30th of an octave apart from 2 to 10 kHz. In the following text, we refer to plots of spike rate as a function of the frequency of the tone or the center frequency of the narrowband noise as "rate-frequency" curves and the frequency evoking the maximal rate as the "best frequency."
Data analyses
Data were analyzed using the MATLAB software package.
DRIFTING NOISE PARADIGM. The relationship between neuronal firing patterns and the temporal structure of stimuli was demonstrated qualitatively by a "drifting noise" paradigm illustrated schematically in Fig. 1, top. Noise bursts were synthesized by replacing different 40-ms segments of a reproducible broadband noise burst, N2(t), with a different and independent reproducible broadband noise burst, N1(t). This splicing procedure generated a series of 100-ms noise bursts in which the onset of N1(t) varied, or "drifted," relative to that of N2(t) in 2-ms steps from 0 to 38 ms. Each stimulus was presented to a cell 20 times, and the resulting spike trains were aligned relative to the onset of the 100-ms noise burst.
|
STATISTICAL ANALYSIS OF SPIKE PATTERNS FROM DIFFERENT SPATIAL
LOCI.
In the ICc-ls, some neurons have more than one RF from which
statistically equivalent (by Student's t-test) spike rates
can be evoked. We placed a sound source at azimuths ranging from
85° (left) to +85° (right) along a cell's best elevation and
recorded the spike train evoked by a 100-ms broadband noise burst. The stimuli were repeated 10-20 times at each azimuth. Each spike train
from the RF closest to the midline (central RF) was cross-correlated with a PSTH built from the remaining spike trains at the central RF,
thus giving us N cross-correlation coefficients, where
N is the number of stimulus repetitions. The variance of
these correlation coefficients served as a measure of the variability
in the spike pattern evoked by repetitions of a reproducible noise
burst from a single location. Each of the spike trains from eccentric
RFs were then cross-correlated with the PSTH built from all spike trains evoked from the central RF. This also yielded N
correlation coefficients, which were averaged and compared using the
Student's t-test to the mean of the coefficients obtained
at the central RF.
COMPUTATION OF THE PRE-EVENT STIMULUS ENSEMBLE.
A spike in an auditory neuron signifies the occurrence of a specific
feature in the ensemble of stimuli received by the subject. The
preevent stimulus ensemble (PESE) is the average stimulus that precedes
each spike. The stimulus ensemble and the PESE can be defined in a
variety of ways. Typically, the pressure waveform before each spike is
averaged, but it is also possible to use some transformation of the
stimulus ensemble, such as its spectrum or envelope (Eggermont
et al. 1983c).
ANALYSIS OF RESPONSES TO SAM STIMULI.
The neurons' responses to SAM noise were plotted as period histograms
relative to the modulation period, and the magnitude (vector strength)
and phase angle of the Rayleigh vector were computed (Goldberg
and Brown 1969). The magnitude and phase, plotted as a function
of modulation frequency, are the modulation transfer function (MTF).
The modulation impulse response (MIR), which is the time-domain
equivalent of the MTF, was estimated by summing 500 sinusoids with
frequencies from 1 to 500 Hz, having the magnitude and phase angles
obtained from the MTF. Magnitude and phase values that were not
measured directly were obtained by linear interpolation.
RECONSTRUCTION OF THE PSTH FROM THE STRF. The STRF represents a linear model of how a neural system transforms the time-varying stimulus into spike trains. To judge the sufficiency of this linear model, we compared the cell's PSTH with a "reconstructed" response based on a convolution of the STRF with the stimulus envelope. We computed the STRF by the second method (see above) resulting in PESE envelopes for each of 81 different frequency bands between 2 and 10 kHz. For each frequency band, we filtered the stimulus with a gammatone filter and computed the envelope of the filtered stimulus using the Hilbert transform. Each envelope was then convolved with the PESE envelope. The results were summed across frequency bands and the mean value was subtracted. The PSTH was compiled at the same sampling rate as the stimulus and then smoothed using a moving 2-ms (60-point) boxcar window. The normalized PSTH was compared with the normalized reconstruction by calculating the maximum of their normalized covariance functions.
![]() |
RESULTS |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
General neuronal properties
Neurons in our sample had best frequencies ranging from 3.5 to 9 kHz and a spatial RF within a 20° radius of the center of gaze. In
addition to the centrally located RF, the neurons also had eccentric
RFs. Typically, sources placed at these eccentric RFs evoked a
discharge rate that was about 60-100% of the rate evoked from the
central RF. These multiple RFs, which are spaced apart by distances
inversely proportional to a cell's best frequency, are due to the fact
that ITD, the cue for sound-source azimuth, is computed by a binaural
cross-correlation-like mechanism operating on narrowband inputs (see
Carr and Konishi 1990; Takahashi and Konishi
1986
for details). These characteristics are consistent with
those found in a survey of the ICc-ls (Mazer 1995
).
Stimulus-related discharge patterns
The temporal pattern of discharge in cells of the owl's ICc-ls to
the repeated presentation of a reproducible noise burst has been shown
to be highly reproducible itself (Takahashi and Keller
1996). An example is shown in the dot raster display of Fig. 1,
in which each block of 20 rows (brackets) represents the spike trains
elicited by the repeated presentation of a reproducible noise burst
from the cell's central RF. Because the spike trains were aligned
relative to stimulus onset, the consistency of the firing pattern is
manifested within a block as vertically aligned columns of dots.
Such consistency could, of course, be an intrinsic property of the neuron, having little to do with stimulus features. Figure 1 illustrates the results of a drift test, which suggests that this is not the case. The onset of N1(t) drifted from 0 to 38 ms within N2(t). As the drifting segment, N1(t), replaced successively later portions of N2(t), identifiable columns of spikes were found correspondingly later, resulting in a diagonal pattern (arrows, Fig. 1). Evidence of this relationship between spike pattern and stimulus temporal structure was observed in all of 44 neurons tested with this stimulus.
Inspection of the dot raster displays shows, however, that the first ca. 10 ms of the spiking pattern is independent of stimulus structure. As can be seen in Fig. 1, the response pattern is the same in the first four blocks despite the fact that the drifting stimulus segment, N1(t), is at a different temporal position in each of these four blocks. Occasionally we noted that later features of the dot-raster display were also independent of the drifting noise segment. In Fig. 1, for instance, the gap at about 27 ms after stimulus onset is relatively constant despite the fact that the N1(t) drifts through this time period. Thus this gap appears to be locked to the stimulus onset and not to any other feature of the stimulus itself.
Spatial dependency of spike-train patterns
In the ICc-ls, neurons often have multiple spatial RFs
(Mazer 1995), as shown in Fig.
2. Each bracket on the left demarcates a
block of spike trains elicited by 11 repetitions of a reproducible 100-ms broadband noise burst from a speaker at the azimuths shown to
the left. The cell shown responded maximally when the speaker was
placed about 5° to the left (contralateral to the cell) of the
bird's midline at eye-level (
5° azimuth; 0° elevation). In addition, the cell had eccentric RFs at +55 and
65°. Stimulus presentation from the central RF (at
5°) gave rise to vertically aligned rows of spikes. At the edges of this central RF, the firing rate declined, but many of the features appeared to be preserved, at
least, as judged by visual inspection. Similarly at the eccentric RFs,
many of the temporal features seen at the center of the main RF
appeared to be preserved.
|
The consistency of the spiking pattern across space was quantified (see
METHODS) in eight cells in which identical stimuli in the
central and at least one of the eccentric RFs evoked discharge rates
that were not statistically distinguishable (P > 0.05, Student's t-test). The RFs at 5 and +55° shown in
Fig. 2 are examples of RFs where the firing rates were not
distinguishable. The firing patterns evoked from these loci were not
significantly more different than were spike trains evoked by repeated
presentation at one spatial location. In fact, only one cell of the
eight tested responded with spike patterns that differed at two equally
effective RFs. For this cell, the difference in the spiking patterns
evoked from the two RFs, although significant statistically
(P < 0.05; Student's t-test), was small. Our
preliminary conclusion, based on this small sample, is that spike
patterns do not change significantly more when sources are placed at
different loci in space than when they are repeated from the same location.
Stimulus features encoded by the temporal pattern
The spike trains may be entrained to either the carrier or
envelope of the stimulus or to both. The possibility that the cell's firing pattern was determined by the carrier of a broadband stimulus was examined by reverse-correlating spike trains to the pressure waveform in 82 neurons. None of the neurons examined showed evidence of
being driven by a characteristic waveform in the carrier. The traces
were all noisy and devoid of large-scale peaks, troughs, or
oscillations of the type described in auditory-nerve fibers or
cochlear-nuclei neurons that are able to phase-lock to the carrier. The
lowest best frequency in our sample was about 3.5 kHz, and we cannot
rule out that the spiking patterns of cells with lower best frequencies
would be correlated with the carrier wave. In the ICc-ls, however,
cells with best frequencies less than 4 kHz are not common
(Mazer 1995).
We next examined the possibility that the spiking pattern of the cells was related to the envelope of a stimulus in 102 cells (63 stimulated with broadband noise; 20 with narrowband noise; and 19 with both). In each case, an average envelope waveform consistently preceded each spike. This average envelope waveform is referred to as the PESE envelope.
Four examples of PESE envelopes are shown in Fig. 3, A-D. These examples show a range of shapes, some of which are roughly monophasic (e.g., Fig. 3A) and others that are more biphasic (e.g., Fig. 3B). The biphasic PESE envelopes suggest that many of the neurons have a high probability of discharge when a peak in the envelope follows a trough. For such a cell, the low-amplitude portion of an envelope would decrease the probability of discharge, which, in turn, would decrease the likelihood that the cell was in a refractory period when a suitable envelope peak appeared. Thus the spikes that do occur would be associated, more often than not, with a trough-peak sequence.
|
Spectrotemporal receptive fields
The PESE envelopes described above were computed for a narrowband
carrier centered on a cell's best frequency. How is the PESE envelope
affected by the carrier frequency? A sound spectrogram describes the
distribution of energy in various frequency bands over time. If one
were to compute the average spectrogram preceding all spikes, we would
obtain the PESE spectrogram, which gives us an idea of how a cell might
respond to modulations of the amplitude of a variety of carrier
frequencies (Aertsen and Johannesma 1980; Hermes
et al. 1981
). Features of the PESE spectrogram that stand out
from the mean level are referred to as the spectrotemporal receptive
field (STRF) (Aertsen et al. 1980
). In theory, a PESE envelope can therefore be thought of as a cross section of the STRF
parallel to the time axis (Aertsen and Johannesma 1980
;
Eggermont et al. 1983b
).
PESE spectrograms were computed from the responses evoked by 5- or 10-s bursts of broadband noise (10-30 repetitions) in 48 cells. PESE spectrograms estimated in the two ways described in METHODS are shown in Fig. 4, A and B, for a typical cell. The time preceding a spike is represented along the vertical axis, the frequency along the horizontal axis, and the magnitude of the signal on a pseudo-color scale. The time of spike occurrence is represented by the 0-ms mark at the top of the vertical axis. It is apparent that the two methods arrive at similar surfaces. Both showed that energy was consistently present at just above 7,000 Hz about 7 ms before a spike. The STRFs also had a region with relatively less energy than the mean in the 7,000-Hz band about 2 ms before the peak (about 9 ms before a spike).
|
The PESE envelope computed using a narrowband noise at a cell's best frequency should be similar to a vertical cross section of a STRF (computed by either method) provided that a cell responds similarly to broadband and narrowband noises. We explored the effects of the stimulus bandwidth on the PESE envelope in 19 cells, which were tested using both broadband and narrowband stimuli. Figure 4C shows the results from the typical cell, and exemplifies the similarity between a PESE envelope obtained with a narrowband noise (thick trace) and a cross section through the STRF (thin trace), which is equivalent to a PESE envelope obtained with a broadband noise. Coefficients of cross-correlation between the two waveforms for each of the 19 cells had a median of 0.92.
Figure 5 shows an assortment of PESE spectrograms, all of which were computed from the output of band-pass filters (2nd method). Qualitatively, the shapes of the STRFs varied along a continuum. Some STRFs had relatively narrow, well-defined peaks occurring about 6 ms before a spike, such as that in Fig. 5C. Others, like those in Fig. 5, B, D, and F, were more complex. It is clear from an inspection of these STRFs that the shape of the PESE envelope, i.e., a vertical slice through the STRF, can depend on the carrier frequency. For instance, in Fig. 5E, the PESE envelope estimated from the output of the filter centered at 6 kHz would show a deep and wide trough preceding a small peak, whereas at about 6.8 kHz, the PESE envelope would display a single peak and no trough.
|
The STRFs shown in Fig. 5, A-F, display other interesting features. Standard rate-frequency curves, assessed by counting the spikes elicited by 100-ms narrowband noise bursts or tones are superimposed, and for cells in Fig. 5, A and C-F, their shapes correlate well with the cross sections of the STRF parallel to the frequency axis (taken at the latency of the STRF maximum). We obtained both types of data from 33 cells. For most of these cells, the shape of the rate-frequency curve and a horizontal cross section through the STRF are quite similar. Figure 6A shows this comparison for each of the remaining 27 cells not shown in Fig. 5. The frequency giving the peak response in the rate-frequency test correlates strongly with the peak frequency of the STRF for all cells (Fig. 6B; r = 0.88, n = 33). The tuning curves obtained by the two methods did not always agree, however. For example, in Fig. 5B, the STRF had a broad, irregular peak between 5 and 9 kHz, whereas the frequency rate curve obtained with narrowband noises had a complex profile with a central region having fewer spikes. In other cases, the STRFs displayed frequencies of relative excitation and inhibition that could not be discerned from inspection of the rate-frequency tuning curves. In Fig. 4E for instance, there is a trough at about 6 kHz, which corresponds to the low-frequency edge of the rate-frequency function.
|
In seven cells, we examined the effects of stimulus amplitude on the
STRFs. Figure 7A shows STRFs
obtained from a typical cell with stimulus levels near threshold
(assigned a value of 0 dB) and up to 25 dB above threshold. In general,
at higher amplitudes, the STRFs broadened along their frequency axis.
Earlier studies have shown that frequency tuning, assessed using
broadband stimuli, widens with increasing amplitude (Evans
1977; Møller 1977
). Our observation is
consistent with this phenomenon. A slight decrease in latency was also
detected (Fig. 7B), a finding consistent with results
obtained in the cochlear nucleus of the rat (Møller
1981
). Finally, we also noted that the peaks and troughs in the
STRFs became more pronounced with increasing stimulus amplitudes
(Møller 1976
). This effect could be seen best by
superimposing cross sections of each STRF, which represent a PESE
envelope near the best frequency, for each stimulus amplitude
(Fig. 7B). Such increases in PESE envelope heights with
increases in stimulus amplitude would be expected if increases in
amplitude improved the locking of a cell's discharge to the envelope.
Alternatively, it may be due simply to the fact that spike number
increases with amplitude, which, in turn, increases the number of
envelope segments that are incorporated into the average. We therefore
re-computed the PESE envelopes for the different stimulus levels using
the number of spikes elicited by the quietest stimulus tested. For
instance, the cell shown in Fig. 7 fired 3,060 spikes at near-threshold
stimulus levels (0 dB). We randomly selected 3,060 spikes from the
spike trains elicited by the higher stimulus levels and re-computed the
PESE envelopes. If the estimate of the PESE envelope improved simply because of an increase in the number of samples averaged, then the PESE
envelopes estimated using only a subset of the spikes should become
less well-defined compared with those obtained using the full set of
spikes. This did not seem to be the case. In the seven cells thus
analyzed, the PESE envelope computed using a subset of spikes and that
computed from the full complement of spikes showed little difference.
This suggests that as stimulus level increases, envelope-locking
improves.
|
Relationship to the modulation transfer function
The response of neurons to dynamic stimuli is often assessed by estimating their MTF using SAM stimuli. In 23 cells, we examined the synchrony and phase of firing to SAM noise modulated at different frequencies. The MTFs thus obtained were compared with the PESE envelopes in the frequency and time domains.
The black lines in Fig. 8,
A-D, represent the magnitude (left) and phase
(right) spectra of the PESE envelopes of the four cells
shown in Fig. 3. The magnitude spectra were, in general, asymmetrically
band-passed with a steeper slope on the low-frequency side. The phase
spectra typically showed a linear relationship. The MTF assessed by
presenting SAM stimuli are superimposed (). As exemplified by the
cells in Fig. 8, the high-frequency cutoff, i.e., the modulation
frequency at which the MTFs declined to 50% of their maxima, was below
256 Hz in all 23 neurons tested.
|
The comparison was quantified in several ways. Figure 8, E and F, compares, for all 23 cells, the peaks and half-widths of the magnitude spectra derived using each technique. In general, the matches were quite close (correlation coefficients of 0.81 and 0.84, respectively) and the slopes describing the relationships between the SAM- and PESE-based estimates were not significantly different from one.
We also constructed the time-domain equivalent of the MTF or modulation
impulse response (MIR) for each of 19 cells using the phase and vector
strengths measured with SAM stimulation and compared these MIRs with
the PESE envelopes themselves. As the MIR is a measure of a cell's
response and the PESE-envelope is a stimulus subset, their time axes
are reversed relative to one another. Figure 8, A-D, right,
insets, shows this comparison for the four cells after having
reversed the MIR in time. As can be seen, they are well matched. Figure
8G plots the distribution of cross-correlation coefficients
between the MIRs and the PESE envelopes for each of the 19 cells
(median cross-correlation = 0.87). These data together suggest
that a cell's response to envelopes consisting of a wide range of
modulation frequencies could be predicted by its responses to envelopes
composed of individual sinusoidal components. Good agreement between
MTFs derived with SAM noises or tones and those derived by reverse
correlation with noise-modulated tones and noises have also been
reported for the rat cochlear nucleus and inferior colliculus
(Møller 1973; Møller and Rees 1986
).
Figure 9 shows the responses of one
neuron to SAM noises having different modulation depths and levels. The
MTFs in Fig. 9A show that when the level of the SAM noise
was increased, vector strengths increased overall, and a new peak
(arrow) emerged at a higher modulation frequency (* vs. ).
Similarly, as shown in Fig. 9B, increases in modulation
depth increased synchrony overall and emphasized higher modulation
frequencies (arrow). Analogous results were shown in Fig. 7, in which
the PESE envelopes became taller and sharper as the stimulus level was
increased.
|
Recently, Dent and colleagues assessed the behavioral sensitivity
of the European barn owl to SAM noise (Dent et al.
1999). These behavioral MTFs were found to be quite similar to
those measured in a variety of other birds and mammals with peak
sensitivity (lowest threshold) between 10 and 20 Hz. The modulation
depths at threshold ranged from 6% near 10 Hz to 22% near 500 Hz.
While direct comparison with our neuronal MTFs is difficult, it is
particularly interesting to note that at modulation depths closer to
the behavioral thresholds, the MTFs of a significant proportion of our
cells had peaks in the 10- to 20-Hz range.
Sufficiency of a linear model
The results above demonstrated a high degree of similarity between the MIR, a characteristic of the cell's response, and the PESE envelope, a stimulus subset. Given an impulse response, one can predict the response of a linear neural system to an arbitrary input. Conversely, given the average prespike stimulus, one can estimate the stimulus that must have caused the response. The similarity of the MIR and PESE envelope provides empirical evidence that the system is reversible, i.e., the same linear model can be used to predict the response given the stimulus or the stimulus given the response. Here, we use the PESE envelopes (STRF) to predict the response of the neuron to an arbitrary noise in an effort to determine the sufficiency of a linear model consisting of only a STRF. Since some features of the spike train (e.g., gap at ca. 27 ms in Fig. 1) are stimulus invariant, any prediction will necessarily be incomplete, and our purpose here was simply to quantify the contribution of the linear component.
The sufficiency of a model such as ours is typically addressed by
comparing the convolution of the stimulus and a kernel, such as the
STRF, with the cell's PSTH (Carney et al. 1999;
Eggermont et al. 1983a
,b
; Kowalski et al.
1996
; Møller and Rees 1986
; Theunissen et al. 2000
; Wickesberg et al. 1984
;
Yamada and Lewis 1999
). The PSTH represents the average
response of the neuron over many repetitions and allows one to
represent spike trains as continuous waveforms. If the STRF provides a
complete description of the cell's response to a given signal, then
the output of a linear model, i.e., a convolution of the stimulus with
the STRF, should be identical to the output of the neuron, i.e., the
PSTH (Rieke et al. 1993
).
For such a test, we first constructed a PSTH from a cell's response to the first 2 s of the 10-s broadband noise using all 20 or 30 repetitions. The cell's STRF was constructed from its responses to the remaining 8 s of the sound. Thus, construction of the STRF was independent from the PSTH. We then attempted to reconstruct the cell's response to the first 2 s of the stimulus by convolving the stimulus envelope with the STRF. Figure 10, A and B, allows a visual comparison of this reconstructed response with the PSTH for two cells. The cell whose comparison is shown in part in Fig. 10A provided one of the most accurate reconstructions, resulting in a covariance coefficient of 0.78 for the 200 ms portion shown and 0.64 for the entire 2 s (arrow a in Fig. 10C, see following text). The response and reconstruction of a more typical cell are shown in Fig. 10B (r = 0.55 for 200 ms and r = 0.49 for 2 s; arrow b in Fig. 10C).
|
In Fig. 10C, the covariance coefficients for the 32 cells that were tested with 20 or more repetitions of the noise are plotted against the maximum height of the peak in the STRF. Cells with a well-defined STRF, as indicated by a high peak value, gave coefficients (r) in the range of 0.42-0.68 (Fig. 10C). The scatter plot in Fig. 10C saturates, suggesting that the ability to predict a cell's response to an arbitrary noise is not limited by our ability to estimate the STRF. Other processes, perhaps nonlinear responses to envelope structure, or processes that generate stimulus-independent response features, may in fact limit the predictions. Thus we conclude that a linear model represented by the STRF of such cells can explain between about 18 and 46% of the variance (r2) in the spike train.
![]() |
DISCUSSION |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Our study was prompted by the observation that the discharge pattern evoked in the cells of the owl's ICc-ls by reproducible broadband noise is itself highly reproducible and was therefore a possible means of representing temporally complex stimuli. Our results show that this pattern was dependent, at least in part, on the envelope of the stimulus and was independent of the source's location.
The highly patterned activity evoked in the ICc by reproducible complex
stimuli is not unique to the owl. Hermes and colleagues reported that
cells in the midbrain of the grassfrog discharged with reproducible
patterns to reproducible noises (Hermes et al. 1981).
Carney and Yin demonstrated patterned activity evoked by reproducible
noise in the cat inferior colliculus (Carney and Yin
1989
). Carney and colleagues, working in the inferior
colliculus of the awake rabbit, have since demonstrated the
relationship between the pattern of these cells' spike trains and the
stimulus envelope (Carney et al. 1999
). Thus the
temporal coding of envelope structure may be a role that the inferior
colliculus assumes across species (Rose 1986
).
Our reconstructions of neuronal spike trains showed that the simple
linear model represented by the STRF can account for up to 46% of the
variance in the pattern of neuronal responses to noises. Others working
at the level of the midbrain in the frog, rat, and cat report similar
findings (Delgutte et al. 1998; Eggermont et al.
1983b
; Møller and Rees 1986
). Møller
and Rees (1986)
have further shown that predictions of neuronal
response can be improved by accounting for the possibility that
inferior collicular neurons may respond differently to amplitude
increments and decrements. Similar models applied to the 8th nerve and
cochlear nuclei generally yield better matches, suggesting that
neuronal responses are more linear at lower levels of the auditory
system (Carney et al. 1999
; Delgutte et al.
1998
; Wickesberg et al. 1984
; Yamada and
Lewis 1999
).
How effective is a STRF derived from responses to noise at predicting
the response to more structured sounds, such as species-specific vocalizations? At higher levels of the auditory system, neurons often
are poorly driven by white-noise stimuli and may require highly
specific and structured stimuli for optimal firing. Eggermont and
colleagues showed that in the midbrain of the grassfrog, a noise-based
STRF is a poor predictor of responses to the frog's vocalizations
(Eggermont et al. 1983a). More recently, Theunissen and
colleagues have extended the STRF-technique by using species-specific vocalizations as stimuli to obtain STRFs from neurons in the primary auditory area of the zebra finch forebrain (Theunissen et al. 2000
). STRFs derived from these vocalizations predict the
neuronal responses to vocalizations with greater accuracy than a STRF
based on tone-pips and with about the same accuracy as we report for owl ICc-ls neurons. Theunissen and colleagues suggested that nonlinear neuronal responses may be more fully characterized by a series of STRFs
obtained with ethologically important subsets of the stimulus space.
Neurons within the owl's ICc-ls respond strongly to white-noise
stimuli, such as those that we used to gather the present STRFs,
perhaps because they are similar to sounds generated by movement of
prey in the leaf litter (Payne 1971
). The responses of
these cells to owl vocalizations have not been described. However, if
responses to owl vocalizations could be predicted better by STRFs based
on vocalizations than by the present STRFs, it would suggest that the
nonlinearities that generate specificity to vocalizations are found as
early as the inferior colliculus.
Effects of stimulus location
Neurons within the owl's ICc-ls are nearly exclusively binaural and require specific values of ITD and ILD to fire optimally. The system that we have attempted to characterize therefore necessarily includes, first, the filtering properties of the owl's facial ruff and outer ear that generate these primary sound localization cues. The system also includes lower-level processing that underlies computation of ILD and ITD as well as the convergence of these binaural cues within the ICc-ls. The STRF cannot tell us about how each of these processes occur, only about the cumulative linear relationship between the ICc-ls spiking pattern and the sound input to the system.
There is evidence suggesting that the temporal pattern of neuronal
discharge codes for the position of a sound source. For example,
Delgutte and colleagues have noted that the firing pattern of some
cells in the cat inferior colliculus changes with the position of a
source in virtual auditory space (Delgutte et al. 1999).
In the cat auditory cortex, the firing pattern can be shown to contain
information regarding the location of the source (Middlebrooks et al. 1994
). By contrast, in the ICc-ls of the owl, the
pattern of spikes evoked by a source at different azimuths
was found to be indistinguishable statistically, provided that the
overall firing rate was similar. However, because the spiking patterns in the owl are related to the stimulus envelope, we examined the envelopes of reproducible gammatone noises filtered with the HRTFs of
the two ears associated with loci on the horizon from
90 to +90° at
eye level. The envelope from each azimuth was cross-correlated with
that obtained at the midline (0°). This analysis showed that the
cross-correlation coefficients deviate by less than 0.1 over the range
tested except at loci where the filtering properties of the ears create
prominent troughs in the amplitude spectra. It is therefore not
surprising that the firing patterns were found to be largely
independent of stimulus position, and without evidence to the contrary,
it is more parsimonious to assume that the firing pattern is better
attributed to the sound's temporal features instead of the source's
position in space. It is interesting to note that in the cat inferior
colliculus, spiking patterns elicited by noise remain constant when
ITD, a major sound localization cue, is varied without altering the
envelope (Carney and Yin 1989
).
Origin of temporal activity patterns
In the barn owl, the ICc-ls is the first stage at which
information from the pathways that compute ITD and ILD converge
(Mazer 1995). The nucleus laminaris, which is the first
site of binaural convergence in the time pathway, projects directly to
the core of the central nucleus of the inferior colliculus, which, in
turn, innervates the ICc-ls of the opposite side (Takahashi et
al. 1989
). The neurons of nucleus laminaris are known to be
phase locked to the carrier (Carr and Konishi 1990
), but
their ability to encode stimulus envelope has not been tested. Envelope
information could also be conveyed through the ILD pathway via neurons
of the nucleus ventralis lemnisci lateralis pars posterior (VLVp). The
nucleus angularis, a cochlear nucleus, is excited by the ipsilateral
ear and projects contralaterally to the ICc-ls (Sullivan and
Konishi 1984
; Takahashi and Konishi 1988
) and
the VLVp. The VLVp is the first site of binaural convergence and is
excited by a direct projection from the contralateral nucleus angularis
and inhibited by a commissural projection from the opposite VLVp
(Takahashi and Keller 1992
; Takahashi et al.
1995
). Evidence suggests that it projects bilaterally to the
ICc-ls (Adolphs 1993
). The response pattern of angularis
neurons was examined with tones only, which gave rise to a chopper
pattern, but responses to wider-band stimuli have not been described
(Sullivan 1985
). In VLVp, spike trains to repeated
presentation of a reproducible, band-limited noise (1-kHz bandwidth)
showed evidence of chopping, but in addition, consistent gaps in this
otherwise regular pattern were noted (Mogdans and Knudsen
1994
). Whether these gaps are related to the envelope is not
known, however. Neurons of the mammalian anteroventral cochlear nucleus
that are classified as choppers based on tonal stimuli respond to
reproducible noises with patterns that are irregular but are
nevertheless consistent from repetition to repetition (Carney et
al. 1999
). A computational model of the inferior colliculus that incorporates chopper cells of the anteroventral cochlear nucleus
as inputs is able to reproduce the firing of collicular neurons to SAM
stimuli (Hewitt and Meddis 1994
). The model's response to stimuli with more complex envelopes was not investigated.
Perceptual relevance
Temporal modulations in amplitude are highly salient cues.
Smearing of envelope modulations by low-pass filtering degrades speech
comprehension in noise (Drullman et al. 1994).
Conversely speech recognition is well preserved when
frequency-specific information is largely removed but amplitude
modulation patterns are left intact (Shannon et al.
1995
). Such results suggest that time-varying amplitude
information plays a major role in speech perception. The song system of
the zebra finch provides a parallel at the neuronal level. In the
telencephalic nucleus, HVc, the selectivity of neurons for a bird's
own song are more resistant to degradation of spectral information than
to degradation of temporal information (Theunissen and Doupe
1998
).
Although the temporal patterns described in the inferior colliculus are
robust, it is natural to ask whether they are meaningful to the
organism. It is not known whether the owl can discriminate the noises
that the ICc-ls cells can differentially encode. There is evidence that
human listeners can discriminate reproducible noise samples
(Coble and Robinson 1992; Hanna 1984
).
This discriminability diminishes as the correlation between the noise
samples being compared is increased (Hanna 1984
). Such a
discrimination could be made on the basis of the carrier or envelope.
Studies with narrowband noises showed that performance was similar with
low center frequencies (225-275 Hz) or high center frequencies
(2,975-3,025 Hz), suggesting the use of envelope cues, which are
equally available in both high- and low-frequency stimuli (Hanna
1984
). Were the discrimination based on the carrier waveform,
the performance would be expected to be better for low-frequency
stimuli for which phase-locking is more robust (Rose et al.
1966
).
More recently, Coble and Robinson demonstrated that the ability of
human listeners to discriminate between two reproducible band-passed
noises was poorest when the two waveforms differed only at the
beginning (Coble and Robinson 1992). Citing the model of
Braida and colleagues (Braida et al. 1984
), they
suggested that the auditory system differentially weights the
information within various segments of the stimuli. The report of
Coble and Robinson (1992)
is particularly
interesting given the results of the drift test (Fig. 1), which showed
that the temporal pattern in the initial ca. 10 ms of a spike train
remains fixed regardless of the structure of the stimulus used. It is
possible that in the human auditory system too, the internal
representation of the initial stimulus segment is independent of the
stimulus's temporal structure. The situation, however, is probably
more complex because the duration over which the two noises must differ
represented a constant proportion of the total stimulus. In
other words, as the stimulus duration increased, the two noises needed
to differ over longer durations to achieve the same discriminability
(Coble and Robinson 1992
). It seems highly unlikely that
the time period during which the spiking pattern in the ICc remains
fixed scales with the total stimulus duration. Nevertheless our current
results predict that the owl would have difficulties in discriminating noise bursts that differed only in their initial 10 ms.
Our results and those of others, suggest that the ICc of the barn owl
functions much like a spatially tuned dynamic spectrum analyzer in
which the firing patterns represent the time-varying amplitude of
stimuli from a given spatial location. Given the importance of temporal
cues to the perception of complex behaviorally relevant signals, such a
view compels us to attend to spike-pattern as much as to spike rate
when studying hearing in naturalistic environments. For instance, the
question of how speech is masked by noise must be addressed in terms
not only of whether or not a neuron can signal the presence of a target
by its average firing rate but also in terms of the fidelity with which
the temporal pattern evoked by a target is preserved in the noise
(Bodnar and Bass 1999; Caird et al. 1991
;
Delgutte and Kiang 1984
; Narins and Wagner
1989
; Simmons et al. 1992
). An understanding of
the temporal code in the ICc is thus a step toward the wider goal of
understanding how the nervous system extracts useful information in the
presence of clutter and uncertainties in the physical signal.
![]() |
ACKNOWLEDGMENTS |
---|
We thank Dr. Klaus Hartung for help during all stages of this study.
This work was supported by National Institute on Deafness and Other Communication Disorders Grant DC-03925 and National Science Foundation Learning and Intelligent Systems Initiative Grant CMS9720334.
![]() |
FOOTNOTES |
---|
Address for reprint requests: C. H. Keller (E-mail: keller{at}uoneuro.uoregon.edu).
1 The average AC fluctuation is linearly related to the fourth moment of the probability density function of signal amplitude. To match the modulation of the SAMs and noises precisely, we would need to match the AC fluctuation of the noise at each modulation frequency. The computation of such a spectrum would require a model with parameters that could only be guessed at and was therefore not implemented.
Received 28 March 2000; accepted in final form 28 July 2000.
![]() |
REFERENCES |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|