Representation of Temporal Features of Complex Sounds by the Discharge Patterns of Neurons in the Owl's Inferior Colliculus

Clifford H. Keller and Terry T. Takahashi

Institute of Neuroscience, University of Oregon, Eugene, Oregon 97403


    ABSTRACT
TOP
ABSTRACT
INTRODUCTION
METHODS
RESULTS
DISCUSSION
REFERENCES

Keller, Clifford H. and Terry T. Takahashi. Representation of Temporal Features of Complex Sounds by the Discharge Patterns of Neurons in the Owl's Inferior Colliculus. J. Neurophysiol. 84: 2638-2650, 2000. The spiking pattern evoked in cells of the owl's inferior colliculus by repeated presentation of the same broadband noise was found to be highly reproducible and synchronized with the temporal features of the noise stimulus. The pattern remained largely unchanged when the stimulus was presented from spatial loci that evoke similar average firing rates. To better understand this patterning, we computed the pre-event stimulus ensemble (PESE)---the average of the stimuli that preceded each spike. Computing the PESE by averaging the pressure waveforms produced a noisy, featureless trace, suggesting that the patterning was not synchronized to a particular waveform in the fine structure. By contrast, computing the PESE by averaging the stimulus envelope revealed an average envelope waveform, the "PESE envelope," typically having a peak preceded by a trough. Increasing the overall stimulus level produced PESE envelopes with higher amplitudes, suggesting a decrease in the jitter of the cell's response. The effect of carrier frequency on the PESE envelope was investigated by obtaining a cell's response to broadband noise and either estimating the PESE envelope for each spectral band or by computing a spectrogram of the stimulus prior to each spike. Either method yielded the cell's PESE spectrogram, a plot of the average amplitude of each carrier-frequency component at various pre-spike times. PESE spectrograms revealed surfaces with peaks and troughs at certain frequencies and pre-spike times. These features are collectively called the spectrotemporal receptive field (STRF). The shape of the STRF showed that in many cases, the carrier frequency can affect the PESE envelope. The modulation transfer function (MTF), which describes a cell's ability to respond to time-varying amplitudes, was estimated with sinusoidally amplitude-modulated (SAM) noises. Comparison of the PESE envelope with the MTF in the time and frequency domains showed that the two were closely matched, suggesting that a cell's response to SAM stimuli is largely predictable from its response to a noise-modulated carrier. The STRF is considered to be a model of the linear component of a system's response to dynamic stimuli. Using the STRF, we estimated the degree to which we could predict a cell's response to an arbitrary broadband noise by comparing the convolution of the STRF and the envelope of the noise with the cell's post-stimulus time histogram to the same noise. The STRF explained 18-46% of the variance of a cell's response to broadband noise.


    INTRODUCTION
TOP
ABSTRACT
INTRODUCTION
METHODS
RESULTS
DISCUSSION
REFERENCES

In nature, most sounds have spectra that vary over time, and auditory systems must encode and process these dynamic stimuli. In redwing blackbirds, for example, the temporal structure of song plays a large role in the discrimination and recognition of different species and subspecies (Brenowitz 1983). In echolocating bats, amplitude and frequency modulations of the returned echo encode the wing-beat of insects and serve as a basis for identification of prey (Schnitzler 1987). Evidence suggests that speech comprehension is possible with degraded spectral information as long as temporal information is left intact (Drullman et al. 1994; Shannon et al. 1995; Wright et al. 1997). Thus, coding of temporal auditory information is as fundamental to auditory function as is the coding of spatial location and spectral characteristics.

Textbook accounts of how the auditory system encodes the spectrum of sounds usually refer to tonotopic maps, wherein the energy at a particular frequency band is represented by the mean firing rates of frequency-tuned neurons within this tonotopic array. This, of course, is a static account. Tonotopic maps, found throughout the auditory system, must continuously update their output as the amplitude in the various spectral bands change. The activity pattern of a single cell in the tonotopic map would therefore be expected to reflect the stimulus' temporal structure.

The present study examines this process in the lateral shell of the central nucleus of the inferior colliculus (ICc-ls) of the barn owl (Tyto alba), which, like that in mammals, is a nearly obligatory synaptic station for ascending auditory information. In the owl, ICc-ls cells are not only tuned to stimulus frequency but are also sensitive to source location (Knudsen and Konishi 1978; Mazer 1995; Wagner et al. 1987). As a result, the cells can convey the spectral characteristics of an auditory event at a particular location in space. The natural auditory environment, however, is cluttered with echoes and sounds from multiple sources, thereby complicating the process of segregating and analyzing the spectra of individual sources. Although the neural processing of spatial information in simple and complex acoustical environments has been described in the owl, little is known of how dynamic spectral information is represented with or without clutter (Keller and Takahashi 1996a,b; Takahashi and Keller 1994). The present study takes the first step, describing the relationship between the spiking pattern of ICc cells and the temporal structure of a single stimulus in a cell's spatial receptive field (RF).


    METHODS
TOP
ABSTRACT
INTRODUCTION
METHODS
RESULTS
DISCUSSION
REFERENCES

All experiments have been approved by the Institutional Animal Care and Use Committee of the University of Oregon.

General procedures

Results are based on recordings from 10 adult barn owls of both sexes. A barn owl was anesthetized by intramuscular injections of ketamine (0.05-0.1 ml/h; Vetalar 100 mg/ml, Parke-Davis) and diazepam (0.025-0.05 ml/h; Diazepam C-IV 5 mg/ml, LyphoMed) and given a prophylactic dosage of ampicillin (0.2 ml intramuscular; Polyflex, 250 mg/ml Aveco). The owl was placed into a stereotaxic device that held its head tilted downwardly at a 45° angle. The scalp was infused with a local anesthetic (2% lidocaine HCl, Xylocaine, Astra Pharmaceuticals) and cut. A hole about 0.49 cm2 was opened in the skull through which a microelectrode was inserted. Silicon grease (Dow-Corning) was applied to the dural surface to prevent desiccation. Body temperature and heart rate were monitored, and the bird was warmed with a circulating-water heating pad. At the end of a recording session, the craniotomy was closed with dental cement (Vitrabond, 3M), and the scalp was sutured and covered with a topical antibacterial cream (Bacitracin-Neomycin-Polymyxin Ointment, E. Fougerra). Typically, a recording session lasted about 12-15 h. Before returning the bird to a recovery cage, it was given 0.2 ml of dexamethasone and 0.2 ml of vitamin B complex intramuscularly and 10-20 ml of 5% dextrose in lactated Ringer solution intravenously. The recovery cage was maintained at 34°C. The bird's activity was monitored until it fully recovered. A bird was typically used for four sessions with a minimum of 7 days between sessions.

All recordings were carried out in a sound-isolating booth (Industrial Acoustics, 1.8 × 1.8 × 1.8 m). For most tests, sounds were presented over earphones (Sony MDR with custom-built cones that fit snugly into the outer ear canal) after compensating for the filtering properties of the earphones in situ and filtering the signals with the individual bird's head-related transfer functions (HRTFs; Tucker Davis Technologies PD1). We showed earlier that this virtual auditory-space method effectively simulates a free-field source (Keller et al. 1998). For testing how a cell's spiking pattern depended on the sound's spatial location (see Fig. 2 in RESULTS), we presented sounds from a mobile speaker (Alpine 6020AX) located 90 cm from the center of the head.

In the following text, positive azimuths denote stimuli presented from the owl's right and positive elevations denote stimuli presented from above eye level.

Stimuli

Our stimuli consisted of tone bursts, narrowband noise bursts, broadband noise bursts, and sinusoidally amplitude-modulated (SAM) noise bursts. All stimuli were synthesized digitally at a sampling rate of 30,000 points/s. Individual narrowband and broadband noise bursts were stored in computer mass-storage media and could therefore be repeated. We refer to these as "reproducible noises." In addition to their specific envelopes (e.g., SAM), they had linear onset and offset ramps lasting 5 ms. Stimuli were scaled to 20-30 dB above neuronal threshold with programmable attenuators (Tucker Davis Technologies PA4), and amplified with a stereophonic amplifier (McIntosh 754 or Tucker Davis Technologies HB6). The stimuli had the following characteristics.

BROADBAND NOISE. Broadband noises had amplitude spectra that deviated by less than 1 dB from the mean value between 2 and 11 kHz and had random phase spectra. The cutoffs were such that between 1.5 and 2 kHz, the amplitude declined by 40 dB and between 11 and 11.5 kHz, the amplitude declined by 60 dB. Our sample of cells had best frequencies between 3.5 and 9 kHz.

NARROWBAND NOISE. Narrowband noise was constructed by filtering a broadband noise with a gammatone filter matched to the tuning curves of the owl's 8th nerve fibers (Köppl 1997). For a given cell, we presented 10 repetitions each of 10 different 10-s-long noise bursts (100 total stimuli) that had been filtered by a gammatone-filter centered near the neuron's best frequency (BF, see below). The envelopes of each of the 10 different gammatone noises presented to a cell were computed using the Hilbert transform (MATLAB software package; The Mathworks, v 5.2). The magnitude spectra of the envelopes were computed and averaged. These average envelope spectra were broadly low-passed, declining linearly by 6 dB over a 2- to 500-Hz range of modulation frequencies. These spectra are superimposed on the modulation transfer functions (MTF) of some typical cells in Figs. 8, A-D, and 9 (gray lines). As exemplified in this figure, the envelope spectrum was wider than the MTFs of all our cells.

SAM NOISE. SAM noise was constructed by multiplying the broadband noise, n(t), by a sinusoidal envelope: SAMnoise(t) = n(t)(1 + A sin 2pi fet). The modulation frequency is fe. The depth of modulation is determined by the constant A, which varies from 0 to 1.0. The SAM noises were presented at a level roughly 20 dB above neuronal threshold and a modulation depth of 50% (A = 0.5). The depth chosen approximates the average modulation depth (or more precisely the average AC fluctuation1) of the noise stimuli used for the estimation of the modulation impulse response (MIR). In five cells, additional depths and levels were tested (see RESULTS).

TONE BURSTS. Tone bursts were constructed by multiplying a 100-ms pure tone produced by a waveform generator (Hewlett-Packard 3245A Universal Source) with the trapezoidal envelope determined by the onset and offset ramps.

Initial characterization of neurons

Action potentials from single units were isolated with epoxy-insulated tungsten electrodes (Frederick Haer, 10 MOmega ), and the time of their occurrence relative to stimulus onset was written to computer mass-storage media at a resolution of 10 µs (M110, Modular Instruments). The spikes evoked by a single stimulus presentation, or "spike train," thus consisted of a series of event times. The set of spike trains evoked by repetitions of a given stimulus were aligned relative to stimulus onset and summed to generate poststimulus time histograms (PSTHs).

On isolating a cell, we first estimated the cell's firing threshold by monitoring the oscilloscope trace and audio-monitor and varying the average of the sound-pressure levels in the two ears. (These cells respond only poorly to monaural stimulation.) The average binaural level was set to a value 20-30 dB above threshold, and we ascertained that the cell was selective for both interaural level difference (ILD) and time difference (ITD), the two cues for sound localization in the owl. Neurons in the adjacent subdivision of the central inferior collicular nucleus, the ICc-core, are sensitive to ITD but not to ILD, and these tests allowed us to bypass cells in the ICc-core. On confirming the binaural selectivity of the cell, we identified its spatial receptive field (spatial RF) by counting the spikes evoked by a 100-ms broadband noise burst (below) presented from different loci in the frontal hemisphere. All cells had a RF within a 20° radius of the center of gaze (0° azimuth; 0° elevation), and some had additional RFs located more eccentrically (see RESULTS for details regarding multiple RFs). The cell's frequency response was then assessed from the center of the spatial RF closest to the midline by measuring the spike rates elicited by 100-ms tone bursts ranging from 2 to10 kHz in steps of 200 Hz or by narrowband noises with center frequencies spaced 1/30th of an octave apart from 2 to 10 kHz. In the following text, we refer to plots of spike rate as a function of the frequency of the tone or the center frequency of the narrowband noise as "rate-frequency" curves and the frequency evoking the maximal rate as the "best frequency."

Data analyses

Data were analyzed using the MATLAB software package.

DRIFTING NOISE PARADIGM. The relationship between neuronal firing patterns and the temporal structure of stimuli was demonstrated qualitatively by a "drifting noise" paradigm illustrated schematically in Fig. 1, top. Noise bursts were synthesized by replacing different 40-ms segments of a reproducible broadband noise burst, N2(t), with a different and independent reproducible broadband noise burst, N1(t). This splicing procedure generated a series of 100-ms noise bursts in which the onset of N1(t) varied, or "drifted," relative to that of N2(t) in 2-ms steps from 0 to 38 ms. Each stimulus was presented to a cell 20 times, and the resulting spike trains were aligned relative to the onset of the 100-ms noise burst.



View larger version (69K):
[in this window]
[in a new window]
 
Fig. 1. Drifting-noise test. Top: a series of noise bursts was synthesized by replacing different 40-ms segments of a reproducible broadband noise burst, N2(t), with a 40-ms segment of a different and independent reproducible broadband noise burst, N1(t), at varying temporal positions. This splicing procedure generated a series of 100-ms noise bursts in which the temporal position of the onset of N1(t) varied, or "drifted," in 2-ms steps from 0 to 38 ms. Bottom: each stimulus was presented to a cell 20 times, and the resulting spike trains (rows of dots) were aligned relative to the onset of N2(t). Brackets indicate a block of 20 trials (unit 829BU).

STATISTICAL ANALYSIS OF SPIKE PATTERNS FROM DIFFERENT SPATIAL LOCI. In the ICc-ls, some neurons have more than one RF from which statistically equivalent (by Student's t-test) spike rates can be evoked. We placed a sound source at azimuths ranging from -85° (left) to +85° (right) along a cell's best elevation and recorded the spike train evoked by a 100-ms broadband noise burst. The stimuli were repeated 10-20 times at each azimuth. Each spike train from the RF closest to the midline (central RF) was cross-correlated with a PSTH built from the remaining spike trains at the central RF, thus giving us N cross-correlation coefficients, where N is the number of stimulus repetitions. The variance of these correlation coefficients served as a measure of the variability in the spike pattern evoked by repetitions of a reproducible noise burst from a single location. Each of the spike trains from eccentric RFs were then cross-correlated with the PSTH built from all spike trains evoked from the central RF. This also yielded N correlation coefficients, which were averaged and compared using the Student's t-test to the mean of the coefficients obtained at the central RF.

COMPUTATION OF THE PRE-EVENT STIMULUS ENSEMBLE. A spike in an auditory neuron signifies the occurrence of a specific feature in the ensemble of stimuli received by the subject. The preevent stimulus ensemble (PESE) is the average stimulus that precedes each spike. The stimulus ensemble and the PESE can be defined in a variety of ways. Typically, the pressure waveform before each spike is averaged, but it is also possible to use some transformation of the stimulus ensemble, such as its spectrum or envelope (Eggermont et al. 1983c).

To compute the PESE with respect to the carrier wave of a sound, we presented a 5- or 10-s broadband-noise burst (10-30 repetitions) and averaged a 136.5-ms segment of the pressure waveform preceding each spike. If a cell was locked to the carrier of the noise stimulus, a waveform having a more or less consistent shape should precede each spike (deBoer and Kuyper 1968), and their average would be the PESE. If, on the other hand, a cell's discharge does not lock to the carrier wave, the PESE would be featureless.

To characterize the neurons with respect to the time-varying stimulus amplitude, we computed the average envelope that preceded each spike, thus deriving the PESE envelope (Eggermont et al. 1983c; Møller 1973). This PESE envelope was obtained using either broadband or narrowband noises. The broadband noise bursts were 5 or 10 s long, and during off-line analysis, they were filtered by the neuron's tuning curve to isolate the frequency band to which the cell was presumably responding. We tested some cells with both broadband and narrowband stimuli. Whatever the stimulus type, the envelopes were extracted using the Hilbert transform (MATLAB v. 5.2) and 136.5-ms segments of the envelope preceding each spike were averaged to obtain the PESE envelope. The mean DC value of the PESE was subtracted for visualization and display.

To study the effect of the carrier frequency on the shape of the PESE envelope, we computed the dynamic spectrum of the stimulus and averaged the spectrograms preceding each spike to obtain the PESE spectrogram. The analysis was carried out in each of two ways: In the first method (Hermes et al. 1981), a running Fourier amplitude spectrum was computed over the 20-ms segment of the noise signal preceding each spike, and these prespike spectrograms were averaged. Fourier spectra were computed within a 2.13-ms Hanning window, advanced in 1.07-ms increments. In the second method, we band-pass-filtered the broadband noise using a gammatone filter and computed PESE envelopes (as above) from the output of each filter band (Aertsen and Johannesma 1980; Theunissen et al. 2000). The PESE envelopes computed from the outputs of 81 such filters (evenly spaced between 2 and 10 kHz) were aligned relative to spike-occurrence times. Each method yielded a surface plot of the amplitude (normalized to the maximum) of the average prespike signal at any given carrier frequency and prespike time. The mean DC value was subtracted from this plot. The portions of the PESE spectrogram that has more or less energy than the mean is called the spectrotemporal receptive field (STRF) (Aertsen et al. 1980).

ANALYSIS OF RESPONSES TO SAM STIMULI. The neurons' responses to SAM noise were plotted as period histograms relative to the modulation period, and the magnitude (vector strength) and phase angle of the Rayleigh vector were computed (Goldberg and Brown 1969). The magnitude and phase, plotted as a function of modulation frequency, are the modulation transfer function (MTF). The modulation impulse response (MIR), which is the time-domain equivalent of the MTF, was estimated by summing 500 sinusoids with frequencies from 1 to 500 Hz, having the magnitude and phase angles obtained from the MTF. Magnitude and phase values that were not measured directly were obtained by linear interpolation.

RECONSTRUCTION OF THE PSTH FROM THE STRF. The STRF represents a linear model of how a neural system transforms the time-varying stimulus into spike trains. To judge the sufficiency of this linear model, we compared the cell's PSTH with a "reconstructed" response based on a convolution of the STRF with the stimulus envelope. We computed the STRF by the second method (see above) resulting in PESE envelopes for each of 81 different frequency bands between 2 and 10 kHz. For each frequency band, we filtered the stimulus with a gammatone filter and computed the envelope of the filtered stimulus using the Hilbert transform. Each envelope was then convolved with the PESE envelope. The results were summed across frequency bands and the mean value was subtracted. The PSTH was compiled at the same sampling rate as the stimulus and then smoothed using a moving 2-ms (60-point) boxcar window. The normalized PSTH was compared with the normalized reconstruction by calculating the maximum of their normalized covariance functions.


    RESULTS
TOP
ABSTRACT
INTRODUCTION
METHODS
RESULTS
DISCUSSION
REFERENCES

General neuronal properties

Neurons in our sample had best frequencies ranging from 3.5 to 9 kHz and a spatial RF within a 20° radius of the center of gaze. In addition to the centrally located RF, the neurons also had eccentric RFs. Typically, sources placed at these eccentric RFs evoked a discharge rate that was about 60-100% of the rate evoked from the central RF. These multiple RFs, which are spaced apart by distances inversely proportional to a cell's best frequency, are due to the fact that ITD, the cue for sound-source azimuth, is computed by a binaural cross-correlation-like mechanism operating on narrowband inputs (see Carr and Konishi 1990; Takahashi and Konishi 1986 for details). These characteristics are consistent with those found in a survey of the ICc-ls (Mazer 1995).

Stimulus-related discharge patterns

The temporal pattern of discharge in cells of the owl's ICc-ls to the repeated presentation of a reproducible noise burst has been shown to be highly reproducible itself (Takahashi and Keller 1996). An example is shown in the dot raster display of Fig. 1, in which each block of 20 rows (brackets) represents the spike trains elicited by the repeated presentation of a reproducible noise burst from the cell's central RF. Because the spike trains were aligned relative to stimulus onset, the consistency of the firing pattern is manifested within a block as vertically aligned columns of dots.

Such consistency could, of course, be an intrinsic property of the neuron, having little to do with stimulus features. Figure 1 illustrates the results of a drift test, which suggests that this is not the case. The onset of N1(t) drifted from 0 to 38 ms within N2(t). As the drifting segment, N1(t), replaced successively later portions of N2(t), identifiable columns of spikes were found correspondingly later, resulting in a diagonal pattern (arrows, Fig. 1). Evidence of this relationship between spike pattern and stimulus temporal structure was observed in all of 44 neurons tested with this stimulus.

Inspection of the dot raster displays shows, however, that the first ca. 10 ms of the spiking pattern is independent of stimulus structure. As can be seen in Fig. 1, the response pattern is the same in the first four blocks despite the fact that the drifting stimulus segment, N1(t), is at a different temporal position in each of these four blocks. Occasionally we noted that later features of the dot-raster display were also independent of the drifting noise segment. In Fig. 1, for instance, the gap at about 27 ms after stimulus onset is relatively constant despite the fact that the N1(t) drifts through this time period. Thus this gap appears to be locked to the stimulus onset and not to any other feature of the stimulus itself.

Spatial dependency of spike-train patterns

In the ICc-ls, neurons often have multiple spatial RFs (Mazer 1995), as shown in Fig. 2. Each bracket on the left demarcates a block of spike trains elicited by 11 repetitions of a reproducible 100-ms broadband noise burst from a speaker at the azimuths shown to the left. The cell shown responded maximally when the speaker was placed about 5° to the left (contralateral to the cell) of the bird's midline at eye-level (-5° azimuth; 0° elevation). In addition, the cell had eccentric RFs at +55 and -65°. Stimulus presentation from the central RF (at -5°) gave rise to vertically aligned rows of spikes. At the edges of this central RF, the firing rate declined, but many of the features appeared to be preserved, at least, as judged by visual inspection. Similarly at the eccentric RFs, many of the temporal features seen at the center of the main RF appeared to be preserved.



View larger version (20K):
[in this window]
[in a new window]
 
Fig. 2. Dot raster displays of spike trains evoked from different locations. Brackets demarcate a block of spike trains evoked by 11 repetitions of a reproducible broadband noise presented from azimuths shown to the left. Time relative to stimulus onset is shown below. The cell, found in the right inferior colliculus, has a central receptive field (RF) located at -5° as well as eccentric RFs located at +55 and -65°. Neither the spike rate nor the spike pattern evoked from the RFs at -5 and +55° are statistically different (see text).

The consistency of the spiking pattern across space was quantified (see METHODS) in eight cells in which identical stimuli in the central and at least one of the eccentric RFs evoked discharge rates that were not statistically distinguishable (P > 0.05, Student's t-test). The RFs at -5 and +55° shown in Fig. 2 are examples of RFs where the firing rates were not distinguishable. The firing patterns evoked from these loci were not significantly more different than were spike trains evoked by repeated presentation at one spatial location. In fact, only one cell of the eight tested responded with spike patterns that differed at two equally effective RFs. For this cell, the difference in the spiking patterns evoked from the two RFs, although significant statistically (P < 0.05; Student's t-test), was small. Our preliminary conclusion, based on this small sample, is that spike patterns do not change significantly more when sources are placed at different loci in space than when they are repeated from the same location.

Stimulus features encoded by the temporal pattern

The spike trains may be entrained to either the carrier or envelope of the stimulus or to both. The possibility that the cell's firing pattern was determined by the carrier of a broadband stimulus was examined by reverse-correlating spike trains to the pressure waveform in 82 neurons. None of the neurons examined showed evidence of being driven by a characteristic waveform in the carrier. The traces were all noisy and devoid of large-scale peaks, troughs, or oscillations of the type described in auditory-nerve fibers or cochlear-nuclei neurons that are able to phase-lock to the carrier. The lowest best frequency in our sample was about 3.5 kHz, and we cannot rule out that the spiking patterns of cells with lower best frequencies would be correlated with the carrier wave. In the ICc-ls, however, cells with best frequencies less than 4 kHz are not common (Mazer 1995).

We next examined the possibility that the spiking pattern of the cells was related to the envelope of a stimulus in 102 cells (63 stimulated with broadband noise; 20 with narrowband noise; and 19 with both). In each case, an average envelope waveform consistently preceded each spike. This average envelope waveform is referred to as the PESE envelope.

Four examples of PESE envelopes are shown in Fig. 3, A-D. These examples show a range of shapes, some of which are roughly monophasic (e.g., Fig. 3A) and others that are more biphasic (e.g., Fig. 3B). The biphasic PESE envelopes suggest that many of the neurons have a high probability of discharge when a peak in the envelope follows a trough. For such a cell, the low-amplitude portion of an envelope would decrease the probability of discharge, which, in turn, would decrease the likelihood that the cell was in a refractory period when a suitable envelope peak appeared. Thus the spikes that do occur would be associated, more often than not, with a trough-peak sequence.



View larger version (15K):
[in this window]
[in a new window]
 
Fig. 3. The pre-event stimulus ensemble (PESE) envelopes of 4 representative cells. The abscissa represents the time before each spike, which occurs at 0 ms. A: unit 707AG PESE envelope estimate based on 4,828 spikes. B: unit 880AB, 21,529 spikes. C: unit 888CD, 24,275 spikes. D: unit 888EE, 17,906 spikes.

Spectrotemporal receptive fields

The PESE envelopes described above were computed for a narrowband carrier centered on a cell's best frequency. How is the PESE envelope affected by the carrier frequency? A sound spectrogram describes the distribution of energy in various frequency bands over time. If one were to compute the average spectrogram preceding all spikes, we would obtain the PESE spectrogram, which gives us an idea of how a cell might respond to modulations of the amplitude of a variety of carrier frequencies (Aertsen and Johannesma 1980; Hermes et al. 1981). Features of the PESE spectrogram that stand out from the mean level are referred to as the spectrotemporal receptive field (STRF) (Aertsen et al. 1980). In theory, a PESE envelope can therefore be thought of as a cross section of the STRF parallel to the time axis (Aertsen and Johannesma 1980; Eggermont et al. 1983b).

PESE spectrograms were computed from the responses evoked by 5- or 10-s bursts of broadband noise (10-30 repetitions) in 48 cells. PESE spectrograms estimated in the two ways described in METHODS are shown in Fig. 4, A and B, for a typical cell. The time preceding a spike is represented along the vertical axis, the frequency along the horizontal axis, and the magnitude of the signal on a pseudo-color scale. The time of spike occurrence is represented by the 0-ms mark at the top of the vertical axis. It is apparent that the two methods arrive at similar surfaces. Both showed that energy was consistently present at just above 7,000 Hz about 7 ms before a spike. The STRFs also had a region with relatively less energy than the mean in the 7,000-Hz band about 2 ms before the peak (about 9 ms before a spike).



View larger version (81K):
[in this window]
[in a new window]
 
Fig. 4. PESE spectrograms. PESE spectrograms represent the prespike amplitude (pseudo-color scale) minus the mean DC level as a function of the frequency of spectral components in the signal (horizontal axis) and the time before the spikes (vertical axis). In the color scale, red represents amplitudes above the mean DC level, and blue represents amplitudes below the mean DC level. A: PESE spectrogram of a neuron obtained by computing a running Fourier amplitude spectrum within a 2.13-ms Hanning window advanced in 1.07-ms increments. B: PESE spectrogram of the same neuron obtained by computing the PESE envelope of 81 spectral bands and aligning them relative to the spikes. [Unit 888AP spectrotemporal receptive field (STRF) estimates based on 14,198 spikes.] C: PESE envelopes computed using narrowband and broadband stimuli. The blue trace is a PESE envelope computed from the presentation of a narrowband noise centered on the cell's best frequency (BF). (PESE envelope estimate based on 14,517 spikes.) The red trace is a cross-section through the peak of the STRF shown in B along the time axis at the BF.

The PESE envelope computed using a narrowband noise at a cell's best frequency should be similar to a vertical cross section of a STRF (computed by either method) provided that a cell responds similarly to broadband and narrowband noises. We explored the effects of the stimulus bandwidth on the PESE envelope in 19 cells, which were tested using both broadband and narrowband stimuli. Figure 4C shows the results from the typical cell, and exemplifies the similarity between a PESE envelope obtained with a narrowband noise (thick trace) and a cross section through the STRF (thin trace), which is equivalent to a PESE envelope obtained with a broadband noise. Coefficients of cross-correlation between the two waveforms for each of the 19 cells had a median of 0.92.

Figure 5 shows an assortment of PESE spectrograms, all of which were computed from the output of band-pass filters (2nd method). Qualitatively, the shapes of the STRFs varied along a continuum. Some STRFs had relatively narrow, well-defined peaks occurring about 6 ms before a spike, such as that in Fig. 5C. Others, like those in Fig. 5, B, D, and F, were more complex. It is clear from an inspection of these STRFs that the shape of the PESE envelope, i.e., a vertical slice through the STRF, can depend on the carrier frequency. For instance, in Fig. 5E, the PESE envelope estimated from the output of the filter centered at 6 kHz would show a deep and wide trough preceding a small peak, whereas at about 6.8 kHz, the PESE envelope would display a single peak and no trough.



View larger version (89K):
[in this window]
[in a new window]
 
Fig. 5. Spectrotemporal receptive fields from 6 representative cells. The STRFs were obtained by computing the PESE envelopes from narrowband filters. All conventions are per Fig. 3. Rate-frequency curves obtained by presentation of narrowband noise bursts are superimposed (---). A: unit 888AG STRF estimate based on 10,921 spikes. B: unit 707AE, 9,851 spikes. C: unit 884CH, 3,242 spikes. D: unit 707BI, 9,970 spikes. E: unit 888ED, 4,032 spikes. F: unit 888AF, 6,215 spikes.

The STRFs shown in Fig. 5, A-F, display other interesting features. Standard rate-frequency curves, assessed by counting the spikes elicited by 100-ms narrowband noise bursts or tones are superimposed, and for cells in Fig. 5, A and C-F, their shapes correlate well with the cross sections of the STRF parallel to the frequency axis (taken at the latency of the STRF maximum). We obtained both types of data from 33 cells. For most of these cells, the shape of the rate-frequency curve and a horizontal cross section through the STRF are quite similar. Figure 6A shows this comparison for each of the remaining 27 cells not shown in Fig. 5. The frequency giving the peak response in the rate-frequency test correlates strongly with the peak frequency of the STRF for all cells (Fig. 6B; r = 0.88, n = 33). The tuning curves obtained by the two methods did not always agree, however. For example, in Fig. 5B, the STRF had a broad, irregular peak between 5 and 9 kHz, whereas the frequency rate curve obtained with narrowband noises had a complex profile with a central region having fewer spikes. In other cases, the STRFs displayed frequencies of relative excitation and inhibition that could not be discerned from inspection of the rate-frequency tuning curves. In Fig. 4E for instance, there is a trough at about 6 kHz, which corresponds to the low-frequency edge of the rate-frequency function.



View larger version (29K):
[in this window]
[in a new window]
 
Fig. 6. Frequency tuning obtained by presentation of narrowband noise bursts or tones compared with estimates obtained from STRFs. A: rate-frequency curves (thick lines) and horizontal cross sections of STRFs at the latency of the STRF peak (thin lines) are presented for 27 cells. B: peak frequency obtained from the STRF (ordinate) is plotted for each cell against the peak frequency obtained by presentation of narrowband noise bursts or tones (abscissa). For 6 cells, the corresponding Fig. 5 subplot is identified by a letter above and to the left of the data point. Line of 1-to-1 correspondence is also plotted.

In seven cells, we examined the effects of stimulus amplitude on the STRFs. Figure 7A shows STRFs obtained from a typical cell with stimulus levels near threshold (assigned a value of 0 dB) and up to 25 dB above threshold. In general, at higher amplitudes, the STRFs broadened along their frequency axis. Earlier studies have shown that frequency tuning, assessed using broadband stimuli, widens with increasing amplitude (Evans 1977; Møller 1977). Our observation is consistent with this phenomenon. A slight decrease in latency was also detected (Fig. 7B), a finding consistent with results obtained in the cochlear nucleus of the rat (Møller 1981). Finally, we also noted that the peaks and troughs in the STRFs became more pronounced with increasing stimulus amplitudes (Møller 1976). This effect could be seen best by superimposing cross sections of each STRF, which represent a PESE envelope near the best frequency, for each stimulus amplitude (Fig. 7B). Such increases in PESE envelope heights with increases in stimulus amplitude would be expected if increases in amplitude improved the locking of a cell's discharge to the envelope. Alternatively, it may be due simply to the fact that spike number increases with amplitude, which, in turn, increases the number of envelope segments that are incorporated into the average. We therefore re-computed the PESE envelopes for the different stimulus levels using the number of spikes elicited by the quietest stimulus tested. For instance, the cell shown in Fig. 7 fired 3,060 spikes at near-threshold stimulus levels (0 dB). We randomly selected 3,060 spikes from the spike trains elicited by the higher stimulus levels and re-computed the PESE envelopes. If the estimate of the PESE envelope improved simply because of an increase in the number of samples averaged, then the PESE envelopes estimated using only a subset of the spikes should become less well-defined compared with those obtained using the full set of spikes. This did not seem to be the case. In the seven cells thus analyzed, the PESE envelope computed using a subset of spikes and that computed from the full complement of spikes showed little difference. This suggests that as stimulus level increases, envelope-locking improves.



View larger version (53K):
[in this window]
[in a new window]
 
Fig. 7. Effects of stimulus level on STRFs. A: STRFs obtained from 1 cell (unit 707AI) at 4 different sound levels relative to a just-suprathreshold stimulus (assigned a level of 0 dB). (STRFs estimated on the basis of 3,060 spikes at 0 dB; 4,105 spikes at +5 dB; 7,693 spikes at +15 dB; and 8,588 spikes at +25 dB.) As the stimulus level is raised, the peaks widen in frequency and become taller, and the latency decreases. B: cross-sections taken through the peaks of the STRFs (shown in A) along their time axes.

Relationship to the modulation transfer function

The response of neurons to dynamic stimuli is often assessed by estimating their MTF using SAM stimuli. In 23 cells, we examined the synchrony and phase of firing to SAM noise modulated at different frequencies. The MTFs thus obtained were compared with the PESE envelopes in the frequency and time domains.

The black lines in Fig. 8, A-D, represent the magnitude (left) and phase (right) spectra of the PESE envelopes of the four cells shown in Fig. 3. The magnitude spectra were, in general, asymmetrically band-passed with a steeper slope on the low-frequency side. The phase spectra typically showed a linear relationship. The MTF assessed by presenting SAM stimuli are superimposed (bullet ). As exemplified by the cells in Fig. 8, the high-frequency cutoff, i.e., the modulation frequency at which the MTFs declined to 50% of their maxima, was below 256 Hz in all 23 neurons tested.



View larger version (37K):
[in this window]
[in a new window]
 
Fig. 8. Responses to sinusoidally amplitude-modulated (SAM) noise. A-D: solid lines, Fourier amplitude (left) and phase (right) spectra of the PESE envelopes shown in Fig. 3. Circles delineate the modulation transfer function (MTF) obtained by plotting the vector strengths (left) or phase (right) of the cells' responses to SAM noise as a function of the modulation frequency. MIRs, reversed in time (broad gray line), and PESE envelopes near the best frequency (black line) for each cell are presented as insets to the right column. Correlation coefficients between these signals are given (r). Gray lines indicate the spectra of the envelope of the narrowband noise. Time-scale bars represent 5 ms. E and F: quantitative comparison of the half-height widths and median frequencies of the MTFs and Fourier amplitude spectra of PESE envelopes for all 23 cells tested. Values obtained from SAM noise (ordinate) are plotted against the same measure obtained from the PESE envelope (abscissa). Correlation coefficients (r) are given and the line of 1-to-1 correspondence is plotted. G: distribution of cross-correlation coefficients between the PESE envelope and the modulation impulse response (MIR) for 19 cells.

The comparison was quantified in several ways. Figure 8, E and F, compares, for all 23 cells, the peaks and half-widths of the magnitude spectra derived using each technique. In general, the matches were quite close (correlation coefficients of 0.81 and 0.84, respectively) and the slopes describing the relationships between the SAM- and PESE-based estimates were not significantly different from one.

We also constructed the time-domain equivalent of the MTF or modulation impulse response (MIR) for each of 19 cells using the phase and vector strengths measured with SAM stimulation and compared these MIRs with the PESE envelopes themselves. As the MIR is a measure of a cell's response and the PESE-envelope is a stimulus subset, their time axes are reversed relative to one another. Figure 8, A-D, right, insets, shows this comparison for the four cells after having reversed the MIR in time. As can be seen, they are well matched. Figure 8G plots the distribution of cross-correlation coefficients between the MIRs and the PESE envelopes for each of the 19 cells (median cross-correlation = 0.87). These data together suggest that a cell's response to envelopes consisting of a wide range of modulation frequencies could be predicted by its responses to envelopes composed of individual sinusoidal components. Good agreement between MTFs derived with SAM noises or tones and those derived by reverse correlation with noise-modulated tones and noises have also been reported for the rat cochlear nucleus and inferior colliculus (Møller 1973; Møller and Rees 1986).

Figure 9 shows the responses of one neuron to SAM noises having different modulation depths and levels. The MTFs in Fig. 9A show that when the level of the SAM noise was increased, vector strengths increased overall, and a new peak (arrow) emerged at a higher modulation frequency (* vs. bullet ). Similarly, as shown in Fig. 9B, increases in modulation depth increased synchrony overall and emphasized higher modulation frequencies (arrow). Analogous results were shown in Fig. 7, in which the PESE envelopes became taller and sharper as the stimulus level was increased.



View larger version (23K):
[in this window]
[in a new window]
 
Fig. 9. A: Effects of changing the level of the SAM noise. (Unit 707BP, PESE estimate based on 24,548 spikes.) SAM noise was presented at a just-suprathreshold level (assigned a value of 0 dB; *) and at 20 dB above this level (). B: effects of changing modulation depth in the cell shown in A. In A and B, increases in overall level or modulation depth cause the peak in the MTF to shift to higher modulation frequencies. Gray lines indicate the spectrum of the envelope of the narrowband noise.

Recently, Dent and colleagues assessed the behavioral sensitivity of the European barn owl to SAM noise (Dent et al. 1999). These behavioral MTFs were found to be quite similar to those measured in a variety of other birds and mammals with peak sensitivity (lowest threshold) between 10 and 20 Hz. The modulation depths at threshold ranged from 6% near 10 Hz to 22% near 500 Hz. While direct comparison with our neuronal MTFs is difficult, it is particularly interesting to note that at modulation depths closer to the behavioral thresholds, the MTFs of a significant proportion of our cells had peaks in the 10- to 20-Hz range.

Sufficiency of a linear model

The results above demonstrated a high degree of similarity between the MIR, a characteristic of the cell's response, and the PESE envelope, a stimulus subset. Given an impulse response, one can predict the response of a linear neural system to an arbitrary input. Conversely, given the average prespike stimulus, one can estimate the stimulus that must have caused the response. The similarity of the MIR and PESE envelope provides empirical evidence that the system is reversible, i.e., the same linear model can be used to predict the response given the stimulus or the stimulus given the response. Here, we use the PESE envelopes (STRF) to predict the response of the neuron to an arbitrary noise in an effort to determine the sufficiency of a linear model consisting of only a STRF. Since some features of the spike train (e.g., gap at ca. 27 ms in Fig. 1) are stimulus invariant, any prediction will necessarily be incomplete, and our purpose here was simply to quantify the contribution of the linear component.

The sufficiency of a model such as ours is typically addressed by comparing the convolution of the stimulus and a kernel, such as the STRF, with the cell's PSTH (Carney et al. 1999; Eggermont et al. 1983a,b; Kowalski et al. 1996; Møller and Rees 1986; Theunissen et al. 2000; Wickesberg et al. 1984; Yamada and Lewis 1999). The PSTH represents the average response of the neuron over many repetitions and allows one to represent spike trains as continuous waveforms. If the STRF provides a complete description of the cell's response to a given signal, then the output of a linear model, i.e., a convolution of the stimulus with the STRF, should be identical to the output of the neuron, i.e., the PSTH (Rieke et al. 1993).

For such a test, we first constructed a PSTH from a cell's response to the first 2 s of the 10-s broadband noise using all 20 or 30 repetitions. The cell's STRF was constructed from its responses to the remaining 8 s of the sound. Thus, construction of the STRF was independent from the PSTH. We then attempted to reconstruct the cell's response to the first 2 s of the stimulus by convolving the stimulus envelope with the STRF. Figure 10, A and B, allows a visual comparison of this reconstructed response with the PSTH for two cells. The cell whose comparison is shown in part in Fig. 10A provided one of the most accurate reconstructions, resulting in a covariance coefficient of 0.78 for the 200 ms portion shown and 0.64 for the entire 2 s (arrow a in Fig. 10C, see following text). The response and reconstruction of a more typical cell are shown in Fig. 10B (r = 0.55 for 200 ms and r = 0.49 for 2 s; arrow b in Fig. 10C).



View larger version (32K):
[in this window]
[in a new window]
 
Fig. 10. Reconstruction of a cell's response by convolution of a STRF with the stimulus envelope. A and B: 200-ms sections of the reconstruction (thick lines) overlay the corresponding sections of the PSTH (thin lines) for 2 cells. Covariance coefficients (r) for the displayed sections are given to the upper left of each plot. (A: unit 888AG; B: unit 707BI.) C: covariance coefficients from the entire 2-s reconstruction are plotted as a function of the maximal height of the STRF. Points corresponding to the plots in A and B are labeled a and b. n = 32 cells.

In Fig. 10C, the covariance coefficients for the 32 cells that were tested with 20 or more repetitions of the noise are plotted against the maximum height of the peak in the STRF. Cells with a well-defined STRF, as indicated by a high peak value, gave coefficients (r) in the range of 0.42-0.68 (Fig. 10C). The scatter plot in Fig. 10C saturates, suggesting that the ability to predict a cell's response to an arbitrary noise is not limited by our ability to estimate the STRF. Other processes, perhaps nonlinear responses to envelope structure, or processes that generate stimulus-independent response features, may in fact limit the predictions. Thus we conclude that a linear model represented by the STRF of such cells can explain between about 18 and 46% of the variance (r2) in the spike train.


    DISCUSSION
TOP
ABSTRACT
INTRODUCTION
METHODS
RESULTS
DISCUSSION
REFERENCES

Our study was prompted by the observation that the discharge pattern evoked in the cells of the owl's ICc-ls by reproducible broadband noise is itself highly reproducible and was therefore a possible means of representing temporally complex stimuli. Our results show that this pattern was dependent, at least in part, on the envelope of the stimulus and was independent of the source's location.

The highly patterned activity evoked in the ICc by reproducible complex stimuli is not unique to the owl. Hermes and colleagues reported that cells in the midbrain of the grassfrog discharged with reproducible patterns to reproducible noises (Hermes et al. 1981). Carney and Yin demonstrated patterned activity evoked by reproducible noise in the cat inferior colliculus (Carney and Yin 1989). Carney and colleagues, working in the inferior colliculus of the awake rabbit, have since demonstrated the relationship between the pattern of these cells' spike trains and the stimulus envelope (Carney et al. 1999). Thus the temporal coding of envelope structure may be a role that the inferior colliculus assumes across species (Rose 1986).

Our reconstructions of neuronal spike trains showed that the simple linear model represented by the STRF can account for up to 46% of the variance in the pattern of neuronal responses to noises. Others working at the level of the midbrain in the frog, rat, and cat report similar findings (Delgutte et al. 1998; Eggermont et al. 1983b; Møller and Rees 1986). Møller and Rees (1986) have further shown that predictions of neuronal response can be improved by accounting for the possibility that inferior collicular neurons may respond differently to amplitude increments and decrements. Similar models applied to the 8th nerve and cochlear nuclei generally yield better matches, suggesting that neuronal responses are more linear at lower levels of the auditory system (Carney et al. 1999; Delgutte et al. 1998; Wickesberg et al. 1984; Yamada and Lewis 1999).

How effective is a STRF derived from responses to noise at predicting the response to more structured sounds, such as species-specific vocalizations? At higher levels of the auditory system, neurons often are poorly driven by white-noise stimuli and may require highly specific and structured stimuli for optimal firing. Eggermont and colleagues showed that in the midbrain of the grassfrog, a noise-based STRF is a poor predictor of responses to the frog's vocalizations (Eggermont et al. 1983a). More recently, Theunissen and colleagues have extended the STRF-technique by using species-specific vocalizations as stimuli to obtain STRFs from neurons in the primary auditory area of the zebra finch forebrain (Theunissen et al. 2000). STRFs derived from these vocalizations predict the neuronal responses to vocalizations with greater accuracy than a STRF based on tone-pips and with about the same accuracy as we report for owl ICc-ls neurons. Theunissen and colleagues suggested that nonlinear neuronal responses may be more fully characterized by a series of STRFs obtained with ethologically important subsets of the stimulus space. Neurons within the owl's ICc-ls respond strongly to white-noise stimuli, such as those that we used to gather the present STRFs, perhaps because they are similar to sounds generated by movement of prey in the leaf litter (Payne 1971). The responses of these cells to owl vocalizations have not been described. However, if responses to owl vocalizations could be predicted better by STRFs based on vocalizations than by the present STRFs, it would suggest that the nonlinearities that generate specificity to vocalizations are found as early as the inferior colliculus.

Effects of stimulus location

Neurons within the owl's ICc-ls are nearly exclusively binaural and require specific values of ITD and ILD to fire optimally. The system that we have attempted to characterize therefore necessarily includes, first, the filtering properties of the owl's facial ruff and outer ear that generate these primary sound localization cues. The system also includes lower-level processing that underlies computation of ILD and ITD as well as the convergence of these binaural cues within the ICc-ls. The STRF cannot tell us about how each of these processes occur, only about the cumulative linear relationship between the ICc-ls spiking pattern and the sound input to the system.

There is evidence suggesting that the temporal pattern of neuronal discharge codes for the position of a sound source. For example, Delgutte and colleagues have noted that the firing pattern of some cells in the cat inferior colliculus changes with the position of a source in virtual auditory space (Delgutte et al. 1999). In the cat auditory cortex, the firing pattern can be shown to contain information regarding the location of the source (Middlebrooks et al. 1994). By contrast, in the ICc-ls of the owl, the pattern of spikes evoked by a source at different azimuths was found to be indistinguishable statistically, provided that the overall firing rate was similar. However, because the spiking patterns in the owl are related to the stimulus envelope, we examined the envelopes of reproducible gammatone noises filtered with the HRTFs of the two ears associated with loci on the horizon from -90 to +90° at eye level. The envelope from each azimuth was cross-correlated with that obtained at the midline (0°). This analysis showed that the cross-correlation coefficients deviate by less than 0.1 over the range tested except at loci where the filtering properties of the ears create prominent troughs in the amplitude spectra. It is therefore not surprising that the firing patterns were found to be largely independent of stimulus position, and without evidence to the contrary, it is more parsimonious to assume that the firing pattern is better attributed to the sound's temporal features instead of the source's position in space. It is interesting to note that in the cat inferior colliculus, spiking patterns elicited by noise remain constant when ITD, a major sound localization cue, is varied without altering the envelope (Carney and Yin 1989).

Origin of temporal activity patterns

In the barn owl, the ICc-ls is the first stage at which information from the pathways that compute ITD and ILD converge (Mazer 1995). The nucleus laminaris, which is the first site of binaural convergence in the time pathway, projects directly to the core of the central nucleus of the inferior colliculus, which, in turn, innervates the ICc-ls of the opposite side (Takahashi et al. 1989). The neurons of nucleus laminaris are known to be phase locked to the carrier (Carr and Konishi 1990), but their ability to encode stimulus envelope has not been tested. Envelope information could also be conveyed through the ILD pathway via neurons of the nucleus ventralis lemnisci lateralis pars posterior (VLVp). The nucleus angularis, a cochlear nucleus, is excited by the ipsilateral ear and projects contralaterally to the ICc-ls (Sullivan and Konishi 1984; Takahashi and Konishi 1988) and the VLVp. The VLVp is the first site of binaural convergence and is excited by a direct projection from the contralateral nucleus angularis and inhibited by a commissural projection from the opposite VLVp (Takahashi and Keller 1992; Takahashi et al. 1995). Evidence suggests that it projects bilaterally to the ICc-ls (Adolphs 1993). The response pattern of angularis neurons was examined with tones only, which gave rise to a chopper pattern, but responses to wider-band stimuli have not been described (Sullivan 1985). In VLVp, spike trains to repeated presentation of a reproducible, band-limited noise (1-kHz bandwidth) showed evidence of chopping, but in addition, consistent gaps in this otherwise regular pattern were noted (Mogdans and Knudsen 1994). Whether these gaps are related to the envelope is not known, however. Neurons of the mammalian anteroventral cochlear nucleus that are classified as choppers based on tonal stimuli respond to reproducible noises with patterns that are irregular but are nevertheless consistent from repetition to repetition (Carney et al. 1999). A computational model of the inferior colliculus that incorporates chopper cells of the anteroventral cochlear nucleus as inputs is able to reproduce the firing of collicular neurons to SAM stimuli (Hewitt and Meddis 1994). The model's response to stimuli with more complex envelopes was not investigated.

Perceptual relevance

Temporal modulations in amplitude are highly salient cues. Smearing of envelope modulations by low-pass filtering degrades speech comprehension in noise (Drullman et al. 1994). Conversely speech recognition is well preserved when frequency-specific information is largely removed but amplitude modulation patterns are left intact (Shannon et al. 1995). Such results suggest that time-varying amplitude information plays a major role in speech perception. The song system of the zebra finch provides a parallel at the neuronal level. In the telencephalic nucleus, HVc, the selectivity of neurons for a bird's own song are more resistant to degradation of spectral information than to degradation of temporal information (Theunissen and Doupe 1998).

Although the temporal patterns described in the inferior colliculus are robust, it is natural to ask whether they are meaningful to the organism. It is not known whether the owl can discriminate the noises that the ICc-ls cells can differentially encode. There is evidence that human listeners can discriminate reproducible noise samples (Coble and Robinson 1992; Hanna 1984). This discriminability diminishes as the correlation between the noise samples being compared is increased (Hanna 1984). Such a discrimination could be made on the basis of the carrier or envelope. Studies with narrowband noises showed that performance was similar with low center frequencies (225-275 Hz) or high center frequencies (2,975-3,025 Hz), suggesting the use of envelope cues, which are equally available in both high- and low-frequency stimuli (Hanna 1984). Were the discrimination based on the carrier waveform, the performance would be expected to be better for low-frequency stimuli for which phase-locking is more robust (Rose et al. 1966).

More recently, Coble and Robinson demonstrated that the ability of human listeners to discriminate between two reproducible band-passed noises was poorest when the two waveforms differed only at the beginning (Coble and Robinson 1992). Citing the model of Braida and colleagues (Braida et al. 1984), they suggested that the auditory system differentially weights the information within various segments of the stimuli. The report of Coble and Robinson (1992) is particularly interesting given the results of the drift test (Fig. 1), which showed that the temporal pattern in the initial ca. 10 ms of a spike train remains fixed regardless of the structure of the stimulus used. It is possible that in the human auditory system too, the internal representation of the initial stimulus segment is independent of the stimulus's temporal structure. The situation, however, is probably more complex because the duration over which the two noises must differ represented a constant proportion of the total stimulus. In other words, as the stimulus duration increased, the two noises needed to differ over longer durations to achieve the same discriminability (Coble and Robinson 1992). It seems highly unlikely that the time period during which the spiking pattern in the ICc remains fixed scales with the total stimulus duration. Nevertheless our current results predict that the owl would have difficulties in discriminating noise bursts that differed only in their initial 10 ms.

Our results and those of others, suggest that the ICc of the barn owl functions much like a spatially tuned dynamic spectrum analyzer in which the firing patterns represent the time-varying amplitude of stimuli from a given spatial location. Given the importance of temporal cues to the perception of complex behaviorally relevant signals, such a view compels us to attend to spike-pattern as much as to spike rate when studying hearing in naturalistic environments. For instance, the question of how speech is masked by noise must be addressed in terms not only of whether or not a neuron can signal the presence of a target by its average firing rate but also in terms of the fidelity with which the temporal pattern evoked by a target is preserved in the noise (Bodnar and Bass 1999; Caird et al. 1991; Delgutte and Kiang 1984; Narins and Wagner 1989; Simmons et al. 1992). An understanding of the temporal code in the ICc is thus a step toward the wider goal of understanding how the nervous system extracts useful information in the presence of clutter and uncertainties in the physical signal.


    ACKNOWLEDGMENTS

We thank Dr. Klaus Hartung for help during all stages of this study.

This work was supported by National Institute on Deafness and Other Communication Disorders Grant DC-03925 and National Science Foundation Learning and Intelligent Systems Initiative Grant CMS9720334.


    FOOTNOTES

Address for reprint requests: C. H. Keller (E-mail: keller{at}uoneuro.uoregon.edu).

1 The average AC fluctuation is linearly related to the fourth moment of the probability density function of signal amplitude. To match the modulation of the SAMs and noises precisely, we would need to match the AC fluctuation of the noise at each modulation frequency. The computation of such a spectrum would require a model with parameters that could only be guessed at and was therefore not implemented.

Received 28 March 2000; accepted in final form 28 July 2000.


    REFERENCES
TOP
ABSTRACT
INTRODUCTION
METHODS
RESULTS
DISCUSSION
REFERENCES

0022-3077/00 $5.00 Copyright © 2000 The American Physiological Society