Responses of Auditory Cortical Neurons to Pairs of Sounds: Correlates of Fusion and Localization

Brian J. Mickey and John C. Middlebrooks

Kresge Hearing Research Institute, University of Michigan, Ann Arbor, Michigan 48109-0506


    ABSTRACT
TOP
ABSTRACT
INTRODUCTION
METHODS
RESULTS
DISCUSSION
REFERENCES

Mickey, Brian J. and John C. Middlebrooks. Responses of Auditory Cortical Neurons to Pairs of Sounds: Correlates of Fusion and Localization. J. Neurophysiol. 86: 1333-1350, 2001. When two brief sounds arrive at a listener's ears nearly simultaneously from different directions, localization of the sounds is described by "the precedence effect." At inter-stimulus delays (ISDs) <5 ms, listeners typically report hearing not two sounds but a single fused sound. The reported location of the fused image depends on the ISD. At ISDs of 1-4 ms, listeners point near the leading source (localization dominance). As the ISD is decreased from 0.8 to 0 ms, the fused image shifts toward a location midway between the two sources (summing localization). When an inter-stimulus level difference (ISLD) is imposed, judgements shift toward the more intense source. Spatial hearing, including the precedence effect, is thought to depend on the auditory cortex. Therefore we tested the hypothesis that the activity of cortical neurons signals the perceived location of fused pairs of sounds. We recorded the unit responses of cortical neurons in areas A1 and A2 of anesthetized cats. Single broadband clicks were presented from various frontal locations. Paired clicks were presented with various ISDs and ISLDs from two loudspeakers located 50° to the left and right of midline. Units typically responded to single clicks or paired clicks with a single burst of spikes. Artificial neural networks were trained to recognize the spike patterns elicited by single clicks from various locations. The trained networks were then used to identify the locations signaled by unit responses to paired clicks. At ISDs of 1-4 ms, unit responses typically signaled locations near that of the leading source in agreement with localization dominance. Nonetheless the responses generally exhibited a substantial undershoot; this finding, too, accorded with psychophysical measurements. As the ISD was decreased from ~0.4 to 0 ms, network estimates typically shifted from the leading location toward the midline in agreement with summing localization. Furthermore a superposed ISLD shifted network estimates toward the more intense source, reaching an asymptote at an ISLD of 15-20 dB. To allow quantitative comparison of our physiological findings to psychophysical results, we performed human psychophysical experiments and made acoustical measurements from the ears of cats and humans. After accounting for the difference in head size between cats and humans, the responses of cortical units usually agreed with the responses of human listeners, although a sizable minority of units defied psychophysical expectations.


    INTRODUCTION
TOP
ABSTRACT
INTRODUCTION
METHODS
RESULTS
DISCUSSION
REFERENCES

The ability to localize sounds is found widely among animals, indicating its functional and evolutionary importance. Most vertebrates, including humans and cats, are thought to use the same acoustical cues for sound localization. Interaural time and intensity differences provide ample information about the azimuth of an isolated broadband sound source in the frontal hemisphere. Accordingly, humans and cats integrate these cues to localize single broadband sounds in a predictable way and with considerable accuracy (human: reviewed by Blauert 1997; Middlebrooks and Green 1991; cat: May and Huang 1996; Populin and Yin 1998). On the other hand, when multiple sounds arrive nearly simultaneously from different locations, the acoustical cues are confounded. Such sounds are generally not localized to their true locations. Investigation of how these sounds are represented in the brain might therefore provide valuable insight into the process of sound localization and stimulus coding.

When multiple sounds arrive in close succession, auditory mechanisms associated with the precedence effect are engaged (reviewed by Blauert 1997; Litovsky et al. 1999; Zurek 1987). If the inter-stimulus delay (ISD) separating two brief stimuli is below the echo threshold---about 3-8 ms for clicks, depending on the listener---only one sound is reported (Freyman et al. 1991; Thurlow and Parks 1961; Wallach et al. 1949). This perception is called fusion. Under most conditions, the fused spatial percept or image is fairly compact and localizable. The perceived location of a fused image depends systematically on the ISD. At ISDs in the range of ~1-5 ms, the reported location lies near the location of the leading sound (Chiang and Freyman 1998; Wallach et al. 1949); that phenomenon is called localization dominance.1 At those ISDs, the lagging sound is not heard as a separate sound and has relatively little influence on the location judgement. When the ISD is ~0.8 ms or less, in the domain of summing localization, both leading and lagging sounds strongly influence the perceived location (reviewed by Blauert 1997). The fused image falls between the two source locations and is biased toward the location of the leading sound. The presence of an inter-stimulus level difference (ISLD) also affects localization: at a given ISD, the location judgement is biased toward the more intense loudspeaker (Snow 1954). It follows that the shift due to an ISD can be compensated by applying an opposing ISLD---this balancing of ISD and ISLD has been termed time-intensity trading.2 These phenomena, and analogous effects under headphones (e.g., Gaskell 1983; Litovsky and Shinn-Cunningham 2001; Shinn-Cunningham et al. 1993; Wallach et al. 1949; Yost and Soderquist 1984; Zurek 1980), have been studied extensively in humans (reviewed by Blauert 1997; Litovsky et al. 1999; Zurek 1987). In addition, behavioral studies have demonstrated localization dominance in rats (Kelly 1974) and localization dominance and summing localization in cats (Cranford 1982; Populin and Yin 1998).

The fusion and localization of paired sounds is likely to involve the auditory cortex. Lesions of the auditory cortex in cats impair localization of single sounds in the contralateral hemifield (Jenkins and Masterton 1982). Furthermore paired sounds with delays of a few milliseconds are mislocalized following auditory cortical lesions (Cranford and Oberholtzer 1976; Cranford et al. 1971; Whitfield et al. 1972). Circumstantial evidence from developmental studies also implicates the cerebral cortex: human infants initially lack localization dominance, but they gain this behavior during a period of intense cortical development, i.e., in the first year of life (reviewed by Clifton 1985; Litovsky and Ashmead 1997; Litovsky et al. 1999). Finally, unit responses of many auditory cortical neurons are sensitive to sound-source location (reviewed by Middlebrooks et al. 2001), and those responses reliably signal the locations of single broadband sound sources (Furukawa et al. 2000; Middlebrooks et al. 1998). The question follows: how are perceptually fused sounds represented by the activity of cortical neurons, and under what conditions do neuronal responses signal the perceived locations of those sounds?

We examined cortical areas A1 and A2 in the present study. We chose to study area A1, which receives specific tonotopic projections from the thalamus (Andersen et al. 1980), because lesions of A1 impair localization of pure tones (Jenkins and Merzenich 1984) and because the spatial sensitivity of A1 neurons has been characterized previously (e.g., Imig et al. 1990; Middlebrooks and Pettigrew 1981). We included the dorsal zone of A1, which tends to exhibit broader frequency tuning than other areas of A1 (Middlebrooks and Zook 1983). Localization of most sounds requires integration across frequencies so we also studied area A2, an area that receives diffuse nontonotopic thalamic projections (Andersen et al. 1980). Neurons in area A2 tend to be broadly tuned in frequency, and their sensitivity to the location of broadband sounds has been studied previously in this laboratory (Furukawa et al. 2000; Middlebrooks et al. 1998).

In the present study, we recorded unit responses from the cortex of anesthetized cats while presenting single broadband clicks from loudspeakers at various frontal azimuths as well as pairs of clicks with various ISDs and ISLDs from a pair of loudspeakers. Previous cortical studies have examined responses to stimulus pairs with a wide range of ISDs (Fitzpatrick et al. 1999; Reale and Brugge 2000). In the present study, we focused on stimulus pairs with ISDs below echo threshold (<5 ms) and specifically asked what locations were signaled by unit responses to such stimuli. Because location judgements had not been measured previously using these specific stimuli, we also performed human psychophysical experiments. We found that cortical units typically responded to paired sounds with a single burst of spikes. Spike patterns were analyzed with artificial neural networks to derive estimates of location that could be directly compared with psychophysical results. With notable exceptions, units signaled locations that, after accounting for the difference in head size between cats and humans, agreed with the responses of human listeners. Finally, we implemented a simple model that included peripheral filtering and interaural cross-correlation. The model results suggested that physiological correlates of localization dominance and time-intensity trading require central auditory processing beyond interaural cross-correlation.


    METHODS
TOP
ABSTRACT
INTRODUCTION
METHODS
RESULTS
DISCUSSION
REFERENCES

Animal preparation

Ten purpose-bred male young-adult cats (Harlan, Indianapolis, IN) with body weights ranging from 3.5 to 5.5 kg were used. All procedures complied with guidelines of the University of Michigan Committee on Use and Care of Animals. The animal preparation reviewed here was essentially identical to that detailed previously (Middlebrooks et al. 1998). Isoflurane anesthesia was used during surgery, and intravenous alpha -chloralose was used during unit recording. A skull opening ~1 cm in diameter exposed the middle ectosylvian gyrus of the right hemisphere. A plastic retainer was cemented to the ventral margin of the opening to create a recording chamber. The animal was positioned with its head in the center of the sound chamber, its body supported in a sling with a heating pad, and its head supported by a bar attached to a skull fixture. Thin wire supports held the pinnae symmetrically throughout the experiment. Experiments lasted 1-5 days and were ended when cortical responses became weak.

Physiological apparatus and stimulus generation

Physiological experiments were performed under free-field conditions with an apparatus that has been described previously (Middlebrooks et al. 1998). A sound-attenuating chamber (dimensions, 2.6 × 2.6 × 2.5 m) was lined with sound-absorbing foam to suppress reflections. A series of loudspeakers was positioned on a horizontal circular hoop. Loudspeakers were located 1.2 m from the cat's head at various frontal azimuths (-80°, -60° to +60° in 10° steps, and +80°). The location directly ahead of the animal was assigned an azimuth of 0°, negative azimuths were to the left, and positive azimuths were to the right. Experiments were controlled by custom MATLAB software (The Mathworks, Natick, MA) running on a Pentium-based personal computer with instruments from Tucker-Davis Technologies (Gainesville, FL). Two-way coaxial loudspeakers (Pioneer TS-879 or JBL GT0302) were used. Computer-controlled two-channel D/A converters and multiplexers allowed sounds to be presented from single loudspeakers or from pairs of loudspeakers simultaneously, and two attenuators allowed the levels at the two loudspeakers to be varied independently.

Physiological experiments employed clicks, noise bursts, and pure-tone bursts. The maximal passband of our system was 0.5-30 kHz. Because the loudspeakers generally differed in their detailed response properties, each loudspeaker was individually calibrated by obtaining an impulse response (Zhou et al. 1992). Click stimuli were created by convolution of a 100-µs rectangular pulse with the inverse impulse response of the intended loudspeaker. A 5-ms segment centered on the resulting transient was isolated, and 0.5-ms raised-cosine ramps were applied to the ends. About 80% of the energy of the click was concentrated within 100 µs. A few units were examined using 3-ms Gaussian noise bursts (with abrupt onsets and offsets), which also incorporated the loudspeaker correction. Each trial used an independently sampled token of noise; for paired noise bursts, the two tokens of noise were identical aside from the imposed ISD or ISLD. When stimulus pairs were presented with a nonzero ISD, an appropriate number of zeros was inserted in front of the waveform intended for the lagging loudspeaker. Stimulus waveforms were generated with 16-bit precision and a sampling rate of 100 kHz. During initial physiological characterization of units, we also delivered loudspeaker-corrected 80-ms Gaussian noise bursts (with abrupt onsets and offsets) and 80-ms pure-tone bursts (with 5-ms raised-cosine ramps applied to onsets and offsets).

The levels of single stimuli are expressed relative to unit threshold for a single stimulus presented from 0° azimuth. In all but a few cases (10 units), we matched the levels of paired stimuli (LR and LL, for right and left loudspeakers, in dB) to the level of a single stimulus (LS) by equalizing the sum of the amplitudes of the paired stimuli (AR and AL) and the amplitude of the single stimulus (AS)
<IT>A</IT><SUB><IT>S</IT></SUB><IT>=</IT><IT>A</IT><SUB><IT>R</IT></SUB><IT>+</IT><IT>A</IT><SUB><IT>L</IT></SUB>
For a desired ISLD, LR - LL, the inter-stimulus amplitude ratio is
<IT>r</IT><IT>=</IT><IT>A</IT><SUB><IT>R</IT></SUB><IT>&cjs0823;  </IT><IT>A</IT><SUB><IT>L</IT></SUB><IT>=10</IT><SUP>(<IT>L</IT><SUB><IT>R</IT></SUB><IT>−</IT><IT>L</IT><SUB><IT>L</IT></SUB>)<IT>&cjs0823;  20</IT></SUP>
Solving these two equations for AR and AL, and converting to levels in dB, we get the following relations
<IT>L</IT><SUB><IT>R</IT></SUB><IT>=</IT><IT>L</IT><SUB><IT>S</IT></SUB><IT>+20 log<SUB>10</SUB> </IT>(<IT>r</IT><IT>&cjs0823;  </IT>(<IT>r</IT><IT>+1</IT>))

<IT>L</IT><SUB><IT>L</IT></SUB><IT>=</IT><IT>L</IT><SUB><IT>S</IT></SUB><IT>+20 log<SUB>10</SUB> </IT>(<IT>1&cjs0823;  </IT>(<IT>r</IT><IT>+1</IT>))
This procedure achieved the desired ISLD and also roughly matched the subjective loudness of the paired stimulus to that of the single stimulus.

Unit recordings and spike sorting

Unit activity was recorded extracellularly with silicon-substrate 16-channel probes (Anderson et al. 1989) that were provided by the University of Michigan Center for Neural Communication Technology. Each probe had a single shank with 16 recording sites arranged linearly at intervals of 100 or 150 µm. After probe insertion, the recording chamber was filled with silicon oil or with warm agarose (2% in Ringer solution) that subsequently solidified. To improve unit stability, we waited >= 30 min after probe placement before the start of recording. The activity at each site was amplified, digitized with 16-bit precision and a sampling rate of 25 kHz, sharply low-pass filtered <6 kHz, resampled at 12.5 kHz, and stored on the computer hard disk.

Spike sorting was performed off-line using custom software (Furukawa et al. 2000) based on principal components analysis of spike shape. The quality of unit isolation was characterized based on scatterplots of weights of the first two principal components and on histograms of inter-spike intervals. In a minority of cases, distinct spike waveforms were inferred to be from single neurons, but more often we recorded unresolved spikes that were inferred to be from two or more neurons, i.e., multi-unit clusters (see Furukawa et al. 2000 for illustrations of unit isolation). In the present study, all such single- and multi-unit recordings are collectively referred to as "units." During initial screening, units were eliminated from further analysis if the mean spike rate (number of spikes per trial) across all conditions varied by more than a factor of two during the recording or the mean spike rate across all conditions was <0.5 per trial. When more than one distinct unit was isolated at a site, only the best-isolated single unit was retained for further analysis. Spikes of one neuron sometimes appeared at two adjacent recording sites, as indicated by sharp peaks near zero in histograms of between-unit spike times. We eliminated one member of each such pair. This paper describes 151 units that survived this screening. Sixteen units (11%) were reliably identified as single units; the remaining 135 units (89%) were multi-unit clusters. Figures 2, 3, and 7 of the present study show data from single units; Figs. 4, 5, and 6 represent multi-unit recordings.

Determination of cortical area

We recorded from three distinct cortical areas: A1, the dorsal zone of A1, and A2. Categorization of a unit was based on three factors: the unit's frequency bandwidth based on responses to 80-ms pure tones (tested frequency range, 0.5-30 kHz, 1/3-octave steps; duration, 80 ms; azimuth, 0 or -40°); consistency with the expected tonotopic organization of area A1 (Merzenich et al. 1975; Reale and Imig 1980); and the unit's location relative to recognized sulci and to other characterized units. A unit was judged to be narrowly tuned if its bandwidth at half-maximal spike rate was <= 1.33 octave at a level >= 40 dB relative to threshold at the best frequency. All narrowly tuned units were assigned to area A1, although we were not able to rule out the inclusion of some high-frequency units from field AAF. Units were designated as broadly tuned if they had multi-peaked frequency response areas or if they had bandwidths >= 1.67 octave at a level <= 40 dB relative to threshold. Broadly tuned units located dorsal or dorsocaudal to area A1 were judged to be in the dorsal zone of A1 (Middlebrooks and Zook 1983); those located ventral to area A1 were judged to be in area A2. For some units, a cortical area could not be assigned because the tuning bandwidth could not be determined due to a best frequency near the limits of the frequencies tested, the bandwidth of the rate-frequency function was ambiguous, or the unit responded only weakly to pure tones. Of 151 units, 41 (27%; 3 single units) were from A1, 74 (49%; 12 single units) were from A2, 20 (13%; all multi-unit) were from the dorsal zone of A1, and 16 (11%; 1 single unit) could not be assigned a cortical area. Most of our units responded best to pure tones of high frequencies. Among units recorded from area A1, the median best frequency was 9.5 kHz (range 1.3-24 kHz), and 85% of units had best frequencies >6 kHz. For broadly tuned units recorded from area A2 and the dorsal zone of A1, we defined the half-maximal frequency band as the band of pure-tone frequencies that elicited a spike rate >50% of maximum for a level 20 dB above the lowest recorded level that gave a reliable response to any frequency. By this definition, the half-maximal frequency bands of 90% of units extended >6 kHz, the bands of 67% of units included frequencies between 2 and 6 kHz, and the bands of 32% of units extended <2 kHz.

Physiological procedure

We recorded from neurons in the middle ectosylvian gyrus of the right hemisphere. The probe was inserted approximately tangential to the surface of the cortex with the goal of placing all recording sites in active cortical layers. The penetration was usually oriented dorsoventrally but sometimes rostrocaudally. The number of sites with usable unit activity ranged across probe placements from 1 to 13 (median 4). The number of probe placements per animal ranged from 1 to 7 (median 3), totaling 34 across the 10 animals.

Search stimuli were 80-ms broadband noise bursts, typically presented from an azimuth of 0° at 30 dB SPL. After initial characterization of frequency tuning, single stimuli were presented from 0° at various levels in 5-dB steps, and unit thresholds were estimated on-line. These estimates guided the choice of stimulus levels. Units' actual thresholds for 0° azimuth were later determined to the nearest 5 dB by inspection of raster plots off-line. After estimating thresholds on-line, we presented single stimuli from various azimuths (-80°, -60° to +60° in 10° steps, and +80°); and paired stimuli, one stimulus from each of a pair of loudspeakers at -50° and +50°. For paired stimuli, we varied the ISD, the ISLD, or both. The ISD ranged from -4 ms (left loudspeaker leading) to +4 ms (right loudspeaker leading), and the ISLD ranged from -30 dB (left loudspeaker more intense) to +30 dB (right loudspeaker more intense). Stimuli were presented at two or three levels in steps of 10 dB, ~20-40 dB above unit threshold. Sounds were delivered every 1.1-1.5 s in pseudorandom order such that all stimulus conditions were tested once before repeating all stimuli again in a different pseudorandom order. Each stimulus condition was repeated a total of 10, 20, or 40 times. We typically presented a block of 20 repetitions of each single-stimulus condition and a block of 10 repetitions of each paired-stimulus condition and then repeated each block once more. The blocks were interleaved to reduce the effects of any potential variation of neuronal responsiveness during the 2-4 h stimulus set.

Physiological data analysis

Spike times were expressed relative to the onset of D/A conversion. Therefore latencies include 3.5 ms of acoustic travel time. For paired stimuli, spike times were expressed relative to the onset at the leading loudspeaker. Because we found stimulus-evoked responses only at poststimulus times between 10 and 50 ms, only spikes occurring within this range were included in the analysis.

We employed artificial neural networks to recognize spike patterns and associate them with particular azimuths using methods similar to those described previously (Middlebrooks et al. 1998). This approach has the advantages that it produces an output (estimated azimuth) that is directly comparable with psychophysical results and it does not require assumptions about the information-bearing features of spike patterns (e.g., spike rate, first-spike latency, or other features). The first step in the procedure was, for each unit, to train a naive network with responses evoked by single stimuli from azimuths in the range -80 to +80°; odd-numbered trials were used for training. To validate the procedure and evaluate that unit's ability to code azimuth, the trained network was then tested with responses to single stimuli collected on even-numbered trials. Finally the trained network was presented with responses to paired stimuli with various ISDs and ISLDs, and the outputs of the network were taken as the azimuths that were signaled by the unit's responses.

Networks were implemented with the MATLAB Neural Network Toolbox. Input to the networks consisted of bootstrap-averaged spike density functions (Middlebrooks et al. 1998), with four samples (trials) per bootstrap average. For analysis of individual units, 100 average spike density functions were created for each unit. For analysis of ensembles of units, a subset of units was drawn from the population (described in RESULTS), the spike density functions of these units were concatenated, and 40 average spike density functions were created. Network architecture and training were the same as described previously for individual units (Middlebrooks et al. 1998) and ensembles of units (Furukawa et al. 2000). Briefly, a single hidden layer contained four or eight units with hyperbolic tangent sigmoid transfer functions. The output layer had two units, representing the sine and cosine of azimuth, with linear transfer functions. The network was feed-forward and fully connected. Supervised training of the network used a mean-squared error performance function and the resilient backpropagation algorithm to adapt network weights and biases. The training was repeated three times, and the network with the smallest centroid error (defined in the following text) was retained. This trained network was then presented with average spike density functions created from responses to paired stimuli at various levels, ISDs, and ISLDs, resulting in multiple estimates of azimuth for each paired-stimulus condition.

Since we were interested in aspects of azimuth coding that are largely independent of stimulus intensity, we analyzed unit responses to sounds that varied in level. Unit responses at two or three levels, 10 dB apart, were used to train networks. Networks were tested with responses at levels that were similar to the training levels. Levels of paired stimuli were matched to those of single stimuli as described under Physiological apparatus and stimulus generation.

Estimates of azimuth were characterized in the same way for physiological signaling of location and for psychophysical responses (described in Psychophysical methods). The central tendency of multiple azimuth estimates was represented by the centroid (i.e., the circular mean), which was computed by treating each estimate as a unit vector, forming the vector sum, and finding the direction of the resultant. To characterize the spread or variability of the data, we calculated the quartile deviation by expressing azimuth estimates as values within ±180° of the centroid, and finding the 25th and 75th percentile values of the distribution. Azimuth estimates falling within the quartile deviation constituted the central 50% of the data. When evaluating the accuracy of responses to single stimuli, we calculated the centroid error, which is the unsigned difference between the centroid and the true source azimuth, averaged across source azimuth. The centroid error serves as a single measure of overall accuracy but does not indicate bias in responses or variation of errors across azimuth. For psychophysical data, the centroid error was calculated over the source azimuth range -70 to +70°. For physiological data, centroid error was calculated over a narrower source azimuth range of -60 to +60° because artificial neural networks were almost always less accurate at the extremes of the training range (i.e., -80 and +80°). Network estimates tended to fall near 0° in the face of uncertainty (instead of, e.g., falling uniformly between the extremes of the training range), so the chance-level centroid error was ~32.3° (the mean of the absolute values of the azimuths tested). When evaluating responses to paired stimuli, we calculated the centroid difference, which is the unsigned difference between the centroid estimate and a psychophysical template (described in Acoustical measurements, computational model, and psychophysical templates), averaged across a specified range of ISD or ISLD. Like the centroid error, the centroid difference characterizes a unit's responses with a single measure but does not indicate bias in responses or variation across stimulus conditions.

Psychophysical methods

For human psychophysical experiments, five paid listeners (age 18-30, 3 female, 2 male) were recruited from students and staff of the University of Michigan. All had normal hearing as determined by standard audiometric screening. Two of the listeners (S75 and S79) had brief previous experience with psychoacoustic tasks.

Psychophysical experiments were performed under free-field conditions using an apparatus similar to that described previously (Middlebrooks 1999). Each listener stood on a platform in a sound-attenuating anechoic chamber (dimensions 2.6 × 3.7 × 3.2 m). The chamber walls, floor, and ceiling were lined with fiberglass wedges and sound-absorbing foam. A headrest was positioned directly below the listener's chin. Sounds were delivered from five two-way coaxial loudspeakers. A computer-controlled movable hoop with a radius of 1.2 m was equipped with two loudspeakers that could be positioned nearly anywhere on a spherical surface around the listener. In addition to the two movable loudspeakers, three stationary loudspeakers were located on the horizontal plane (i.e., 0° elevation): one at a distance of 1.6 m and an azimuth of 0° and two at a distance of 1.8 m and azimuths of -46 and +46°. The latter two loudspeakers were hidden behind acoustically transparent black cloth so that listeners would not be aware of them. Experiments were controlled by custom MATLAB software running on a Pentium-based personal computer with instruments from Tucker-Davis Technologies. Computer-controlled two-channel D/A converters and multiplexers allowed sounds to be presented from single loudspeakers or from pairs of loudspeakers simultaneously, and two attenuators allowed the levels at the two loudspeakers to be varied independently. Click stimuli were generated as described above for physiological experiments except that the passband was 0.3-18 kHz for human experiments.

Listeners reported the apparent location of sounds by orienting their heads. An electromagnetic tracking system (Polhemus Fastrak, Colchester, VT) measured head orientation. Prior to participating in localization experiments, each listener was trained in the localization task. This procedure is detailed elsewhere (Macpherson and Middlebrooks 2000) and is summarized here. First the listener completed one session (60 trials) during which he or she oriented to a visual target (a light-emitting diode) on the loudspeaker hoop, and visual feedback was provided by moving the target to the response location. Next the listener completed three sessions (60 trials each) during which he or she oriented to auditory targets (broadband noise bursts) and was provided with visual feedback; the overhead lights of the anechoic chamber were turned off for the latter two sessions. Finally, the listener completed two sessions with auditory targets without visual feedback.

After the training procedure, we measured the listener's threshold to a click stimulus at 0° azimuth. A one-up, three-down, two-interval, forced-choice procedure was used. Each measurement included eight reversals, with the step size decreasing progressively from 4 to 1 dB. The average level at the last four reversals was computed. Three such measurements were made during a single session; the range of the three values was no more than 5.5 dB for any listener. The listener's threshold was calculated as the mean of the three measurements.

Following these preliminary sessions, each listener participated in sessions designed to measure summing localization and time-intensity trading. Listeners stood in the center of the anechoic chamber in complete darkness. They were not told of the hidden stationary loudspeakers at ±46° or that some stimuli would be presented from two loudspeakers. Each listener was instructed to oriented his or her head to face the perceived location of the loudest or most prominent sound. Although we did not expect listeners to perceive the two clicks separately, they occasionally reported hearing more than one sound. The conditions under which this percept occurred were unclear since we didn't systematically collect this information from listeners. Each trial was initiated when a continuous noise was presented from the centering loudspeaker. The noise cued the listener to face that loudspeaker, position his or her head on the chin rest, and press a button on a hand-held response box. The button press terminated the noise. One second following the button press, the listener was presented with the stimulus, either a single click from one of the movable hoop loudspeakers or a click pair from the two stationary loudspeakers. The listener then oriented his or her head to face the perceived location of the sound and pressed the response button, which triggered measurement of head orientation. The hoop was then positioned for the next trial. To eliminate adventitious cues about the stimulus location, the hoop was moved after each trial, even when the hoop position was the same for two consecutive trials. Following hoop movement, noise was presented from the centering loudspeaker and the cycle began again.

Within each session, the stimulus set consisted of single clicks and click pairs interleaved in pseudo-random order with 63-67% of stimuli being click pairs. The sound level varied randomly from trial to trial within a range of 40-50 dB above threshold. Single clicks were presented from azimuths of -70 to +70° in 10° steps. To avoid obstruction of the centering loudspeaker by the hoop, it was necessary to present single clicks from an elevation of 5° above the horizon. Click pairs were presented from loudspeakers at azimuths of -46 and +46° (1 click from each loudspeaker) with a variable ISD, a variable ISLD, or both. Each session employed one of four stimulus sets: variable ISD (range -0.8 to +0.8 ms) with ISLD = 0; variable ISD (range -1.4 to +1.4 ms) with ISLD = 0; variable ISLD (range -27 to +27 dB) with ISD = 0; and variable ISD (range -0.8 to +0.8 ms) and variable ISLD (-5, 0, and +5 dB). Each session lasted ~10 min, and listeners completed three to six sessions per day. Each listener completed 3-12 repetitions of each stimulus set for a total of 12-27 sessions.

Acoustical measurements, computational model, and psychophysical templates

We made physical acoustical measurements from the ears of humans and cats to characterize the proximal stimulus created at the eardrums by paired clicks at ISDs <1 ms. At these short delays, incident sound waves from the two sources superposed as they interacted physically with the head and pinnae, resulting in complex interaural time and level differences. We measured the directional impulse response for each ear and each source location by presenting broadband sounds and recording from the ear canals with miniature microphones, essentially as previously described (Middlebrooks and Green 1990; Xu and Middlebrooks 2000). Four hundred uniformly spaced source locations at various elevations and azimuths were used for the human measurements; 24 source azimuths confined to the horizontal plane (20° spacing for rear locations, 10° spacing for frontal locations) were used for the cat measurements. The proximal stimulus at each eardrum was simulated by convolving the directional impulse response with a 100-µs rectangular impulse and, in the case of click pairs, summing the signals of the two sources after incorporating the desired ISD and ISLD.

We implemented a simple computational model to determine which aspects of summing localization and time-intensity trading might be accounted for by filtering by the head and pinnae, critical-band filtering by the basilar membrane, low-pass filtering by hair cells, and delay-line cross-correlation (representing circuits in the lower brain stem). First, critical-band filtering of the proximal stimulus was achieved with a MATLAB implementation of a gammatone filterbank (Slaney 1993), using center frequencies of 625 Hz to 20 kHz in 1/3-octave steps. High-frequency channels (>1 kHz) were full-wave rectified; all channels were then low-pass filtered <1 kHz using a fourth-order Butterworth filter. Finally, for each critical band, the signals from the two ears were cross-correlated over a lag range of -20 to +20 ms, and the lag of the maximum of the cross-correlation function was taken as the output of the model. At the lowest center frequencies, cross-correlation functions often had multiple prominent peaks, which led to discontinuities in model output such as those shown in Fig. 10B. This computational model is based on previous models that employ interaural cross-correlation (reviewed by Stern and Trahoitis 1997) and resembles models developed to describe free-field localization of paired sounds (Blauert and Cobben 1978; Macpherson 1991; Pulkki et al. 1999). It differs somewhat from binaural models developed to describe lateralization of stimuli presented over headphones because those models do not include filtering by the head and pinnae (Gaskell 1983; Lindemann 1986; Tollin and Henning 1999; Zurek 1980).

We sought to compare quantitatively our physiological results to the responses of listeners. Ideally, one would compare cat physiological results with cat behavioral responses obtained under similar stimulus conditions. Such data are available (Populin and Yin 1998), but they are not sufficiently detailed for our purposes. We therefore chose to compare our physiological results to human psychophysical data, which are more readily available. To make this comparison, we used human psychophysical data to construct psychophysical standard curves, or templates, of azimuth versus ISD and azimuth versus ISLD. First, mean responses at each value of ISD or ISLD were averaged across the five listeners. The data were then symmetrized by averaging responses to stimuli that were symmetric with respect to the midline (e.g., the absolute value of the response at ISD = +0.4 ms was averaged with the absolute value of the response at ISD = -0.4 ms). The averaged symmetrized data were then fit with a logistic function
sin (&thgr;)=sin (<IT>a</IT>)(<IT>10</IT><SUP><IT>x</IT><IT>&cjs0823;  </IT><IT>b</IT></SUP><IT>−1</IT>)<IT>&cjs0823;  </IT>(<IT>10</IT><SUP><IT>x</IT>&cjs0823;  <IT>b</IT></SUP><IT>+1</IT>)
where theta  is estimated azimuth, x is either ISD or ISLD, and a and b are fit parameters. This fitting procedure was chosen for three reasons: phasor analysis suggests a physical basis for this functional form for simultaneous low-frequency sounds with a variable ISLD (Bauer 1961); our psychophysical data showed a dependence on ISD and ISLD with the same general shape; and the procedure resulted in smooth curves that could be scaled to account for acoustical differences between cat and human (see following text). Fitting was performed in MATLAB using nonlinear optimization to minimize the squared error of theta . The resulting fit parameters based on human psychophysical responses were a = 31.2° and b = 0.477 ms for azimuth versus ISD, and a = 39.8° and b = 13.6 dB for azimuth versus ISLD. Finally, to create psychophysical templates for comparison with cat physiological data, two adjustments were made to these curves: the estimated azimuth was multiplied by a factor of sin (50°)/sin (46°) to account for the slightly different loudspeaker locations used in the cat experiments and for the ISD curve, the fit parameter b was divided by a factor of 1.64 to account for the smaller effective head size of a cat.

We calculated the factor of 1.64 as follows. Lag values were computed for azimuths of -60 to +60° in 10° steps using the cross-correlation model described in the preceding text. For each frequency band, the best-fitting line was determined by least-squares fitting; the slope (µs/°) and its variance were retained. Slopes were determined from both human and cat acoustics, and a ratio of slopes was calculated within each frequency band. This ratio varied irregularly with frequency. Finally, a weighted-mean ratio was computed across frequency bands; weighting was based on the variance of the slope obtained during least-squares fitting. Using acoustical data from one 4.5-kg cat, we found weighted-mean ratios of 1.62 and 1.67 for listeners S75 and S79; the average was 1.64. Given the inter-subject variability of interaural delays found among humans (Middlebrooks 1999) and cats (Roth et al. 1980), the factor of 1.64 should be viewed as a rough estimate. Nonetheless, considering that relatively large male cats were used in our study, this value is consistent with previous acoustical measurements of interaural delays in cats (Roth et al. 1980) and humans (Middlebrooks 1999; Middlebrooks and Green 1990).


    RESULTS
TOP
ABSTRACT
INTRODUCTION
METHODS
RESULTS
DISCUSSION
REFERENCES

We performed human psychophysical experiments and cat physiological experiments under similar stimulus conditions. We first present the human psychophysical results and thereby demonstrate summing localization, time-intensity trading, and localization dominance using broadband click stimuli. Then we describe the responses of cortical neurons to the same stimuli and analyze these responses in a way that permits comparison to the psychophysical responses. Finally, we use a simple computational model to investigate the extent to which our physiological results might be explained by peripheral filtering and interaural cross-correlation.

Psychophysics

Each listener participated in a localization task in which single clicks and click pairs were presented and the listener oriented his or her head to face the perceived location of the sound. Figure 1 shows the responses of two individual listeners (Fig. 1, top and middle) and mean responses across five listeners (Fig. 1, bottom). As expected, when single clicks were presented from various frontal azimuths, listeners localized the sounds with considerable accuracy (Fig. 1, 1st column). To quantify localization accuracy, we calculated the centroid error, which is the unsigned error of the mean response at each target location, averaged across locations. The centroid error ranged from 3.8 to 5.8° among the five listeners tested. The quartile deviation, which is the range of azimuth that included half of a listener's responses (see METHODS), was used to characterize response variability (gray areas in Fig. 1). Values for single clicks, averaged across source azimuth, ranged from 9 to 16° among the five listeners.



View larger version (39K):
[in this window]
[in a new window]
 
Fig. 1. Psychophysical demonstration of summing localization, time-intensity trading, and localization dominance. Human subjects listened to single clicks from various frontal azimuths and click pairs from loudspeakers at -46 and +46° with a variable inter-stimulus delay and level difference (ISD and ISLD). The ordinate of each panel represents listeners' judgements of location. open circle , the centroid (circular mean) response; , quartile deviations, which contain 50% of the responses. Horizontal lines indicate the source locations for paired clicks. First column: responses to single clicks as a function of azimuth. Second column: responses to paired clicks with a variable ISD and 0 ISLD. Third column: responses to simultaneous paired clicks with a variable ISLD. Fourth column: both the ISD and the ISLD of paired clicks were varied: ISLDs of -5 dB (bottom curve), 0 dB (middle curve), and +5 dB (top curve) were superposed at each ISD. (Quartile deviations have been omitted for clarity.) Top and middle show responses for 2 of 5 listeners. Bottom shows the mean of centroid responses across 5 listeners; error bars represent SDs. For clarity, error bars have been omitted for non-0 ISLDs in L. The asymmetry evident in K was not found consistently across subjects.

Trials using paired clicks were randomly interspersed with trials using single clicks. Paired clicks were always presented from two loudspeakers located 46° to the left and right of midline; one click was presented from each loudspeaker. The ISD, the ISLD, or both ISD and ISLD were varied. When click pairs were presented at equal intensity with a variable ISD, listeners localized click pairs to intermediate azimuths (Fig. 1, 2nd column). When the clicks were simultaneous (ISD = 0), listeners pointed near 0°. As the magnitude of the ISD was increased, listeners' judgements shifted laterally, reaching a maximum at an ISD of ~0.8 ms. At ISDs of 1.0-1.4 ms, listeners pointed near the leading loudspeaker, but all exhibited an appreciable undershoot. After compensating for small biases in responses to single clicks at azimuths of ±40 and ±50°, the mean undershoots were 7, 22, 18, and 18° for the four listeners tested at these delays. The variability in listeners' responses with this stimulus set, as measured by the quartile deviation averaged across stimulus conditions and listeners, was 13.8° for click pairs compared with 11.6° for single clicks tested during the same sessions.

When click pairs were presented simultaneously and only the ISLD was varied (Fig. 1, 3rd column), listeners' responses depended systematically on the ISLD. When the ISLD was zero, listeners pointed near 0°. As the absolute ISLD was increased, listeners pointed to increasingly lateral azimuths, in the direction of the more intense loudspeaker. Beyond a level difference of ~15 dB, listeners' estimates fell near the more intense source. At ISLDs beyond 15 dB, listeners commonly undershot the more intense source, but the undershoot was somewhat smaller than it was when the ISD was varied. Curiously, the averaged quartile deviation with this stimulus set was smaller for click pairs (9.3°) than for single clicks tested during the same sessions (13.4°).

When both a delay and a level difference were imposed, both variables influenced listeners' responses. When an ISLD of +5 dB was superposed on an ISD, so that the loudspeaker to the right (+46°) was more intense (Fig. 1, 4th column, top curve), judgements shifted toward the right (upward in the figure). Similarly, when the ISLD was -5 dB (bottom curve), so that the left loudspeaker (-46°) was more intense, judgements shifted toward the left (downward). Thus at a given ISD, a nonzero ISLD biased listeners' responses toward the more intense loudspeaker. Alternatively, at each ISLD tested (each curve in the figure), a nonzero ISD biased responses toward the leading loudspeaker. Although responses varied among listeners, when the ISD was 0.6-0.8 ms, an opposing ISLD of 5 dB generally shifted judgements back to the midline (Fig. 1L). The averaged quartile deviation with this stimulus set was 18.8° for click pairs compared with 13.2° for single clicks tested during the same sessions.

Physiology

We recorded spike activity of single units and multi-units from areas A1 and A2 on the right side of anesthetized cats while presenting click stimuli. Single clicks were presented from various frontal azimuths; paired clicks were presented from -50 and +50° while varying the ISD, the ISLD, or both.

We first characterized each unit's responses to single clicks. Most units exhibited some sensitivity to sound-source azimuth. Figure 2A shows a raster plot of responses of one such unit. This unit responded more strongly to clicks from locations contralateral to the recording site, and less strongly to ipsilateral clicks (Fig. 3A). To directly compare neuronal responses to the responses of psychophysical listeners in a localization task, we analyzed spike patterns using an artificial neural network as a general-purpose pattern-recognition algorithm. The network recognized location-specific spike patterns and produced estimates of sound-source location. For each unit, a naive network was trained to associate spike patterns recorded on odd-numbered trials with various frontal azimuths. Then the trained network, when presented with the spike patterns recorded on even-numbered trials, produced estimates of source azimuth. Such an analysis of the unit represented in Figs. 2A and 3A resulted in the estimates of azimuth shown in Fig. 3B. Network estimates fell near the perfect performance line for most locations, indicating that this unit's responses reliably signaled the locations of single clicks. The centroid error was 8.2°---worse than psychophysical values but significantly better than the chance level of ~32.3° (see METHODS).



View larger version (39K):
[in this window]
[in a new window]
 
Fig. 2. Raster plot of responses of a unit to single clicks and paired clicks. A: single clicks were presented from various frontal azimuths at a level 25 dB above unit threshold. Unit responses are displayed in raster format as a function of poststimulus time. Each rectangular box contains 10 trials at a particular sound-source azimuth. black-triangle-left , the locations of loudspeakers used to present paired sounds. B: paired clicks were presented from 2 loudspeakers at -50 and +50° azimuth at a level 31 dB above threshold. The ISD was varied from -4 ms (-50° leading) to +4 ms (+50° leading). Ten repetitions are shown.



View larger version (27K):
[in this window]
[in a new window]
 
Fig. 3. Responses of a unit to single clicks and paired clicks. This unit is the same one depicted in Fig. 2. Left: responses to single clicks presented from various frontal azimuths. Right column: responses to paired clicks presented from two loudspeakers at -50 and +50° azimuth, with a variable ISD. A: the spike rate (mean number of spikes per trial) is plotted vs. azimuth for 40 repetitions at a level 25 dB above threshold. Error bars represent the SE of the mean. , responses to the individual loudspeakers that were used to present paired sounds. B: unit spike patterns obtained at 15, 25, and 35 dB above threshold were analyzed using an artificial neural network. The output of the network is plotted vs. azimuth. open circle , the centroid (circular mean) of the network output; , quartile deviations, which contain 50% of network estimates. , network output for the individual loudspeakers that were used to present paired sounds. The diagonal line represents perfect performance. The centroid error for this unit was 8.2°. C: the spike rate is plotted vs. ISD for 20 repetitions at a level 31 dB above threshold. Error bars indicate the SE of the mean. D: unit spike patterns obtained at 21 and 31 dB above threshold were analyzed using an artificial neural network. The output of the network is plotted vs. ISD. Horizontal lines indicate the source locations for paired clicks. This unit was located in cortical area A2; it responded to pure-tone frequencies of 3 to 15 kHz.

This analysis was applied to each of the 149 units of our unit population that were tested with click stimuli. For the 61 units examined at two stimulus levels, the centroid error ranged from 7.4° (near psychophysical values) to 32.5° (near chance levels) with a median value of 18.3°. For the 88 units examined at three stimulus levels, the centroid error ranged from 7.3 to 32.3° with a median value of 16.4°. These distributions did not differ significantly (P = 0.56, chi 2 test), so we pooled the two groups of units together for subsequent analyses. Centroid errors for single units (median, 14.9°; range, 7.3 to 32.3°) were similar to those for multi-unit recordings (median, 17.2°; range, 7.9 to 32.5°).

Unit responses to click pairs at various ISDs resembled the responses to single clicks at various azimuths. The unit described in Figs. 2 and 3, for example, typically responded with a single spike or burst of spikes when paired clicks were presented from -50 and +50° (Fig. 2B). At negative ISDs (-50° loudspeaker leading), this unit responded more strongly, resembling the response to a single click from a contralateral location (Fig. 3C). At ISDs near zero, the unit responded with fewer spikes. At large positive ISDs (+50° loudspeaker leading), the spike rate was reduced, resembling the response to a single click from an ipsilateral location. That is, a leading click from +50° suppressed the response to a lagging click from -50°. We analyzed the responses to click pairs using the artificial neural network that had been previously trained with responses to single clicks as described in the preceding text. According to this analysis, the unit depicted in Figs. 2 and 3 associated click pairs with source azimuths much as a human listener would (Fig. 3D). When the absolute ISD was greater than or equal to ~1 ms, network estimates fell near the leading loudspeaker, although there was an undershoot when the ipsilateral source led. At smaller absolute ISDs, the unit signaled intermediate locations, with a general shift across the midline as the ISD progressed from about -1 to +1 ms.

Another unit is represented in Fig. 4. In response to single clicks from various azimuths, this unit showed an ipsilateral preference, which was unusual among our unit sample (Fig. 4A). The unit's spike patterns signaled the locations of single clicks fairly accurately: the centroid error was 12.9° (Fig. 4B). In response to click pairs at various ISDs, the unit responded with more spikes when the ipsilateral loudspeaker led (Fig. 4C). Although a click presented from -50° evoked very little response, it was sufficient to reduce the response to a lagging ipsilateral click at ISDs between -0.2 and -4 ms. Furthermore responses to click pairs over an ISD range of roughly -1 to +1 ms (Fig. 4C) resembled the responses to single clicks over an azimuth range of -50 to +50° (Fig. 4A). Analysis using an artificial neural network showed that the click-pair responses of this unit signaled contralateral locations when the contralateral loudspeaker led (at negative ISDs), and ipsilateral locations when the ipsilateral loudspeaker led by 0.2 to 1.2 ms (Fig. 4D). At ISDs greater than +1.2 ms, however, spike counts decreased and network estimates fell near the midline, in disagreement with psychophysical results. This reduced response at ISDs greater than +1.2 ms indicates a backward suppression of the response to the source at +50° caused by a stimulus presented from -50° (see DISCUSSION).



View larger version (28K):
[in this window]
[in a new window]
 
Fig. 4. Responses of a unit to single clicks and paired clicks. The conventions of Fig. 3 are used. A: the spike rate is plotted vs. azimuth for 40 repetitions at a level 20 dB above threshold. B: spike patterns obtained at 10, 20, and 30 dB above threshold were analyzed using an artificial neural network (centroid error: 12.9°). C: the spike rate is plotted vs. ISD for 40 repetitions at a level 20 dB above threshold. D: spike patterns obtained at 10, 20, and 30 dB above threshold were analyzed using an artificial neural network. This unit responded only weakly to pure tones and was located ventral to area A1. Thus the unit was likely in area A2, but by our criteria, the cortical area could not be determined.

Because pairs of noise bursts are known to elicit summing localization and localization dominance in a manner similar to click pairs, we tested a small number of units with 3-ms noise bursts and noise-burst pairs. Six of the eight units examined with noise bursts were among those examined with clicks. In response to single or paired noise bursts, units typically fired a brief burst of spikes. Responses to single noise bursts signaled source location with accuracy similar to that for clicks: among the eight units, centroid errors ranged from 9.4 to 26.0° (median, 16.0°). Units generally responded to paired noise bursts in a manner consistent with summing localization and localization dominance; analysis of one such unit is shown in Fig. 5. This finding indicated that units' signaling of location generalized to another type of broadband stimulus.



View larger version (28K):
[in this window]
[in a new window]
 
Fig. 5. Responses of a unit to single 3-ms noise bursts and paired 3-ms noise bursts. The conventions of Fig. 3 are used. A: the spike rate is plotted versus azimuth for 40 repetitions at a level 25 dB above threshold. B: spike patterns obtained at 25, 35, and 45 dB above threshold were analyzed using an artificial neural network (centroid error: 9.4°). C: the spike rate is plotted vs. ISD for 20 repetitions at a level 25 dB above threshold. D: spike patterns obtained at 25, 35, and 45 dB above threshold were analyzed using an artificial neural network. This unit was located in cortical area A2; it responded most strongly to pure-tone frequencies of 7.5 to 19 kHz.

In addition to sensitivity to ISD, cortical units showed sensitivity to the ISLD of click pairs. Figure 6 shows artificial-neural-network analysis of one unit. When click pairs were presented simultaneously (ISD = 0) with a variable ISLD, the unit signaled locations near the midline at small ISLDs, and locations closer to the more intense loudspeaker at greater absolute ISLDs (Fig. 6B). Conversely, when click pairs were presented at equal intensity (ISLD = 0) with a variable ISD, the unit signaled locations on the side of the leading loudspeaker (Fig. 6C, central curve). When a nonzero ISLD was superposed at a given ISD, both the ISD and the ISLD influenced network estimates (Fig. 6C). An ISLD biased network estimates toward the more intense loudspeaker. That is, the curve shifted downward (toward the left loudspeaker) when the ISLD was negative (left loudspeaker more intense) and upward when the ISLD was positive. Alternatively, at each ISLD (for each curve), introducing a nonzero ISD generally shifted network estimates toward the leading loudspeaker. The shift was particularly evident at ISDs between -1 and +1 ms. The flattening of the curves in Fig. 6C may be attributed to the severe undershoot seen in response to single clicks at ±50° (Fig. 6A); this effectively reduced the range of azimuth that was accessible to the network. Nonetheless the activity of this unit was at least qualitatively consistent with time-intensity trading.



View larger version (25K):
[in this window]
[in a new window]
 
Fig. 6. Artificial-neural-network analysis of the responses of a unit to single clicks and paired clicks. The conventions of Fig. 3, B and D, are used. A: unit responses to single clicks at 15, 25, and 35 dB above threshold were analyzed (centroid error: 12.9°). B: analysis is shown for unit responses to simultaneous paired clicks at 15, 25, and 35 dB above threshold. Network output is plotted vs. ISLD. C: unit responses to paired clicks at 15, 25, and 35 dB above threshold with various ISDs and ISLDs were analyzed. The centroid of the network output is shown as a function of ISD; quartile deviations have been omitted for clarity. From top to bottom, the 5 curves correspond to ISLDs of +18, +9, 0, -9, and -18 dB. This unit is the same one depicted in Fig. 5.

Among units that accurately localized single stimuli, a sizable minority responded to paired stimuli in ways inconsistent with localization dominance and summing localization. As described in the preceding text, the responses of the unit represented in Fig. 4 agreed with psychophysical results at most delays but not when the ISD was between +1.4 and +4 ms (Fig. 4D). An even clearer contrary example is shown in Fig. 7. This unit responded to single clicks from all azimuths tested with a preference for contralateral locations, and accurately signaled source location, with a centroid error of 7.9° (Fig. 7, A and B). Nonetheless, the unit showed little sensitivity to the ISD of click pairs (Fig. 7, C and D). Furthermore, when click pairs were presented simultaneously with a variable ISLD, this unit responded strongly at most ISLDs tested, showing a decrease in response only at ISLDs greater than +15 dB (Fig. 7, E and F). Thus this unit deviated markedly from expectations based on psychophysical results.



View larger version (30K):
[in this window]
[in a new window]
 
Fig. 7. Responses of a unit to single clicks and paired clicks. The conventions of Fig. 3 are used. Left: responses to single clicks presented from various frontal azimuths. Middle: responses to paired clicks with a variable ISD (and 0 ISLD). Right: responses to paired clicks with a variable ISLD (and 0 ISD). A: the spike rate is plotted vs. azimuth for 20 repetitions at a level 25 dB above threshold. B: spike patterns obtained at 15, 25, and 35 dB above threshold were analyzed using an artificial neural network (centroid error: 7.3°). C: the spike rate is plotted vs. ISD for 20 repetitions at a level 25 dB above threshold. D: spike patterns obtained at 15, 25, and 35 dB above threshold were analyzed using an artificial neural network. E: the spike rate is plotted vs. ISLD for 20 repetitions at a level 25 dB above threshold. F: spike patterns obtained at 15, 25, and 35 dB above threshold were analyzed using an artificial neural network. This unit was located in cortical area A1; the best frequency was 24 kHz.

We used the following procedure to quantify the extent to which each unit's responses agreed with psychophysical measurements of summing localization and localization dominance. First, we constructed psychophysical templates of azimuth versus ISD and azimuth versus ISLD (solid curves, Fig. 8, A-C, insets). The templates were based on our human psychophysical results and scaled according to physical acoustical measurements from humans and cats (see METHODS). Then for each unit, we found the centroid difference---the unsigned difference of the mean physiological response from the psychophysical template at specified values of ISD or ISLD (open circle , Fig. 8, A-C, insets), averaged across ISD or ISLD. The centroid difference is a measure of the overall disagreement of average physiological responses from the psychophysical template. ISD-based summing localization was evaluated at ISDs of -0.4, -0.2, 0, +0.2, and +0.4 ms; localization dominance at ISDs of -3, -2, -1, +1, +2, and +3 ms; and ISLD-based summing localization at ISLDs from -18 to +18 dB in 3-dB steps. The centroid difference for each unit was then plotted against the unit's centroid error for localizing single stimuli. The results are shown in the scatterplots of Fig. 8. In each panel, the various types of symbol represent units recorded from specific cortical fields; black-triangle, , , black-diamond  signify units used as examples in other figures. Each of the centroid-difference measures showed a significant correlation with the centroid error (ISD-based summing localization: r2 = 0.085, P < 0.01; localization dominance: r2 = 0.61, P < 0.001; ISLD-based summing localization: r2 = 0.16, P < 0.02). The positive correlation of the two measures indicates that units that localized single stimuli most accurately also tended to show the strongest summing localization and localization dominance. Nonetheless, among units that accurately localized single sounds, a sizable minority showed weak ISD-based summing localization, as indicated by symbols in the upper left quadrant of Fig. 8A. This minority of units contributed to the weak correlation between centroid difference and centroid error noted in the preceding text. A smaller proportion of units that accurately localized single sounds failed to show localization dominance (Fig. 8B, top left quadrant) or ISLD-based summing localization (Fig. 8C, top left quadrant). By this analysis, no consistent differences were found between cortical areas or between responses to clicks and responses to 3-ms noise bursts (also plotted in Fig. 8, A and B).



View larger version (21K):
[in this window]
[in a new window]
 
Fig. 8. Locations signaled by unit responses compared with psychophysical templates. In each panel, the centroid difference (based on paired-click responses) is plotted against centroid error (based on single-click responses). The centroid difference is the unsigned difference of the network output from the psychophysical template (insets, ---), averaged across specified values of ISD or ISLD (marked by open circle  in the insets). The vertical axes of the insets range from -50 to +50°; tick marks represent intervals of 25°. Units tested with clicks were located in cortical areas A1 (open circle ), A2 (), and dorsal A1 (+); for some units, the cortical area was not determined (triangle ). Units tested with 3-ms noise bursts were located in cortical area A2 (diamond ). , black-triangle, , and black-lozenge , units depicted as examples in other figures. ---, median values for the population; · · · , values expected by chance. A: ISD-based summing localization. The centroid difference was calculated using network estimates at ISDs of -0.4, -0.2, 0, +0.2, and +0.4 ms. B: localization dominance. The centroid difference was calculated using network estimates at ISDs of -3, -2, -1, +1, +2, and +3. C: ISLD-based summing localization. ISLDs of -18 to +18 dB in 3-dB steps were used to calculate the centroid difference.

To determine the locations that were signaled by our unit population as a whole, we performed artificial-neural-network analyses of small ensembles of units. Similar analyses have shown that ensemble networks more accurately classify unit responses to single broadband noise bursts than do individual-unit networks (Furukawa et al. 2000). Furthermore, we have found that ensemble networks are particularly accurate for azimuthal targets near the extremes of the training range (data not shown). Note that ensemble analysis takes into account all units in the ensemble, including those that contradict psychophysical predictions such as the unit in Fig. 7. Among our unit population, 123 units were tested with a variable ISD only, 16 were tested with a variable ISD and a variable ISLD, and 46 were tested with a variable ISLD only. For each of these three subpopulations, we selected the 25% of the subpopulation with the lowest centroid errors (derived from responses of individual units to single clicks) for use in ensemble analysis. This selection process resulted in three ensembles consisting of 30, 4, and 11 units, respectively. Networks were trained with a set of ensemble responses to single clicks, and validated by testing with an independent set of ensemble responses to single clicks (Fig. 9, A-C, insets). We found that unit ensembles signaled the locations of single clicks with considerable accuracy: centroid errors for the respective ensembles were 8.1, 8.2, and 8.3°. When the first ensemble was tested with paired clicks at various ISDs, the signaled locations shifted from the midline toward the leading loudspeaker as the magnitude of the ISD was increased from 0 to 0.4-0.6 ms (Fig. 9A). At greater delays, network estimates fell near, but somewhat short of, the location of the leading loudspeaker. This undershoot was prominent at most delays, it was greater when the ipsilateral source led, and it could not be accounted for by an undershoot in response to single clicks (Fig. 9A, inset). The thin curve in Fig. 9A represents the psychophysical template (described in the preceding text); the network estimates fell close to the prediction at negative ISDs (contralateral leading) but less so at positive ISDs. The second ensemble was tested with click pairs over a range of ISDs at five ISLDs; each ISLD is represented by a distinct curve in Fig. 9B. At a given ISD, superposition of an ISLD biased network estimates toward the more intense source. The bias was notably asymmetric, being stronger at negative ISDs than at positive ISDs. The third ensemble, in response to paired clicks at various ISLDs, signaled locations spanning -50 to +50° (Fig. 9C). Network estimates reached an asymptote when the absolute ISLD reached 15-20 dB, and they showed little undershoot at extreme ISLDs. This ensemble signaled locations that agreed well with the psychophysical template (Fig. 9C, thin curve).



View larger version (35K):
[in this window]
[in a new window]
 
Fig. 9. Locations signaled by small ensembles of units in response to paired clicks. Each panel shows artificial-neural-network analysis of a small ensemble of units. Insets: analysis of ensemble responses to single clicks; network estimates are plotted vs. source azimuth; tick marks represent intervals of 25°. Analysis of ensemble responses to paired clicks is shown in the main part of each panel. Horizontal lines indicate the source locations for paired clicks. A: for an ensemble of 30 units, responses to paired clicks at various ISDs were examined. Open circles mark the centroid of the network output; gray shaded areas represent quartile deviations. The thin curve is the psychophysical template of azimuth versus ISD. B: for an ensemble of 4 units, responses to paired clicks at various ISDs and ISLDs were examined. Network output is plotted vs. ISD; quartile deviations have been omitted for clarity. From top to bottom, the 5 curves correspond to ISLDs of +18, +9, 0, -9, and -18 dB. C: for an ensemble of 11 units, responses to simultaneous paired clicks at various ISLDs were examined. Open circles mark the centroid of the network output; gray shaded areas represent quartile deviations. The thin curve is the psychophysical template of azimuth vs. ISLD.

Acoustical modeling

When a listener is presented with paired sounds such as those used in the current study, rather complex acoustical cues can result. Several factors transform the signal before it even reaches the nervous system: summation of sound waves in the air, interaction of the resultant sound waves with the head and pinnae, and critical-band filtering in the cochleae. Under these conditions, it is important to distinguish the proximal interaural time differences (ITDs) and interaural level differences (ILDs) from the applied ISDs and ISLDs. For instance, because of phasor addition, paired sounds presented simultaneously (ISD = 0) with a nonzero ISLD will produce a nonzero ITD at low frequencies (Bauer 1961; Blauert 1997). Given such complications, we wondered to what extent the phenomena we studied might be explained by relatively well-described auditory processes such as acoustical filtering by the head and pinnae, cochlear filtering, and sensitivity of brain stem circuits to ITDs. We expected that summing localization would be explicable by such mechanisms, but we wondered, in particular, whether time-intensity trading and localization dominance might also be explained in this way.

We investigated the degree to which physiological correlates of summing localization, time-intensity trading, and localization dominance could be explained by a simple computational model. The model was based on simple superposition of sounds in the sound field, peripheral filtering, and delay-line cross-correlation of inputs to the two ears (see METHODS). We reasoned that after accounting for these factors, any physiological results that remained unexplained by the model were likely to arise from central auditory processing beyond interaural cross-correlation. For given binaural input waveforms, the model produced a frequency-dependent interaural lag. Figure 10 shows model output for three representative frequency bands. As expected, single clicks produced an interaural lag that shifted systematically with the azimuth of the source, regardless of the frequency band (Fig. 10, left).



View larger version (38K):
[in this window]
[in a new window]
 
Fig. 10. Model analysis of summing localization and time-intensity trading. Each panel shows the output (interaural lag) of a computational model (see METHODS). Left: interaural lag is plotted against source azimuth for single clicks. Middle: interaural lag is plotted vs. ISD for paired clicks. From top to bottom, the 5 curves represent ISLDs of +18, +9, 0, -9, and -18 dB. Right: interaural lag is plotted vs. ISLD for simultaneous paired clicks. Each row shows model output for a representative frequency band. A-C: the frequency band centered at 1 kHz typified 0.6-2 kHz. D-F: the 5-kHz frequency band typified 2-6 kHz. G-I: the 10-kHz frequency band typified 6-20 kHz. The center curve in B is interrupted because interaural lags with magnitudes >500 µs were omitted for clarity. These results derive from acoustical measurements from cats. Results obtained from human measurements were similar, except that the ISD strongly and systematically influenced interaural lag even at the lowest frequencies.

Paired clicks, on the other hand, produced outputs that were more varied and frequency dependent. Three frequency domains, with boundaries at ~2 and ~6 kHz, were distinguished on the basis of the model results. At the lowest frequencies (0.6-2 kHz), the interaural lag showed a systematic dependence on ISLD (Fig. 10C) as expected from the phase differences that are generated by ISLDs at low frequencies (Bauer 1961; Blauert 1997). In contrast, when the ISD was varied, the interaural lag tended to jump unstably at low frequencies (Fig. 10B) due to the presence of multiple nearly equal peaks in cross-correlation functions. At middle frequencies (2-6 kHz), the ISD influenced the interaural lag in a rather unstable manner, the dependence on ISLD was somewhat less than the dependence at low frequencies, and a weak effect that resembled time-intensity trading was appreciated (Fig. 10, E and F). At high frequencies (6-20 kHz), the ISD dominated and the ISLD had relatively little influence on interaural lag (Fig. 10, H and I).

Because nearly all the units we studied were most sensitive to middle or high frequencies, the model appeared to account for physiological correlates of ISD-based summing localization (Fig. 10, E and H). In contrast, the model did not appear to explain time-intensity trading. Many of the units we studied---and in particular, the units in Figs. 6 and 9B that showed correlates of time-intensity trading---did not respond to pure tones <6 kHz. Assuming that these units were not influenced by frequencies <6 kHz, if unit responses were based only on peripheral filtering and interaural cross-correlation, the responses would be largely insensitive to ISLD, as shown in Fig. 10, H and I. Because these units did show ISLD sensitivity, to explain their responses one must invoke brain mechanisms beyond interaural cross-correlation (e.g., processing of interaural level differences). The model also failed to account for localization dominance. At ISDs beyond ~0.4 ms, the interaural lag did not approach an asymptote in any frequency band, but continued to increase beyond lag values encountered with single clicks (data not shown).


    DISCUSSION
TOP
ABSTRACT
INTRODUCTION
METHODS
RESULTS
DISCUSSION
REFERENCES

We have studied spatial signaling by cortical neurons in response to pairs of sounds that are known to fuse perceptually. With notable exceptions, most units signaled locations that would be reported by a listener. We begin this section by discussing the psychophysical findings in our study and in studies by other groups. We then compare our physiological results to the psychophysical predictions and to the results of previous physiological studies.

Psychophysics

Because many aspects of the precedence effect appear to be sensitive to the type of stimulus used (Blauert 1997; Litovsky et al. 1999), we felt it was important to compare our physiology to psychophysical results obtained in a localization task using the same stimuli. Therefore we performed human localization experiments using the same free-field click stimuli as were used in the physiological experiments. Furthermore the listeners performed a task in which they made absolute judgements of location (rather than a task based on discrimination). Thus we were able to directly compare artificial-neural-network estimates to psychophysical listeners' estimates because both are continuous measures in degrees of azimuth.

Our results largely support previous psychophysical studies of summing localization (reviewed by Blauert 1997). Of past studies of summing localization, the most comparable is one by Wendt (1963). He presented broadband impulses from loudspeakers at -30 and +30° azimuth and varied the ISD or ISLD. Wendt found that as the ISD was varied from 0 to ~0.5 ms, the reported azimuth progressed from 0 to ~20°. At delays of 0.5-1 ms, the image location moved slightly more laterally toward ~25°. Consistent with these results, we found that an ISD in the range of 0-0.8 ms systematically shifted location judgements toward the leading source. When Wendt presented impulses simultaneously with a variable ISLD, the perceived azimuth moved systematically from the midline toward the more intense loudspeaker, reaching nearly 90% of the distance to the more intense source at an ISLD of 20 dB. Our ISLD curves are in close agreement with these results.

Previous studies of time-intensity trading have generally used a procedure whereby the ISLD of a stimulus pair is adjusted to offset a constant ISD, thereby moving the sound image to the midline (e.g., Chiang and Freyman 1998; Snow 1954). In contrast, we measured listeners' judgements of location when an ISLD was superposed at a given ISD. Nonetheless, some rough comparisons are possible. A report by Snow (1954) used paired click trains with loudspeakers arranged at -27 and +27°. He found that ISLDs of -5 to -8 dB were required to offset ISDs of +0.45 to +1.8 ms (time-intensity trading ratio, 0.06-0.40 ms/dB). In a more recent study, Chiang and Freyman (1998) found that an ISD of +2 ms between pairs of 4-ms broadband noise bursts was offset by an ISLD ranging from -9 to -19 dB among five listeners (trading ratio, 0.11-0.22 ms/dB). When our listeners were presented with click pairs at an ISD of 0.6-0.8 ms, an opposing level difference of 5 dB typically brought listeners' responses back near the midline (trading ratio, 0.12-0.16 ms/dB). Thus our results fell within the range of previous free-field measurements of time-intensity trading.

Our results support previous studies of localization dominance at ISDs of 1-4 ms. Since at least the 1930s, studies have reported the basic qualitative observation that when a sound is delivered from two loudspeakers with a delay of a few milliseconds between the loudspeakers, the sound is localized near the leading source (reviewed by Blauert 1997; Gardner 1968; Litovsky et al. 1999; Zurek 1987). Our results essentially replicated this observation: listeners' responses were strongly biased toward the leading loudspeaker at delays of 1.0-1.4 ms.

Although the leading sound dominated localization, listeners' responses uniformly undershot the leading location, demonstrating an appreciable influence of the lagging sound at delays of 1.0-1.4 ms. Similar findings have been reported previously. Headphone studies that have measured lateralization of lead-lag pairs of dichotic clicks or noise bursts (e.g., Shinn-Cunningham et al. 1993; Wallach et al. 1949; Yost and Soderquist 1984; Zurek 1980) have shown that a click is lateralized less strongly when followed 1-4 ms later by a lagging click than when presented alone. Free-field studies have also indicated that a lagging sound influences localization. Wendt (1963) found an undershoot of ~20% at an ISD of 1 ms in a localization task. Litovsky and Macmillan (1994) measured minimum-audible-angle thresholds for pairs of 6-ms noise bursts; their analysis showed that the lag stimulus had significant weight in listeners' azimuth discriminations when the ISD was 4 ms. Chiang and Freyman (1998) used a localization task in which listeners compared the apparent location of a test stimulus (a train of paired 4-ms noise bursts with a fixed ISD of 2 ms) with that of a movable reference stimulus (a train of single noise bursts). At sensation levels near the levels we used (40-50 dB), they found undershoots that ranged up to ~20% among four listeners with a fifth listener exhibiting an overshoot of a few degrees (Chiang and Freyman 1998, Table 1). Compared with these studies, the undershoots we found are somewhat greater for most listeners. The difference might be explained by differences in the specific task performed or in the stimuli used.

The undershoot that we and others have found supports the notion that the lagging stimulus influences localization, even at delays at which the precedence effect is strongest. The lagging stimulus could contribute to localization judgements in several ways. It might simply shift the image of the leading stimulus toward the lagging location without affecting the quality or extent of the image. Alternatively, the lagging stimulus might broaden the image so that the image extends toward the lagging location; broadening would move the center of the image closer to the lagging location. Our data appear to favor image broadening over simple shifting because listeners' responses showed greater dispersion for paired clicks than for single clicks. A third explanation, however, cannot be ruled out: listeners might have perceived a complex (and possibly confusing) amalgam of the two sources and then pointed to a single location according to some as-yet-unknown cognitive strategy.

The human psychophysical results of the current study qualitatively agree with behavioral studies in cats. Cranford (1982) trained cats to lateralize single clicks and then tested the cats with paired clicks at various ISDs. At ISDs of ~0.1-5 ms, responses were significantly biased toward the side of the leading sound with the strongest effect at ISDs of ~0.5-2.0 ms. This result is consistent with localization dominance found with human listeners. Populin and Yin (1998) trained cats to respond to single clicks with an oculomotor saccade. When tested with click pairs, the cats moved their eyes toward the leading source at ISDs of 0.1-1 ms. At ISDs less than ~0.3 ms, saccades fell somewhat closer to the midline; at ISDs greater than ~0.3 ms, they fell more laterally but considerably short of the location of the leading source. Furthermore when an ISLD of 5 or 10 dB was imposed, eye movements shifted toward the more intense speaker. Thus the cats responded with saccades that were consistent with localization dominance, summing localization, and time-intensity trading seen in human listeners.

Limitations of the physiological findings

When comparing our cat physiological results to human psychophysical results, two aspects of our methodology should be considered: species differences and anesthesia. Cats are thought to use the same acoustical cues as humans to localize sounds, and behavioral studies have shown that cats localize isolated broadband sounds with an accuracy similar to that of humans (May and Huang 1996; Populin and Yin 1998). Furthermore cats localize paired clicks in a manner qualitatively similar to humans, suggesting that cats also experience the precedence effect over a similar range of ISDs (Cranford 1982; Populin and Yin 1998). Since a cat's head is smaller than a human head, one would expect differences in the time scale of ISD-based summing localization because this aspect of the precedence effect depends on precise differences in arrival times at the two ears. The existing behavioral data are consistent with this expectation (Populin and Yin 1998). In comparing our physiological results to human psychophysical results, we compensated for this expected difference by scaling the human psychophysical results by an amount that we determined by physical acoustical measurements. This scaling procedure is ad hoc but reasonable, and the results do essentially agree with the cat behavioral data that are available (Populin and Yin 1998).

The animals that we recorded from were anesthetized, so the cortical responses obtained are likely to differ from those of a human or cat that is actively localizing sounds. The effects of anesthesia on cortical responses are well appreciated but generally not understood. It is unknown whether anesthesia alters spike patterns in important ways, but anesthesia is known to decrease overall response magnitude, and studies of responses to paired sounds using ISDs beyond echo threshold indicate that anesthesia increases the suppression of responses to lagging sounds (Fitzpatrick et al. 1999; Mickey and Middlebrooks 2000; Reale and Brugge 2000). Cortical responses may also be influenced by behavioral state---e.g., whether the listener is localizing sounds (Benson et al. 1981). Such effects cannot be addressed with an anesthetized preparation such as ours. Thus it is possible that we have failed to appreciate important aspects of cortical responses that would be apparent in the absence of anesthesia. Nevertheless close parallels found between the present physiological results and psychophysical measurements attest to the robustness of cortical neuronal sensitivity, which survives even the depressive effects of anesthesia.

Cortical responses and perceptual fusion

When two similar, brief stimuli are presented at ISDs below ~5 ms, listeners usually report hearing a single fused sound. At these ISDs, we found that a unit's response to a paired click typically consisted of a single burst of spikes, much like the response to a single click. This finding is corroborated by other cortical studies (Fitzpatrick et al. 1999; Reale and Brugge 2000). This unified response to paired stimuli suggests a correspondence with perceptual fusion. Of course, the correspondence would be complete only if units were to show a discrete lagging response at ISDs just beyond echo threshold (~5 ms for clicks). Such responses are rarely seen. The great majority of cortical units remains suppressed at ISDs of tens to hundreds of milliseconds; only a small proportion of units show discrete lagging responses at shorter ISDs, even in the absence of anesthesia (awake rabbit: Fitzpatrick et al. 1999; anesthetized cat: Mickey and Middlebrooks 2000; Reale and Brugge 2000). That is, cortical neuronal "echo thresholds"---usually defined as the ISD at which lagging responses recover to 50% of the maximal response---are nearly always greater than psychophysical echo thresholds. The cortical-behavioral parallel might be rescued by supposing that the few neurons that do respond to lagging sounds at short delays are the neurons that underlie the behavioral echo threshold as has been suggested (Fitzpatrick et al. 1999; Yin 1994). Otherwise, echo suppression and fusion must be attributed to auditory structures of the brain stem, where responses to lagging sounds at short delays are more common (Fitzpatrick et al. 1999; Litovsky and Yin 1998; Yin 1994).

Although the precedence effect can exist for nonidentical sounds (Blauert and Divenyi 1988; Divenyi 1992; Shinn-Cunningham et al. 1995; Yang and Grantham 1997), perceptual fusion is strongest when the members of a stimulus pair are similar. For example, fusion occurs when two 50-ms noise bursts are correlated. In contrast, when the noise bursts are uncorrelated, listeners report two simultaneous, spatially distinct images (Perrott et al. 1987). How might these two images be represented simultaneously by the response of cortical neurons? One could posit two simultaneously active subpopulations of cortical neurons underlying the two images. A population code of this kind appears to exist in the topographical map of space in the barn owl's inferior colliculus. Inferences from single-neuron recordings suggest that correlated noise bursts produce a single focus of activity in the space map, whereas uncorrelated bursts produce two foci corresponding to the locations of the two sources (Takahashi and Keller 1994). A cortical version of this population code may exist, but one would expect it to be less obvious since cortical neurons appear to signal source locations over large regions of space and since no topographical map of space is known to exist in the cortex (Middlebrooks et al. 1998). To address this question, cortical studies using dissimilar sound sources, as well as models of populations of neurons, will be needed.

Localization of fused sounds by cortical neurons

In this study, we focused on the spatial aspects of cortical responses at delays below the echo threshold where the precedence effect is strongest. We found physiological correlates of two distinct effects: summing localization at the smallest ISDs (below ~0.5 ms in cats) and localization dominance at ISDs of ~1-4 ms.

When we presented paired clicks at ISDs between about -0.4 and +0.4 ms, most units signaled locations between the two loudspeaker locations with a more-or-less systematic progression from left to right. This observation agrees, at least qualitatively, with summing localization seen psychophysically in cats (Populin and Yin 1998) and, after accounting for the difference in head size, in humans (Wendt 1963; the current study). Studies of lower auditory areas have also described unit responses consistent with summing localization. Yin (1994) described several units recorded from the cat inferior colliculus. As the ISD of click pairs was varied from roughly -1 to +1 ms, the magnitude of unit responses changed much as it did when the azimuth of a single click was varied from -45 to +45°. Keller and Takahashi (1996) recorded responses to paired 100-ms noise bursts in the barn owl inferior colliculus. They found high spike rates when the loudspeaker azimuths and ISD were chosen as to position the fused image within the unit's receptive field; response profiles were successfully predicted by cross-correlating the waveforms at the two ears.

We found that cortical responses to click pairs also depended on the ISLD. When the ISLD of simultaneous clicks was varied from about -20 to +20 dB, units signaled locations from -50 to +50°. This result closely mirrors the psychophysical results of the present study and previous studies (Bauer 1961; Wendt 1963). Furthermore we found that superposition of a nonzero ISLD at a given ISD shifted network estimates toward the more intense loudspeaker. When the ISLD opposed the ISD, the shift was smaller, consistent with time-intensity trading. This physiological time-intensity trading likely requires synthesis of ITDs with ILDs in the central auditory pathway. Using a model based on cross-correlation of inputs to the two ears, we showed that, at frequencies above ~6 kHz, the interaural lag did not change appreciably with moderate ISLDs. Because the units we studied (in particular, those shown in Figs. 6 and 9B) responded only to higher frequencies and because these units were nonetheless sensitive to moderate ISLDs, we conclude that their responses resulted from combination of ITDs and ILDs by the central auditory system. This synthesis may be complete as early as the inferior colliculus: Yin (1994) described a collicular unit with responses that were reminiscent of time-intensity trading. How might ITD and ILD cues be integrated? Yin and colleagues (1985) have proposed a mechanism whereby a change of ILD would alter a neuron's spike latency. This mechanism is appealing because ILDs could be processed by the same coincidence-detection circuits that presumably process ITDs.

At ISDs of 1-4 ms, we found that most units signaled locations near the lead location in agreement with psychophysical demonstrations of localization dominance. Notably, network estimates fell slightly short of the lead location, much as listeners' estimates fell short. In neither case could the undershoot be accounted for by a general response bias toward undershooting because responses to single clicks showed no such tendency. Psychophysically such undershoots may be attributed to the influence of the lagging sound on the perceived location of the fused image. Physiologically the undershoot might arise from incomplete forward suppression of the lagging response by the leading stimulus (e.g., Figs. 2B and 3, C and D, positive ISDs). In other cases, a backward suppression of the leading response by the lagging stimulus might account for the undershoot (e.g., Fig. 4, C and D, positive ISDs). The cortex is thought to be required for localization dominance (Cranford and Oberholtzer 1976; Cranford et al. 1971; Whitfield et al. 1972), but much of the underlying processing is likely to occur subcortically. Yin (1994) described unit responses in the inferior colliculus that were consistent with localization dominance. At ISDs of ~1-5 ms, units typically responded strongly to click pairs if the leading stimulus evoked a strong response when presented in isolation and responded weakly to click pairs if the leading stimulus evoked a weak response when presented in isolation.

Our finding that cortical responses generally signaled the locations reported by listeners was not a foregone conclusion. Indeed, we found a small number of units whose responses clearly disagreed with psychophysical measurements of localization dominance and summing localization even though these same units accurately signaled the locations of single clicks (Figs. 7 and 8). What role might these neurons play in the perception of fused sounds? They may play no role at all in sound localization. On the other hand, these neurons might contribute to the behavioral undershoot in response to paired sounds. They might also contribute to other qualities of the fused percept such as spatial extent, timbre, or loudness---qualities in which single sounds and paired sounds generally differ (Blauert 1997).

Conclusion

In summary, when presented with pairs of sounds that fuse perceptually, most auditory cortical neurons responded in a manner consistent with the spatial judgements of human listeners. A number of questions remain unanswered. How might cortical neurons represent simultaneous non-fused sounds? Do cortical neurons exhibit a correlate of localization dominance when both sound sources lie in the median sagittal plane, where interaural cues are negligible? At delays near behavioral echo threshold, can cortical neurons localize lagging stimuli as accurately as psychophysical listeners do? Do dynamic aspects of the precedence effect (Clifton 1987; Freyman et al. 1991) have a parallel in cortical responses? We hope to address these issues in future studies.


    ACKNOWLEDGMENTS

We thank E. Macpherson for illuminating discussions throughout the course of this work, S. Furukawa and E. Macpherson for assistance with physiological experiments and for valuable comments on an earlier version of the manuscript, and Z. Onsan for assistance with psychophysical experiments and with preparation of the manuscript.

This work was supported by National Institutes of Health (NIH) Grants RO1 DC-00420, T32 GM-07863, T32 DC-00011, and RO1 RR-13619 and by the Scottish Rite Schizophrenia Research Program. Multichannel silicon probes were kindly provided by the University of Michigan Center for Neural Communication Technology, sponsored by NIH Grant P41 RR-09754.


    FOOTNOTES

Address for reprint requests: J. C. Middlebrooks, Kresge Hearing Research Institute, University of Michigan, 1301 East Ann St., Ann Arbor, MI 48109-0506 (E-mail: jmidd{at}umich.edu).

1 Although this particular observation is sometimes termed "the precedence effect," we choose to follow the terminology of Litovsky et al. (1999) in which localization dominance is considered just one aspect of a broader set of phenomena called the precedence effect. Localization dominance is also commonly referred to as "the law of the first wavefront."

2 The trading relation between ISD and ISLD for sounds delivered from a pair of loudspeakers should be distinguished from the rather different trading relation between interaural time difference and interaural level difference often studied under headphones. The ISD and ISLD are distinct from and not simply related to proximal interaural differences (Blauert 1997).

Received 23 February 2001; accepted in final form 21 May 2001.


    REFERENCES
TOP
ABSTRACT
INTRODUCTION
METHODS
RESULTS
DISCUSSION
REFERENCES

0022-3077/01 $5.00 Copyright © 2001 The American Physiological Society