Sensitivity to Sound-Source Elevation in Nontonotopic Auditory Cortex

Li Xu1, 2, Shigeto Furukawa2, and John C. Middlebrooks1, 2

1 Department of Neuroscience, University of Florida, Gainesville, Florida 32610; and 2 Kresge Hearing Research Institute, University of Michigan, Ann Arbor, Michigan 48109-0506

    ABSTRACT
Abstract
Introduction
Methods
Results
Discussion
References

Xu, Li, Shigeto Furukawa, and John C. Middlebrooks. Sensitivity to sound-source elevation in nontonotopic auditory cortex. J. Neurophysiol. 80: 882-894, 1998. We have demonstrated that the spike patterns of auditory cortical neurons carry information about sound-source location in azimuth. The question arises as to whether those units integrate the multiple acoustical cues that signal the location of a sound source or whether they merely demonstrate sensitivity to a specific parameter that covaries with sound-source azimuth, such as interaural level difference. We addressed that issue by testing the sensitivity of cortical neurons to sound locations in the median vertical plane, where interaural difference cues are negligible. Auditory unit responses were recorded from 14 alpha -chloralose-anesthetized cats. We studied 113 units in the anterior ectosylvian auditory area and 82 units in auditory area A2. Broadband noise stimuli were presented in an anechoic room from 14 locations in the vertical midline in 20° steps, from 60° below the front horizon, up and over the head, to 20° below the rear horizon, as well as from 18 locations in the horizontal plane. The spike counts of most units showed fairly broad elevation tuning. An artificial neural network was used to recognize spike patterns, which contain both the number and timing of spikes, and thereby estimate the locations of sound sources in elevation. For each unit, the median error of neural-network estimates was used as a measure of the network performance. For all 195 units, the average of the median errors was 46.4 ± 9.1° (mean ± SD), compared with the expectation of 65° based on chance performance. To address the question of whether sensitivity to sound pressure level (SPL) alone might account for the modest sensitivity to elevation of neurons, we measured SPLs from the cat's ear canal and compared the neural elevation sensitivity with the acoustical data. In many instances, the artificial neural network discriminated stimulus elevations even when the free-field sound produced identical SPLs in the ear canal. Conversely, two stimuli at the same elevation could produce the same network estimate of elevation, even when we varied sound-source SPL over a 20-dB range. There was a significant correlation between the accuracy of network performance in azimuth and in elevation. Most units that localized well in elevation also localized well in azimuth. Because the principal acoustic cues for localization in elevation differ from those for localization in azimuth, that positive correlation suggests that individual cortical neurons can integrate multiple cues for sound-source location.

    INTRODUCTION
Abstract
Introduction
Methods
Results
Discussion
References

We have shown that the spike patterns of auditory cortical neurons carry information about sound-source azimuth (Middlebrooks et al. 1994, 1998). The principal cues for the location of a sound source in the horizontal dimension (i.e., azimuth) are those provided by the differences in sounds at the two ears, i.e., interaural time difference (ITD) and interaural level difference (ILD). In contrast, the principal cues for location in the vertical dimension are spectral-shape cues that are produced largely by the interaction of the incident sound wave with the convoluted surface of the pinna (see Middlebrooks and Green 1991 for review). The question arises as to whether the spike patterns that we studied represent the output of a system that integrates these multiple cues for sound-source location or whether they merely demonstrate neuronal sensitivity to an interaural difference that covaries with sound-source azimuth, such as ILD. Sound sources located anywhere in the vertical midline produce small, perhaps negligible, interaural differences. For that reason, one would predict that a neuron that was sensitive only to interaural differences would show no sensitivity to the vertical location of sound source in the midline and be unable to distinguish front and rear locations. Alternatively, if cortical neurons integrate multiple types of location information, we would expect to observe sensitivity to both the horizontal and the vertical location of a sound source. We addressed this issue by testing the sensitivity of neurons for the vertical location of sound sources in the median plane.

The spatial tuning properties of cortical auditory neurons have been studied by several groups of investigators [area A1: Brugge et al. 1994, 1996; Imig et al. 1990; Middlebrooks and Pettigrew 1981; Rajan et al. 1990a,b; anterior ectosylvian auditory area (area AES): Korte and Rauschecker 1993; Middlebrooks et al. 1994, 1998]. Most of those studies were restricted to the azimuthal sensitivity of the neurons. Middlebrooks and Pettigrew (1981) described a few units that showed elevation sensitivity to near-threshold sounds, but the stimuli in that study were pure tone bursts that lacked the spectral information that is crucial for vertical localization of sounds that vary in sound pressure level (SPL). Brugge and colleagues (1994, 1996) recently confirmed that most A1 cells are differentially sensitive to sound-source direction using "virtual space" clicks as stimuli that simulated 1,650 sound-source locations in a three-dimensional space. Near threshold, many of the neurons in their study showed virtual space receptive fields that were restricted in the horizontal and vertical dimensions. When stimulus levels were increased, however, most of the spatial receptive fields enlarged and the vertical selectivity disappeared. Imig et al. (1997) found that, at the level of the medial geniculate body, neurons showed sensitivity to sound-source elevation when stimulated with broadband noise. Such elevation sensitivity disappeared when stimulated with pure tones. They suggested that those neurons were capable of synthesizing their elevation sensitivity by using spectral cues that were present in the broadband noise stimuli.

The present study was undertaken to examine the coding of sound-source elevation by neurons in cortical areas AES and A2. The spike counts of most of these neurons showed rather broad tuning for sound-source elevation. Nevertheless, spike patterns (i.e., spike counts and spike timing) varied with sound-source elevation. Using an artificial neural network paradigm like the one that we used in the previous studies of azimuth coding (Middlebrooks et al. 1994, 1998), we found that it was possible to identify sound-source elevation by recognizing spike patterns. This result leads us to reject the hypothesis that neurons are merely sensitive to ITD or ILD. Our initial data all were collected from units in area AES (Xu and Middlebrooks 1995). Many of those units failed to discriminate among low elevations. When tested with tones, most of those AES neurons responded only to frequencies >15 kHz. We reasoned that the accuracy in lower elevation coding might improve if we could find neurons that were sensitive to lower frequency tones because spectral details in the range of 5-10 kHz are thought to signal lower elevations (Rice et al. 1992). Therefore we expanded our experiments to area A2 in which neurons sensitive to broader bands of frequency are found more often. In this paper, results from areas AES and A2 were compared in terms of their elevation-coding accuracy and their frequency tuning properties. The role that source-sound pressure level might play in elevation coding was addressed. The relationship between network performance in azimuth and elevation of the same neurons was examined.

    METHODS
Abstract
Introduction
Methods
Results
Discussion
References

Methods of surgical preparation, electrophysiological recording, stimulus presentation, and data analysis were described in detail in the preceding paper (Middlebrooks et al. 1998). In brief, 14 cats were used for this study. Cats were anesthetized for surgery with isoflurane, then were transferred to alpha -chloralose for single-unit recording. The right auditory cortex was exposed for microelectrode penetration. Our on-line spike discriminator sometimes accepted spikes from more than one unit, so we must note the possibility that we have underestimated the precision of elevation coding by single units. We recorded from area AES and auditory area A2. Recordings from area AES were made from the portion of area AES that lies on the posterior bank of the anterior ectosylvian sulcus. Recordings from area A2 were made from the crest of the middle ectosylvian gyrus ventral to area A1. Area A2 was distinguished from neighboring A1 by frequency tuning curves that were at least one octave wide at 40 dB above threshold. After each experiment, the cat was euthanized and then perfused. The half brain was stored in 10% formalin with 4% sucrose and later transferred to 30% sucrose. Frozen sections stained with cresyl violet were examined with a light microscope to determine the electrode location in the cortex.

Sound stimuli were presented in an anechoic chamber from 14 loudspeakers that were located on the median sagittal plane, from 60° below the frontal horizon (-60°), up and over the head, to 20° below the rear horizon (+200°) in 20° steps. Stimuli consisted of broadband Gaussian noise burst stimuli of 100-ms duration with abrupt onsets and offsets. Loudspeaker frequency responses were closely equalized as described in the companion paper (Middlebrooks et al. 1998). All speakers were 1.2 m from the center of the cat's head. The stimulus levels were 20-40 dB above the threshold of each unit in 5-dB steps. A total of 24-40 trials was delivered for each combination of stimulus location and stimulus level; locations and levels were varied in a pseudorandom order. Whenever possible, the frequency tuning properties of the units also were studied, using pure tone stimuli. The pure tone stimuli were 100-ms tone bursts (with 5-ms onset and offset ramps) with frequencies ranging from 3.75-30.0 kHz at one-third octave steps. They were presented at 10 and 40 dB above threshold from a speaker in the horizontal plane from which strong responses to broadband noise were obtained, usually at contralateral 20 or 40° azimuth.

Off-line, an artificial neural network was used to perform pattern recognition on the neuronal responses (Middlebrooks et al. 1998). Neural spike patterns were represented by estimates of spike density functions based on bootstrap averages of responses to eight stimuli, as described in the companion paper. The two output units of the neural network produced the sine and cosine of the stimulus elevation, and the arctangent of the two outputs gave a continuously varying output in degree in elevation. We did not constrain the output of the network to any particular range, so the scatter in network estimation of elevation sometimes fell outside the range of location to which the network was trained (i.e., from -60 to +200°).

Measurement of directional transfer functions of the external ears was carried out in six of the cats after the physiological experiments. A 1/4-in tube microphone was inserted in the ear canal through a surgical opening at the posterior base of the pinna. The probe stimuli delivered from each of the 14 speakers in the median plane were pairs of Golay codes (Zhou et al. 1992) that were 81.92 ms in duration. Recordings from the microphone were amplified and then digitized at 100 kHz, yielding a spectral resolution of 12.2 Hz from 0 to 50 kHz. We subtracted from the amplitude spectra a common term that was formed by the root-mean-squared sound pressure averaged across all elevations. Subtraction of the common term left the component of each spectrum that was specific to each location (Middlebrooks and Green 1990). Those measurements permitted us to study in detail the directional transfer functions of the external ear; however, in the present study, we considered only the spatial patterns of sound levels of three one-octave frequency bands: low-frequency (3.75-7.5 kHz), midfrequency (7.5-15 kHz), and high-frequency (15-30 kHz).

    RESULTS
Abstract
Introduction
Methods
Results
Discussion
References

General properties of sound-source elevation sensitivity

A total of 195 units was recorded from areas AES (113 units) and A2 (82 units). Figure 1 shows the elevation sensitivity of two AES units (Fig. 1, A and B) and two A2 units (Fig. 1, C and D). Left and right columns of the figure plot data from 20 and 40 dB above threshold, respectively. The elevation tuning of the units in Fig. 1, A and C, was among the sharpest in our sample. Most often, however, units showed some selectivity at the lower sound pressure level, but the selectivity broadened considerably at higher sound pressure levels. The units in Fig. 1, B and D, are typical. The region of stimulus elevation that produced the greatest spike counts from each unit was represented by the "best-elevation centroid," which was the spike-count-weighted center of mass of the peak response, with the peak defined by a spike count >75% of the unit's maximum. The rationale for representing elevation preferences by best-elevation centroids rather than by single peaks or best areas was that the location of a centroid is influenced by all stimuli that produced strong responses, not just by a single stimulus location (Middlebrooks et al. 1998). The primary centroids for the examples in Fig. 1 are marked by arrows. However, for the responses at 40 dB above threshold represented by Fig. 1, B and D (right columns), no centroids could be computed because the spatial tuning became too flat.


View larger version (29K):
[in this window]
[in a new window]
 
FIG. 1. Spike-count-vs.-elevation profiles. A and B: anterior ectosylvian auditory area (AES) units (950719 and 950984). C and D: A2 units (9607A2 and 960721). Left: spike-count-vs.-elevation profiles at stimulus level 20 dB above threshold; right: 40 dB above threshold. In these polar plots, the angular dimension gives the speaker elevation in the median plane, with 0° straight in front of the cat, 90° straight above the cat's head, and 180° straight behind, as marked in A. Radial dimension gives the mean spike counts (spikes per stimulus presentation). right-arrow, primary elevation centroids, which are the spike-count-weighted centers of mass with peaks defined by a spike count >75% of the unit's maximum. No centroids could be calculated for 40 dB data of B and D.

The elevation sensitivity of spike counts in our sample of units is summarized in Figs. 2 and 3. At stimulus levels 20 dB above threshold, 86% of the AES units and 66% of the A2 units showed >50% modulation of spike counts by sound-source elevation (Fig. 2, left), but that proportion of the sample dropped to 48% for AES units and 13% for A2 units when the stimulus level was raised to 40 dB above threshold (Fig. 2, right). The height of elevation tuning was represented by the range of elevation over which stimuli activated units to >50% of their maximal spike counts. Figure 3 shows the distributions of the heights of elevation tuning in our sample of units. Fifty-two percent of the AES units and 84% of the A2 units showed heights larger than 180° at stimulus levels 20 dB above threshold (Fig. 3, left), and the heights of nearly all units from either area AES or area A2 were larger than 180° at 40 dB above threshold (Fig. 3, right). In general, A2 units tended to show broader tuning in sound-source elevation than did AES units (Mann-Whitney U test, P < 0.01). Note that all measurements of elevation were made in the vertical midline. Elevation sensitivity might have appeared somewhat sharper if it had been tested in a vertical plane, off the midline that passed through the peaks in units' azimuth profiles. That approach has been used, for instance, in studies of the superior colliculus (Middlebrooks and Knudsen 1984) and medial geniculate body (Imig et al. 1997).


View larger version (27K):
[in this window]
[in a new window]
 
FIG. 2. Distribution of depth of modulation of spike count by elevation. square , area AES units; black-square, area A2 units. Left: data at a stimulus level 20 dB above threshold. Right: data at a stimulus level 40 dB above threshold.


View larger version (26K):
[in this window]
[in a new window]
 
FIG. 3. Distribution of the range of elevations over which spike counts greater than half-maximum were elicited. Conventions as in Fig. 2.

The best-elevation centroids of our population of 195 units were distributed throughout the elevations of the median plane. However, more centroids were located in the frontal elevations from 20 to 80° than in any other locations (Fig. 4). For 14% of the AES units and 34% of the A2 units that were studied at 20 dB above threshold, best-elevation centroids were not computed because the modulation of the spike counts of the units by sound-source elevation was <50%. Such percentages increased to 51 and 87%, respectively, at stimulus levels 40 dB above threshold. These units were represented by the bars marked by NC in Fig. 4. No consistent orderly progression of centroids along electrode penetrations was evident in either area AES or area A2. Rarely, for low-intensity stimuli, we saw an orderly progression of centroids along a short distance of the penetration. However, this organization did not persist at higher stimulus levels.


View larger version (27K):
[in this window]
[in a new window]
 
FIG. 4. Distribution of locations of best-elevation centroids. Percentages of units for which no centroids could be calculated are marked NC on the abscissa. Conventions as in Fig. 2.

Neural network classification of spike patterns

Examples of the spike patterns of two AES units and an A2 unit are shown in Fig. 5 in a raster plot format. Each panel in the figure represents one unit, and only responses elicited at 40 dB above threshold are shown here. Sound-source elevation is plotted on the ordinate, and the post-onset time of stimulus is plotted on the abscissa. Each filled circle represents one spike recorded from the unit. For each of the spike patterns, one can see subtle changes in the numbers and distribution of spikes and in the latencies of the patterns from one elevation to another. It is also noticeable that spike patterns from different units differ significantly.


View larger version (36K):
[in this window]
[in a new window]
 
FIG. 5. Raster plot of responses from 2 AES units (A: 950531 and B: 950754) and an A2 unit (C: 970821). Each dot represents 1 spike from the unit. Each row of dots represents the spike pattern recorded from 10 ms before the onset to 10 ms after the offset of 1 presentation of the stimulus at the location in elevation indicated along the vertical axis. Only 10 of the 40 trials recorded at each elevation are plotted. Stimuli were 100-ms noise burst starting at 0 ms, represented by the thick bars. Stimulus level was 40 dB above threshold.

Figure 6 plots the results from artificial-neural-network analysis of the spike patterns of the same AES unit as in Fig. 5A elicited at 40 dB re threshold. In Fig. 6A, each plus represents the network estimate of elevation based on one spike pattern, and solid line indicates the mean direction of responses at each stimulus elevation. In general, the neural-network estimates scattered around the perfect performance line (- - -). Some large deviations from the targets were seen at certain locations in elevation (e.g., -60 to -20° in this particular example). The neural network classification of the spike patterns of this unit yielded a median error of 32.2°, which was among the smallest in our sample. The distribution of errors in estimation of elevation for this unit is shown in Fig. 6B. Seventeen percent of network errors were within 10° of the targets. In contrast, the expected value of random chance performance given 14 speakers is 7.1%.


View larger version (23K):
[in this window]
[in a new window]
 
FIG. 6. Network performance of the same unit (950531) as in Fig. 5A. A: each + represents the network output in response to input of 1 bootstrapped patterns. Abscissa represents the actual stimulus elevation, and the ordinate represents the network estimate of elevation. , mean directions of network estimates for each stimulus location. - - -, perfect performance. B: distribution of network errors. - - -, 7.1%, which is the expected random chance performance given 14 speaker elevations.

Results of neural-network analysis of responses of another AES unit are shown in Fig. 7; the spike patterns of this unit are plotted in Fig. 5B. The network estimates of elevation based on the responses of this unit were less accurate than the estimates shown in Fig. 6. The network scatter was larger, and at elevations -60 to -20°, the network estimates consistently pointed above the stimuli. Nevertheless, the network produced systematically varying estimates of elevation within the region of 0-140°. The unit represented in Fig. 7 was typical of many units in that network analysis of its spike patterns tended to undershoot elevations at the extremes of the range that we tested (e.g., -60 to -20° and 160 to 200° in this particular example). The median error for this unit was 47.5°, which is slightly larger than the mean of our entire population.


View larger version (24K):
[in this window]
[in a new window]
 
FIG. 7. Network performance of the same unit (950754) as in Fig. 5B. Conventions as Fig. 6.

Undershoots at the extremes of the range were also common for A2 units. However, some of the A2 units could discriminate the lower elevations fairly well. Figure 8 shows the network analysis of the spike patterns shown in Fig. 5C. The mean directions of the responses were fairly accurate at all locations except at 160-200°, where undershoots were seen (Fig. 8A). The distribution of errors (Fig. 8B) shows a bias toward negative errors because of those undershoots.


View larger version (22K):
[in this window]
[in a new window]
 
FIG. 8. Network performance of the same unit (970821) as in Fig. 5C. Conventions as Fig. 6.

For all the 195 units studied at 40 dB above threshold, the median errors of the network performance averaged 46.4°, ranging from 25.4 to 67.5. The distribution of the median errors is shown in Fig. 9 (right). For stimulus level at 20 dB above threshold, the median errors of the network performances averaged 6° less than those at 40 dB above threshold (Fig. 9, left). The bulk of the distribution for all stimulus level conditions was substantially better than chance performance of 65° which is marked by arrows in Fig. 9. The chance performance of 65° is a theoretical median error when we consider the entire range of 260° of elevation. When we tested the network with data in which the relation between spike patterns and stimulus elevations was randomized, we obtained an averaged median error of 66.5 ± 1.7° (mean ± SD) across all the 195 units. In general, the median errors of network performance in elevation averaged 2-3° larger than those we found in network outputs in azimuth (Middlebrooks et al. 1998). This is consistent with an observation from a study of localization by human listeners (Makous and Middlebrooks 1990). For stimuli in the frontal midline, vertical errors were roughly twice as large as horizontal errors. Results from behavioral studies in cats are difficult to compare in terms of localization accuracy in vertical and horizontal dimensions because only a very limited range of elevation was employed in those studies (Huang and May 1996b; May and Huang 1996).


View larger version (27K):
[in this window]
[in a new window]
 
FIG. 9. Distribution of elevation coding performance across the entire sample of units. right-arrow, chance performance of 65°. Conventions as in Fig. 2.

We demonstrated in our preceding paper that coding of sound-source azimuth by spike patterns is more accurate than coding by spike counts alone (Middlebrooks et al. 1998). We evaluated the coding of sound-source elevation by those two coding schemes. Consistent with our previous paper, we found that median errors in neural network outputs obtained with spike counts were significantly larger than those obtained with complete spike patterns. Median errors in network output obtained in the spike-count-only condition averaged 8-12° larger than those obtained in the complete-spike-pattern condition, depending on cortical area (A2 or AES) and stimulus level (20 or 40 dB above threshold).

Comparison of elevation coding in areas AES and A2

We compared our sample of A2 units with our sample of AES units in regard to the accuracy of coding of elevation by spike patterns. Averaged across all elevations, the median errors at sound levels of 20 dB above threshold were slightly smaller for A2 units than those for AES units (t-test, P < 0.05) but not significantly different from each other in the two areas at 40 dB above threshold (compare Fig. 9, top and bottom). When we consider particular ranges of elevation, however, we often found that in area AES, the median errors at locations below the front horizon were much larger than those at the rest of the locations in elevation. In the case of A2 units, this difference was less prominent. Individual examples were given in Figs. 6-8. We then calculated the median errors at each of the 14 elevations for units from areas AES and A2. The mean and standard error of the median errors were plotted in Fig. 10. Locations at which the differences in the means of the median errors between the two cortical areas were statistically significant (t-test, P < 0.05) are marked (*). The median errors at elevations from 0 to 120° for A2 units and from 20 to 140° for AES units were fairly small. The median errors of AES units at -60-0° of elevation were significantly larger than those of A2 units. The reverse was true at 120-200° of elevation. Thus, compared with AES units, A2 units achieved a better balance in the network output errors in lower elevations and rear locations.


View larger version (34K):
[in this window]
[in a new window]
 
FIG. 10. Comparison of network performance of A2 and AES units. Plotted here are the means ± SE of the median errors from the network analysis of AES (square ) and A2 units (black-square) at each individual elevation. * Locations where the means of A2 units are significantly different from those of AES units (t-test, P < 0.05).

Contribution of SPL cues to elevation coding

Spectral shape cues are regarded as the major acoustical cue for location in the median plane (Middlebrooks and Green 1991). However, the modulation of SPL in the cat's ear canal due to the directionality of the pinna also can serve as a cue. We refer this cue as the SPL cue. We wished to test the hypothesis that SPL cues alone could account for our results. We measured the SPLs in the cat's ear canal and compared the acoustical data with the network performance. Specifically, we compared the network performance among sound-source elevations at which the stimuli produced similar SPLs in the ear canal. If the SPL cue played a dominant role, the artificial neural network would not be able to discriminate those elevations successfully. We also tested the network performance under conditions in which the SPL of the sound source was varied. If the SPL cue dominated, we would expect that the network performance would be degraded substantially when the variation of the source SPL is large relative to the dynamic range of the modulation of SPL in the cat's ear canal.

The elevation sensitivity of SPLs varies somewhat with frequency, so we measured SPLs within three one-octave bands: low, 3.75-7.5 kHz; middle, 7.5-15 kHz; and high, 15-30 kHz. The spatial patterns of sound levels in these three frequency bands were similar among the six cats that were used in the acoustic measurement. Figure 11A plots the sound levels in those three frequency bands as a function of sound-source elevation from the measurement of one of the cats. The entire ranges of the sound level profiles for the low-, mid-, and high-frequency regions were 11.9, 17.8, and 29.2 dB, respectively (Fig. 11A). For the low- and high-frequency bands, sound from 0° elevation produced the maximal gain in the external ear canal of the cat. Sound levels decreased more or less monotonically when the sound source moved below or above the horizontal plane and behind the cat. For the midfrequency band, however, sounds from -20 and 0°, and those from 100 and 120°, produced the largest gains in the external ear canal. The sound levels dropped at locations behind the cat and at those below the frontal horizon.


View larger version (41K):
[in this window]
[in a new window]
 
FIG. 11. Sound levels and neural network performance. A: sound levels measured at the external ear canal as a function of sound-source elevation. Levels were measured in low- (3.75-7.5 kHz), mid- (7.5-15 kHz), and high-frequency (15-30 kHz) bands. B: sound levels in the low-frequency band are plotted (triangle ) on the left ordinate. Mean directions of neural network responses of a unit (960553) that responded well to the low-frequency tones are plotted (bullet ) on the right ordinate. Two ordinates are scaled so that the ranges of 2 curves roughly overlap. Small arrows mark the pair of sound-source elevations at which sound levels were found similar to one another (within 1 dB) but at which network estimates of elevation were different. C: sound-level profile at midfrequency region (square ) and mean directions of the network responses (bullet ) of a unit (950915) that responded well to mid-frequency tones are plotted in the same format as B. D: sound-level profiles at high-frequency band at 10 dB above and 10 dB below the actual one shown in A are plotted on the left ordinate (×) to simulate the 20-dB range of the roving levels. Mean directions of the network responses of a unit (950702) that responded well to high-frequency tones are plotted on the right ordinate. Network was trained with spike patterns from 5 sound pressure levels, from 20 to 40 dB above threshold. bullet  and open circle , mean directions of network output when tested with spike patterns obtained with stimulus at 20 and 40 dB above threshold. right-arrow, examples at which the 2 network outputs point to the same correct locations.

We compared the elevation sensitivity of sound levels with the neural network estimation of elevation by plotting sound levels and neural network output on common abscissas (Fig. 11, B and C). Figure 11B shows the network analysis of a unit that responded best to frequencies in the low-frequency band (triangle , sound levels in that band). Figure 11C shows network data and midfrequency sound levels for a unit that responded best to the middle frequencies. The left ordinate, used for SPL data, and the right ordinate, used for neural network estimate, were scaled so that both sets of data roughly overlapped. If the network identification of elevation was due simply to SPL variation, sound sources that differed in elevation but produced the same SPLs in the ear canal would result in the same elevations in the network output. In fact, the neural network could distinguish pairs of speakers at which similar SPLs (within 1-dB) were produced. Examples of such pairs of locations are marked in Fig. 11, B and C (right-arrow). The results are inconsistent with the prediction based on the SPL cue.

Next, we tested the effect of roving the source SPLs. Figure 11D was plotted for another unit in a similar format to Fig. 11, B and C. This unit responded best to frequencies in the high-frequency band. Here, we plotted two high-frequency sound-level curves separated by 20 dB, simulating the SPL cues under conditions in which we varied the stimulus SPLs in a range of 20 dB. A neural network was trained with spike patterns from five SPLs between 20 and 40 dB above threshold in 5-dB steps. The network output based on spike patterns elicited with single source SPLs at 20 and 40 dB above threshold were plotted using the right ordinate. One can see from Fig. 11D that even though the high-frequency band provided the strongest SPL cues for localization in elevation, those SPL cues were confounded greatly when stimulus levels were roved in the range of 20 dB. For instance, a stimulus of 20 dB SPL at 0° and a stimulus of 40 dB SPL at 180° would produce similar sound level at the ear canal. Nevertheless, neural-network recognition of spike patterns produced by two single stimulus levels (20 and 40 dB above threshold) were fairly accurate and comparable. Examples are shown (right-arrow) in which the network recognized two sets of spike patterns as responses to stimuli at the same elevation, even when the stimulus SPLs differed by 20 dB. The median error in network output for the unit represented in Fig. 11D was 29.0°. That means that one-half of the network outputs fell within a range of roughly 58.0° (i.e., ±29.0°) around the correct elevation. That range of errors is 22.3% of the 260° range of elevation that was tested. In contrast, SPL cues to sound-source elevation were confounded by source levels that roved over a range of 20 dB, which is 68.5% of the 29.2-dB range of variation of SPL produced by a constant-level source moved through 260° of elevation. We applied the same approach as in Fig. 11 to all the units in our sample that had median errors <40° and obtained results qualitatively similar to those shown in the figure. These results contradict the hypothesis that elevation sensitivity is due entirely to the elevation dependence of SPL.

Our systematic analysis of the effect of roving levels on network performance further supports the hypothesis that level-invariant information about sound-source location is present in the spike patterns. For the sample of 195 units, the averaged median errors of the network when trained and tested with responses to stimuli that were 20 and 40 dB above threshold were 40.3 and 46.4°, respectively. Neural network analysis yielded an average median error of 47.9° when trained and tested with 5 roving levels (20, 25, 30, 35, and 40 dB above threshold). Statistics did not show any significant difference of the averaged median errors between the condition of a single level at 40 dB above threshold and that of 5 roving levels (paired t-test, P > 0.05).

Frequency tuning properties and network performance

The coding of sound source elevation requires integration of information across a range of frequencies. Frequency tuning properties of neuron might be related to a neuron's elevation sensitivity. In this section, we explored the relation between the frequency tuning properties and the network performance in the two cortical areas. We found that A2 units showed broader frequency tuning than did AES units. The broader frequency tuning in A2 was mainly due to that the low-cutoff frequencies of the frequency tuning curves of the A2 units extended toward lower frequencies. Acoustic measures of the cat's head-related transfer function (Rice et al. 1992) and behavioral studies in cats (Huang and May 1996b) suggested that spectral details in lower frequency range (e.g., 5-10 kHz) might signal low elevations. In fact, as we showed earlier, the AES units tended to produce larger errors in the low elevations (-60 to 0°) than did A2 units (Fig. 10). Could the broader frequency tuning and lower low-cutoff frequencies of the A2 units account for their better performance in the low elevations?

First, we considered the frequency tuning properties of the units. The units that we encountered in areas AES and A2 responded well to broadband noise burst stimuli. We recorded frequency tuning responses to tone bursts of 100-ms duration in 173 of the 195 units. Among them, 91 units were from area AES and 82 from area A2. Most of units showed stronger responses to higher frequency tones (>= 15 kHz) than to lower frequency tones (<15 kHz). Figure 12, A and B, shows, for our sample of AES and A2 units, respectively, the percentage of the population activated to levels at or above 25, 50, and 75% of maximal spike counts at various tonal frequencies, at a stimulus level 40 dB above threshold. At almost all frequencies, more than half of the population in both areas AES and A2 were activated >25% of maximal spike counts. Tonal stimuli activated a larger fraction of the unit population in area A2 than in area AES, especially in lower frequencies. Hence, frequency tuning bandwidth appeared broader in our sample of A2 units than in the AES units. The conventional way of defining tuning bandwidth is to find thresholds at various frequencies and then to measure the bandwidth at a certain level above the lowest threshold. That might not provide an accurate description of tuning bandwidth under condition of free-field sound stimulation because the transfer functions of the pinnas will be added to the frequency sensitivity of the unit. Instead, we defined the tuning bandwidth by measuring spike counts in response to tones at various frequencies with a fixed level of level 40 dB above the threshold for the best frequency. The tuning bandwidth was the frequency range over which the spike counts were >= 50% of the maximal spike count. That provided a somewhat more appropriate measure of the bandwidth of frequency that influenced the unit responses in our study. The distribution of the frequency tuning bandwidths in our sample of A2 and AES units is shown in Fig. 13, top. The mean bandwidth in A2 was 2.02 octaves and that in AES neurons was 1.49 octaves. This difference was statistically significant (t-test, P < 0.01).


View larger version (23K):
[in this window]
[in a new window]
 
FIG. 12. Percentage of unit sample activated as a function of stimulus tonal frequency. Three lines in each panel represent the percentage of units activated at or above 25, 50, and 75% of maximal spike counts. A: pooled data from 91 AES units. B: pooled data from 82 A2 units.


View larger version (27K):
[in this window]
[in a new window]
 
FIG. 13. Frequency tuning bandwidth and neural network performance. Top: distribution of bandwidth in AES units (left, square ) and in A2 units (right, black-square). Bottom: relation between the neural network performance in the lower elevation and the frequency tuning bandwidth. Left and right: areas AES and A2, respectively. Median errors were computed in a range of -60 to 0° elevation.

Next, to explore whether this difference in frequency tuning bandwidth could account for the difference between AES and A2 units in neural network performance in low elevation coding, we measured the correlation of the bandwidths of individual A2 and AES units with their neural network performance, particularly in the lower elevation coding. Figure 13, bottom, shows scatter plots of the neural network performance at lower elevations as a function of frequency tuning bandwidth for our AES and A2 units, respectively. The lower elevations that are represented are -60 to 0°, which are in the range in which difference between the two cortical areas were evident (Fig. 10). No correlation could be seen between the network performance represented by the median errors and the frequency tuning bandwidth. Similarly, we measured the correlation of the low-cutoff frequencies of the frequency tuning curves of individual A2 and AES units with their neural network performance in the lower elevations. We found a marginally significant correlation between the network output errors at low elevations and low-cutoff frequencies in the sample of A2 units (r = 0.24, P < 0.05) but not in the sample of AES units.

Relation between azimuth and elevation coding

For 175 units, responses to stimuli from both horizontal and vertical speakers were obtained. Across these 175 units, there was a significant positive correlation between the network performance in azimuth and in elevation (Fig. 14). Each panel in Fig. 14 is a scatter plot of the median errors of the same units in encoding sound-source azimuth and elevation. AES units (n = 113) are presented top and A2 units (n = 62) in the bottom. Left panels plot data obtained from stimulus level at 20 dB above threshold and right panels 40 dB above threshold. Correlation coefficients (r) between median errors in azimuth and elevation ranged between 0.23 and 0.53 depending on the cortical areas and the stimulus levels. The correlation coefficients of the A2 units were larger than those of the AES units, especially for the stimulus level at 40 dB above threshold. Among the units that coded elevation with median errors of <= 40°, for example, the majority of units also showed median errors of <= 40° in azimuth. The principal acoustic cues for localization in elevation differ from those for localization in azimuth. If neurons are sensitive only to a particular localization cues, no correlation or perhaps negative correlation between network performance in the two dimensions would be expected. The fact that we observed positive correlations between the two dimensions indicates that many units can integrate information from multiple types of localization cues.


View larger version (34K):
[in this window]
[in a new window]
 
FIG. 14. Correlation between network performance in azimuth and elevation. Each symbol represents, for 1 unit, the median error of the network performance in elevation vs. that in azimuth. There is a positive correlation between network performance in both dimensions. open circle , area AES units; bullet , area A2 units. Left: data at a stimulus level 20 dB above threshold. Right: data at a stimulus level 40 dB above threshold.

    DISCUSSION
Abstract
Introduction
Methods
Results
Discussion
References

Results presented in the companion paper (Middlebrooks et al. 1998) support the hypothesis that sound-source azimuth is represented in the auditory cortex by a distributed code. In that code, responses of individual neurons carry information about 360° of azimuth, and the information about any particular sound-source location is distributed among units throughout entire cortical areas. The present study extends that observation to the dimension of sound-source elevation. The acoustical cues for sound-source elevation differ from those for azimuth, and identification of source azimuth and elevation presumably require distinct neural mechanisms. The observation that units in areas AES and A2 show similar coding for azimuth and elevation supports the hypothesis that neurons integrate the multiple cues that signal the location of a sound source rather than merely coding a particular acoustical parameter that happens to covary with sound-source location. In this DISCUSSION, we consider the acoustical cues that could underlie the elevation sensitivity that we observed, we evaluate the similarities and differences between areas AES and A2 in regard to elevation and frequency sensitivity, and we comment on the significance of the correlation between azimuth and elevation coding accuracy.

Acoustical cues and localization in the median plane

Acoustical measurements of directional transfer functions in the ear canal and behavioral studies have provided insights into the acoustical cues for sound localization in the vertical dimension. Due to the approximate left-right symmetry of the head and ears, a stimulus presented in the median plane will reach both ears simultaneously with equal levels. Interaural time differences and interaural level differences that are important for localization in the horizontal plane may contribute little if any to the localization in the median plane (Middlebrooks and Green 1991; Middlebrooks et al. 1989).

Sound pressure level, on the other hand, can be a cue for vertical localization if the source level is known and constant. The SPL in the ear canal varies with sound-source elevation. Earlier recordings in cats have shown that within the range of -60 to +90° elevation, SPL varies a few dB for lower frequency tones to as much as 20 dB for high-frequency tones (Middlebrooks and Pettigrew 1981; Musicant et al. 1990; Phillips et al. 1982). In the present study, the acoustical recording of the directional transfer function at the entrance of the external ear canal of cats was carried out in the range of elevation from -60 to 200°. Instead of examining each individual frequency, we plotted the SPL profile in three frequency bands (Fig. 11A). The high-frequency band (15-30 kHz) had the largest variation in SPL. The entire range of the sound level profiles for the low-, mid-, and high-frequency regions were 11.9, 17.8, and 29.2 dB, respectively. To test the degree to which SPL cues might have contributed to our physiological results, we compared the elevation sensitivity of unit responses with the elevation sensitivity of ear-canal SPLs. There were two indications that SPL cues are not the principal cues for the elevation sensitivity that we observed. First, we observed many instances in which sound sources at two locations produced roughly the same SPL in the ear canals yet produced unit responses that could be readily distinguished by an artificial neural network. Second, under conditions in which we roved stimulus SPLs over a range of 20 dB, a sound source at a single location produced SPLs ranging over 20 dB, yet produced unit responses containing SPL-invariant features that resulted in roughly equal neural-network estimates of elevation. Although SPL cues might contribute to elevation sensitivity under certain conditions in which sound-source SPLs are constant, these two observations indicate that SPL cues alone could not have accounted for the neuronal elevation sensitivity that we observed.

A body of evidence suggests that spectral-shape cues are the principal cues for localization in the vertical dimension. Measurement of the directional transfer functions of human ears (Middlebrooks et al. 1989; Shaw 1974; Wightman and Kistler 1989) and those of cat ears (Musicant et al. 1990; Rice et al. 1992) has shown that spectral shape features vary systematically with sound-source elevations. The most conspicuous features of the transfer functions of a cat ear are probably the spectral notches. The center frequencies of the spectral notches (5-18 kHz in cat) increase as sound-source elevation changes from low to high (Musicant et al. 1990; Rice et al. 1992). Recent behavioral studies in cats have provided evidence that indicates that the midfrequency spectral-shape cues are important for vertical localization (Huang and May 1996a,b; May and Huang 1996). A recent report from Imig et al. (1997) has demonstrated that at least some elevation sensitive units in the medial geniculate body lose that sensitivity when tested with tonal stimuli, also suggesting a spectral basis for elevation sensitivity (Imig et al. 1997). We do not yet have any direct evidence that the elevation sensitivity that we observed was due to sensitivity to spectral-shape cues. Having ruled out SPL cues, however, sensitivity to spectral-shape cues certainly is the most likely explanation for the elevation sensitivity that we see.

A2 versus AES: elevation sensitivity and frequency tuning properties

Our initial data from area AES showed larger errors at frontal locations below the horizon than at higher elevations and in the rear. We explored auditory area A2 to test whether sensitivity to low frontal elevations might be more accurate in another cortical area. Averaged across all elevations, the accuracy of elevation coding for units from areas A2 and AES was not significantly different. Nevertheless, differences between cortical areas were found in the errors at low frontal and rear locations (i.e., -60-0° and +120 to +200°). For both cortical areas, errors of the network output at lower elevations and rear locations were much larger than those at other locations. These large errors almost always were caused by underestimation of targets. These undershoots might be due to an edge effect of the neural network analysis. That is, the network would tend not to give mean outputs at locations beyond the limits of the training set. However, the edge effect could not explain why there were differences in the accuracy of network output in various elevation ranges between the two cortical areas.

Because spectral-shape cues of the sound are important for localization in vertical plane, it is conceivable that differences in the frequency tuning of neurons in areas AES and A2 might account for differences in elevation sensitivity. Previous studies showed that broadly tuned neurons were found in both areas (Andersen et al. 1980; Clarey and Irvine 1986; Reale and Imig 1980; Schreiner and Cynader 1984). In area AES, neurons were shown to respond to ranges of frequency that most often were weighted toward high frequencies (Clarey and Irvine 1986). In area A2, a dorsoventral gradient of frequency tuning bandwidth was demonstrated with the lowest Q10 values found in the most ventral parts of A2. Frequency bands often extended to low frequencies (Schreiner and Cynader 1984). For the sample of our 91 AES units and 82 A2 units, most of them showed stronger responses to higher frequency tones (>= 15 kHz) than to lower frequency tones (<15 kHz). Frequency tuning bandwidth was broader in our sample of A2 units than in the AES units, and tonal stimuli activated a larger fraction of the unit population in area A2 than in area AES, especially at lower frequencies (Figs. 12 and 13). We could postulate that the properties of broad frequency tuning in area A2 would make A2 neurons more suitable for detecting the spectral shape cues that are important for elevation coding than AES neurons. However, our results were not conclusive in this regard. No correlation was found between the frequency tuning bandwidth and the network output errors at the locations at which differences between A2 and AES neurons were evident (Fig. 13). Only a marginally significant correlation was found between the low-cutoff frequencies and network output errors at low elevations in the sample of A2 units. Perhaps, overall frequency tuning bandwidth of the cortical neurons is not as important as details of frequency response areas that consist of excitatory and inhibitory regions, as suggested in the data obtained from the medial geniculate body (Imig et al. 1997). Our limited data, as well as earlier studies on frequency tuning of the A2 and AES neurons, have shown that some of the neurons from either cortical area have irregular frequency tuning curves in which two or multiple peaks are present (Clarey and Irvine 1986; Schreiner and Cynader 1984). Such irregular frequency tuning may produce spectral regions of inhibition and facilitation, which in turn may provide the basis for neurons directional sensitivity.

Correlation between azimuth and elevation coding

We find that, in general, cortical units in areas AES and A2 that exhibit the most accurate elevation coding also tend to show good azimuth sensitivity. The psychophysical literature supports the view that azimuth sensitivity derives primarily from interaural difference cues and that elevation sensitivity derives from spectral shape cues (Middlebrooks and Green 1991). We would like to conclude that single cortical neurons receive information both from brain systems that perform interaural comparisons as well as from those that analyze details of spectra at each ear. An alternative interpretation, however, is that the units that we studied were not sensitive to interaural differences and that both the azimuth sensitivity and the elevation sensitivity that we observed were derived from spectral shape cues. Indeed, acoustical studies in cat and human indicate that spectra measured at each ear vary conspicuously as a broadband sound source is varied in azimuth (Rice et al. 1992; Shaw 1974). Moreover, human patients that are chronically deaf in one ear can show reasonably accurate localization in azimuth, presumably by exploiting monaural spectral cues for azimuth (Slattery and Middlebrooks 1994).

These conflicting conclusions can be resolved only by future studies in which specific acoustical cues are controlled directly. At this time, however, at least two lines of evidence lead us to reject the view that the spatial sensitivity of the units that we studied is derived entirely from spectral shape cues. First, Imig and colleagues (1997) searched for units in the cat's medial geniculate body that showed azimuth sensitivity derived predominantly from monaural spectral cues. Only ~17% of units in the ventral nucleus (VN) and the lateral part of the posterior group (PO) showed azimuth sensitivity that persisted after the ipsilateral ear was plugged. That study is not directly relevant to the current one because VN and PO project most strongly to cortical area A1, not A2 or AES. Nevertheless, those results argue that in at least two divisions of the auditory thalamus only a small minority of units shows azimuth sensitivity that is dominated by monaural spectral cues. Second, studies in area A2 that used dichotic stimulation have shown that about a third of area A2 units show excitatory/inhibitory binaural interactions (Schreiner and Cynader 1984). That type of binaural interaction would necessarily result in sensitivity to interaural level differences. About 40% of units in area A2 and ~69% of units in area AES show excitatory/excitatory binaural interactions (Clarey and Irvine 1986; Schreiner and Cynader 1984), and excitatory/excitatory interactions also can result in sensitivity to interaural level differences (Wise and Irvine 1984). Even if we consider only the excitatory/inhibitory units in area A2, a minimum of a third of our A2 sample should have included units that were sensitive to interaural level differences. It would be difficult to argue that both the elevation and azimuth sensitivity shown by units in areas AES and A2 is due primarily to spectral shape sensitivity.

Concluding remarks

In the study reported in the companion paper (Middlebrooks et al. 1998), we demonstrated that the responses of single units in areas AES and A2 can code sound-source location in the horizontal plane throughout 360° of azimuth. That result raised the question of whether units in those cortical areas integrate multiple acoustical cues for sound-source location or whether they simply code the value of a single acoustical parameter, such as interaural level difference, that covaries with azimuth. In the present study, we have found that the responses of units also can code the elevation of a sound source in the median plane, in which interaural difference cues presumably are negligible. Moreover, the units that show the best elevation coding accuracy also code azimuth well. These results do not constitute conclusive evidence of a direct role of these neurons in sound-localization behavior. They, however, do support the hypothesis that single cortical neurons can combine information from multiple acoustical cues to identify the location of a sound source in azimuth and elevation.

    ACKNOWLEDGEMENTS

  We acknowledge the expert technical assistance of Z. Onsan. We thank Dr. D. M. Green for insightful comments on the manuscript.

  This research was supported by National Institute of Deafness and Other Communication Disorders Grant RO1 DC-00420.

    FOOTNOTES

   Present address of J. Middlebrooks: Kresge Hearing Research Institute, University of Michigan, 1301 E. Ann St., Ann Arbor, MI 48109-0506.

  Present address and address for reprint requests: L. Xu, Kresge Hearing Research Institute, University of Michigan, 1301 E. Ann St., Ann Arbor, MI 48109-0506.

  Received 20 June 1997; accepted in final form 8 April 1998.

    REFERENCES
Abstract
Introduction
Methods
Results
Discussion
References

0022-3077/98 $5.00 Copyright ©1998 The American Physiological Society