1Wellcome Department of Cognitive Neurology, Institute of Neurology, London WC1N 3BG, United Kingdom; 2Laboratoire de Psychologie Expérimentale, Unité Mixte de Recherche, 8581 Centre National de la Recherche Scientifique, Unité de Formation et de Recherche Institut de Psychologie, Université Paris V, 75006 Paris; 3Service Otorhinolaryngology, Hôpital Avicenne, 93009 Bobigny Cédex, France; and 4Johann Wolfgang Goethe University, 60590 Frankfurt, Germany
![]() |
ABSTRACT |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Giraud, Anne-Lise, Christian Lorenzi, John Ashburner, Jocelyne Wable, Ingrid Johnsrude, Richard Frackowiak, and Andreas Kleinschmidt. Representation of the Temporal Envelope of Sounds in the Human Brain. J. Neurophysiol. 84: 1588-1598, 2000. The cerebral representation of the temporal envelope of sounds was studied in five normal-hearing subjects using functional magnetic resonance imaging. The stimuli were white noise, sinusoidally amplitude-modulated at frequencies ranging from 4 to 256 Hz. This range includes low AM frequencies (up to 32 Hz) essential for the perception of the manner of articulation and syllabic rate, and high AM frequencies (above 64 Hz) essential for the perception of voicing and prosody. The right lower brainstem (superior olivary complex), the right inferior colliculus, the left medial geniculate body, Heschl's gyrus, the superior temporal gyrus, the superior temporal sulcus, and the inferior parietal lobule were specifically responsive to AM. Global tuning curves in these regions suggest that the human auditory system is organized as a hierarchical filter bank, each processing level responding preferentially to a given AM frequency, 256 Hz for the lower brainstem, 32-256 Hz for the inferior colliculus, 16 Hz for the medial geniculate body, 8 Hz for the primary auditory cortex, and 4-8 Hz for secondary regions. The time course of the hemodynamic responses showed sustained and transient components with reverse frequency dependent patterns: the lower the AM frequency the better the fit with a sustained response model, the higher the AM frequency the better the fit with a transient response model. Using cortical maps of best modulation frequency, we demonstrate that the spatial representation of AM frequencies varies according to the response type. Sustained responses yield maps of low frequencies organized in large clusters. Transient responses yield maps of high frequencies represented by a mosaic of small clusters. Very few voxels were tuned to intermediate frequencies (32-64 Hz). We did not find spatial gradients of AM frequencies associated with any response type. Our results suggest that two frequency ranges (up to 16 and 128 Hz and above) are represented in the cortex by different response types. However, the spatial segregation of these two ranges is not systematic. Most cortical regions were tuned to low frequencies and only a few to high frequencies. Yet, voxels that show a preference for low frequencies were also responsive to high frequencies. Overall, our study shows that the temporal envelope of sounds is processed by both distinct (hierarchically organized series of filters) and shared (high and low AM frequencies eliciting different responses at the same cortical locus) neural substrates. This layout suggests that the human auditory system is organized in a parallel fashion that allows a degree of separate routing for groups of AM frequencies conveying different information and preserves a possibility for integration of complementary features in cortical auditory regions.
![]() |
INTRODUCTION |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Continuous speech shows pronounced low-frequency
AM in its temporal envelope, with most prominent modulation frequencies
near the average syllabic rate of 3-4 Hz (Houtgast and
Steeneken 1985). A number of studies conducted with
normal-hearing listeners, listeners with sensorineural hearing loss,
and cochlear implantees have shown that modulation frequencies below
about 50 Hz are both necessary (Drullman et al. 1994a
,b
;
Duquesnoy and Plomb 1980
; Houtgast and Steeneken
1985
) and almost sufficient (Hochmair and
Hochmair-Desoyer 1984
; Shannon et al. 1995
;
Tasell et al. 1987
) for accurate speech recognition in
silence and in noise. Various psychophysical methods have been
developed to assess human sensitivity to AM. A common approach is to
measure the auditory temporal modulation transfer function (TMTF), that
is, the listener's threshold for detecting a sinusoidal AM applied to
a noise carrier as a function of modulation frequency (Bacon and
Viemeister 1985
; Viemeister 1979
). In such a
task, detection is only based on temporal envelope cues since modulation of white noise does not affect its long-term magnitude spectrum. These measurements show that TMTFs obtained with
normal-hearing listeners are typically low-pass in shape with preserved
sensitivity up to about 16-50 Hz.
A classical model used to account for human sensitivity to AM assumes
that the temporal envelope of stimuli is smoothed by a single low-pass
filter (or a temporal integrator) operating at a postcochlear level
(Strickland and Viemeister 1996; Viemeister 1979
). The similarity in the TMTFs measured in listeners with normal-hearing and listeners with cochlear damage (Bacon and
Viemeister 1985
; Moore et al. 1992
) indicates
that such a low-pass filter is located at a central rather than at a
peripheral level. Lesion data in humans support this hypothesis and
further suggest that this low-pass filter could be situated cortically
(Albert and Bear 1974
; Auerbach et al.
1982
; Chocholle et al. 1975
; Efron et al.
1985
; Lorenzi et al. 2000
; Phillips and
Farmer 1990
; Praamstra et al. 1991
; Robin
et al. 1990
; Tanaka et al. 1987
; Yaqub et
al. 1988
).
The results of psychophysical adaptation and masking experiments
(Bacon and Grantham 1989; Houtgast 1989
;
Tansley and Suffield 1983
; Yost et al.
1989
) performed in humans with sinusoidally AM noises and tones
have, however, indicated an alternative model (Dau et al.
1997
; Hewitt and Meddis 1994
; Langner
1992
; Lorenzi et al. 1995
) in which a bank of
perceptual channels, each tuned to a different low AM frequency,
decomposes the temporal envelope of sounds at a central level. This
alternative hypothesis is supported by electrophysiological recordings
performed in the cochlear nucleus (e.g., Frisina et al.
1990
; Møller 1976
), the inferior colliculus (e.g., Langner and Schreiner 1988
; Rees and
Møller 1987
; Rees and Palmer 1989
), and
the auditory cortex (e.g., Eggermont 1994
; Schreiner and Urbas 1986
, 1988
) of various mammals
(e.g., guinea pig, gerbil, cat). These studies show that most neurons
of the cochlear nucleus and inferior colliculus are selectively tuned to high AM frequencies, the best modulation frequencies ranging from
about 50 to 500 Hz. In comparison, most neurons of the auditory cortical fields are selectively tuned to low AM frequencies, the best
modulation frequencies ranging from about 3 to 30 Hz. Such a cascade of
AM filters should allow for the decomposition of the temporal envelope
at subcortical and cortical levels.
Recent data obtained with functional magnetic resonance imaging (f-MRI)
in humans show that each level of the auditory system responds
preferentially to a given stimulus repetition rate within a low
frequency range of 3-35 Hz (Harms et al. 1998). The
inferior colliculus responds better to 35 bursts/s, the medial
geniculate body to 20 bursts/s, Heschl's gyrus between 2 and 10 bursts/s, and the superior temporal gyrus responds preferentially to a
rate of 2 bursts/s. Thus f-MRI data in humans and single-unit
recordings in other mammals provide general agreement that
1) processing of AM frequencies in humans may be subserved
by different subcortical and cortical regions, each tuned to a
different AM frequency; 2) this AM frequency decreases from
the brainstem to the cortex.
Finally, MRI data also suggests a dependency of response properties on
stimulus repetition rate. Cortical regions show transient responses at
high repetition rates (35 bursts/s) and sustained responses at low
rates (2 bursts/s) (Harms et al. 1998). Thus a coding by
temporal response properties rather than by topographical representation may also enable the decomposition of the temporal envelope of sounds.
In summary, the analysis of modulated sounds may depend on different
parts of the brain, each tuned to a different AM frequency. It may also
depend on specialized cortical regions that contain arrays of neurons
tuned to different AM frequencies and/or show different response
properties. In other words, each level of the auditory pathway could be
considered as a filter in the AM domain with a best frequency that
decreases from the periphery to the cortex. Alternatively, cortical
regions could contain neuronal maps representing AM frequencies. In
such a case, cortical neurons would be considered as AM filters and the
array of neurons would behave collectively as a modulation filter bank
(Dau et al. 1997; Langner 1992
).
Despite correspondence between psychophysical and electrophysiological
data, the controversy about the location and nature of the temporal
processor (a single temporal integrator, a single modulation
filterbank, or a cascade of subcortical and cortical modulation
filters) persists. Recently, a magnetoencephalography (MEG) study using
complex tones demonstrated a topographic organization of the human
auditory cortex for high AM frequencies (from 50 to 400 Hz)
(Langner et al. 1997). This so called "periodotopic" organization (Langner 1992
) is consistent with the
modulation filter bank model suggested by electrophysiological studies
in animals and psychophysical studies in humans. However, it is not clear whether the described maps indicate an intra-regional gradient (distinct auditory regions tuned to different AM frequencies) or a
topographical organization of AM frequencies limited to a particular
auditory field. Since that study aimed to investigate the cortical
representation of the periodicity pitch, the AM frequencies used were
higher than those known to generate the best cortical response
(Schreiner and Urbas 1988
). Moreover, these AM
frequencies, located in the range 50-400 Hz, are less crucial to
speech processing than lower ones (4-16 Hz), as degradation of high AM
frequencies does not affect speech recognition (e.g., Drullman
et al. 1994a
,b
). Considering the number of hypotheses,
techniques, and stimuli, further investigation of temporal envelope
coding in humans is warranted. Our study investigated the cortical
representation of the temporal envelope of sounds using the f-MRI
technique and a set of white noises sinusoidally modulated in amplitude
at frequencies crucial for speech recognition (4-32 Hz) as well as at
higher frequencies (up to 256 Hz) that are known to be important for the perception of voicing and prosody (see Rosen 1992
for a review). These stimuli were used to 1) identify the
cerebral structures contributing to the human sensitivity to the
temporal envelope of sounds, 2) analyze their functional
organization, and 3) investigate their response properties.
![]() |
METHODS |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Stimuli
All white-noise stimuli were generated using a 16-bit D/A converter at a sampling frequency of 44.1 kHz. White noises were either unmodulated or sinusoidally modulated in amplitude at 4, 8, 16, 32, 64, 128, and 256 Hz, with a modulation depth of 100% (Fig. 1A). All stimuli were shaped by rising and falling 25-ms cosine ramps/damps, equated in energy, and presented to the right ear at 75-80 dB sound pressure level (SPL) via a plastic tube plugged in the outer ear canal. Scanner noise during functional imaging was 75 dB SPL under the headphones (attenuation by headphones = 30 dB). We could not measure the scanner noise level in the outer ear canal after insertion of the earplug but the attenuation provided by the earplug was estimated to be another 30 dB (subject to inter-individual variations in the earplug positioning). The signal-to-noise ratio was thus estimated to be about 30 dB. Preliminary testing ensured that all stimuli were clearly audible and presented at a comfortable level.
|
A series of measurements of the output sounds (output of the plastic tube inserted in the outer ear canal) confirmed that under experimental conditions, all stimuli reached the ear with a flat spectrum between 20 Hz and 10 kHz and an effective modulation depth of at least 50-60% (Fig. 2). Under scanning conditions, the modulation depth was well above the detectability thresholds which were around 10% for the highest frequency.
|
f-MRI acquisition
Functional imaging was performed in five normal-hearing subjects (1 female and 4 males, mean age 29 yr) at 2T (Siemens Vision MR scanner) using a gradient echo EPI (48 slices, intervolume time = 4s). The voxel size was 3 mm isotropic. A session included 42 condition epochs of 28-s duration with alternation of unmodulated and modulated noises (Fig. 1B). We used a sinusoidal acquisition sequence that peaked at 833 Hz and had a periodicity at around 10 Hz in the temporal domain due to the slice selection. The periodicity of the scanner noise could have interfered with the detection of the stimulus modulated at 8 Hz. However, in a pilot study performed in one subject, we measured the detection thresholds for all AM rates. The detection threshold for white noise modulated at 8 Hz was situated within the range of the detectability thresholds for other modulation rates (<10%). The functional results further confirmed that the response at this particular frequency was not obscured.
Data analysis
Data were realigned, normalized, and spatially smoothed with a
Gaussian filter of 6 mm, using the SPM99 software
(www.fil.ion.ucl.ac.uk/spm). Subsequent analyses were performed
based on single subject data. The statistical analysis, performed with
SPM99, employed a general linear model (Worsley and Friston
1995) with a design matrix that comprised event- and
epoch-related regressors. These regressors were obtained by convolving
series of delta functions with a set of temporal basis functions (as
detailed below for each type of analysis).
In a first step, we identified brain regions responding to modulated as
opposed to unmodulated sounds by applying an epoch-related analysis
(Friston et al. 1995), where an epoch was modeled by a
hemodynamically smoothed box-car function (28-s baseline followed by
28 s of modulated noise, etc.). As single-subject data had been
co-registered into a standard stereotactic space, we were able to
specify the regions that were significantly activated in every subject
using a conjunction analysis across subjects. We set the level of
significance at P = 0.05 (corrected for multiple comparisons on a large number of volume elements).
Further analyses were performed to determine the following: 1) the shape of the hemodynamic response to AM sounds and its frequency-dependency (coding of AM frequency by response properties rather than topographical organization), 2) whether there was a topographic organization of AM frequencies across brain regions responsive to AM (hypothesis of a cascade of discrete filters), 3) whether there was a topographical organization of modulation frequencies within each responsive cortical regions (modulation filter bank hypothesis).
1) We investigated frequency dependent effects in the time
course of the response to AM by analyzing poststimulus-onset responses. According to Harms and Melcher (1999), the shape of the
f-MRI signal varies with stimulus repetition rate with an increasing response decay at increasing stimulus rates. We confirmed this finding
and subsequently modeled separately a sustained response (epoch-related
analysis) and a transient response related to the onset of the
modulated noise (event-related analysis). An event was modeled by a
linear combination of the hemodynamic response function and its
temporal derivative (Friston et al. 1998
). To ensure
that transient effects were specific to AM sounds and were not related
to any change in the quality of sounds, we excluded regions also
showing a response to the onset of the unmodulated noise (baseline).
Within regions exhibiting transient or sustained responses to any AM
frequency (selected using individual inclusive masks of regions
responding to any modulation rate against baseline at P = 0.0005, uncorrected for multiple comparisons), we determined preferential responsiveness to low (4, 8, 16 Hz) and high AM
frequencies (64, 128, 256 Hz) and for high/low AM-frequency
interactions. Fitted responses corresponding to the f-MRI signal
convolved with the response model were used to assess
frequency-dependency effects at a given location.
2) As we found activations over a large range of the
auditory cortex, we assumed that they covered different auditory
regions. An intra-regional gradient was assessed by calculating best
modulation frequencies (BMFs) in regions of interest.
Considering the difficulties of defining regions of interest using
anatomical landmarks with MRI (Jäncke et al.
1999), we used spheres (diameter = 1 cm) centered around
the peaks of response observed in a group analysis. The BMF of a region
was obtained from averaged tuning curves of all voxels contained in
that region. We considered only left-hemisphere regions because we used
right monaural stimulation. Spheres of interest were located in the
lower brainstem (superior olivary complex), the inferior colliculus,
the medial geniculate body, Heschl's gyrus, the superior temporal
sulcus, the superior temporal gyrus, and the supramarginal
gyrus/inferior parietal lobule (coordinates given in Table 3). In all
these regions, we also assessed the response type (decaying or
sustained). This was done with a mixed model that modeled an epoch of
the duration of the stimulus but additionally allows an exponential
decay to occur during that epoch. More precisely, the box-car function
was linearly combined with an exponential function of peri-onset time
to model within-epoch adaptation. Both were convolved with a
hemodynamical response function (Friston et al. 1995
).
Responses were classified as sustained when there was <25% response
decrease during the epoch and as decaying below 25% decrease.
3) Intra-regional gradients were assessed by calculating the best frequency in all voxels responsive to AM. Each voxel was assigned its best frequency. Maps of BMF were visualized using a color code (Fig. 7). Since we found different frequency-dependent effects according to the model used, we compared maps of the epoch-related analysis, the event-related analysis, and the combined model. The contour of the maps was defined for each analysis by the response to all AM frequencies in all subjects at P = 0.0005, uncorrected. This mask encompassed all regions sensitive to AM including Heschl's gyrus, the superior temporal gyrus, the superior temporal sulcus, and the inferior parietal lobule. This procedure was used to optimize our chances of detecting systematic spatial patterns. We also built up BMF maps using single-subject functional data to guide boundary definition. These individually shaped maps were used to count voxels across AM frequencies (Fig. 8).
![]() |
RESULTS |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Brain regions sensitive to AM
In all subjects, AM sounds activated the posterior regions of the temporal sulcus (BA 22) and the superior temporal gyrus (BA 42), Heschl's gyrus (BA 41), and a region of the inferior parietal lobule (BA 40) (Table 1 and Fig. 3). Pooling data across subjects and thus enhancing sensitivity, we also found subcortical activations in the right lower brainstem (superior olivary complex), the right inferior colliculus, and the left medial geniculate body, as shown in Fig. 6. These activations are consistent with a monaural stimulation to the right ear. They were not significant at P = 0.05, corrected, in every single subject data set and were therefore not detected by a conjunction analysis.
|
|
Time course and response types
Figure 4 shows the time course of the MRI signal intensity change response for four AM frequencies, in Heschl's gyrus of three of the subjects. We observed an effect of AM frequency both on the amplitude and on the shape of these responses. The fitted response (represented by a solid line) indicated a significant effect of AM up to 16 Hz. The raw response (dotted line) showed a transient effect following the onset of the AM stimulus that was preserved at high AM frequencies. This observation motivated the event-related analysis that sought brain regions showing such transient responses at the onset of modulated noise. The event-related analysis revealed the same set of regions as the epoch-related analysis but showed a different spatial distribution of BMFs.
|
The epoch-related analysis did not model the responses to AM frequencies above 32 Hz optimally and therefore failed to show a significant effect at high AM frequencies. The response to high AM frequencies was better modeled by an event related to stimulus onset. Using such a transient response model, we found cortical regions specifically responsive to 64, 128, and 256 Hz. Figure 5 shows a complementary pattern for high and low AM frequencies, and Table 2 gives the statistical results associated with the high versus low frequency comparison in these regions.
|
|
In individual data, we looked at the effect of AM frequency with both transient and sustained response models, in voxels showing a large AM-frequency effect in the event-related analysis. We found an interaction between frequency and response type. In the same voxels, the epoch- and event-related analyses revealed reverse AM-frequency dependency patterns, i.e., a response decrease with AM-frequency increase and a response increase with AM-frequency increase, respectively. In other words, in the same volume element, opposite AM-frequency gradients were found, depending on the response type. Such an interaction is presented in Fig. 4 for one subject. In this figure, it can be noted that the response to the AM frequency of 32 Hz was fitted with both models. To achieve a good fit of these intermediate frequencies, we used an analysis combining both response types.
Global tuning and response patterns
We used the combined model to assess the global tuning of the different regions responsive to AM. For all subjects, BMFs in these regions are presented in Table 3 along with the dominant response pattern. In the lower brainstem, at a location compatible with that of the superior olivary complex, the responses were mostly tuned to the two highest AM frequencies (128 and 256 Hz) and showed a decaying response. In the inferior colliculus, BMFs were equal to 128 and 256 Hz with a decaying response in four subjects, and to 32 Hz with a sustained response in one subject. In the medial geniculate body, BMFs ranged between 16 and 32 Hz with both response patterns. Heschl's gyrus was consistently tuned to 8 Hz with a sustained response pattern. In four subjects, the activated regions of the superior temporal sulcus and superior temporal gyrus were tuned to the two lowest AM frequencies of 4 and 8 Hz with sustained response patterns. The inferior parietal lobule/supramarginal region showed no consistent tuning across subjects but consistently responded with a transient pattern to the whole range of AM frequencies. In most brain regions, we observed consistent tuning across subjects. Overall, an AM frequency of 4 Hz always produced a sustained response, and AM frequencies of 128 and 256 Hz always produced a transient response. AM frequencies of 16 and 32 Hz showed sustained and decaying responses depending on region and subject.
|
Figure 6 shows fitted responses in voxels sampled from different brain regions in subject 3. Although we found that regions of interest were globally tuned to a given AM frequency, voxels sampled in such regions could be tuned to other AMs, but these were always in the same range as the dominant BMF.
|
Intra-regional spatial gradients
The analysis of the global regional tuning was based on the average of the tuning curves of voxels contained in a region of interest. This method was therefore blind to the tuning characteristics of individual voxels. Possible intra-regional spatial gradients of AM frequencies were studied with cortical maps in which each voxel was assigned its best AM frequency. Because our previous analyses had shown an interaction between AM frequency and response type, we addressed the issue of spatial segregation at the level of the three response models applied (epoch-related, event-related, and combined). Maps from the epoch-related analysis were expected to reveal patterns in the low AM-frequency domain and maps from the event-related analysis patterns in the high AM-frequency domain. The combined model was expected to be sensitive to the whole range of AM frequencies, including the intermediate AM frequencies that were poorly modeled by a purely sustained or a purely transient response. Maps from all single subjects are shown in Fig. 7 on horizontal sections. We analyzed horizontal, coronal, and sagittal sections but found no consistent spatial gradient across subjects in any of these maps. However, a high degree of spatial segregation of high and low AM frequencies, as demonstrated by the group event-related analysis, was confirmed in every single subject.
|
As expected, the epoch-related analysis yielded homogeneous maps in the low AM-frequency domain, reflecting the fact that most cortical regions responded in a sustained way. In contrast, the event-related analysis revealed a mosaic-like pattern mostly comprising high AM frequencies. The combined model showed a homogeneous pattern, revealing a clustered segregation of high and low AM frequencies rather than intra-regional periodotopic gradients. Epoch-related and combined models both yielded coherent patterns in the low AM-frequency domain in four subjects. This similarity suggests that the sustained response component to low AM frequencies predominated in the cortex, although the same cortical regions remained responsive to high AM frequencies but with a different (decaying) response mode. In summary, the event-related patchy patterns can be superimposed onto the sustained response pattern. Together, they reveal co-localized response properties differing with respect to time courses and AM-frequency range.
Although the combined analysis provided a better fit to the intermediate AM-frequency responses better, we noted that these AM frequencies were still under-represented in the maps. A count of the number of voxels responsive to each AM frequency (Fig. 8) in all regions sensitive to AM defined individually, revealed a consistent "notch" between low and high AM frequencies. This notch ranged from 16 to 64 Hz, depending on the subject. Since it was not situated close to the repetition rate of the scanner noise (10 Hz), the notch was probably not due to an interaction with the stimuli. Intermediate AM frequencies appear to truly have poorer cortical representation than higher and lower AM frequencies.
|
In summary, we did not find consistent periodotopic gradients across subjects. However, the combined model revealed clustered segregation, indicating that voxels tuned to a given frequency are gathered in large clusters rather than being randomly distributed.
![]() |
DISCUSSION |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Brain regions subserving temporal envelope processing
We investigated the cortical response to the temporal envelope of
sounds using AM noises with a flat spectrum contrasted against white
noises with the same spectrum but no AM. This design ensures that the
observed brain responses specifically reflect temporal processing. AM
frequencies ranging from 4 to 256 Hz activated essentially identical
cortical regions, although, when tested in isolation, only the lowest
frequencies (4-16 Hz) yielded a response associated with a probability
below P = 0.001 (corrected). In all subjects, we found
bilateral activations in Heschl's gyrus, with locations consistent
with that of the primary auditory cortex (Penhune et al.
1996). Stronger responses, however, were observed lateral and
posterior to the primary auditory cortex, in association auditory
regions situated in the superior temporal sulcus (BA 22, for the
largest response) and externally on the lateral surface of the superior
temporal gyrus in BA 42. The location of these regions, in terms of
gross functional neuroanatomy, is consistent with the location of
regions assigned to the analysis of the fine temporal structure of
sounds identified by Griffiths et al. (1998)
, with
positron emission tomography, using a parametric increase in temporal
regularity of "delay-and-add" noises (i.e., iterated rippled noises).
The activation obtained in five subjects in the supra-marginal
gyrus/inferior parietal lobule is more difficult to relate to other
functional imaging data in the auditory perception domain. The right
homologue of this region is considered a specific substrate for sound
movement processing (Griffiths et al. 1998). Although activated by moving sounds, the left inferior parietal lobule is not
critical for sound motion perception but seems to be largely multi-modal since it is also recruited during visual tasks involving written material (Menard et al. 1996
; Rumsey et
al. 1997
). The left inferior temporal/supramarginal region is
mostly involved in visual tasks requiring phonological processing.
Whatever the modality, an involvement of this region in phonological
segmentation is not implausible but currently remains speculative.
Pooling our data across subjects, we also found activations at the level of the right superior olivary complex, the right inferior colliculus, and the left medial geniculate body. This emphasizes that already subcortical stages in the auditory pathways play a specific role in the processing of periodicity.
Spatial representation of AM frequencies
HIERARCHICAL FILTER BANK.
The response of cortical regions sensitive to AM was mostly low-pass
with a cutoff frequency ranging from 16 to 32 Hz. We found a bottom-up
inverse gradient of AM frequency, the superior olivary complex
responding best to high AM frequencies up to 256 Hz, the inferior
colliculus to AM frequencies ranging from 32 to 256 Hz, depending on
the subject. The medial geniculate body preferred AM frequencies around
16 Hz, Heschl's gyrus AM frequencies around 8 Hz, and regions lateral
and posterior to Heschl's gyrus the lowest AM frequencies (4-8 Hz).
These results are consistent with the observation of Harms et
al. (1998), using different presentation rates of noise bursts,
with the exception of the inferior colliculus which these authors found
tuned to much lower rates. However, such differences in BMF might be
related to differences in the stimulus type. In the auditory cortex of
the cat, for instance, AM sounds produced higher BMFs than periodic
click trains (Eggermont 1998
).
Clustered segregation and selective processing of AM
We found distinct clusters of voxels responding selectively to
specific AM frequencies over the whole range under study (4-256 Hz),
but no consistent periodotopic gradient across subjects, neither within
auditory fields (AI, for example) nor across the whole set of cortical
regions responsive to AM. The existence of clusters of voxels globally
tuned to specific AM frequencies is consistent with the notion of
modulation channels (or AM band-pass filters) proposed by
Tansley and Suffield (1983), Houtgast
(1989)
, Langner (1992)
, and Dau et al.
(1997)
. However, the absence of a clear periodotopic gradient
suggests that the organization of such modulation channels at a
cortical level may be more complex than previously thought. We review
the possible explanations for this negative finding.
Along with electrophysiological studies (Schreiner and Urbas
1988) that show differential tuning in cortical auditory
fields, the predictions of a mathematical model recently proposed by
May (1999)
suggest the existence of periodicity mapping
in the auditory cortex. This model consists of coupled excitatory and
inhibitory cells. Under tonic periodic input, the system behaves as a
harmonic oscillator with damping, oscillating with a given resonance
frequency that increases with the gain of the feedback loop. As
inhibition increases in strength, the wavelength of the periodic
stimulation to which the system is maximally responsive decreases. If
the gain is set in a spatial fashion, the model reproduces cortical tonotopy with one best wavelength per location and one characteristic place per wavelength. In response to different stimulus rates, the
model generates a complex map of periodicity. Each rate yields resonances at multiple harmonic places with a best resonance locus, and
conversely, at each location several AM frequencies resonate. A
subsequent prediction is that spatial resolution should be much coarser
for periodicity maps than for tonotopic maps as the complete spatial
segregation of rates requires a larger field. Such a difference in
spatial resolution might account for the inconsistency between the
scale of tonotopic maps (about 1.5 cm in AI, for example) and the scale
of the periodotopic map proposed by Langner (1997)
, which spread over 4-5 cm and thus over several distinct auditory fields.
Although the model proposed by May (1999) predicts a
coherent map, no clear periodotopic gradient emerges from these
simulations. The very notion of a gradient is questionable since the
model predicts not only variations in location but also variations in the amplitude of responses depending on AM frequency. This is not the
case for tonotopic maps which show few amplitude differences as a
function of wavelength. In the periodotopic map, low AM frequencies are
represented by larger responses than high frequencies. This is in
accordance with our data showing separate clusters for different groups
of AM frequencies, with larger responses for low frequencies than for
high frequencies. The absence of homogeneity in amplitude and spread of
responses contradicts the idea of a true gradient (spatial segregation)
but rather suggests a mapping by clustered segregation (spatial
clustering) which agrees with our findings. To model their MEG data,
Langner et al. (1997)
assumed a single dipole per
periodicity frequency (50, 100, 200, and 400 Hz). While necessary and
common for technical reasons, this approach inadvertently biases the
analysis toward the detection of a spatial segregation that may appear
as a systematic gradient. Thus their finding probably reflects a
incomplete picture of periodotopy.
Representation of AM frequencies by response patterns
We have seen that, despite a degree of spatial segregation, several AM frequencies are represented at each locus with a gradient in amplitude. We indeed found that most AM frequencies are represented in each volume element, and that they are grouped according to the response type analyzed. In most brain regions, we found that high and low AM frequencies gave rise to different response patterns, with sustained responses for low AM frequencies and decaying responses for high AM frequencies. Intermediate frequencies (32 and 64 Hz) gave rise to a mixed pattern but failed to produce activations as large as higher and lower AM frequencies even when modeling a mixed contribution from both response types.
Electrophysiological evidence is consistent with the idea that distinct
response patterns may encode different AM-frequency ranges. For
instance, in the superior olivary complex of the rabbit, Kuwada
and Batra (1999) have characterized two categories of neurons showing a sustained response during the stimulus and a transient response to the offset of the stimulus, respectively. The so called "sustained" (excitatory) and "off" (inhibitory) neurons
show this very distinct pattern of response for high AM frequencies
(>50 Hz). For low AM frequencies (<50 Hz), all neurons show
synchronous responses to the stimulus period, since the offset neurons
discharge for each fall in amplitude. In the superior olivary complex,
AM frequencies are encoded by two populations of cells with similar or
distinct patterns depending on AM frequency.
In the cortex of the cat, Schreiner and Urbas (1988)
showed high BMF neurons (BMF around 100 Hz) with high characteristic (audio) frequencies (>10 kHz) mainly clustered in the anterior auditory field, whereas neurons with lower characteristic frequencies generally had BMF below 20 Hz. Eggermont (1998)
showed
that most neurons exhibit synchronous responses to the AM period (up to an AM frequency of 16 Hz in the anterior auditory field and AII, and an
AM frequency of 32 Hz in AI). Eggermont reported, however, variable
correlations between characteristic frequency and BMF and a general
predominance of low BMF neurons in all cortical fields. This is
consistent with the patterns of BMF we obtained using the combined
model that showed an overall predominance of low AM frequencies over
the whole set of cortical regions responsive to AM sounds. Our results
are thus in agreement with these electrophysiological studies, as we
demonstrate not only a partial degree of spatial segregation between
high and low BMF but also that high and low BMFs are represented by
different response patterns at the same cortical sites.
The notion of a single locus encoding two ranges of modulation rates by
two distinct response types finds support in electrophysiological recordings in awake monkeys. Steinschneider et al.
(1998) observed that multi-unit activity evoked by click trains
changed from a phase-locked pattern of response, following stimulation
rates up to 100 Hz, to a transient pattern of response associated with a response at the onset of the stimulus, for higher stimulation rates.
The sustained activity observed with f-MRI may reflect phase-locked
periodic synaptic discharges and transient f-MRI responses may
correspond to a transient pattern of response at the onset of the stimulus.
It is also possible that a distinction by neuron population and by
response type in the same neuron accounts for the two categories of
responses observed with f-MRI. Bieser and Muller-Preuss
(1996) distinguished neurons that were able to code the AM of
sounds from neurons that discharged only at the beginning and end of stimulation. The neurons are able to code AM displayed two response modes, phase-locking and spike-rate coding, to encode variations of the
modulation rate. It remains, uncertain, however, whether a coding of AM
by spike rate would be better estimated with a sustained, transient, or
mixed f-MRI response model.
Although there is much evidence for a coupling between the hemodynamic response and synaptic activity, it is still speculative to relate the different types of response we observed to response properties of single neurons, as the time constants used in electrophysiological studies are very different from those used in the present study (millisecond vs. second). It is therefore unclear whether the two components of f-MRI responses arise from distinct neuronal populations (sustained versus transient neurons) or from distinct response patterns changing as a function of the AM-frequency range produced by a single neuronal population.
May (1999) proposed a model for periodotopic mapping
which opens a field for speculation about segregation and integration mechanisms involved in the coding of high (50-500 Hz) and low (<50
Hz) AM frequencies, referred to as periodicity and temporal envelope,
respectively, by Rosen (1992)
. In this model,
inhomogeneous properties of neurons over space (such as spatial
gradients of feedback strength) are not required. A simple timing
mechanism that increases the responsiveness of cells periodically
enables spatial separation of AM frequencies. According to
electrophysiological studies (Eggermont 1998
;
Schreiner and Urbas 1988
), most cortical neurons
discharge periodically at slow rates (2-16 Hz) in a sustained fashion.
These regular oscillations could impose a rhythm on the responsiveness
of the system and elicit another periodotopic mapping mechanism, in
another range of AM frequencies. Slow oscillations (2-10 Hz) could
signal the temporal limits of syllables and trigger a maximal
responsiveness at the very beginning of the syllabic segment when the
cues permitting voicing detection (relying on higher AM frequencies)
are present. With such a mechanism, the system's ability to segregate
high AM frequencies spatially and thereby categorize phonemes would be
maximal at the exact time when the essential features are available.
Studies relying on hemodynamic measures lack temporal resolution to assert this speculative mechanism. However, the repercussion of such a mechanism onto temporally low-pass filtered responses, as recorded in our study, would indeed consist of a mixed contribution of sustained and transient response components to low and high AM-frequency responses. Studies combining hemodynamic and electrophysiological recordings in humans therefore appear warranted to address this issue.
![]() |
ACKNOWLEDGMENTS |
---|
We thank K. Friston for the contribution to data analysis.
This work was supported by The European Commission and The Wellcome Trust. A.-L. Giraud is funded by Alexander von Humboldt Stiftung.
![]() |
FOOTNOTES |
---|
* A.-L. Giraud and C. Lorenzi contributed equally to the study.
Present address and address for reprint requests: A. L. Giraud, Physiologisches Institut III, Universtitätsklinikum, Theodor-Stern-Kai 7, 60590 Frankfurt/M, Germany (E-mail: Giraud{at}em.uni-frankfurt.de).
The costs of publication of this article were defrayed in part by the payment of page charges. The article must therefore be hereby marked "advertisement" in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.
Received 28 September 1999; accepted in final form 25 May 2000.
![]() |
REFERENCES |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|