1 Institute of Experimental Audiology, University Clinic Münster, Münster, Germany and , 2 Center for the Neural Basis of Hearing, Department of Physiology, University of Cambridge, Cambridge, UK
![]() |
Abstract |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
![]() |
Introduction |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Bilsen (Bilsen, 1966) and later Yost (Yost, 1996
) have shown that it is possible to manipulate the temporal structure of a random noise on the millisecond timescale and increase the regularity of time intervals between local waveform peaks, thereby introducing a pitch into the perception of the sound without changing the energy or producing harmonically spaced peaks in the tonotopic distribution of the neural activity elicited by the sound. The current study shows that this regular-interval (RI) sound makes it possible to segregate the cortical response to the onset of sound energy from that associated with the processing of temporal regularity, and thus to segregate the source associated with the processing of pitch in auditory cortex.
Griffiths and colleagues have used RI sounds and functional brain imaging to confirm the common hypothesis that there is a hierarchy of pitch processing in the auditory pathway beginning in sub-cortical structures (Griffiths et al., 2001) and extending up through Heschls gyrus out onto planum polare (PP) and planum temporale (PT) (Griffiths et al., 1998
). In the most recent study (Patterson et al., 2002
), they showed that the antero-lateral part of Heschls gyrus is particularly sensitive to the contrast between RI sounds and noise, and they concluded that this region was concerned with the extraction of pitch information from representations created in sub-cortical structures. They also inverted the contrast to try and identify regions where noise produced more activation than tonal sounds and, intriguingly, found none whatsoever, anywhere in the auditory pathway. The importance of lateral Heschls gyrus in pitch processing has also been emphasized by Gutschalk et al. (Gutschalk et al., 2002
) who contrasted the MEG responses to regular and irregular click trains (CTs) with varying sound levels. They found a double dissociation involving a source in lateral Heschls gyrus that was sensitive to CT regularity but not to CT level and a source in PT that was sensitive to CT level but not to CT regularity.
In previous studies with RI sounds, the different stimulus conditions were presented separately in discrete trials with silence between them; in this case, the MEG onset response is dominated by the N100m. This paper introduces a new paradigm, in which a continuous sound is constructed from a segment of noise and a segment of RI sound with the same energy and a very similar spectral profile. Perceptually, the sound comes on with a hiss characteristic of random noise and then it changes to a musical note with a distinct pitch and a timbre rather like a cracked bassoon. The effect of the manipulation is limited to the temporal microstructure of the sound; the neural tonotopic representation and its gross temporal structure are essentially unchanged. This is illustrated in Figure 1. Figure 1a
shows the waveform of a noise that becomes a RI sound at 2000 ms and Figure 1b
shows the simulated neural response to the stimulus at the output of the cochlea. Each horizontal line in Figure 1b
shows the spike probability in an individual auditory nerve fiber as a function of time. The ordinate shows the fibers best frequency. The transition from the noise to the RI sound is not accompanied by marked changes either in the waveform (Fig. 1a
) or the neural response (Fig. 1b
). In particular, the transition does not produce a discontinuity in the activity averaged over frequency (Fig. 1c
), or activity averaged over time (Fig. 1d
).
|
|
In the current experiments, the neuromagnetic response to the transition from a noise to a RI sound was measured as a function of stimulus parameters that control the pitch of the RI sound and its salience, to determine whether the amplitude and/or latency of the magnetic response reflect pitch and/or pitch strength, and whether the location of the source is the same as that of the N100m. The stimulus in each trial of these experiments consisted of two segments: a 2000 ms standard segment intended to produce an onset response, followed by a 1000 ms test segment intended to produce a change of information response. In the first two experiments, the standard was a random noise and the test stimulus was a RI sound. In the third experiment, the standard and test sounds were reversed, so the standard was a RI sound and the test was a noise. The RI sounds were produced from a random noise by a delay-and-add process (Yost, 1996). Imagine a broadband noise with infinite duration. It is possible to impart a temporal regularity to the noise, by delaying a copy of the noise by d ms, adding it back to the original, and repeating the process n times. The sound has a pitch (in kHz) corresponding to the reciprocal of the delay (in ms). Each cycle of the delay-and-add process is referred to as an iteration. Iteration increases the degree of regularity in the waveform by increasing the probability of time intervals at the delay, and so the number of iterations, n, determines the strength, or salience, of the pitch percept. When n is 2, the tonal component of the sound is weak compared to the noise component; when n is 8 or more, the tonal component dominates the perception.
![]() |
Materials and Methods |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
The sounds used in the current experiments were presented at 65 dB hearing level and they were filtered to remove energy below 0.8 kHz and above 3.2 kHz. The sounds were produced by a speaker (compressor driver type) outside the magnetically shielded measuring room and delivered to the listeners right ear via 6.3 m of plastic tubing with an inner diameter of 16 mm. The passband in the transfer function of the plastic tubes approximately corresponded to the passband of the stimuli (0.83.2 kHz). Each stimulus was presented 100 times during the course of the experiment and the order of the conditions was randomized. The inter-trial interval was 5 s. The standard and test sounds were gated on and off with 5 ms cosine-squared ramps. At the transition from standard to test sound, the ramps overlapped so that the envelope of the composite stimulus remained flat (see Fig. 1a). Eight listeners participated in the first two experiments, where the test stimulus was a RI sound and the standard was a random noise. Nine listeners participated in the third experiment, six of whom had participated in the first two experiments; in the third experiment, the standard was a RI sound and the test sound was a noise. All listeners had normal audiological status and no history of neurological disease. Informed consent was obtained from each listener and the experimental procedures were approved by the Ethics Commission of the University of Münster.
Neuromagnetic Recordings
The magnetic fields were recorded over the listeners left hemisphere using a 37-channel first-order gradiometer system (Biomagnetic Technologies) in a magnetically shielded room. The data were acquired with a sampling rate of 297.6 Hz, filtered online between 0.1 and 100 Hz, and stored in 4 s stimulus-related epochs. The listeners were asked to stay awake and they were allowed to watch soundless video-films during the experiments.
Data Analysis
The 100 data epochs acquired for each stimulus condition were averaged and low-pass filtered at 20 Hz using a zero-phase-shift filter. Epochs with amplitudes larger than 3 pT were considered artifactual and rejected. The sources of the N100m and the POR were analyzed with a single fixed dipole model assuming a spherical volume conductor. The center of the volume was estimated by approximating the scalp underneath the measuring coils by a sphere. Dipole parameters were derived using a maximum likelihood estimation procedure (Lütkenhöner, 1998a,b
; Lütkenhöner et al., 2003
). The estimation of the time-invariant dipole parameters was restricted to a time window of 40 ms around the maximum in the root-mean-square (RMS) amplitude of the respective deflection. In order to analyze the N100m and P200m responses, the traces were baseline-corrected to the 100 ms period of silence just before stimulus onset. In the first and second experiments, the standard was always a noise, so the traces for all trials in each experiment were averaged, and the averaged traces analyzed to determine the location of the source. The baseline for the POR was the 100 ms segment of noise just before the transition to the RI sound. Sources were fitted separately for the POR in each stimulus condition, i.e. each combination of delay and number of iterations, because the latency of the POR depended on these parameters. Representative dipole parameters for the POR were produced by taking the median over the parameters for individual stimulus conditions. In one of the eight listeners who participated in the first two experiments, the signal-to-noise ratio of the responses was so low that many of the conditions did not yield a stable dipole solution, so this listener was discarded from further analysis.
Psychophysical Pitch-discrimination Experiment
A psychophysical pitch-discrimination experiment was performed to measure the time required to form a stable estimate of the pitch of RI sounds and compare it to the latency of the POR. Four listeners with no history of hearing impairment or neurological disease participated in this experiment. The experiment was carried out in a sound-insulated room. The stimuli were RI sounds with 16 iterations and varying delays, d. They were gated on and off with 2.5 ms cosine-squared ramps and presented binaurally to the listeners through headphones (AKG K 240 DF). The pitch-discrimination threshold (PDT) was measured as a function of the duration of the RI sounds, using an adaptive two-alternative, forced-choice procedure. In each trial, two RI sounds were presented with a silent gap of 700 ms. The delays of the two RI sounds differed slightly and the listener had to indicate, which of the two sounds had the higher pitch, namely, the shorter delay. The duration and the mean delay of the two RI sounds were fixed throughout each threshold run. The delay difference between the two RI sounds was decreased by a factor, , after three consecutive correct responses and increased by the same factor after each incorrect response, tracking the delay difference that yields 79% correct responses (Levitt, 1971
). The factor was 1.5 and 1.3 up to the first and second reversals of the delay difference, and was reduced to 1.15 for the rest of the 10 reversals that made up each threshold run. Each threshold estimate is the geometric mean of the delay differences at the last eight reversals. Three to five threshold estimates were gathered for each stimulus condition, that is, each combination of the mean delay and stimulus duration, and averaged.
All stimuli were presented with a constant overall energy; when the stimulus duration was 512 ms, the intensity level was ~59 dB SPL. The shortest and the longest stimulus durations tested were 16 and 1024 ms corresponding to intensity levels of 74 and 56 dB SPL, respectively.
![]() |
Results |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
The main result of this study is illustrated in Figure 3 for one representative listener. The left column shows the evoked magnetic fields at the onset of a noise (Fig. 3a
) and a RI sound (Fig. 3c
); the right column shows the response to the transition from one sound to the other at 2000 ms. The onset responses to the noise and RI sound (Fig. 3a,c
) have essentially the same latency and amplitude, and the value of the latency is a little less than 100 ms (vertical, dashdotted lines) indicating that these are classic N100m responses. The transition from noise to RI sound (Fig. 3b
) produces an enhanced response, referred to as the POR, with a much longer latency (~150 ms). In contrast, the transition from RI sound to noise (Fig. 3d
) produces no discernible response whatsoever, despite the fact that it produces a perceptual change that is just as salient as the transition from noise to RI sound.
|
The amplitude and latency of the POR varied with the pitch and pitch strength of the RI sound. In the first experiment, the delay, d, was fixed at 16 ms, corresponding to a pitch of 62.5 Hz, and the number of iterations, n, was varied from 2 to 32 in doublings; in the second experiment, n was fixed at 16 and d was varied from 4 to 64 ms in octave steps. For each stimulus condition in each experiment, a single equivalent dipole model was used to estimate the strength and location of the source of the magnetic field during the POR. Figure 4a,b shows the average dipole moments for seven listeners plotted as a function of time relative to sound onset. Figure 4a
shows that the number of iterations, which determines the salience of the pitch, has a large effect on the amplitude of the POR. Figure 4b
shows that the delay, which determines the pitch, affects both the amplitude and the latency of the POR. The condition labeled noise was a control, where the transition was from one sample of noise to another. As expected, this condition produced no discernable response.
|
The statistical significance of the effects of number of iterations and delay on the latency and amplitude of the TR was verified by submitting the individual latency and amplitude data to a one-way ANOVA with repeated measures. Scheffés post hoc test showed that the significant (P < 0.0001) main effect of delay on the TR amplitude (Fig. 4f) was due to significant differences between the amplitudes for delays of 64 and 32 ms and those for 16, 8 and 4 ms (P = 0.0036). The differences within each of these two groups were insignificant (P = 0.6478). An analysis of covariance applied to the TR latencies for different delays confirmed that there was a significant difference between the gradients of the latency-delay function (Fig. 4d
) for delays below and above 16 ms (P < 0.0001).
In the third experiment, the standard segment of the stimulus was a RI sound with 16 iterations and a delay of 4, 8 or 16 ms, and the test sound was a random noise. None of the transitions from a RI sound to a noise produced a measurable transient response in any listener (see Fig. 3d).
The Location of the Source of the POR
The presence of a strong magnetic response to the transition from noise to tone, and the absence of a response to the transition from tone to noise, suggest that the N100m and the POR are independent neural responses generated by largely different neural populations. This conjecture was supported by the analysis of the locations of the equivalent current dipoles for the N100m and the POR. On average, the POR dipole was 12.4 mm more anterior, 6.0 mm more medial and 10.9 mm more inferior than the N100m dipole. The orientations of the POR and the N100m dipoles, on the other hand, were essentially equal.
Each of the three Cartesian coordinates of the individual dipole locations for the N100m and the POR was submitted to a one-way ANOVA with repeated measures. The analysis showed that the anterior and inferior shifts of the POR dipole relative to the N100m dipole (12.4 and 10.9 mm) were both highly significant (P < 0.0001 and P = 0.0031); the medial shift (6.0 mm) was also significant, albeit with a slightly larger value of P (P = 0.0171).
Figure 5 shows the proportion of the field explained by these current dipoles in two time ranges, one about the noise onset at 0 ms (left column), the other about the transition from noise to RI sound at 2000 ms (right column); the data are from one representative listener with an intermediate signal-to-noise ratio. The gray shading in Figure 5
shows the RMS amplitude of the measured field from all 37 gradiometer channels. The black shading shows the RMS amplitude of the deviation of the measured field from the field predicted by the current dipole; for convenience, the RMS deviation will be referred to as the residual field of the dipole. Figure 5b
shows that the magnetic field (gray shading) in the time range associated with the POR, marked by vertical dashed lines, is much larger than the residual field of the POR dipole (black shading), indicating that the POR dipole produces a good fit to the field of the POR. Figure 5a
shows that the same dipole does not provide a good fit to the N100m; between the vertical dashed lines, marking the time range for the N100m, the residual field is as large as the field itself. The situation is essentially reversed for the N100m dipole shown in the middle row of Figure 5
; the N100m dipole produces a good fit in the time range of the N100m response (between the dashed lines in Fig. 5c
), and a poor fit in the time range of the POR (between the dashed lines in Fig. 5d
).
|
Lütkenhöner and Steinsträter (Lütkenhöner and Steinsträter, 1998) performed a high-precision measurement of the source locations of the N100m and P200m responses in a single listener, using sinusoids with varying frequencies as stimuli; the sources were then co-registered with a three-dimensional reconstruction of the listeners auditory cortex. Their results suggest that the N100m arises mainly from planum temporale, whereas the source of the P200m reflects activity centered on Heschls gyrus, anterior and inferior to the source of N100m. In order to determine whether the same is true for the POR, additional measurements were obtained from a listener with a large signal-to-noise ratio. The standard segment of the stimulus was a noise and the test segment was a RI sound with an 8 ms delay and 16 iterations that produces a strong pitch and thus a large POR. Four separate measurement sessions were performed, during each of which the stimulus was presented 420 times. Figure 6
shows a three-dimensional reconstruction of the listeners left temporal lobe derived from magnetic resonance images. The vertical lines with red arrows show the equivalent current dipoles for the N100m from the four measurement sessions; the vertical lines with blue arrows show the comparable dipoles for the POR. Despite the variability, it is clear that the POR dipoles are anterior and inferior to the N100m dipoles. The location of the N100m dipole is consistent with Lütkenhöner and Steinsträters assumption that the N100m receives major contributions from planum temporale. The location of the POR dipole appears to be on Heschls gyrus in a position similar to the dipole location that Lütkenhöner and Steinsträter reported for the P200m.
|
The results from the previous sections indicate that the POR reflects the activity of those neural elements in auditory cortex that are involved in pitch processing. The function relating POR latency to the delay of the RI sound (Fig. 4d) shows that the neural elements at the source of the POR integrate pitch-related information over about four times the delay before generating a response. The functional imaging data of Griffiths et al. (Griffiths et al., 2001
) show that the processing of temporal pitch information is organized hierarchically in the auditory system. In this section, we report a psychophysical experiment designed to measure the perceptual integration time for pitch, that is, the time required to form a stable pitch estimate. The purpose was to try and determine the point in the pitch hierarchy represented by the POR, by comparing the latency of the POR to the perceptual integration time for pitch.
In the experiment, listeners were required to indicate which of two RI sounds had the higher pitch, and PDT was defined to be the minimum difference in delay required for statistically reliable discrimination. For each of four different delays of the RI sounds, ranging from 4 to 32 ms in octave steps, the PDT was measured as a function of stimulus duration. The data are presented in Figure 7; the parameter is the delay. The figure shows that threshold decreases rapidly as duration increases from ~4 to 8 times the delay of the RI sound. When the sounds were shorter than four times the delay, it was not possible to measure a stable threshold. This suggests that the auditory system has to integrate over a duration of at least four times the delay to derive a rough estimate of the pitch for these sounds a period that is comparable to the POR latency. At the same time, the auditory system appears to be able to integrate over a period of up to eight times the delay to attain a more precise pitch estimate. Beyond eight times the delay, the PDT asymptotes and the value of the asymptote is considerably lower for the 4, 8 and 16 ms delays than it is for the 32 ms delay. This is probably because a RI sound with a 32 ms delay does not produce a precise pitch when filtered as in the current experiment (Krumbholz et al., 2000
).
|
![]() |
Discussion |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
The comparison of the physiological and perceptual data suggests that the neural elements at the source of the POR are involved in extracting an initial estimate of the pitch of a sound. The latency of the POR corresponds to the time that is required to determine that the sound has a unique pitch. At the same time, the POR occurs prior to the time required to refine the pitch value to the point where it could be used for melodic pitch perception (Krumbholz et al., 2000; Pressnitzer et al., 2001
). The POR seems to represent a source, or sources, on medial Heschls gyrus, adjacent to a larger region in the antero-lateral half of Heschls gyrus where functional imaging studies have shown that activation is highly correlated with the degree of regularity in RI sounds (Griffiths et al., 1998
, 2001
; Patterson et al., 2002
). In addition, a recent MEG study (Gutschalk et al., 2002
) with click trains has shown that regular click trains produce much more activity than irregular click trains with the same average click rate in medial Heschls gyrus. With regard to the hierarchy of pitch processing, these findings support the hypothesis that pitch is extracted and refined in centers progressing laterally along Heschls gyrus and on out into adjacent areas.
![]() |
Notes |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Address correspondence to Dr Katrin Krumbholz, IME, AG Kognitive Neurologie, Forschungszentrum Jülich, 52 425 Jülich, Germany. Email: k.krumbholz{at}fz-juelich.de
![]() |
References |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Bilsen FA (1966) Repetition pitch: monaural interaction of a sound with the repetition of the same, but phase-shifted sound. Acustica 17:295300.[ISI]
Cornette L, Dupont P, Spileers W, Sunaert S, Michiels J, Van Hecke P, Mortelmans L, Orban GA (1998) Human cerebral activity evoked by motion reversal and motion onset. A PET study. Brain 121:143157.[Abstract]
Crottaz-Herbette S, Ragot R (2000) Perception of complex sounds: N1 latency codes pitch and topography codes spectra. Clin Neurophysiol 111:17591766.[CrossRef][ISI][Medline]
Forss N, Mäkelä JP, McEvoy L, Hari R (1993) Temporal integration and oscillatory responses of the human auditory cortex revealed by evoked magnetic fields to click trains. Hear Res 68:8996.[CrossRef][ISI][Medline]
Griffiths TD, Buchel C, Frackowiak RS, Patterson RD (1998) Analysis of temporal structure in sound by the human brain. Nat Neurosci 1:422427.[CrossRef][ISI][Medline]
Griffiths TD, Uppenkamp S, Johnsrude I, Josephs O, Patterson RD (2001) Encoding of the temporal regularity of sound in the human brainstem. Nat Neurosci 4:633637.[CrossRef][ISI][Medline]
Gutschalk A, Patterson RD, Rupp A, Uppenkamp S, Scherg M (2002) Sustained magnetic fields reveal separate sites for sound level and temporal regularity in human auditory cortex. Neuroimage 15:207216.[CrossRef][ISI][Medline]
Krumbholz K, Patterson RD, Pressnitzer D (2000) The lower limit of pitch as determined by rate discrimination. J Acoust Soc Am 108:11701180.[CrossRef][ISI][Medline]
Langner G, Sams M, Heil P, Schulze H (1997) Frequency and periodicity are represented in orthogonal maps in the human auditory cortex: evidence from magnetoencephalography. J Comp Physiol A 181:665676.[CrossRef][ISI][Medline]
Levitt H (1971) Transformed updown methods in psychoacoustics. J Acoust Soc Am 49:467477.[ISI][Medline]
Lütkenhöner B (1998a) Dipole source localization by means of maximum likelihood estimation: I. Theory and simulations. Electroencephalogr Clin Neurophysiol 106:314321.[CrossRef][ISI][Medline]
Lütkenhöner B (1998b) Dipole source localization by means of maximum likelihood estimation: II. Experimental evaluation. Electroencephalogr Clin Neurophysiol 106:322329.[CrossRef][ISI][Medline]
Lütkenhöner B, Steinsträter O (1998) High-precision neuromagnetic study of the functional organization of the human auditory cortex. Audiol Neurootol 3:191213.[ISI][Medline]
Lütkenhöner B, Lammertmann C, Knecht S (2001) Latency of auditory evoked field deflection N100m ruled by pitch or spectrum? Audiol Neurootol 6:263278.[CrossRef][ISI][Medline]
Lütkenhöner B, Krumbholz K, Lammertmann C, Seither-Preisler A, Steinsträter O, Patterson RD (2003) Localization of primary auditory cortex in humans by magnetoencephalography. Neuroimage 18: 5866.[CrossRef][ISI][Medline]
Näätänen R, Picton T (1987) The N1 wave of the human electric and magnetic response to sound: a review and an analysis of the component structure. Psychophysiology 24:375425.[ISI][Medline]
Niedeggen M, Wist ER (1999) Characteristics of visual evoked potentials generated by motion coherence onset. Cogn Brain Res 8:95105.[CrossRef][ISI][Medline]
Pantev C, Hoke M, Lütkenhöner B, Lehnertz K (1989) Tonotopic organization of the auditory cortex: pitch versus frequency representation. Science 246:486488.[ISI][Medline]
Pantev C, Elbert T, Ross B, Eulitz C, Terhardt E (1996) Binaural fusion and the representation of virtual pitch in the human auditory cortex. Hear Res 100:164170.[CrossRef][ISI][Medline]
Pantev C, Oostenveld R, Engelien A, Ross B, Roberts LE, Hoke M (1998) Increased cortical representations of musicians. Nature 392:811814.[CrossRef][ISI][Medline]
Patterson RD, Allerhand M, Giguère C (1995) Time-domain modeling of peripheral auditory processing: a modular architecture and a software platform. J Acoust Soc Am 98:18901894.[ISI][Medline]
Patterson RD, Uppenkamp S, Johnsrude IS, Griffiths TD (2002) The processing of temporal pitch and melody information in auditory cortex. Neuron 36:767776.[ISI][Medline]
Pressnitzer D, Patterson RD, Krumbholz K (2001) The lower limit of melodic pitch. J Acoust Soc Am 109:20742084.[ISI][Medline]
Roberts TP L, Ferrari P, Stufflebeam SM, Poeppel D (2000) Latency of the auditory evoked neuromagnetic field components: stimulus dependence and insights toward perception. J Clin Neurophysiol 17:114129.[ISI][Medline]
Seither-Preisler A, Krumbholz K, Lütkenhöner B (2002) MEG-correlates of pitch and spectrum in the auditory cortex. Proceedings of the 13th International Conference on Biomagnetism, Jena, Germany, pp. 122124.
Yost WA (1996) Pitch strength of iterated rippled noise. J Acoust Soc Am 100:33293335. [ISI][Medline]