1 Department of Neurology, Albert Einstein College of Medicine, Bronx, NY 10461, USA, 2 Department of Neuroscience, Albert Einstein College of Medicine, Bronx, NY 10461, USA and 3 Department of Surgery (Division of Neurosurgery), University of Iowa College of Medicine, Iowa City, IA 52242, USA
![]() |
Abstract |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Key Words: auditory evoked potentials Heschl's gyrus intracortical recording population encoding speech
![]() |
Introduction |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
A temporal processing mechanism likely serves as the primary means by which voiced stop consonants are distinguished from unvoiced stops, despite modulation of VOT perceptual boundaries by spectral, visual and language-related lexical or linguistic manipulations (Stevens and Klatt, 1974; Lisker, 1975
; Repp, 1979
; Ganang, 1980
; Kluender et al., 1995
; Shannon et al., 1995
; Borsky et al., 1998
; Faulkner and Rosen, 1999
; Holt et al., 2001
; Lotto and Kluender, 2002
; Brancazio et al., 2003
). This mechanism was first proposed by Pisoni (1977)
, who presented subjects with two-tone stimuli that varied in the relative onset timing of the two tones in a manner mimicking that of VOT (tone onset time, TOT). Subjects were measured in their ability to identify whether the tones were presented simultaneously or sequentially. Results paralleled those seen for speech; identification was categorical with a boundary at
20 ms, and discrimination between stimuli showed a peak at the same value. These findings led Pisoni (1977)
to propose that the differential perception of voiced from unvoiced stop consonants is based on whether consonant release and voicing onset are perceived as occurring simultaneously or sequentially. This speech-related example of temporal encoding was further suggested to represent a specific instance of a more general rule governing the ability to temporally order the sequence of two sounds (Hirsh, 1959
).
Temporally precise speech-evoked responses in auditory cortex support the importance of temporal processing mechanisms for VOT perception. Studies in monkeys and other animals reveal a characteristic pattern of activity, wherein syllables with a short VOT evoke a single response burst time-locked to consonant release, while syllables with a longer VOT evoke response bursts time-locked to both consonant release and voicing onset (e.g. Steinschneider et al., 1994, 1995b
, 2003
; Eggermont, 1995a
,b
, 1999
; McGee et al., 1996
; Schreiner, 1998
). Importantly, several of these studies have shown a marked increase in the response time-locked to voicing onset at VOT intervals that cross the boundary between human perception of voiced and unvoiced stop consonants. These animal model findings gain further relevance by their similarity to speech-evoked response patterns recorded directly from human auditory cortex (Liégeois-Chauvel et al., 1999
; Steinschneider et al., 1999
). Furthermore, this activity profile offers a plausible, physiological mechanism supporting categorical discrimination between voiced and unvoiced stop consonants. Perception of voiced stops would be facilitated when only a single response burst in auditory cortex is evoked, as by short duration VOT syllables. In contrast, perception of unvoiced stops would be promoted when two response bursts are sequentially elicited, one by consonant release and the other by voicing onset, as seen with longer duration VOTs. The border between these two response patterns would approximate the perceptual boundary.
If this cortically based temporal processing mechanism for VOT discrimination is derived from a general capacity to temporally order the sequence of two sounds through time-locked responses, then a 20 ms physiological boundary paralleling the psychoacoustic findings of Pisoni (1977) should be present. However, studies examining responses to two-tone sequences in auditory cortex have failed to demonstrate this degree of physiological temporal acuity (Calford and Semple, 1995
; Brosch and Schreiner, 1997
, 2000
; Horikawa et al., 1997
). While methodological considerations, such as the use of anesthetized animals, may be in part responsible for this discrepancy, the fact remains that a fundamental prerequisite for this physiological temporal processing mechanism has not been met.
A second potential shortcoming of this processing scheme for VOT perception is that it does not account for significant boundary shifts that occur with changes in stop consonant place of articulation. Perceptual boundaries are shortest for the differential perception of the bilabial stop consonants /b/ and /p/ (20 ms), intermediate for the alveolar stops /d/ and /t/ (
30 ms), and longest for the velar consonants /g/ and /k/ (
40 ms) (Lisker and Abramson, 1964
). A major acoustic consequence of differences in place of articulation is that for any VOT value occurring prior to the attainment of steady-state vowel frequencies, the first formant (F1) frequency is highest for the bilabial stops, lowest for the velar consonants and intermediate for the alveolar stops (see Parker, 1988
). Multiple studies have demonstrated an inverse relationship between F1 frequency and the VOT boundary, and have suggested that this trading relations effect between F1 frequency and VOT is the perceptual basis for the boundary shifts observed with changes in consonant place of articulation (Lisker, 1975
; Summerfield and Haggard, 1977
; Summerfield, 1982
; Soli, 1983
; Hillenbrand, 1984
).
Placed in a temporal processing framework, these findings imply that the lower F1 frequencies seen for velar stops would require a longer VOT interval for the onsets of consonant release and voicing to be perceived as sequential, and therefore as an unvoiced consonant. The ever higher F1 frequencies observed for the alveolar and bilabial stops would need progressively shorter VOT intervals to identify sequential onsets and the unvoiced character of the consonant. An auditory processing basis for the trading relations effect between spectral and temporal speech components gains additional support when considering VOT perception in animals. Animals demonstrate categorical-like perception with boundaries and boundary shifts due to changes in consonant place of articulation or F1 frequency similar to those in humans, and show heightened sensitivity to incremental changes in VOT at the boundary in a manner that mirrors human perception (Kuhl, 1986; Kluender, 1991
; Kluender and Lotto, 1994
; Ohlemiller et al., 1999
). Since a language-specific mechanism cannot be invoked to explain perceptual phenomena in animal models, these findings indicate that at least some of the fundamental neural mechanisms responsible for VOT perception must be based on auditory system processing.
Thus, the goal of this study is to test the hypothesis that temporal response patterns elicited by syllables in auditory cortex are key elements for VOT perception. We test this hypothesis by examining whether perceptual boundaries are paralleled by neural patterns of activity using two related experiments. In the first experiment, we examine whether temporal responses reflecting syllable VOT are modulated by spectral components of speech in a manner that can account for the VOT boundary shifts that occur with changes in F1 frequency, and by extension, consonant place of articulation. For this experiment, we examine auditory evoked potentials (AEP) elicited by synthetic syllables with variable F1s recorded directly from auditory cortex in a patient undergoing surgical evaluation for medically intractable epilepsy. In the second experiment, we examine whether the physiological boundary for detecting the sequence of two acoustic elements parallels the psychoacoustic result of 20 ms. For this experiment, we examine responses evoked by two-tone complexes with variable TOTs in primary auditory cortex (A1) of the monkey.
![]() |
Materials and Methods |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
One right-handed man with medically intractable epilepsy was studied. Experimental protocols were approved by the University of Iowa Human Subjects Review Board and National Institutes of Health, and informed consent was obtained from the patient prior to his study participation. The patient's seizures often began with the perception of a tuning fork sound, and non-invasive studies suggested an epileptic focus within or near auditory cortex of the right hemisphere. Multicontact intracranial electrodes and subdural grid electrodes were implanted for acquisition of diagnostic electroencephalographic data required to plan subsequent surgical treatment. Research recordings were performed in parallel with the diagnostic evaluation, did not disrupt acquisition of medically required information and did not add any additional health risks.
Experimental recordings were obtained from two stereotaxically-placed hybrid-depth electrodes that contained evenly spaced low-impedance recording sites with higher impedance contacts interspersed along the shaft (Howard et al., 1996a,b
). The first electrode was located in the anterior portion of Heschl's gyrus, while the second was positioned at the junction of the posterior rim of Heschl's gyrus and the planum temporale (Fig. 1). Responses elicited by musical chords from these electrodes have been previously reported (subject 1; Fishman et al., 2001b
). The reference electrode was a subdural recording contact located on the ventral surface of the ipsilateral, anterior temporal lobe. Recordings were performed with the subject lying comfortably awake in a quiet room of the Epilepsy Monitoring Unit of the University of Iowa Hospitals and Clinics. The subject could abort the experimental session at any time.
|
Stimuli were presented to the left ear (contralateral to the recording sites) via an insert earphone (Etymotic Research), and at a comfortable suprathreshold listening level determined by the subject (70 dB SPL). Stimuli were synthetic syllables, 175 ms in duration, and were generated by a parallel/cascade Klatt synthesizer (SenSyn, Sensimetrics) at a sampling rate of 10 kHz. Frequency values were chosen appropriate for the perception of /d/ and /t/. A schematic of the syllables is shown in Figure 2. Syllables contained three formants. The second formant (F2) had a starting frequency of 1600 Hz and linearly decreased to a steady-state value of 1200 Hz, while the third formant (F3) began at 3000 Hz and linearly decreased to 2500 Hz. Both formant transitions were 40 ms in duration. These formants were excited by a noise source simulating frication for the first 5 ms of the syllables. F1 was without a transition and was centered at 424, 600 or 848 Hz (1/2 octave intervals). It began after frication and after a variable period of aspiration that preceded voicing. Each syllable was presented with seven VOT values ranging from 5 to 60 ms. The subject was asked whether he heard a /d/ or a /t/ after presentation of 50 repetitions of each syllable.
|
Five male macaque monkeys (Macaca fascicularis), weighing between 2.5 and 3.5 kg, were studied following approval by the Animal Care and Use Committee of Albert Einstein College of Medicine. Experiments were conducted in accordance with institutional and federal guidelines governing the use of primates, who were housed in our AAALAC-accredited Animal Institute. Other protocols were performed in parallel with this experiment to minimize the overall number of animals used. Monkeys were trained to sit comfortably in customized primate chairs with hands restrained. Surgery was then performed using sterile techniques and general anesthesia (sodium pentobarbital). Holes were drilled into the skull to accommodate epidural matrices that allowed access to the brain. Matrices consisted of 18-gauge stainless-steel tubes glued together into a honeycomb form, and were shaped to approximate the contour of the cortical convexity. The bottom of each matrix was covered with a protective layer of sterile silastic. Matrices were stereotaxically positioned to target A1 at an angle 30° from normal to approximate the anteriorposterior tilt of the superior temporal gyrus, thus guiding electrode penetrations to be orthogonal with the surface of A1. Matrices and Plexiglas bars permitting painless head fixation were embedded in dental acrylic secured to the skull with inverted bolts keyed into the bone. Peri- and post-operative anti-inflammatory, antibiotic and analgesic agents were given. Recordings began 2 weeks after surgery.
Recordings were conducted in a sound-attenuated chamber with the animals painlessly restrained. Monkeys maintained a relaxed, but alert state, facilitated by frequent contact and delivery of juice reinforcements. Later animals were also monitored by closed-circuit television. Recordings were performed with multicontact electrodes constructed in our laboratory. They contained 14 recording contacts arranged in a linear array and evenly spaced at 150 µm intervals (<10% error), permitting simultaneous recording across multiple A1 laminae. Contacts were 25 µm stainless steel wires insulated except at the tip, and were fixed in place within the sharpened distal portion of a 30-gauge tube. Impedance of each contact was maintained at 0.10.4 M at 1 kHz. The reference was an occipital epidural electrode. Headstage pre-amplification was followed by amplification (x5000) with differential amplifiers (Grass P5, down 3 dB at 3 Hz and 3 kHz). Signals were digitized at a rate of 3400 Hz and averaged by Neuroscan software to generate auditory evoked potentials (AEPs). Data were also stored on a digital tape recorder (DT-1600, MicroData Instrument, Inc., sample rate 6 kHz) for 2/3 of the recording sessions. Positioning of the electrodes was performed with a microdrive whose movements were guided by online inspection of AEPs and multiunit activity (MUA) evoked by 80 dB clicks. Tone bursts and two-tone complexes were presented when the recording contacts of the linear-array electrode straddled the inversion of the cortical AEP, and the largest evoked MUA was maximal in the middle electrode contacts. Response averages were generated from 5075 stimulus presentations.
One-dimensional current source density (CSD) analysis was used to define physiologically the laminar location of recording sites in A1. CSD was calculated from AEP laminar profiles using an algorithm that approximated the second spatial derivative of the field potentials recorded at three adjacent depths (Freeman and Nicholson, 1975). Depths of the earliest click-evoked and tone-evoked current sinks were used to locate lamina 4 and lower lamina 3 (e.g. Müller-Preuss and Mitzdorf, 1984
; Steinschneider et al., 1992
; Cruikshank et al., 2002
). A later current sink in upper lamina 3 and a concurrent source located more superficially were almost always identified in the recordings and served as additional markers of laminar depth (e.g. Müller-Preuss and Mitzdorf, 1984
; Steinschneider et al., 1992
, 1994
, 2003
; Fishman et al., 2001b
; Cruikshank et al., 2002
). This physiological procedure was later checked by correlation with measured widths of A1 and its laminae at select electrode sites obtained from histological data (see below).
MUA was extracted in the first four animals by high-pass filtering the raw input at 500 Hz (roll-off 24 dB/octave), further amplifying (x8) and full-wave rectifying the derived signal, and computer averaging the resultant activity. In the last animal, rectification was followed by low-pass filtering at 600 Hz prior to digitization using newly acquired digital filters (RP2 modules, Tucker Davis Technologies). MUA measures the envelope of action potential activity generated by neuronal aggregates, weighted by neuronal location and size. MUA is similar to cluster activity, but has greater response stability (Nelken et al., 1994). We observe sharply differentiated MUA at a recording contact spacing of 75 µm (e.g. Schroeder et al., 1990
), and other investigators have demonstrated a similar sphere of recording (Brosch et al., 1997
). Due to limitations of the acquisition computer, sampling rates were less than the Nyquist frequency of the low-pass filter setting of the amplifiers in the first four animals. Empirical testing revealed negligible signal distortion, as almost all energy in the neural signals was <1 kHz. Samples of off-line data from the digital tape recorder were re-digitized at 6 kHz, and resultant MUA had waveshapes and amplitudes nearly identical to those of data sampled at the lower rate (distortion < 1%). MUA acquired from the digitally taped data was also low-pass filtered below 800 Hz (96 dB/octave) and then averaged at a sampling rate of 2 kHz to further test the accuracy of the initial measurements. Differences between these and initial measurements were negligible (see Fishman et al., 2001b
).
Peristimulus-time-histograms (PSTHs) of multiunit cluster activity were constructed from data stored on digital tape to complement MUA measures. Data were band-pass filtered between 450 and 3000 Hz (54 dB/octave; RP2 modules) prior to spike analysis using Brainware software and hardware (Tucker Davis Technologies, Inc.). Sample rate was 65 kHz and bin width was 1 ms. Triggers for spike acquisition were set at 2.5 times the amplitude of the high-frequency background activity.
Isolated pure tones and two-tone complexes were generated and delivered at a sample rate of 100 kHz by a PC-based system using RP2 modules. Isolated pure tones ranged from 0.2 to 17.0 kHz and were 175 ms in duration, with linear rise/decay times of 10 ms. Two-tone complexes of the same duration, but with 5 ms rise/decay times, were presented with variable tone onset times (TOT) ranging from 0 to 50 ms in 10 ms increments. The two tones ended simultaneously. All stimuli were monaurally delivered via a dynamic headphone (MDR-7502, Sony, Inc.) to the ear contralateral to the recorded hemisphere with a stimulus onset asynchrony of 658 ms. Sounds were presented to the ear through a 3'' long, 60 cc plastic tube attached to the headphone. Pure tone intensity was 60 dB SPL measured with a Bruel and Kjaer sound level meter (type 2236) positioned at the opening of the plastic tube. Two-tone complexes were generated through the linear addition of two equal-amplitude 60 dB tones each beginning at 0 degree phase. The frequency response of the headphone was flattened (±3 dB) from 0.2 to 17.0 kHz by a graphic equalizer (GE-60, Rane, Inc.).
After completion of a recording series, animals were deeply anesthetized with pentobarbital and perfused through the heart with physiological saline and 10% buffered formalin. A1 was physiologically delineated by its typically large amplitude responses and by a best frequency (BF) map that was organized with low BFs located anterolaterally and higher BFs posteromedially (e.g. Merzenich and Brugge, 1973; Morel et al., 1993
). Electrode tracks were reconstructed from coronal sections stained for Nissl and acetylcholinesterase, and A1 was anatomically identified using published criteria (e.g. Morel et al., 1993
).
Four adjacent channels of MUA and cell cluster activity (PSTHs) located in lamina 4 and lower lamina 3 were averaged together for analysis of responses to pure tones and tone pairs. BFs were defined as the tone frequency eliciting the largest amplitude MUA within the first 20 ms after stimulus onset.
![]() |
Results |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
The subject's perception of the syllables varied as a function of the F1 frequency. When F1 was 600 or 848 Hz, syllables with a VOT of 25 ms or greater were heard as /ta/, while those with shorter VOTs were perceived as /da/. In contrast, when the F1 was lowered to 424 Hz, only the consonant with a VOT of 60 ms was identified as /t/, while all syllables with a VOT of 40 ms or shorter led to the perception of /d/. This effect of a later perceptual VOT boundary for the /d//t/ distinction when F1 is lowered parallels previously reported results (e.g. Lisker, 1975; Summerfield and Haggard, 1977
).
Syllable perception is associated with multiple physiological response patterns recorded from the electrode located in anterior Heschl's gyrus and the more posterior electrode located at the border with the planum temporale. The most basic finding is that VOT is differentially represented in temporal response patterns recorded within different auditory cortical regions. This finding is illustrated in Figure 3, which depicts AEPs averaged across the three recording sites on each electrode and across the three F1 conditions. Temporal response patterns recorded in the anterior portion of Heschl's gyrus, corresponding to primary auditory cortex (e.g. Hackett et al., 2001; Wallace et al., 2002
), are dramatically sensitive to the syllable VOT. A second response component following the initial activity is time-locked to voicing onset (arrows). This component shows a marked decrease in amplitude at VOTs of <30 ms, and merges with the initial response complex at shorter values. Simultaneous recordings from the posterior electrode, however, fail to exhibit a response to voicing onset, despite a threefold increase in AEP amplitude relative to that recorded from anterior Heschl's gyrus. This finding confirms a previous observation on differences between speech-evoked activity recorded from anterior Heschl's gyrus and more posterior areas (Steinschneider et al., 1999
).
|
|
|
|
|
|
The opportunity to record directly from Heschl's gyrus is rare, necessitating studies generally based on, at most, observations obtained from only a few subjects. In this study, we report activity from only a single human subject. Without additional information, it is difficult to evaluate with confidence whether the Heschl's gyrus responses are representative indices of auditory cortical activity, or are aberrant findings in a patient with medically intractable epilepsy. One way to support the reliability of the AEPs is to compare responses from this subject with those obtained in other patients using identical stimuli. In the present case, we also obtained AEPs evoked by the speech sounds /da/ and /ta/ used in a previous study of Heschl's gyrus activity (Steinschneider et al., 1999). VOTs varied from 0 to 80 ms in 20 ms increments. Perceptually in this patient, /ta/ was heard when the VOT was 4080 ms, and /da/ when the VOT was 0 and 20 ms. Figure 9 depicts the averaged AEP evoked by these syllables collapsed across the three anterior Heschl's gyrus recording sites. As previously reported, discrete time-locked components elicited by voicing onset are only observed for the three stimuli heard as /ta/ (solid arrows). The dotted arrow in the AEP for the 20 ms VOT sound marks the predicted time at which the absent response evoked by voicing onset should have occurred.
|
Monkey Physiological Data
Sample Characteristics
Data are based on responses obtained from 37 electrode penetrations into A1. BFs ranged from 0.4 to 12.5 kHz. The sample distribution of BFs is shown in Figure 10A. The average MUA response evoked by BF tones from these electrode penetrations has a stereotypic pattern with an onset of 10 ms, a peak quickly reached by 1520 ms, followed by a rapid decay that plateaus at low levels for the duration of the sound (Fig. 10B). Spectral sensitivity of onset responses to tones of moderate intensity, as defined by area measures within the first 10 ms of the responses, is fairly restricted (Fig. 10C, 3 and 6 dB down points of
0.2 and 0.3 octaves from the BF, respectively). Similar values are obtained when peak measures are used. Computed for percentage change away from the BF, amplitude of the on response is 3 and 6 dB down at
10 and 20%, respectively (Fig. 10D). The rapid decrement in MUA over the first 100 ms following BF stimulus onset can be accurately modeled by a single phase exponential decay curve (Fig. 10E, R2 = 0.99). This profile suggests that A1 detection of new acoustic events by synchronized onset responses will be manifested as deviations from this basic response pattern, and that goodness-of-fit (GOF) measures using a single phase exponential decay function can be a concise index to assess the magnitude of these deviations.
|
Tone pairs with frequencies at various distances away from the BF of the recording sites were presented. Total sample responses computed from PSTHs are shown in Figure 11. Mean and standard deviations of the tone frequencies in terms of octave distance away from the BFs of the recording sites, and values for their separation in octaves, are also shown. The average response to the tone complex with a TOT of 0 ms reveals the same pattern of rapid rise in activation and subsequent exponential decay of activity as that seen for MUA evoked by isolated BF tones. PSTHs evoked by stimuli with all TOTs other than 0 ms show some evidence of response perturbation time-locked to the onset of the second tone (arrows). The degree of perturbation, however, increases nonlinearly as the TOT interval lengthens. While a small perturbation is evident when TOT = 10 ms, a clearly defined response peak to the second tone is first seen when TOT = 20 ms. Longer TOTs evoke peaks of similar amplitude. However, there is a progressive increase in activity evoked by the second tone manifested as a temporal widening of the response. This can best be appreciated in the superimposed waveforms shown at the bottom left of the figure. The enhanced response at longer TOT intervals is further revealed by the degree to which the GOF for a single phase exponential decay from the initial tone response is reduced (Fig. 11, bottom right). There is a shallow decrease in GOF with TOTs of 1030 ms from an initial R2 of 0.99 when TOT equals 0 ms. This, in turn, is followed by a more pronounced decrement at TOTs of 40 and 50 ms. These features suggest that while A1 is capable of representing new sound events by discrete time-locked responses at tone separations of between 10 and 20 ms, intervals of 40 ms or greater lead to enhanced neural differentiation.
|
|
|
The previous data sets represent composites of evoked activity where the frequencies of the two tones vary widely with respect to BFs. To clarify the interaction between tone frequency and temporal response patterns, data were divided into four groups, based upon the distance in octaves each tone was from the BFs of the recording sites. Group 1 contains responses when both tones are less than their median distance from the BF, group 2 consists of responses when tone 1 is less than the median and tone 2 is greater than its median, group 3 is the reverse of group 2, and group 4 has both tones greater than the median. Thus, group 1 has both tones near the BF, group 2 has tone 2 farther away from the BF than tone 1, group 3 has tone 2 closer to the BF than tone 1, and group 4 has both tones at a distance from, and generally straddling, the BF.
Groups display a range of capabilities in representing both tones in a two-tone complex. Data for MUA are summarized in Table 1, which reports the statistical P values of the post hoc tests for whether the response amplitudes at 1015 and 1520 ms after the onset of the second tone in the tone complex are larger than the responses occurring 10 ms earlier. This convention is the same as that illustrated in Figure 12. Spectral distance of the tones from the BF of the recording sites, and their octave separation, are also shown. All initial responses occurring between 10 and 20 ms after the first tone are larger than baseline (data not shown). For all groups other than group 2, a statistically significant increase in activity evoked by the second tone is present at TOT intervals as small as 20 ms. Qualitatively similar results are obtained from analysis of the PSTH data (not shown). This effect is not due to a 6 dB increase in stimulus amplitude when the second tone is added to the first, as only trivial, non-significant increases in peak amplitude are seen when both tones are near the BF of the recording sites and the TOT interval is at its most prolonged value of 50 ms (data not shown). For group 2, when the first tone is near and the second tone is distant from the BF, a significant increase in activity occurs only when the TOT interval reaches 50 ms.
|
|
![]() |
Discussion |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Viability of this physiological hypothesis, however, also requires that it help account for the shorter 1520 ms boundary that limits our ability to sequentially order two non-speech acoustic events (Hirsh, 1959; Stevens and Klatt, 1974
; Miller et al., 1976
; Pisoni, 1977
). In this paper, we show that new onset responses evoked by both the first and second tones in a two-tone complex are reliably detected by population responses in A1 at a tone onset time separation as short as 20 ms. The minimal limit of
20 ms is observed when both tones of the complex are near, or spectrally distant, from the BF of the recording sites, as well as when the second tone is near and the first tone is more distant from the BF. This physiological boundary parallels the perceptual data, thus supporting the relevance of a physiological processing mechanism based on synchronized onset responses for temporal order perception in audition.
The importance of synchronized, short-latency, stimulus-evoked responses within neuronal populations is a common theme in mammalian sensory cortex (e.g. Kreiter and Singer, 1996; Ehret, 1997
; Phillips, 1998
; Roy and Alloway, 2001
; Temerenca and Simons, 2003
). Consistent with present results, it has been estimated that most stimulus-related information in primary visual and somatosensory cortices is represented by synchronized responses within 20 ms after cortical activation (Petersen and Diamond, 2000
; Petersen et al., 2001
; Wyss et al., 2003
). Furthermore, these synchronized responses are an especially powerful means by which A1 can effectively transmit information to secondary auditory areas for further sound processing (Eggermont, 1994
; deCharms and Merzenich, 1996
; see also Oram and Perrett, 1992
). In addition to VOT, we have demonstrated how onset responses within A1 populations represent spectral features important for discrimination of stop consonant place of articulation, temporal pitch, musical consonance and dissonance, critical band behavior, and features of auditory scene analysis (Steinschneider et al., 1995a
, 1998; Fishman et al., 2000a
,b
, 2001a
,b
). Other investigations extend these observations to include the rapid representation of complex species-specific vocalizations in A1 (e.g. Creutzfeldt et al., 1980
; Wang et al., 1995
; Gehr et al., 2000
; Rotman et al., 2001
; Nagarajan et al., 2002
).
The relevance of synchronized onset responses in signaling temporal sound organization does not preclude the concurrent operation of other processing mechanisms. Synchronized, longer latency activity among neurons without an increase in firing rate, a property not examined in the present paper, occurs in A1 and likely plays an important role in the binding of multiple sound object attributes (deCharms and Merzenich, 1996). Neural mechanisms within A1 based on response rate instead of synchrony are an additional means by which temporal information can be physiologically encoded in cortex, especially for discrimination of rapidly changing stimuli (Lu et al., 2001a
,b
). With training or under low uncertainty psychoacoustical conditions, human subjects can discriminate speech stimuli with short VOTs which lie on the same side of a phonetic perceptual boundary (Carney et al., 1977
; Kewley-Port et al., 1988
). A rate code might facilitate this type of discrimination. Presumably, discrimination based on a rate code is more difficult than one based on the synchronous activation of large neural populations evoked by stimulus onsets. This would explain why only under specific, low uncertainty conditions or after extensive training can subjects make certain fine-grained VOT discriminations. A temporal mechanism based on synchronized onset responses would likely dominate in the typical acoustical environment of stimulus uncertainty.
In contrast to the present work, previous studies have reported that a period considerably longer than 20 ms is required for a neuronal response to be elicited by a probe tone after presentation of a masker tone (Calford and Semple, 1995; Brosch and Schreiner, 1997
, 2000
; Horikawa et al., 1997
). Reasons for the discrepancy likely include differences in the stimulation paradigms, their use of anesthetized animal preparations, and our examination of A1 populations as opposed to single units. In the previous studies, two brief tones were presented sequentially, such that the second tone was presented after the first tone terminated. Here, the second tone was initiated while the first tone was still being presented. Inhibition produced by the offset of the first tone might increase the duration of suppression produced by the masker in the previous studies. Furthermore, use of anesthetized animals in previous studies likely enhances suppression of activity to the probe tone (Brosch and Schreiner, 1997
). Finally, recordings in A1 populations might reveal processing sensitivities that are not observed in the activity of single cells or small neuronal clusters.
Despite the quantitative differences between the present and cited work, there is qualitative agreement on the masking effects of the first tone in suppressing activity to the second tone. In all studies, tones at or near the BF of a recording site are the most effective masker stimuli, whereas tones at a distance from the BF are the least effective in suppressing responses to a second tone (Brosch and Schreiner, 1997, 2000
). Previous studies did not examine physiological temporal acuity of A1 when both tones are distant from the BF of a recording site. We find the same 20 ms limit in the ability of synchronized neuronal activity to detect the onsets of both tones, though the strength of this activity is not as great as when the second tone is near the BF. Thus, animal model data indicate that two-tone complexes elicit multiple temporal response patterns in A1 that have varying capacity to represent both tones. Strength of response to each tone at any given site in A1 is based on the frequencies within the complex and their relationship to the BF at that site.
The human data complement findings in the monkey. Multiple temporal patterns with varying capacity to represent the onsets of consonant release and voicing occur across the three recording sites in anterior Heschl's gyrus. The most lateral site has the greatest capacity to represent voicing onset at the shortest VOT intervals, the most medial site the least, while the central site is intermediate. Human primary auditory cortex has a tonotopic organization with lower frequencies best represented laterally and progressively higher frequencies represented medially on Heschl's gyrus (Howard et al., 1996b; Liégeois-Chauvel et al., 2001
; Schönwiesner et al., 2002
; Formisano et al., 2003
). Discussed in terms of a simplified two-tone complex, the relative strength of the response to voicing onset is greatest at the lateral site because the first tone (higher formants) is at a spectral distance from the BF and the second tone (F1) is near the BF of the recording site. In contrast, the medial site with a higher BF is a location whose first tone (higher formants) is near the BF and whose second tone (F1) is at a distance from the BF. This combination produces the least capacity for a second stimulus component to elicit a response time-locked to its onset. Responses at the center site are intermediate between these two extremes.
While each location in Heschl's gyrus has a varying capacity to represent the onsets of consonant release and voicing, the temporal response pattern averaged across the three recording sites roughly mirrors the patient's perceptual boundary shifts as F1 frequency is modulated. Specifically, at the lowest F1 frequency of 424 Hz, the perceptual boundary shifts from 2025 ms to between 40 and 60 ms. In parallel, a discrete response to voicing onset is only seen at a VOT of 60 ms. Contrasting this pattern are those observed when the syllables contain higher F1 frequencies. Discrete responses evoked by voicing onset are now observed at shorter VOTs, and they maintain statistically significant increases above preceding activity to within 5 ms of the perceptual boundaries. The absence of a perfect correlation between the AEPs evoked by the higher F1 syllables and the perceptual boundaries for these stimulus sets likely reflects, in addition to the low statistical power of the single subject analysis, the fact that the presence or absence of responses to voicing onset can not be the only determinant of the voiced/voiceless distinction. For instance, the intensity and duration of aspiration noise are important cues for this perceptual discrimination (Sinnott and Adams, 1987; Lotto and Kluender et al., 2002
), yet their effects upon perceptual boundaries are not evident in these AEP recordings.
Even though the observations of rough parallels between perception and temporal response patterns are limited to a single subject, we were then able to replicate parallels between physiological and perceptual boundaries using a different /da//ta/ series. As an additional check on whether averaged activity across auditory cortex can reflect perceptual boundaries, we reanalyzed our previously published data on VOT representation by examining averaged activity profiles across electrode sites in the human and across tonotopic regions in the monkey (Steinschneider et al., 1999, 2003
). We averaged activity from subject 1 in the human study, whose three low-impedance electrode sites spanning 20 mm were amenable for analysis. In both the human and monkey data, distinct differences in response patterns were observed across the averaged responses between those evoked by /da/ with a VOT of 0 and 20 ms and those elicited by /ta/ with a VOT of 40 and 60 ms. Differences reflected a new response time-locked to voicing onset for the longer VOT stimuli (data available upon request). Thus, physiological findings support a temporal processing mechanism for VOT encoding and further suggest that the perceptual boundary is partially determined by response patterns averaged across primary auditory cortex.
Several factors likely contribute to the decreased capacity of the syllables with the lowest F1 to generate an early response to voicing onset in the averaged population responses. First, consideration of spectral tuning characteristics in A1 means that there will be a decreased contribution of the response to the 424 Hz F1 spectral component relative to the higher F1s at all but the lowest BF areas. This smaller response contribution to the average will require a longer VOT for F1 onset to be physiologically detected above the exponentially decaying activity evoked by the earlier consonant release. Compounding this effect is the diminished auditory sensitivity to the 424 Hz F1 frequency relative to F1s centered at 600 and 848 Hz (e.g. Owren et al., 1988). This diminished sensitivity translates into a functionally less intense sound component that will lead to a smaller neural response that will require a more prolonged decay of earlier activity in order for F1 onset to be identified as a new acoustic event.
Averaged population activity as a determinant for a behavioral or perceptual outcome has been repeatedly reported in both motor and sensory systems. For instance, perception of visual motion is guided by the averaged activity within area MT (Kruse et al., 2002; Ditterich et al., 2003
). Similarly, in motor and prefrontal areas, complex hand and finger movements are directed by the averaged activity of large neuronal ensembles (e.g. Georgopoulos et al., 1999
; Schwartz and Moran, 1999
; Averbeck et al., 2003
). Generally, template-matching procedures such as population vector or maximum likelihood estimations are used to approximate the population code (Pouget et al., 2000
). Ultimately, these procedures examine the overall shape and amplitude of the population activity in order to derive information regarding stimulus features or motor commands. By analogy, we propose that a physiologically plausible template in A1 for a single acoustic event is a fast rise in neuronal activity followed by a rapid exponential decay. Significant deviations from this template, as determined by the averaged activity across A1, would support a perceptual decision that more than one event has occurred in time.
Before concluding, multiple issues deserve consideration. One regards the degree to which activity in Heschl's gyrus on the right/non-language dominant hemisphere is involved in speech perception. Both neuroimaging and behavioral studies support the importance of the right hemisphere for VOT processing (Simos et al., 1997; Laguitton et al., 2000
; Jäncke et al., 2002
; Papanicolaou et al., 2003
). Another issue relates to whether our recordings were sufficiently extensive to adequately sample patterns of evoked activity. Electrode sites spanned 6.7 mm in the subject in this study. Anatomical maps of human A1 suggest a maximum extent of
1012 mm (Hackett et al., 2001
; Wallace et al., 2002
). The volume-conducted nature of AEPs coupled to the recording span suggests that activity patterns were reasonably approximated by our sample.
A significant concern is whether activity profiles in a patient with epilepsy reflect normal or aberrant auditory processing. While caution must always be exercised in extrapolating normative physiologic processes from data obtained in subjects with epilepsy, there are several reasons to believe that the patterns observed in the present study represent reasonable indices of normal functions. First, temporal response patterns observed are similar to those reported by other studies examining intracranially acquired AEPs, and all conform to the known perceptual relevance of the VOT parameter (Liégeois-Chauvel et al., 1999; Steinschneider et al., 1999
). Secondly, these similarities include differential sensitivities for generating responses evoked by voicing onset in anterior Heschl's gyrus relative to more posterior regions. Thirdly, the greater amplitude of AEPs recorded from more posterior auditory cortex that is presently observed has been reported (Liégeois-Chauvel et al., 1994
). Finally, while these latter studies were all performed in patients with epilepsy, they, in turn, reveal temporal patterns of activity that are also mirrored in the magnetic responses and AEPs of neurologically normal subjects (e.g. Kaukoranta et al., 1987
; Joliot et al., 1994
; Kuriki et al., 1995
; Sharma and Dorman, 1999
). In summary, the reproducibility and similarities between present results and previously reported findings enhance their potential relevance as indices of normal auditory cortical functions.
An additional issue relates to the correspondence between the intracranial data and speech-evoked activity examined using surface-recorded evoked potentials and magnetic responses. There is a decrease in the amplitude of the N1m component of the magnetic responses to speech sounds or two-tone analogs with prolonged VOTs and TOTs that correlates with perceptual boundaries (Simos et al., 1998ac
; see, however, Tremblay et al., 2003
). This effect is likely based on the truncation of the N1m evoked by consonant onset by new positive-going components evoked by voicing onset (for a detailed demonstration, see Steinschneider et al., 1999
). The intracranial data indicate that at short VOTs/TOTs a new response complex evoked by voicing onset is likely to be severely attenuated in amplitude, leading to an overall increase in the size of the resultant N1m component when compared against those evoked when longer VOT/TOT stimuli are presented. Thus, there is general agreement between the magnetic and intracranial responses in terms of identifying physiological activity patterns that roughly correlate with perceptual features.
Several studies, however, cast some doubt on this correlation between physiology and perception. For example, one study failed to find a parallel between the presence or absence of a single- or double-peaked N1 AEP component and shifting VOT perceptual boundaries that occurred with changes in consonant place of articulation (Sharma et al., 2000). This study needs to be carefully interpreted. The N1 component is a composite wave with multiple generators in primary and secondary auditory cortex (e.g. Wood and Wolpaw, 1982
; Näätänen and Picton, 1987
; Scherg et al., 1989
; Liégeois-Chauvel et al., 1994
; Krumbholz et al., 2003
). Furthermore, the dominant contributor to the scalp-recorded N1 is likely an extensive area of auditory cortex posterior to anterior Heschl's gyrus, including the planum temporale (Liégeois-Chauvel et al., 1994
). We find large differences in the capacity to respond to acoustic transients between anterior Heschl's gyrus and cortex located more posteriorly. More subtle differences are observed within anterior Heschl's gyrus. These distinct temporal patterns, which overlap in time, indicate that great caution must be exercised when suggesting detailed aspects of auditory cortical organization based on the modulation of a composite wave whose morphology is the result of activity in functionally disparate, yet closely spaced, auditory cortical regions.
More germane to the question of the correlation between physiology and perception are the findings of a study by Sharma and Dorman (2000). This study compared the morphology of N1 in response to bilabial consonant-vowel syllables varying in VOT from 90 to 0 ms in Hindi and English listeners. In the former language, those syllables with prolonged pre-voicing (<30 ms) are perceived as /ba/, while those with short VOTs are perceived as /pa/. In English, all these syllables are perceived as /ba/. Latency shifts in N1 latency correlated with VOT, but not perception, as this effect was observed in listeners from both languages. This finding indicates that the obligatory temporal response patterns we observe in primary auditory cortex help shape, but are not the ultimate determinants of, the phonetic perception. It further highlights the importance of language experience, and is in keeping with the known multiple auditory, visual, lexical and linguistic cues that all contribute to phonetic perception.
The nature of the neural elements recorded by our lower impedance electrodes in monkey A1 needs to be also addressed. Multiple lines of evidence support the conclusion that the major contributors to the MUA and PSTHs are cortical action potentials. First, the PSTHs recorded from middle laminae are derived from higher amplitude spikes, while the very small diameter of distal thalamocortical axons will generally produce lower amplitude spikes. The fact that the PSTHs and MUA have nearly identical latencies and other response characteristics supports their predominant cortical origin. Secondly, these responses are concurrent with intracortical negativities in the AEP and CSD sources and sinks whose dipolar spatial distribution are indicative of pyramidal cell activation (e.g. Steinschneider et al., 1994, 2003
; Kisley and Gerstein, 1999
; Rose and Metherate, 2001
; Cruikshank et al., 2002
). Thirdly, latency of the responses is in accord with other studies examining single cell activity in A1 (e.g. Phillips and Hall, 1990
; Heil, 1997
; Recanzone et al., 2000
; Cheung et al., 2001
). Finally, the earliest thalamocortical fiber volley in awake monkey A1 has an onset latency of 56 ms and a peak at 810 ms (Steinschneider et al., 1992
, 2003
). This activity is earlier than the responses seen in the MUA and PSTHs in the present study, indicating that cortical cells are the predominant elements recorded by our electrodes. However, it must be acknowledged that a small contribution from TC fibers to the neural responses cannot be excluded.
Finally, the relationship between the MUA/PSTH responses in the monkey and the AEP components in the human needs to be assessed. The monkey responses represent the initial activation of A1 with a peak at 1520 ms post-stimulus onset (see Steinschneider et al., 1994, 2003
; Eggermont and Ponton, 2002
). In contrast, the principal AEP component examined is a positivity whose peak is
60 ms. We and others have suggested that the homolog of this component in monkeys is a large positive wave peaking around 28 ms that is primarily generated by polysynaptic depolarizations within upper lamina 3 (Steinschneider et al., 1994
; Eggermont and Ponton, 2002
). The resultant current sinks are balanced by more superficial sources, leading to the positivity recorded at the scalp. While the homology between the monkey and human responses are therefore not direct, similar patterns of activity with respect to VOT encoding occur in the upper lamina 3 sinks and more superficial sources of monkey A1 (Steinschneider et al., 2003
). This laminar profile suggests that initial activation in lower lamina 3 induces later synaptic events in upper lamina 3 that would be manifested in the large positive wave in the human.
In conclusion, physiological findings support a temporal processing mechanism in primary auditory cortex as important for neural encoding of VOT. Findings in the monkey bolster the hypothesis that VOT encoding represents, in part, a specific instance of a more general process governing the ability to identify the sequential order of sound events. Perceptual findings using non-speech stimuli modeling VOT indicate that a separation of 20 ms between the onsets of two acoustic events is required for this identification. We show a nearly identical capacity in physiological response patterns of A1 populations. These A1 temporal response patterns are also systematically modulated by interactions between temporal and spectral sound components. Within primary auditory cortex of both humans and monkeys, this modulation appears to be based on the relationship between the tonotopically-organized recording location and the specific frequency components of the sounds. When viewed across the array of activated tissue, the composite temporal response patterns of large-scale neural populations in human A1 vary in a manner that supports a physiologically plausible explanation for the trading relations effect between F1 frequency and VOT boundaries. Given these positive findings, it is essential to appreciate that primary auditory cortical temporal response patterns represent just one informational component that can be used to facilitate discrimination of voiced from unvoiced phonemes. This perceptual process is ultimately decided by the activity of large-scale neural networks utilizing multiple acoustic cues, visual inputs, and higher-order lexical and linguistic constructs.
Address correspondence to Mitchell Steinschneider, MD, PhD, Department of Neurology, Rose F. Kennedy Center, Room 322, Albert Einstein College of Medicine, 1300 Morris Park Avenue, Bronx, NY 10461, USA. Email: steinsch{at}aecom.yu.edu.
![]() |
Acknowledgments |
---|
![]() |
References |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Borsky S, Tuller B, Shapiro LP (1998) How to milk a coat: the effects of semantic and acoustic information on phoneme categorization. J Acoust Soc Am 103:26702676.[CrossRef][ISI][Medline]
Brancazio L, Miller JL, Paré MA (2003) Visual influences on the internal structure of phonetic categories. Percept Psychophys 65:591601.[ISI][Medline]
Brosch M, Schreiner CE (1997) Time course of forward masking tuning curves in cat primary auditory cortex. J Neurophysiol 77:923943.
Brosch M, Schreiner CE (2000) Sequence sensitivity of neurons in cat primary auditory cortex. Cereb Cortex 10:11551167.
Brosch M, Bauer R, Eckhorn, R. (1997) Stimulus-dependent modulations of correlated high-frequency oscillations in cat visual cortex. Cereb Cortex 7:7076.[Abstract]
Calford MB, Semple MN (1995) Monaural inhibition in cat auditory cortex. J Neurophysiol 73:18761891.
Cheung SW, Bedenbaugh PH, Nagarajan SS, Schreiner CE (2001) Functional organization of squirrel monkey primary auditory cortex: responses to pure tones. J Neurophysiol 85:17321749.
Carney AE, Widin GP, Viemeister NF (1977) Noncategorical perception of stop consonants differing in VOT. J Acoust Soc Am 62:961970.[ISI][Medline]
Creutzfeldt O, Hellweg F-C, Schreiner C (1980) Thalamocortical transformation of responses to complex auditory stimuli. Exp Brain Res 39:87104.[ISI][Medline]
Cruikshank SJ, Rose HJ, Metherate R (2002) Auditory thalamocortical synaptic transmission in vitro. J Neurophysiol 87:361384.
deCharms R, Merzenich MM (1996) Primary cortical representation of sounds by the coordination of action-potential timing. Nature 381:610613.[CrossRef][ISI][Medline]
Ditterich J, Mazurek ME, Shadlen MN (2003) Microstimulation of visual cortex affects the speed of perceptual decisions. Nat Neurosci 6:891898.[CrossRef][ISI][Medline]
Eggermont JJ (1994) Neural interaction in cat primary auditory cortex. II. Effects of sound stimulation. J Neurophysiol 71:246270.
Eggermont JJ (1995a) Representation of a voice onset time continuum in primary auditory cortex of the cat. J Acoust Soc Am 98:911920.[ISI][Medline]
Eggermont JJ (1995b) Neural correlates of gap detection and auditory fusion in cat auditory cortex. Neuroreport 6:16451648.[ISI][Medline]
Eggermont JJ (1999) Neural correlates of gap detection in three auditory cortical fields in the cat. J Neurophysiol 81:25702581.
Eggermont JJ, Ponton CW (2002) The neurophysiology of auditory perception: from single units to evoked potentials. Audiol Neurootol 7:7199.[ISI][Medline]
Ehret G (1997) The auditory cortex. J Comp Physiol A 181:547557.[ISI][Medline]
Faulkner A, Rosen S (1999) Contributions of temporal encodings of voicing, voicelessness, fundamental frequency, and amplitude variation to audio-visual and auditory speech perception. J Acoust Soc Am 106:20632073.[CrossRef][ISI][Medline]
Fishman YI, Reser DH, Arezzo JC, Steinschneider M (2000a) Complex tone processing in primary auditory cortex of the awake monkey. I. Neural ensemble correlates of roughness. J Acoust Soc Am 108:235246.[CrossRef][ISI][Medline]
Fishman YI, Reser DH, Arezzo JC, Steinschneider M (2000b) Complex tone processing in primary auditory cortex of the awake monkey. II. Pitch versus critical band representation. J Acoust Soc Am 108:247262.[CrossRef][ISI][Medline]
Fishman YI, Reser DR, Arezzo JC, Steinschneider M (2001a) Neural correlates of auditory stream segregation in primary auditory cortex of the awake monkey. Hear Res 151:167187.[CrossRef][ISI][Medline]
Fishman YI, Volkov IO, Noh MD, Garell PC, Bakken H, Arezzo JC, Howard MA, Steinschneider M (2001b) Consonance and dissonance of musical chords: neural correlates in auditory cortex of monkeys and humans. J Neurophysiol 86:27612788.
Formisano E, Kim D-S, Di Salle F, van de Moortele P-F, Ugurbil K, Goebel R (2003) Mirror-symmetrical tonotopic maps in human primary auditory cortex. Neuron 40:859869.[CrossRef][ISI][Medline]
Freeman JA, Nicholson C (1975) Experimental optimization of current source density techniques for anuran cerebellum. J Neurophysiol 38:369382.
Ganang III WF (1980) Phonetic categorization in auditory word perception. J Exp Psychol Hum Percept Perform 6:110125.[CrossRef][ISI][Medline]
Gehr DD, Komiya H, Eggermont JJ (2000) Neuronal responses in cat primary auditory cortex to natural and altered species-specific call. Hear Res 150:2742.[CrossRef][ISI][Medline]
Georgopoulos AP, Pellizzer G, Poliakov AV, Schieber MH (1999) Neural coding of finger and wrist movements. J Comp Neurosci 6:279288.[CrossRef][ISI][Medline]
Hackett TA, Preuss TM, Kaas JH (2001) Architectonic identification of the core region in auditory cortex of macaques, chimpanzees and humans. J Comp Neurol 441:197222.[CrossRef][ISI][Medline]
Heil P (1997) Auditory cortical onset responses revisited. I. First-spike timing. J Neurophysiol 77:26162641.
Hillenbrand J (1984) Perception of sine-wave analogs of voice onset time stimuli. J Acoust Soc Am 75:231240.[ISI][Medline]
Hirsh IJ (1959) Auditory perception of temporal order. J Acoust Soc Am 31:759767.[ISI]
Holt LL, Lotto AJ, Kluender KR (2001) Influence of fundamental frequency on stop-consonant voicing perception: a case of learned covariation or auditory enhancement? J Acoust Soc Am 109:764774.[CrossRef][ISI][Medline]
Horikawa J, Hosokawa Y, Nasu M, Taniguchi, I (1997) Optical study of spatiotemporal inhibition evoked by two-tone sequences in the guinea pig auditory cortex. J Comp Physiol A 181:677684.[Medline]
Howard MA III, Volkov IO, Granner MA, Damasio HM, Ollendieck MC, Bakken HE (1996a) A hybrid clinical-research depth electrode for acute and chronic in vivo microelectrode recording of human brain neurons. J Neurosurg 84:129132.[ISI][Medline]
Howard MA III, Volkov IO, Abbas PJ, Damasio HM, Ollendieck MC, Granner MA (1996b) A chronic microelectrode investigation of the tonotopic organization of human auditory cortex. Brain Res 724:260264.[CrossRef][ISI][Medline]
Jäncke L, Wüstenberg T, Scheich H, Heinze H-J (2002) Phonetic perception and the temporal cortex. NeuroImage 15:733746.[CrossRef][ISI][Medline]
Joliot M, Ribary U, Llinás, R (1994) Human oscillatory brain activity near 40 Hz coexists with cognitive temporal binding. Proc Natl Acad Sci 91:1174811751.
Kaukoranta E, Hari R, Lounasmaa OV (1987) Responses of the human auditory cortex to vowel onset after fricative consonants. Exp Brain Res 69:1923.[ISI][Medline]
Kewley-Port D, Watson CS, Foyle DC (1988) Auditory temporal acuity in relation to category boundaries; speech and nonspeech stimuli. J Acoust Soc Am 83:11331145.[ISI][Medline]
Kisley MA, Gerstein GL (1999) Trial-to-trial variability and state-dependent modulation of auditory-evoked responses in cortex. J Neuroscience 19:1045110460.
Kluender KR (1991) Effects of first formant onset properties on voicing judgments result from processes not specific to humans. J Acoust Soc Am 90:8396.[ISI][Medline]
Kluender KR, Lotto AJ (1994) Effects of first formant onset frequency on [-voice] judgments result from auditory processes not specific to humans. J Acoust Soc Am 95:10441052.[ISI][Medline]
Kluender KR, Lotto AJ, Jenison RL (1995) Perception of voicing for syllable-initial stops at different intensities: does synchrony capture signal voiceless stop consonants? J Acoust Soc Am 97:25522567.[ISI][Medline]
Kreiter AK, Singer W (1996) Stimulus-dependent synchronization of neuronal responses in the visual cortex of the awake macaque monkey. J Neuroscience 16:23812396.[Abstract]
Krumbholz K, Patterson RD, Seither-Preisler A, Lammertmann C, Lütkenhöner B (2003) Neuromagnetic evidence for a pitch processing center in Heschl's gyrus. Cereb Cortex 13:765772.
Kruse W, Dannenberg S, Kleiser R, Hoffman K-P (2002) Temporal relation of population activity in visual areas MT/MST and in primary motor cortex during visually guided tracking movements. Cereb Cortex 12:466476.
Kuhl P (1986) Theoretical contributions of tests on animals to the special-mechanisms debate in speech. Exp Biol 45:233265.[ISI][Medline]
Kuriki S, Okita Y, Hirata Y (1995) Source analysis of magnetic field responses from the human auditory cortex elicited by short speech sounds. Exp Brain Res 104:144152.[ISI][Medline]
Laguitton V, De Graaf JB, Chauvel P, Liégeois-Chauvel C (2000) Identification reaction times of voiced/voiceless continua: a right-ear advantage for VOT values near the phonetic boundary. Brain Lang 75:153162.[CrossRef][ISI][Medline]
Liégeois-Chauvel C, Giraud K, Badier J-M, Marquis P, Chauvel P (2001) Intracerebral evoked potentials in pitch perception reveal a functional asymmetry of the human auditory cortex. Ann NY Acad Sci 930:117132.
Liégeois-Chauvel C, Musolino A, Badier JM, Marquis P, Chauvel P. (1994) Evoked potentials recorded from the auditory cortex in man: evaluation and topography of the middle latency components. Electroenceph Clin Neurophysiol 92:204214.[CrossRef][ISI][Medline]
Liégeois-Chauvel C, de Graaf JB, Laguitton V, Chauvel P (1999) Specialization of left auditory cortex for speech perception in man depends on temporal coding. Cereb Cortex 9:484496.
Lisker L (1975) Is it VOT or a first-formant transition detector? J Acoust Soc Am 57:15471551.[ISI][Medline]
Lisker L, Abramson AS (1964) A cross-language study of voicing in initial stops: acoustical measurements. Word 20:384422.[ISI]
Lotto AJ, Kluender KR (2002) Synchrony capture hypothesis fails to account for effects of amplitude on voicing perception. J Acoust Soc Am 111:10561062.[CrossRef][ISI][Medline]
Lu T, Liang L, Wang X (2001a) Neural representation of temporally asymmetric stimuli in the auditory cortex of awake primates. J Neurophysiol 85:23642380.
Lu T, Liang L, Wang X (2001b) Temporal and rate representations of time-varying signals in the auditory cortex of awake primates. Nat Neurosci 4:11311138.[CrossRef][ISI][Medline]
McGee T, Kraus N, King C, Nicol T (1996) Acoustic elements of speechlike stimuli are reflected in surface recorded responses over the guinea pig temporal lobe. J Acoust Soc Am 99:36063614.[ISI][Medline]
Merzenich MM, Brugge JF (1973) Representation of the cochlear partition on the superior temporal plane of the macaque monkey. Brain Res 50:275296.[CrossRef][ISI][Medline]
Miller JD, Wier CC, Pastore RE, Kelly WJ, Dooling RJ (1976) Discrimination and labeling of noise-buzz sequences with varying noise-lead times: an example of categorical perception. J Acoust Soc Am 60:410417.[ISI][Medline]
Morel A, Garraghty PE, Kaas JH (1993) Tonotopic organization, architectonic fields, and connections of auditory cortex in macaque monkeys. J Comp Neurol 335:437459.[ISI][Medline]
Müller-Preuss P, Mitzdorf U (1984) Functional anatomy of the inferior colliculus and the auditory cortex: current source density analyses of click-evoked potentials. Hear Res 16:133142.[CrossRef][ISI][Medline]
Näätänen R, Picton T (1987) The N1 wave of the human electric and magnetic response to sound: a review and an analysis of the component structure. Psychophysiology 24:375425.[ISI][Medline]
Nagarajan SS, Cheung SW, Bedenbaugh P, Beitel RE, Schreiner CE, Merzenich MM (2002) Representation of spectral and temporal envelope of twitter vocalizations in common marmoset primary auditory cortex. J Neurophysiol 87:17231737.
Nelken I, Prut Y, Vaadia E, Abeles M (1994) Population responses to multifrequency sounds in the cat auditory cortex: one- and two-parameter families of sounds. Hear Res 72:206222.[CrossRef][ISI][Medline]
Ohlemiller KK, Jones LB, Heidbreder AF, Clark WW, Miller JD (1999) Voicing judgements by chinchillas trained with a reward paradigm. Behav Brain Res 100:185195.[CrossRef][ISI][Medline]
Oram MW, Perrett DI (1992) Time course of neural responses discriminating different views of the face and head. J Neurophysiol 68:7084.
Owren MJ, Hopp SL, Sinnott JM, Petersen MR (1988) Absolute auditory thresholds in three Old World monkey species (Cercopithecus aethiops, C. neglectus, Macaca fuscata) and humans (Homo sapiens). J Comp Psychol 102:99107.[CrossRef][ISI][Medline]
Papanicolaou AC, Castillo E, Breier JI, Davis RN, Simos PG, Diehl, RL (2003) Differential brain activation patterns during perception of voice and tone onset time series: a MEG study. Neuroimage 18:448459.[CrossRef][ISI][Medline]
Parker EM (1988) Auditory constraints on the perception of voice-onset time: the influence of lower tone frequency on judgments of tone-onset simultaneity. J Acoust Soc Am 83:15971607.[ISI][Medline]
Petersen RS, Diamond ME (2000) Spatial-temporal distribution of whisker-evoked activity in rat somatosensory cortex and the coding of stimulus location. J Neuroscience 20:61356143.
Petersen RS, Panzeri S, Diamond ME (2001) Population coding of stimulus location in rat somatosensory cortex. Neuron 32:503514.[CrossRef][ISI][Medline]
Phillips DP (1998) Sensory representations, the auditory cortex, and speech perception. Semin Hear 19:319331.
Phillips DP, Hall, SE (1990) Response timing constraints on the cortical representation of sound time structure. J Acoust Soc Am 88:14031411.[ISI][Medline]
Pisoni DB (1977) Identification and discrimination of the relative onset time of two component tones: implications for voicing perception in stops. J Acoust Soc Am 61:13521361.[ISI][Medline]
Pouget A, Dayan P, Zemel R (2000) Information processing with population codes. Nat Rev Neuroscience 1:125132.[CrossRef][ISI][Medline]
Recanzone GH, Guard DC, Phan ML (2000) Frequency and intensity response properties of single neurons in the auditory cortex of the behaving macaque monkey. J Neurophysiol 83:23152331.
Repp BH (1979) Relative amplitude of aspiration noise as a voicing cue for syllable-initial stop consonants. Lang Speech 22:173189.[ISI][Medline]
Rose HJ, Metherate R (2001) Thalamic stimulation largely elicits orthodromic, rather than antidromic, cortical activation in an auditory thalamocortical slice. Neuroscience 106:331340.[CrossRef][ISI][Medline]
Rotman Y, Bar-Yosef O, Nelken I (2001) Relating cluster and population responses to natural sounds and tonal stimuli in cat primary auditory cortex. Hear Res 152:110127.[CrossRef][ISI][Medline]
Roy SA, Alloway KD (2001) Coincidence detection or temporal integration? What the neurons in somatosensory cortex are doing. J Neurosci 21:24622473.
Scherg M, Vajsar J, Picton TW (1989) A source analysis of the late human auditory evoked potentials. J Cogn Neurosci 1:336355.
Schönwiesner M, von Cramon DY, Rübsamen R (2002) Is it tonotopy after all? Neuroimage 17:11441161.[CrossRef][ISI][Medline]
Schreiner CE (1998) Spatial distribution of responses to simple and complex sounds in the primary auditory cortex. Audiol Neurootol 3:104122.[ISI][Medline]
Schroeder CE, Tenke CE, Givre SJ, Arezzo JC, Vaughan HG Jr (1990) Laminar analysis of bicuculline-induced epileptiform activity in area 17 of the awake macaque. Brain Res 515:326330.[CrossRef][ISI][Medline]
Schwartz AB, Moran DW (1999) Motor control activity during drawing movements: population representation during lemniscate tracing. J Neurophysiol 82:27052718.
Shannon RV, Zeng F-G, Kamath V, Wygonski J, Ekelid M (1995) Speech recognition with primarily temporal cues. Science 270:303304.[Abstract]
Sharma A, and Dorman MF (1999) Cortical auditory evoked potential correlates of categorical perception of voice-onset time. J Acoust Soc Am 106:10781083.[CrossRef][ISI][Medline]
Sharma A, and Dorman MF (2000) Neurophysiologic correlates of cross-language phonetic perception. J Acoust Soc Am 107:26972703.[CrossRef][ISI][Medline]
Sharma A, Marsh CM, and Dorman MF (2000) Relationship between N1 evoked potential morphology and the perception of voicing. J Acoust Soc Am 108:30303035.[CrossRef][ISI][Medline]
Simos PG, Molfese DL, Brenden RA (1997) Behavioral and electrophysiological indices of voicing-cue discrimination: laterality patterns and development. Brain and Lang 57:122150.[CrossRef][ISI]
Simos PG, Breier JI, Zouridakis G, and Papanicolaou AC (1998a) MEG correlates of categorical-like temporal cue perception in humans. Neuroreport 9:24752479.[ISI][Medline]
Simos PG, Breier JI, Zouridakis G, and Papanicolaou AC (1998b) Magnetic fields elicited by a tone onset time continuum in humans. Cogn Brain Res 6:285294.[CrossRef][ISI][Medline]
Simos PG, Diehl RL, Breier JI, Molis MR, Zouridakis G, and Papanicolaou AC (1998c) MEG correlates of categorical perception of a voice onset time continuum in humans. Cogn Brain Res 7:215219.[CrossRef][ISI][Medline]
Sinnott JM, Adams FS (1987) Differences in human and monkey sensitivity to acoustic cues underlying voicing contrasts. J Acoust Soc Am 82:15391547.[ISI][Medline]
Soli SD (1983) The role of spectral cues in discrimination of voice onset time differences. J Acoust Soc Am 73:21502165.[ISI][Medline]
Steinschneider M, Tenke C, Schroeder C, Javitt D, Simpson GV, Arezzo JC, Vaughan HG Jr (1992) Cellular generators of the cortical auditory evoked potential initial component. Electroenceph Clin Neurophysiol 84:196200.[CrossRef][ISI][Medline]
Steinschneider M, Schroeder C, Arezzo JC, Vaughan HG Jr (1994) Speech-evoked activity in primary cortex: effects of voice onset time. Electroenceph Clin Neurophysiol 92:3043.[CrossRef][ISI][Medline]
Steinschneider M, Reser D, Schroeder CE, Arezzo JC (1995a) Tonotopic organization of responses reflecting stop consonant place of articulation in primary auditory cortex (A1) of the monkey. Brain Res 674:147152.[CrossRef][ISI][Medline]
Steinschneider M, Schroeder CE, Arezzo JC, Vaughan HG Jr (1995b) Physiologic correlates of the voice onset time (VOT) boundary in primary auditory cortex (A1) of the awake monkey: temporal response patterns. Brain Lang 48:326340.[CrossRef][ISI][Medline]
Steinschneider M, Volkov IO, Noh MD, Garell PC, Howard III MA (1999) Temporal encoding of the voice onset time (VOT) phonetic parameter by field potentials recorded directly from human auditory cortex. J Neurophysiol 82:23462357.
Steinschneider M, Fishman YI, Arezzo JC (2003) Representation of the voice onset time (VOT) speech parameter in population responses within primary auditory cortex of the awake monkey. J Acoust Soc Am 114:307321.[CrossRef][ISI][Medline]
Stevens KN, Klatt DH (1974) Role of formant transitions in the voiced-voiceless distinction for stops. J Acoust Soc Am 55:653659.[ISI][Medline]
Summerfield Q (1982) Differences between spectral dependencies in auditory and phonetic temporal processing: relevance to the perception of voicing in initial stops. J Acoust Soc Am 72:5161.[ISI][Medline]
Summerfield Q, Haggard M (1977) On the dissociation of spectral and temporal cues to the voicing distinction in initial stop consonants. J Acoust Soc Am 62:435448.[ISI][Medline]
Temerenca S, Simons DJ (2003) Local field potentials and the encoding of whisker deflections by population firing synchrony in thalamic barreloids. J Neurophysiol 89:21372145.
Tremblay KL, Piskosz M, Souza P (2003) Effects of age and age-related hearing loss on the neural representation of speech cues. Clin Neurophysiol 114:13321343.[CrossRef][ISI][Medline]
Wallace MN, Johnson PW, Palmer AR (2002) Histochemical identification of cortical areas in the auditory region of the human brain. Exp Brain Res 143:499508.[CrossRef][ISI][Medline]
Wang, X, Merzenich MM, Beitel R, Schreiner CE (1995) Representation of a species-specific vocalization in the primary auditory cortex of the common marmoset: temporal and spectral characteristics. J Neurophysiol 74:26852706.
Wood CC (1976) Discriminability, response bias, and phoneme categories in discrimination of voice onset time. J Acoust Soc Am 60:13811389.[ISI][Medline]
Wood CC, Wolpaw JR (1982) Scalp distribution of human auditory evoked potentials. II. Evidence for overlapping sources and involvement of auditory cortex. Electroenceph Clin Neurophysiol 54:2538.[CrossRef][ISI][Medline]
Wyss R, König P, Verschure PFMJ (2003) Invariant representations of visual patterns in a temporal population code. Proceed Natl Acad Sci 100:324329.[CrossRef]
|