Brain Research Unit, Low Temperature Laboratory, Helsinki University of Technology, Espoo, Finland
Address correspondence to Tiina Parviainen, Brain Research Unit, Low Temperature Laboratory, PO Box 2200, FIN-02015 HUT, Finland. Email: tiina{at}neuro.hut.fi.
![]() |
Abstract |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Key Words: dyslexia MEG N100 speech perception
![]() |
Introduction |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
During the past decade, cortical areas specifically involved in speech sound analysis have been explored using functional magnetic resonance imaging (fMRI) and positron emission tomography (PET). Speech stimuli have been shown to evoke more widespread activation than nonspeech stimuli in the superior temporal cortex bilaterally or with slight left-hemisphere predominance (Demonet et al., 1992; Zatorre et al., 1992
; Binder et al., 1994
; Vouloumanos et al., 2001
). When searching for the neural basis of phonetic processing, it is crucial to contrast speech sounds with acoustically comparable sounds to exclude the possibility of finding differences only based on complexity. Contrasting phonetic versus acoustic analysis has revealed activation in the left superior and middle temporal gyri (STG and MTG) and superior temporal sulcus (STS) (Binder et al., 2000
; Benson et al., 2001
; Vouloumanos et al., 2001
).
Identification of the cortical loci selectively activated by speech sounds, however, provides only partial information. Speech perception is a very fast process the signal is transformed from acoustic features to meaning within fractions of a second. Thus, especially for the early steps in the analysis of speech signal, it is likely that the neural representations of different stages and transformations are activated very briefly. The time course of auditory processing can be followed using neurophysiological measures, electroencephalography (EEG) and magnetoencephalography (MEG).
Semantic processing of spoken language starts around 200300 ms after sound onset, as demonstrated, e.g. by studies using sentences with semantically congruent or incongruent final words (cf. Connolly et al., 1994; Helenius et al., 2002b
). Phonetic/phonological information must thus be accessible by this time. Within the first 200 ms, speech-specificity has been tested using oddball paradigms. In these setups, frequent (standard) stimuli are interspersed with infrequent (deviant) stimuli. The difference between the responses to deviant and standard stimuli in auditory cortex is known as the mismatch response, or mismatch negativity (MMN) in EEG literature (Näätänen, 1992
; Alho, 1995
). The MMN typically reaches the maximum at
150 ms after stimulus onset. It is seen as a reflection of auditory sensory memory at the neuronal level. MMN behaves differently for speech and nonspeech stimuli (Aulanko et al., 1993
; Phillips et al., 2000
; Shtyrov et al., 2000
; Vihla et al., 2000
). Moreover, MMN responses to phoneme contrasts in the native language are stronger than those to non-native contrasts (Näätänen et al., 1997
). Phonetic representation of the speech sound must thus be available at this time window to enable memory traces based on phonetic (or phonological) labels.
Whether speech-specific analysis is reflected in the neural processing before MMN time window is currently not established. The MMN signal is preceded by a robust activation of the auditory cortex at about 100 ms after sound onset, referred to as the N100m (or N100 in EEG literature). Some studies suggest phonetic/phonological effects in this response but others not (Kuriki and Murase, 1989; Eulitz et al., 1995
; Gootjes et al., 1999
; Tiitinen et al., 1999
). Gootjes et al. (1999)
found significantly stronger N100m responses to vowels than to tones or piano notes over the left but not the right hemisphere. However, Eulitz et al. (1995)
and Tiitinen et al. (1999)
found no significant difference in the strength of the N100m response to speech and tone stimuli, although the N100m response was slightly later for speech sounds than for tones, in both hemispheres. The variability of the results is likely to be largely due to variability of the stimulus materials. In many of these studies, the main research question did not require careful acoustic matching of the speech and nonspeech stimuli, or it was not attempted. Thus, results differing for speech versus nonspeech sounds may reflect acoustic variation rather than sensitivity to speech sounds per se. It is also worth noting that in any single study the stimuli have typically been sounds with stable frequencies (i.e. vowel type sounds) (Eulitz et al., 1995
; Tiitinen et al., 1999
; Vihla and Salmelin, 2003
) or transition sounds (i.e. CV-syllable type of sounds) (Shtyrov et al., 2000
), but not both. As natural language is a mixture of these sound types, it may be important to allow acoustic variation among the speech stimuli when evaluating cortical analysis of speech versus nonspeech sounds.
Characterization of the time windows and hemispheric balance of acoustic and phonetic/phonological analysis is essential not only for understanding normal speech perception but also for understanding the neural basis of dyslexia. Dyslexic individuals are known to have problems in tasks requiring auditory phonetic analysis (Bradley and Bryant, 1983; Shankweiler et al., 1995
). At the neuronal level, dyslexic subjects show delayed semantic processing at 300400 ms post-stimulus (Helenius et al., 2002b
), and abnormalities in the preceding MMN response (Baldeweg et al., 1999
; Schulte-Körne et al., 2000
) and N100m response (Helenius et al., 2002b
). These findings clearly point to problems within the first 200 ms after speech onset. It would be tempting to interpret the unusual cortical activation patterns in the dyslexic subjects as signatures of their known phonological problems but, obviously, they could equally well be associated with abnormalities in basic acoustic processing. The functional role of the N100m time window in speech versus nonspeech analysis is thus a pressing issue in dyslexia research, as well.
In the present study, we used whole-head MEG to focus on the role of the N100m auditory cortical response in acoustic and phonetic processing. First, we investigated whether the N100m response is sensitive to speech in a normal subject population, i.e. whether the strength or timing of the neural response differ between speech and nonspeech sounds. Our speech stimuli were two synthetic vowels and consonantvowel syllables. The nonspeech stimuli were complex sounds and simple sine wave tones that were spectrally and temporally carefully matched with the speech stimuli. Second, we tested these same speech and nonspeech stimuli on a group of dyslexic individuals to investigate whether they show deviation from the response pattern seen in controls either for all sound types or specifically for speech sounds.
![]() |
Materials and Methods |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
The stimuli were synthetic speech sounds, complex nonspeech sounds and simple sine wave tones (Fig. 1). The duration of all stimuli was 150 ms. The speech sounds were Finnish vowels (V; /a/, /u/) and consonantvowel syllables (CV; /pa/, /ka/) created using a Klatt synthesizer (Klatt, 1980) for Macintosh (Sensimetrics, Cambridge, MA, USA). The fundamental frequency (F0) decreased steadily from 118 to 90 Hz, resembling a normal male voice. The formant frequencies F1, F2 and F3 for vowel /a/ were 700, 1130 and 2500 Hz, and for vowel /u/, 340, 600 and 2500 Hz, respectively. These values were based on studies of Finnish speech sounds and formant structure (Wiik, 1965
; Iivonen and Laukkanen, 1993
) and subjective evaluation of vowel and consonant quality and intelligibility. The formant bandwidths in both vowels were 90 Hz for F1, 100 Hz for F2 and 60 Hz for F3. The vowel envelopes had 15 ms fade-in and fade-out periods.
|
The nonspeech stimuli were created in Sound Edit (MacroMedia, San Francisco, CA, USA). They were simple sine wave tones and complex sounds combined from three sine wave tone components of exactly the same frequency as the formants of each of the four speech sounds. To retain the transition difference between /pa/ and /ka/ also in the sine wave tones, these stimuli were composed of the F2 frequency of each speech sound. The envelopes of the nonspeech sounds were similar to the speech sounds including 15 ms fade-in and fade-out periods and a slope fade-in for the nonspeech equivalents of the CV stimuli. Although acoustically carefully matched, none of the nonspeech sounds were perceived as speech sounds.
The amplitudes of the different sounds were adjusted with elongated versions of the original sounds so that at the end of the sound delivery system, measured with artificial ear and spectrum analyzer calibrated to ear sensitivity, the sound amplitudes differed by <2 dB (SPL).
Subjects
Subjects were 10 normally reading adults (2339 years; five females) and 10 adults with developmental dyslexia (2039 years; five females). The subjects gave their informed consent to participate in the study. They were native Finnish speakers, right-handed (except for one control subject), and had no history of hearing loss or neurological abnormalities. The dyslexic adults were selected on the basis of self-reported early history of reading problems. They had all been tested for dyslexia or had received special tutoring for reading difficulties during their school years. The average education level of the control (14 years) and dyslexic groups (13 years) was similar.
Behavioral Tests
The dyslexic subjects were tested for general linguistic and non-linguistic abilities using a subset of the standardized Finnish version of the Wechsler Adult Intelligence Scale - Revised (WAIS-R) and of Wechsler Memory Scale - Revised (WMS-R) (Vocabulary, Comprehension, Similarities, Block Design, Digit Span, Visual Span) tests (Wechsler, 1981, 1987
; Woods et al., 1998a
; Woods et al., 1998b
). The reading and naming speed of dyslexic subjects were measured as well. Reduced reading speed (Leinonen et al., 2001
) and naming speed (Wolf and Obregon, 1992
) have been found to be reliable markers for dyslexia. In the Oral Reading test subjects were asked to read aloud a narrative printed on a sheet of paper. The reading speed was measured as words per minute. In the Rapid Automatized Naming test (Denckla and Rudel, 1976
) and in the Rapid Alternating Stimulus naming test (Wolf, 1986
) subjects were asked to name a 5 x 10 matrix of colors, numbers and letters and the naming speed was measured. The results of these tests were compared against norm data of 38 (Oral Reading, RAS) and 15 (RAN) normally reading subjects.
In addition, the following auditorily presented phonological tests were administered. In the Phoneme Deletion test (Leinonen et al., 2001) 16 words with 410 letters and with 24 syllables were presented via headphones. Subjects were asked to pronounce each stimulus without the second phoneme (e.g. studio
sudio, kaupunki
kupunki). The number of correct responses was calculated. In the Syllable Reversal test (Leinonen et al., 2001
) 10 words and 10 pseudowords with 59 letters and with 34 syllables were presented via headphones and subjects were asked to change the order of the last two syllables and to say the new pseudoword aloud (e.g. aurinko
aukorin, rospiemi
rosmipie). The number of correct responses was calculated. For Phoneme Deletion and Syllable Reversal tests the vocal reaction times to the stimuli were measured from a microphone signal. In the Spelling test (Leinonen et al., 2001
) the subjects were asked to spell to dictation 10 pseudowords and 10 words with 614 letters and with 27 syllables. The number of errors was calculated. These phonological tests were administered also to seven of the control subjects participating in this study.
MEG Measurement Procedure
Measurements were conducted in a magnetically shielded room. Stimulus presentation was controlled by the Presentation program (Neurobehavioral Systems Inc., San Francisco, CA) running on a PC. To normalize the stimulus intensities across subjects, individual hearing thresholds were determined before the actual measurement using simple 1 kHz tones of 50 ms with 15 ms rise and fall times. The stimuli were delivered to the subject through plastic tubes and earpieces at 65 dB above the subjective hearing threshold. The subjects were watching a silent film and were instructed to ignore the auditory stimuli.
There were two sessions. In the first session the subject heard a randomized sequence of vowel sounds and their nonspeech equivalents (synthetic /a/ and /u/, complex sound equivalents of /a/ and /u/, and tone equivalents of /a/ and /u/). In the second session, the stimuli were CV sounds and their nonspeech equivalents (synthetic /pa/ and /ka/, complex sound equivalents of /pa/ and /ka/, and tone equivalents of /pa/ and /ka/). The order of the sessions was randomized across subjects. Stimuli were separated by an interstimulus interval of 2 s and they were presented monaurally to the right ear to maximally engage the language-dominant left hemisphere. Each session lasted for 2030 min and the sessions were separated by a 23 min break.
MEG Recordings
MEG signals were recorded using a helmet-shaped 306-channel whole-head system (VectorviewTM, Neuromag Ltd, Helsinki, Finland) with two orthogonally oriented planar gradiometers and one magnetometer in 102 locations. Signals were bandpass filtered at 0.03200 Hz, sampled at 600 Hz, and averaged on-line from 200 ms before stimulus onset to 800 ms after it. The horizontal and vertical electro-oculograms were recorded for on-line rejection of epochs contaminated by blinks or saccades. About 100 artifact-free epochs were gathered and averaged separately for each of the 12 stimulus categories. The position of the subject's head with respect to the measurement helmet was determined at the beginning of each measurement session by briefly energizing four head position indicator coils attached to the subject's head. The location of the coils was determined with respect to three anatomical landmarks (preauricular points and nasion) using a 3-D digitizer (Polhemus, Colchester, VT). The location of the active brain areas could thus be displayed on anatomical MR images after identification of the landmarks in the MR images.
Data Analysis
MEG signals were low-pass filtered at 40 Hz before further analysis. The activated areas were modeled as equivalent current dipoles (ECD), which represent the mean location, direction and strength of the current flowing in a given cortical patch (Hämäläinen et al., 1993). The ECDs were determined from standard subsets of 46 planar gradiometers (= 23 pairs) that covered the 100 ms auditory field pattern over each hemisphere. A spherical estimation was used to describe the conductivity profile of the brain. The sphere model was fitted to optimally describe the curvature of the temporal areas, using the individual anatomical MR images when available (eight control subjects and four dyslexic subjects), or a sphere model that was an average of the individual parameters from all our subjects with MRIs, calculated separately for males and females.
In every subject, ECDs were first determined separately for each stimulus. The goodness-of-fit of the obtained two-dipole models (one dipole in each hemisphere) varied from 85 to 95% across subjects and different stimuli. Within each subject, the source locations varied on average by 1 cm and the orientations of current flow by 25° across the different stimuli, in both hemispheres. The close similarity of the ECDs found in the different stimulus conditions made it possible to improve the signal-to-noise ratio by forming an average of the responses to all stimuli in each subject (four stimulus categories: two vowels and two syllables; three stimulus types: tone, complex sound, speech sound; 10901354 trials in total). The left- and right-hemisphere ECDs modeled in this averaged data set were then used to account for the MEG signals recorded for each stimulus. The locations and orientations of the two ECDs were kept fixed, while their amplitudes were allowed to vary to best explain the signals recorded by all sensors over the entire averaging interval. This common two-dipole model accounted for the MEG signals in each stimulus condition equally well as the two-dipole models which had been found separately for each stimulus condition (goodness-of-fit varied from 83 to 94%). The use of the common set of two ECDs for all conditions in each individual subject made it possible to directly compare the time behavior of activation in these cortical areas (source waveforms) across all stimuli.
Statistical Tests
A repeated-measures analysis of variance (ANOVA) with stimulus category (/a/, /u/, /pa/, /ka/), stimulus type (speech sound, complex nonspeech sound, simple tone) and hemisphere (left, right) as within-subjects factors was used for evaluating systematic effects in activation strengths and latencies within each subject population. Source locations were tested, separately for each spatial dimension (x = axial plane from left ear to right ear, y = axial plane, orthogonal to x, towards the nasion, z = sagittal plane from inferior to superior), and orientations of current flow were also tested. For group comparisons, a mixed-model ANOVA was employed with group (controls, dyslexics) as the between-subjects factor.
For behavioral tests, the reaction times and error scores between subject groups were analyzed using Student's t-test. To test for correlations between phonological abilities and cortical measures we calculated Pearson's correlation coefficient.
![]() |
Results |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Figure 2 illustrates examples of MEG signals recorded in one subject. Responses to different sound types (speech sound, complex nonspeech sound and simple tone) are presented on the MEG sensors that showed the maximum amplitude over the left and right auditory cortex. Figure 3 shows the group mean location of the equivalent current dipoles that best represented the activated cortical areas in each subject, superposed on an MR image averaged across the control subjects (Schormann et al., 1996; Woods et al., 1998a
, 1998b
). In a few cases, the dipoles were found in the Heschl's gyrus but mostly they were localized to Heschl's sulcus or posterolateral to it.
|
|
Strength of N100m Response
The strength of the N100m response (Table 1 and Fig. 4a) varied by stimulus type in the left hemisphere but not in the right hemisphere [stimulus type, F(2,18) = 10.2, P < 0.001; and stimulus type-by-hemisphere: interaction F(2,18) = 13.4, P < 0.001]. In the left hemisphere, the responses were stronger to speech sounds than to complex nonspeech sounds and simple tones [F(2,18) = 14.7, P < 0.001]. The effect of stimulus type was significant for all stimulus categories (a: P < 0.001, u: P < 0.001, pa: P < 0.01, ka: P < 0.001).
|
|
Timing of N100m Response
The onset latency (time point when signal crosses the level of standard deviation of the prestimulus baseline) did not show systematic variation with sound type. However, the build-up of the N100m response in the left and right hemisphere differentiated between speech and nonspeech sounds (Fig. 5). For speech sounds, the ascending slope of the N100m response (increase of amplitude versus time) was steeper in the left than right hemisphere but, for the nonspeech sounds, there was no significant difference between the two hemispheres [stimulus type-by-hemisphere interaction, F(2,18) = 4.2, P < 0.05; hemisphere effect for speech sounds F(1,9) = 7.8, P < 0.05, complex sounds F(1,9) = 2.8, P = 0.1, and sine wave tones F(1,9) = 2.5, P = 0.2)].
|
The effect of stimulated ear was subsequently tested in 7 of the 10 subjects that participated in the original study. Stimuli presented to the left ear (/a/ and /pa/ and their nonspeech equivalents) evoked a similar activation pattern as stimuli presented to the right ear (Fig. 6). In the left hemisphere, activation was stronger to speech than complex and simple nonspeech sounds but in the right hemisphere no general effect of stimulus level was detected [effect of level, F(2,12) = 9.3, P < 0.01; level-by-hemisphere interaction, F(2,12) = 5.0, P < 0.05]. Thus, the sensitivity of the N100m strength in different hemispheres for speech versus nonspeech sounds was not affected by changing the stimulated ear.
|
There were no systematic group differences in the location of the activated areas. As in controls, the source location was slightly affected by stimulus type (13 mm between speech and nonspeech conditions).
Comparison of N100m Strength in the Two Subject Groups
The N100m source strength showed no main effect of subject group, nor significant interactions. Thus, similarly to controls, also in the dyslexic subjects the N100m strength differentiated between speech and nonspeech sounds in the left hemisphere [F(2,18) = 8.2, P < 0.01] but not in the right hemisphere [F(2,18) = 1.5, P = 0.2] (Fig. 4b). However, in the right hemisphere there was a tendency towards generally weaker activation in the dyslexic than control subjects [effect of group in right hemisphere F(1,18) = 3.6, P = 0.08]. In a separate ANOVA for dyslexic subjects, the N100m strength differed significantly between the hemispheres [left 53 ± 7 nAm, right 40 ± 4 nAm, F(1,9) = 5.5, P < 0.05], while in the control subjects the overall level of activation between the hemispheres was very similar [left 54 ± 7 nAm, right 55 ± 7 nAm, F(1,9) = 0.01, P = 0.9].
Comparison of the N100m Timing in the Two Subject Groups
The build-up of the N100m response showed a subtle effect of subject group for speech sounds but not for nonspeech sounds [effect of group for speech sounds, F(1,18) = 4.9, P < 0.05; complex nonspeech sounds, F(1,18) = 0.9, P = 0.3; sine wave tones, F(1,18) = 1.9, P = 0.2]. The N100m for speech sounds was found to rise more gradually in dyslexic than control subjects, similarly in both hemispheres.
The peak latency of the N100m response (Table 1) showed a significant group-by-hemisphere interaction [F(1,18) = 5.4, P < 0.05]. In a separate analysis for each hemisphere the peak latency in the left hemisphere tended to be longer in dyslexic than control subjects, but this difference only approached significance [F(1,18) = 3.0, P = 0.1]. In the right hemisphere, the groups showed very similar timing of activation [F(1,18) = 0.007, P = 0.9]. When the dyslexic subjects were tested separately, the typical pattern of an earlier response in the contralateral left than ipsilateral right hemisphere found in controls was not evident (left 104 ± 5 ms, right 108 ± 5 ms) (see Fig. 5). Nevertheless, the response to simple tones reached the maximum first and the response to speech sounds last, similarly in both hemispheres, as in the control group [main effect of stimulus type F(2,18) = 7.1, P < 0.01].
Behavioral Results and Correlations to MEG Responses in Dyslexic versus Control Subjects
All the dyslexic subjects had normal intelligence, as measured by the general linguistic and non-linguistic cognitive tests (WAIS-R, WMS-R) (Table 2). The dyslexic subjects were significantly slower than normally reading controls in Oral Reading test [mean difference 59 words, t(46) = 5.8, P < 0.001] and Rapid Naming tests [mean difference in RAS 9 s, t(46) = 5.0, P < 0.001; and in RAN 5 s, t(23) = 2.5, P < 0.05]. Control subjects in the present study (7/10 tested) did not differ from the larger normative data set in either Oral Reading [t(35) = 0.7, P = 0.5] or Rapid Naming [RAS, t(35) = 1.1, P = 0.3; RAN, t(12) = 1.3, P = 0.2]. In the more specific phonological tests the dyslexic subjects were significantly slower and more error-prone than the control subjects. The reaction times of the dyslexic individuals were longer than those of the control subjects in the auditorily presented Phoneme Deletion test [difference on average 3.7 s, t(15) = 4.6, P < 0.001] and Syllable Reversal test [difference on average 5.4 s, t(15) = 4.8, P < 0.001]. Dyslexic subjects also made significantly more errors in the Phoneme Deletion [t(15) = 2.5, P < 0.05], Syllable Reversal [t(15) = 2.3, P < 0.05] and Spelling tests [t(15) = 2.9, P < 0.05] than did control subjects.
|
|
![]() |
Discussion |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
The N100m response was fastest to simple tones. The peak latency was systematically delayed to complex sounds and, further, to speech sounds, similarly in both hemispheres. However, the strength of the N100m activation displayed interesting hemispheric specialization. The responses were stronger for speech than nonspeech sounds in the left auditory cortex but not in the right auditory cortex, independent of the stimulated ear. Thus, while both hemispheres were involved in the analysis of all sound types, the relative contribution of the left auditory cortex was increased when the stimuli were speech sounds.
The present findings agree with and extend earlier reports on speech/nonspeech processing and N100m, which have shown stronger amplitude for vowels than piano notes or tones (Gootjes et al., 1999), longer latencies for vowels than tones (Eulitz et al., 1995
; Tiitinen et al., 1999
) or leftward shift of hemispheric balance for natural vowels as compared with complex tones (Vihla and Salmelin, 2003
). Using acoustically carefully matched speech and nonspeech sounds, we demonstrate that these effects are likely to be tied together. The increase of amplitude in speech sound analysis is lateralized to the left hemisphere, resulting in a leftward shift of activation when hearing speech sounds. The increase in latency for speech sounds occurs bilaterally. We also show that the leftward shift of activation is not markedly affected by the acoustic structure of the speech stimuli (vowels, CV syllables).
One may picture the build-up of the N100m response as a signature of a process where an ever-larger number of auditory cortical neurons are firing synchronously. For a constant rate of neuronal recruitment, a delay in the peak latency would be associated with stronger peak activation. The combined increase of peak latency and N100m strength for speech versus complex versus simple nonspeech sounds in the left hemisphere could certainly be interpreted this way. On the other hand, the right-hemisphere effect of increasing peak latency with no accompanying changes in activation strength suggests a slower rate of neuronal recruitment or less synchronous firing of neuronal populations for increasing sound complexity.
Interestingly, the ascending slope of the N100m response was significantly steeper in the left than right hemisphere for speech sounds but more similar in the two hemispheres for the nonspeech sounds. This observation speaks for a qualitative difference between the analysis of speech and nonspeech sounds in the left auditory cortex by 100 ms. It thus appears that, on top of acoustic processing per se which may be affected by varying the spectral composition or temporal structure of the sounds, the N100m response may also reflect speech-specific processing.
At the cellular level, speech-specificity could mean that neurons generating the response prefer sounds that form phonetically (linguistically) relevant combinations of acoustic features. Acoustically, speech sounds do not have any single unique property different from nonspeech sounds but rather represent particular (unique) combinations of different properties (Stevens, 1980). Although there is plenty of information available on how phonetically important features are encoded in the cochlear nucleus and auditory nerve (see e.g. Delgutte, 1999
), the combinations of features in speech sounds that are critical for analysis at the cortical level are less well defined. The present study implies that a simple combination of formant frequencies does not suffice as the N100m response differed from that evoked by simple speech sounds.
Combination sensitive neurons, originally proposed by Suga et al. (1978) in a study on auditory system of echolocating bats, have been investigated in a number of animal species and recently also in nonhuman primates (Rauschecker et al., 1995
). In the macaque, neurons located posterior to the primary auditory cortex of the left hemisphere (roughly corresponding to the location of our N100m source areas) responded better to complex sounds, e.g. species-specific calls, than to simple tones (Rauschecker et al., 1995
). This kind of preference is suggested to be the result of nonlinear summation of inputs from more narrowly tuned neurons in the primary auditory cortex (Rauschecker et al., 1995
; Rauschecker, 1998
).
Some degree of correspondence between nonhuman primates and humans is suggested by the observation that increased stimulus complexity (band-passed noise versus pure tones) results in similarly enhanced activation in humans, in corresponding areas posterior to the primary auditory cortex (Wessinger et al., 2001). However, as phonetics of human speech is not directly comparable to animal communication sounds, nor is it known whether analysis of speech sounds uses the same computations as other complex sounds, these observations cannot be unequivocally linked to human speech perception.
During the recent years, much has been learned about the functional anatomy of auditory processing of complex sounds in humans but detailed information about the neural processes still remains largely unestablished. At the anatomical level, it is known that the primary auditory cortex located in the Heschl's gyrus is surrounded by non-primary auditory areas anteriorily, laterally and posteriorily (for a review, see Hall et al., 2003). With time-sensitive imaging methods it has been shown that by 100 ms the activation is largely generated in nonprimary auditory areas posterior and lateral to the primary auditory cortex, in the planum temporale (PT) (Liegeois-Chavel et al., 1994
; Lütkenhöner and Steinstrater, 1998
).
Some hemodynamic studies of speech and nonspeech processing have suggested a linguistically specialized role for the PT and the surrounding cortex (Zatorre et al., 1992; Benson et al., 2001
; Vouloumanos et al., 2001
), while other studies have seen it as part of a basic acoustic analysis network and thus relevant for processing of both speech and nonspeech sounds (Binder et al., 1996
, 2000
). In agreement with the latter view, the N100 response is generated to any kind of abrupt change in the auditory environment (Hari, 1990
). Here, we found a strong N100m response to both speech and nonspeech sounds which showed a small but significant modulation by the speech content of the stimulus. Taking into account the inertia of blood-flow measures, stimulus-dependent variation of transient neural responses like the N100m may well go undetected in PET or fMRI. The different time windows accessible with the different imaging methods may have a considerable effect on which part of the network is detected. Our MEG results suggest that at 100 ms after stimulus onset, activation of the PT and the adjacent auditory cortex reflects acoustic but also speech-specific analysis.
What is the exact nature of the linkage between speech specific properties in sound and neuronal firing remains to be clarified. Based on her psychoacoustical experiments, Kuhl (2000) has proposed that the statistical properties of auditory input shape the auditory processing system in infancy to enhance language perception. This view would suggest that, whatever the critical feature combinations in speech may be, experience has a major role in creating the sensitivity for speech.
Implications for Acoustic versus Speech-specific Analysis in Dyslexia
The pattern of speech versus nonspeech differentiation in control subjects was reproduced in the dyslexic group. However, group differences emerged in the interhemispheric timing of the N100m response and in the overall balance of the N100m activation strength, similarly for speech and nonspeech sounds. In controls, the response was earlier in the left (contralateral) than right (ipsilateral) hemisphere, but in dyslexics the left hemisphere response was delayed and N100m reached the maximum at the same time in the left and right hemispheres. Furthermore, the right-hemisphere responses were weaker than the left-hemisphere responses whereas in the control group the overall level of activation was similar across the two hemispheres.
The unusual timing and amplitude effects could reflect separate processes but they can also be readily understood as components of a single process. As the activation in contralateral auditory cortex is thought to modulate the ipsilateral auditory cortex via callosal connections (Mäkelä and Hari, 1992; Oe et al., 2002
), a delay in the left-hemisphere N100m response could reduce the strength of the right-hemisphere N100m. This would result in the combination of timing and amplitude effects observed in our dyslexic subjects. Why is the left-hemisphere N100m response delayed in dyslexic individuals? Normally, the contra- and ipsilateral N100m responses are systematically slower in the left than in the right hemisphere for simple tones (Salmelin et al., 1999
). The longer processing time in the left hemisphere may be related to stronger connections between the Heschl's gyrus (primary auditory cortex) and the adjacent PT in the left than right hemisphere (Penhune et al., 1996
). Any irregularities in this interaction could cause a delay in the build-up of the N100m response. Interestingly, abnormalities in the development of the left PT (or left versus right PT) and perisylvian regions have been suggested by post-mortem (e.g. Galaburda et al., 1985
; for a review, see Galaburda, 1993
), anatomical MRI (e.g Hynd et al., 1990
; Leonard et al., 1993
) and animal studies (for a review, see Galaburda, 1994
), which could affect the interaction between Heschl's gyrus and PT and, further, the N100m response to auditory stimuli. However, it is important to note that the relationship between abnormalities of the planum temporale and dyslexia may be more complex, varying e.g. with hand preference and general verbal ability (see e.g. Rumsey et al., 1997
; Eckert and Leonard, 2000
).
The present data suggest changes in general auditory processing in dyslexia in the time window when speech-specific information is extracted and the (left) PT becomes involved in the process. As the stimuli were delivered to the right ear only, we must remain cautious about the hemispheric specificity of the effect. In a PET study of word repetition, McCrory et al. (2000) used binaural stimuli and found abnormally weak activation of the right auditory cortex in dyslexic adults, which would speak for hemisphere-specific effects. McCrory et al. (2000)
interpreted their finding as reflecting particular emphasis on phonetic (left hemisphere) and de-emphasis on non-phonetic (right hemisphere) auditory processing in dyslexia. In the present data set, however, reduced right-hemisphere activation was detected for speech and nonspeech stimuli alike during passive listening, thus rendering a purely linguistic explanation rather unlikely.
To allow direct comparison between speech and nonspeech sounds, the stimuli were acoustically matched as well as possible, and they were as simple as possible. Therefore, it is not reasonable to directly compare the present data with previous MEG studies of speech or nonspeech processing in dyslexia which used rapidly successive nonspeech sounds (Nagarajan et al., 1999), paired speech or nonspeech sounds not matched for intensity (Helenius et al., 2002a
), or natural speech sounds (Helenius et al., 2002b
) on quite specific groups of dyslexics (pronounced auditory problems, strong family history of dyslexia). Nevertheless, the important common finding in all these studies is that differences in auditory processing between control and dyslexic groups were found in the N100m response.
To conclude, we provide evidence that activation arising from the PT and the surrounding auditory cortex at 100 ms after sound onset is sensitive to phonetic content in the speech signal. This claim is based on the significant increase in activation strength and rate of signal build-up in the left hemisphere for speech sounds as compared with complex and simple nonspeech sounds. In dyslexic subjects, the altered hemispheric balance in both activation strength and timing are proposed to be linked to abnormalities within the left PT or in the communication between the PT and the primary auditory cortex which affect all auditory processing, including phonetic analysis. A general auditory impairment within the time window of phonetic analysis is consistent with reports on both phonological impairment (Rumsey et al., 1992; Studdert-Kennedy and Mody, 1995
; Mody et al., 1997
; Helenius et al., 2002a
) and on basic auditory deficit (Tallal et al., 1993
; Hari and Kiesilä, 1996
; Fitch et al., 1997
; Ahissar et al., 2000
; Amitay et al., 2002
; Renvall and Hari, 2002
) in dyslexia.
![]() |
Acknowledgments |
---|
![]() |
References |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Alho K (1995) Cerebral generators of mismatch negativity (MMN) and its magnetic counterpart (MMNm) elicited by sound changes. Ear Hear 16:3851.[ISI][Medline]
Amitay S, Ahissar M, Nelken I (2002) Auditory processing deficits in reading disabled adults. J Assoc Res Otolaryngol 3:302320.[CrossRef][Medline]
Aulanko R, Hari R, Lounasmaa O, Näätänen R, Sams M (1993) Phonetic invariance in the human auditory cortex. Neuroreport 4:13561358.[ISI][Medline]
Baldeweg T, Richardson A, Watkins S, Foale C, Gruzelier J (1999) Impaired auditory frequency discrimination in dyslexia detected with mismatch evoked potentials. Ann Neurol 45:495503.[CrossRef][ISI][Medline]
Benson R, Whalen DH, Richardson M, Swainson B, Clark VP, Lai S, Liberman AM (2001) Parametrically dissociating speech and nonspeech perception in the brain using fMRI. Brain Lang 78:364396.[CrossRef][ISI][Medline]
Binder JR, Rao SM, Hammeke TA, Yetkin FZ, Jesmanowicz A, Bandettini PA, Wong EC, Estkowski LD, Goldstein MD, Haughton WM, Hyde JS (1994) Functional magnetic resonance imaging of human auditory cortex. Ann Neurol 35:662672.[CrossRef][ISI][Medline]
Binder JR, Frost JA, Hammeke TA, Rao SM, Cox RW (1996) Function of the left planum temporale in auditory and linguistic processing. Brain 119:12391247.[Abstract]
Binder J, Frost J, Hammeke T, Bellgowan P, Springer J, Kaufman J, Possing E (2000) Human temporal lobe activation by speech and nonspeech sounds. Cereb Cortex 10:512528.
Bradley L, Bryant P (1983) Categorizing sounds and learning to read a causal connection. Nature 301:419421.[CrossRef][ISI]
Connolly JF, Phillips NA (1994) Event-related potential components reflect phonological and semantic processing of the terminal word of spoken sentence. J Cogn Neurosci 6:256266.[ISI]
Delgutte B (1999) Auditory neural processing of speech. In: The handbook of phonetic sciences (Hardcastle W, Laver J, eds), pp. 507538. Oxford: Blackwell Publishers.
Demonet J, Chollet F, Ramsay S, Cardebat D, Nespoulous JL, Wise R, Rascol A, Frackowiak R (1992) The anatomy of phonological and semantic processing in normal subjects. Brain 115:17531768.[Abstract]
Denckla M, Rudel R (1976) Rapid automatized naming (R.A.N): dyslexia differentiated from other learning disabilities. Neuropsychologia 14:471479.[CrossRef][ISI][Medline]
Eckert M, Leonard C (2000) Structural imaging in dyslexia: the planum temporale. Ment Retard Dev Disabil Res Rev 6:198206.[CrossRef][ISI][Medline]
Elberling C, Bak C, Kofoed B, Lebech J, Saermark K (1982) Auditory magnetic fields from the human cerebral cortex. Location and strength of an equivalent current dipole. Acta Neurol Scand 65:553569.[ISI][Medline]
Eulitz C, Diesch E, Pantev C, Hampson S, Elbert T (1995) Magnetic and electric brain activity evoked by the processing of tone and vowel stimuli. J Neurosci 15:27482755.[Abstract]
Fitch RH, Miller S, Tallal P (1997) Neurobiology of speech perception. Annu Rev Neurosci 20:331353.[CrossRef][ISI][Medline]
Galaburda AM (1993) Neuroanatomic basis of developmental dyslexia. Neurol Clin 11:161173.[ISI][Medline]
Galaburda AM (1994) Developmental dyslexia and animal studies: at the interface between cognition and neurology. Cognition 50:133149.[CrossRef][ISI][Medline]
Galaburda AM, Sherman GF, Rosen F, Aboitiz N, Geschwind N (1985) Developmental dyslexia: four consecutive patients with cortical anomalies. Ann Neurol 18:222233.[CrossRef][ISI][Medline]
Gootjes L, Raij T, Salmelin R, Hari R (1999) Left-hemisphere dominance for processing of vowels: a whole-scalp neuromagnetic study. Neuroreport 10:29872991.[ISI][Medline]
Hall D, Hart H, Johnsrude I (2003) Relationships between human auditory cortical structure and function. Audiol Neurootol 8:118.[CrossRef][ISI][Medline]
Hari R (1990) The neuromagnetic method in the study of the human auditory cortex. In: Auditory evoked magnetic fields and potentials: advances in audiology (Grandori F, Hoke M, Romani G, eds), pp. 222282. Basel: S. Karger.
Hari R, Kiesilä P (1996) Deficit of temporal auditory processing in dyslexic adults. Neurosci Lett 205:138140.[CrossRef][ISI][Medline]
Helenius P, Salmelin R, Richardson U, Leinonen S, Lyytinen H (2002a) Abnormal auditory cortical activation in dyslexia 100 ms after speech onset. J Cogn Neurosci 15:603617.[CrossRef]
Helenius P, Salmelin R, Service E, Connolly JF, Leinonen S, Lyytinen H (2002b) Cortical activation during spoken-word segmentation in nonreading-impaired and dyslexic adults. J Neurosci 22:29362944.
Hynd GW, Semrud-Clickman M, Larys AR (1990) Brain morphology in developmental dyslexia and attention deficit disorder/hyperactivity. Arch Neurol 47:919926.[Abstract]
Hämäläinen M, Hari R, Ilmoniemi R, Knuutila J, Lounasmaa O (1993) Magnetoencephalography theory, instrumentation, and applications to noninvasive studies of the working human brain. Rev Modern Phys 65:413497.[CrossRef][ISI]
Iivonen A, Laukkanen AM (1993) Explanations for the qualitative variation of Finnish vowels. In: Studies in logopedics and phonetics 4 (Iivonen A, Lehtihalmes M, eds.), pp. 2954. Helsinki: University of Helsinki.
Kaukoranta E, Hari R, Lounasmaa OV (1987) Responses of the human auditory cortex to vowel onset after fricative consonants. Exp Brain Res 69:1923.[ISI][Medline]
Klatt D (1980) Software for a cascade/parallel formant synthesizer. J Acoust Soc Am 67:971995.[ISI]
Kuhl P (2000) A new view of language acquisition. Proc Natl Acad Sci USA 97:1185011857.
Kuriki S, Murase M (1989) Neuromagnetic study of the auditory responses in right and left hemispheres of the human brain evoked by pure tones and speech sounds. Exp Brain Res 77:127134.[ISI][Medline]
Leinonen S, Müller K, Leppänen PHT, Aro M, Ahonen T, Lyytinen H (2001) Heterogeneity in adult dyslexic readers: relating processing skills to the speed and accuracy of oral text reading. Read Writ 14:265296.[CrossRef][ISI]
Leonard CM, Voeller KKS, Lombardino LJ, Morris MK, Hynd GW, Alexander AW, Andersen HG, Garofalakis M, Honeyman JC, Mao J, Agee OF, Staab EV (1993) Anomalous cerebral structure in dyslexia revealed with MRI. Arch Neurol 50:461469.[Abstract]
Liegeois-Chavel C, Musolino A, Badier JM, Marquis P, Chauvel P (1994) Evoked potentials recorded from the auditory cortex in man: evaluation and topography of the middle latency components. Electroencephalogr Clin Neurophysiol 92:204214.[CrossRef][ISI][Medline]
Lütkenhöner B, Steinstrater O (1998) High-precision neuromagnetic study of the functional organization of the human auditory cortex. Audiol Neurootol 3:191213.[CrossRef][ISI][Medline]
Mäkelä J, Hari R (1992) Neuromagnetic auditory evoked responses after a stroke in the right temporal lobe. Neuroreport 3:9496.[ISI][Medline]
Mäkelä JP, Ahonen A, Hämäläinen M, Hari R, Ilmoniemi R, Kajola M, Knuutila J, Lounasmaa OV, McEvoy L, Salmelin R, Salonen O, Sams M, Simola J, Tesche C, Vasama J-P (1993) Functional differences between auditory cortices of the two hemispheres revealed by whole-head neuromagnetic recordings. Hum Brain Mapp 1:4856.[CrossRef]
McCrory E, Frith U, Brunswick N, Price C (2000) Abnormal functional activation during a simple word repetition task: a PET study of adult dyslexics. J Cogn Neurosci 12:753762.
Mody M, Studdert-Kennedy M, Brady S (1997) Speech perception deficits in poor readers: auditory processing or phonological coding? J Exp Child Psychol 64:199231.[CrossRef][ISI][Medline]
Näätänen R (1992) Attention and brain function. Hillsdale, NJ: Erlbaum.
Näätänen R, Lehtokoski A, Lennes M, Cheour M, Huotilainen M, Iivonen A, Vainio M, Alku P, Ilmoniemi R, Luuk A, Allik J, Sinkkonen J, Alho K (1997) Language specific phoneme representations revealed by electric and magnetic brain responses. Nature 385:432434.[CrossRef][ISI][Medline]
Nagarajan S, Mahncke H, Salz T, Tallal P, Roberts T, Merzenich MM (1999) Cortical auditory signal processing in poor readers. Proc Natl Acad Sci USA 96:64836488.
Oe H, Kandori A, Yamada N, Miyashita T, Tsukada K, Naritomi H (2002) Interhemispheric connection of auditory neural pathways assessed by auditory evoked magnetic fields in patients with fronto-temporal lobe infarction. Neurosci Res 44:483488.[CrossRef][ISI][Medline]
Penhune VB, Zatorre RJ, MacDonald JD, Evans AC (1996) Interhemispheric anatomical differences in human primary auditory cortex: probablistic mapping and volume measurement from magnetic resonance scans. Cereb Cortex 6:661672.[Abstract]
Phillips C (2001) Levels of representation in the electrophysiology of speech perception. Cogn Sci 25:711731.[CrossRef][ISI]
Phillips C, Pellathy T, Maranz A, Yellin E, Wexler K, Poeppel D, McGinnis M, Roberts T (2000) Auditory cortex accesses phonological categories: an MEG mismatch study. J Cogn Neurosci 12:10381055.
Rauschecker J (1998) Cortical processing of complex sounds. Curr Opin Neurobiol 8:516521.[CrossRef][ISI][Medline]
Rauschecker J, Tian B, Hauser M (1995) Processing of complex sounds in the macaque nonprimary auditory cortex. Science 268:111114.[ISI][Medline]
Renvall H, Hari R (2002) Auditory cortical responses to speech-like stimuli in dyslexic adults. J Cogn Neurosci 14:757768.
Rumsey JM, Andreason P, Zametkin AJ, Aquino T, King C, Hamburger SD, Pikus A, Rapoport JL, Cohen R (1992) Failure to activate the left temporal cortex in dyslexia: an oxygen 15 positron emission tomographic study. Arch Neurol 49:527534.[Abstract]
Rumsey JM, Donohue BC, Brady DR, Nace K, Giedd GN, Andreason P (1997) A magnetic resonance imaging study of planum temporale asymmetry in men with developmental dyslexia. Arch Neurol 54:14811489.[Abstract]
Salmelin R, Schnitzler A, Parkkonen L, Biermann K, Helenius P, Kiviniemi K, Kuukka K, Schmitz F, Freund H (1999) Native language, gender, and functional organization of the auditory cortex. Proc Natl Acad Sci USA 96:1046010465.
Schulte-Körne G, Deimel W, Bartling J, Remschmidt H (2000) Speech perception deficit in dyslexic adults as measured by mismatch negativity (MMN). Int J Psychophysiol 40:7787.[ISI]
Shankweiler D, Crain S, Katz L, Fowler A, Liberman A, Brady S, Thornton R, Lundquist E, Dreyer L, Fletcher J, Stuebing K, Shaywitz S, Shaywitz B (1995) Cognitive profiles of reading-disabled children: comparison of language skills in phonology, morphology, and syntax. Psychol Sci 6:149156.[ISI]
Shtyrov Y, Kujala T, Palva S, Ilmoniemi R, Näätänen R (2000) Discrimination of speech and of complex nonspeech sounds of different temporal structure in the left and right cerebral hemispheres. Neuroimage 12:657663.[CrossRef][ISI][Medline]
Stevens KN (1980) Acoustic correlates of some phonetic categories. J Acoust Soc Am 68:836842.[ISI][Medline]
Studdert-Kennedy M, Mody M (1995) Auditory temporal perception deficits in the reading-impaired: a critical review of the evidence. Psychon Bull Rev 2:508514.[ISI]
Suga N, O'Neill WE, Manabe T (1978) Cortical neurons sensitive to combinations of information-bearing elements of biosonar signals in the mustache bat. Science 200:778781.[ISI][Medline]
Tallal P, Miller S, Fitch R (1993) Neurobiological basis of speech: a case for the preeminence of temporal processing. Ann N Y Acad Sci 14:2747.
Tiitinen H, Sivonen P, Alku P, Virtanen J, Näätänen R (1999) Electromagnetic recordings reveal latency differences in speech and tone processing in humans. Cogn Brain Res 8:355363.[CrossRef][ISI][Medline]
Vihla M, Salmelin R (2003) Hemispheric balance in processing attended and non attended vowels and complex tones. Cogn Brain Res 16:167173.[CrossRef][ISI][Medline]
Vihla M, Lounasmaa O, Salmelin R (2000) Cortical processing of change detection: dissociation between natural vowels and two-frequency complex tones. Proc Natl Acad Sci USA 97:1059010594.
Vouloumanos A, Kiehl K, Werker J, Liddle P (2001) Detection of sounds in the auditory stream: event-related fMRI evidence for differential activation to speech and nonspeech. J Cogn Neurosci 13:9941005.
Wechsler D (1981) Wechsler adult intelligence scale revised: manual. New York: Psychological Corporation. [Finnish translation, Psykologien Kustannus Oy, 1992.]
Wechsler D (1987) Wechsler memory scale revised: manual. New York: Psychological Corporation. [Finnish translation, Psykologien Kustannus Oy, 1997.]
Wessinger C, Van Meter J, Tian B, Van Lare J, Pekar J, Rauschecker J (2001) Hierarchical organization of the human auditory cortex revealed by functional magnetic resonance imaging. J Cogn Neurosci 13:17.
Wiik K (1965) Finnish and English vowels. Turku: University of Turku.
Wolf M (1986) Rapid alternating stimulus naming in the developmental dyslexias. Brain Lang 27:360379.[CrossRef][ISI][Medline]
Wolf M, Obregon M (1992) Early naming deficits, developmental dyslexia, and a specific deficit hypothesis. Brain Lang 42:219247.[CrossRef][ISI][Medline]
Zatorre R, Evans A, Meyer E, Gjedde A (1992) Lateralization of phonetic and pitch discrimination in speech processing. Science 256:846849.[ISI][Medline]
Schormann T, Henn S, Zilles K (1996) A new approach to fast elastic alignment with applications to human brains. Lecture notes in Comput Sci 1131:337342.[ISI]
Woods RP, Grafton ST, Holmes CJ, Cherry SR, Mazziotta JC (1998a) Automated image registration: I: General methods and intrasubject, intramodality validation. J Comp Assist Tomogr. 22:139152.[CrossRef][ISI][Medline]
Woods RP, Grafton ST, Watson JDG, Sicotte NL, Mazziotta JC, (1998b) Automated image registration: II. Intersubject validation of linear and nonlinear models. J Comput Assist Tomogr. 22:153165.[CrossRef][ISI][Medline]
|