1 Cognitive Neurophysiology Laboratory, Program in Cognitive Neuroscience and Schizophrenia, Nathan S. Kline Institute for Psychiatric Research, 140 Old Orangeburg Road, Orangeburg, NY 10962, USA, 2 Department of Psychology, City College of the City University of New York, New York, NY 10031, USA, 3 Department of Psychiatry, New York University School of Medicine, 550 First Avenue, New York, NY 10016, USA, 4 Department of Psychiatry and Behavioral Science, The Albert Einstein College of Medicine, Bronx, NY 10461, USA, 5 Department of Neuroscience, The Albert Einstein College of Medicine, Bronx, NY 10461, USA
![]() |
Abstract |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Key Words: auditory, electrical mapping, humans, multisensory, object recognition, visual
![]() |
Introduction |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
The various sensory energies that originate from a single object usually provide complementary and/or redundant information about that objects identity. Oftentimes, object information entering the nervous system through any single sense is degraded due to noise, such as the partial occlusion or camouflage of a visual object, or the hum of a vent masking speech. Clearly the re-combination of these potentially redundant and/or complementary inputs can benefit, and at times may even be essential to, accurate and timely object-recognition. Nevertheless, multisensory object-recognition has not been extensively examined outside the domain of speech perception. Tellingly, multisensory effects on speech perception are abundant and often profound. For instance, viewing articulatory gestures can dramatically affect the perception of a speech sound (e.g. McGurk and MacDonald, 1976), or enhance speech perception in a noisy environment (Sumby and Pollack, 1954
; Campbell and Dodd, 1980
; MacLeod and Summerfield, 1990
; Thompson, 1995
).
Functional imaging and event-related potential (ERP) studies of multisensory speech perception have detailed a network of cortical areas that plays a role in the integration of auditoryvisual speech (e.g. Calvert et al., 1999, 2000). However, it is probable that this circuit for speech perception is a relatively specialized one, and that the cortical network involved in integrating information from the different sensory systems depends on the class of multisensory objects/events that are being considered (e.g. animals vs speech), as well as the specific demands of the tasks that are being performed.
To advance understanding of multisensory object-recognition, the present study examined the combined influence of visual and auditory inputs on the recognition of animals. High-density electrical mapping of the ERP was used to assess brain activity while participants performed an object-recognition task in which they were required to make a speeded button press to the occurrence of a target animal (one out of a possible eight) in the visual and/or auditory sensory modality. Both reaction times (RT) and error rates (accuracy) were computed as measures of performance. Stimuli were either unisensory (visual or auditory) or bisensory (visual plus auditory). The bisensory stimuli had visual and auditory elements that either belonged to the same animal or belonged to different animals.
To behaviorally assess the combined effect of visual and auditory information on object-recognition, we compared performance on the bisensory targets to performance on the unisensory targets. We expected that the neural interaction of visual and auditory information from the same object would facilitate performance compared to unisensory targets (visual unisensory targets and auditory unisensory targets). In contrast, we expected that the neural interaction of visual and auditory information that belonged to different objects might interfere with object-recognition processes, and result in poorer performance compared to unisensory targets.
Physiologically, we predicted that multisensory facilitation of object processing would be mediated in part by the multisensory modulation of sensory-specific object-recognition processes. In particular, we considered that object-recognition would be a visually dominated function for the class of objects used in the present design. Based on this proposal, we believed that the likeliest brain structures to mediate these multisensory object-recognition processes would be found in the ventral visual stream, which is well known for its role in object processing (e.g. Ungerleider and Mishkin, 1982; Allison et al., 1999
; Doniger et al., 2000
). These object processing functions have been extensively investigated through both functional imaging and ERP studies, and object processing in humans has been specifically associated with neuronal activation in a cluster of brain regions in the ventral visual stream known as the lateral occipital complex (LOC; e.g. Kohler et al., 1995
; Malach et al., 1995
; Puce et al., 1996
, 1999; Kanwisher et al., 1997
; Allison et al., 1999
; Haxby et al., 1999
; Ishai et al., 1999
; Doniger et al., 2000
, 2001, 2002; James et al., 2002a
; Lerner et al., 2002
; Murray et al., 2002
). An ERP signature of the processing of the features of visual objects in the ventral visual stream may be found in the N1 component of the visual evoked potential (or one or more subcomponents of the N1 complex). Inverse source modeling has revealed that neuronal activity in ventral visual cortex, and in some cases specifically in LOC, substantially contributes to the generation of the visual N1 (Anllo-Vento et al., 1998
; Di Russo et al., 2002
, 2003; Murray et al., 2002
; Schweinberger et al., 2002
). We should note that source modeling also shows a contribution from the dorsal visual stream to the visual N1 (e.g. Di Russo et al., 2002
, 2003). Further, this component has been implicated in ventral stream processing functions. It has, for example, been repeatedly shown to reflect visual processing of the structural features of objects (e.g. Bentin et al., 1999
; Tanaka et al., 1999
; Eimer, 2000
; Rossion et al., 2000
; Murray et al., 2002
; but see Vogel and Luck, 2000
).
We hypothesized that object-identity information provided in the auditory sensory modality would have its effect in visual object-recognition areas, with the presence of features that defined an auditory object affecting processing of visual features of the same object. Given the role of the visual N1 in the processing of visual features, and findings of auditory-based modulation of the visual N1 during bisensory detection and classification tasks (Giard and Peronnet, 1999; Molholm et al., 2002
), it was predicted that the co-occurrence of object congruent visual and auditory elements would result in the multisensory modulation of the visual N1 component of the ERP. The visual N1 is considered to represent relatively early processing, showing a characteristic scalp topography over lateral occipital cortices that typically peaks between 140 and 200 ms.
Visual and auditory selective attention componentry were anticipated in our electrophysiological data as we expected that in performing the object-recognition task, subjects would selectively attend to the visual and auditory features of the target object (e.g. for a block of trials on which the designated target was dog, selectively attending to physical attributes specific to the picture of the dog and the sound of the dog). As such, we expected to record the so-called selection negativity (SN), a component of the visual evoked potential that is elicited under circumstances where subjects selectively attend to relevant visual features (e.g. Anllo-Vento and Hillyard, 1996; Harter and Aine, 1984
; Kenemans et al., 1993
; Smid et al., 1999
) [in ERPs, non-spatial visual selective attention effects are readily discerned in occasionally occurring targets (Woods and Alain, 2001
)], and we expected that the same would be the case for auditory selective attention effects. and the so-called negative difference wave (Nd), an auditory evoked potential that is elicited under circumstances where subjects selectively attend to relevant auditory features (e.g. Hansen and Hillyard, 1980
).
Multisensory modulation of the visual N1 was expected in turn to affect subsequent feature based processing in the ventral visual stream, resulting in modulation of the SN. In contrast, and in keeping with our notion that multisensory effects would be biased towards the visual system under the stimulus and task conditions of the present experiment, we predicted that equivalent auditory selective attention effects, as indexed by the Nd component, would not be substantially modulated by multisensory object-recognition processes.
![]() |
Materials and Methods |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Fourteen neurologically normal, paid volunteers participated (mean age 23.6 ± 6.2 years; six female; all right-handed). All reported normal hearing and normal or corrected-to-normal vision. The Institutional Review Board of the Nathan Kline Institute for Psychiatric Research approved the experimental procedures, and each subject provided written informed consent. Data from two additional subjects were excluded, one for excessive blinking, and the other for failure to perform the task adequately.
Stimuli
There were four basic stimulus types, each presented equiprobably: (i) sounds alone; (ii) pictures alone; (iii) paired pictures and sounds belonging to the same object; (iv) paired pictures and sounds belonging to different objects. In all there were 80 stimuli: eight sounds, eight pictures, and 64 pairings of each of the sound and picture stimuli.
Pictures
There were eight line drawings of animals from Snodgrass and Vanderwart (1980), standardized on familiarity and complexity. These were of a dog, chimpanzee, cow, sheep, chicken, bird, cat and frog. They were presented on a 21 inch computer monitor located 143 cm in front of the subject, and were black on a gray background. The images subtended an average of 4.8° of visual angel in the vertical plane and 4.4° of visual angle in the horizontal plane. These were presented for a duration of 340 ms.
Sounds
There were eight complementary animal sounds, adapted from Fabiani et al. (1996). These sounds were of uniquely identifiable vocalizations corresponding to the eight animal drawings. These were modified such that each had a duration of 340 ms, and were presented over two JBL speakers at a comfortable listening level of
75 dB SPL.
Procedure
Participants were seated in a comfortable chair in a dimly lit and electrically shielded (Braden Shielding Systems) room and asked to keep head and eye movements to a minimum, while maintaining central fixation. Eye position was monitored with horizontal and vertical electro-oculogram (EOG) recordings. Subjects were first presented with the stimuli and asked to identify them. The sounds were presented first, followed by the pictures and finally the soundpicture pairs. All subjects easily identified the sounds and pictures as the animals intended by the experimenter.
During the experiment, subjects performed an animal detection task (e.g. During this block, press the button to the cow). They were instructed to make a button press response with their right index finger to the occurrence of a target, whether in the visual sensory modality, the auditory sensory modality, or both; it was further clarified that they were to also respond to bisensory trials in which only the visual or only the auditory element was a target. Five target stimulus types and four non-target stimulus types were derived from the four basic stimulus types (see Stimuli). These are delineated here, and in Table 1 for quick reference. The five target stimulus types were as follows: visual target (V+; e.g. a picture of a cow), auditory target (A+; e.g. the lowing sound of a cow), a picture and sound pair in which only the picture was a target (V+A; e.g. a picture of a cow and a dog bark), a picture and sound pair in which only the sound was a target (VA+; e.g. a picture of a chimpanzee and a cow lowing), and a picture and sound pair in which both were targets (V+A+; e.g. a picture of a cow and a cow lowing). The five target stimulus types were presented equiprobably, and targets occurred on 15.6% of trials. Each animal served as the target in two of 16 blocks, in randomized order both within and between subjects, with the exception that the full set of animals served as targets within the first eight blocks. The four non-target stimulus types were as follows: an animal picture (V); an animal sound (A); a paired picture and sound of the same animal (VA congruent); and a paired picture of one animal and sound of another animal (VA incongruent). Non-targets occurred on 84.4% of trials, with the VA incongruent stimulus type occurring at a slightly lower probability than the remaining three non-target stimulus types. This small decrease in probability was because two target types (V+A and VA+) instead of one were drawn from the same pool of basic stimuli as the VA incongruent stimuli. This, along with probabilities of the different stimulus types, is depicted in Figure 1. It should also be noted that the 16 elements that made up the stimuli, the eight animal pictures and the eight animal sounds, were presented equiprobably. Each block contained 56 instances of each basic stimulus type such that a full set of mismatching stimulus pairs was presented in each block (eight visual animals x seven mismatching auditory animal sounds). Stimulus onset asynchrony varied randomly between 750 and 3000 ms. A total of 16 blocks were presented. Breaks were encouraged between blocks to maintain high concentration and prevent fatigue. See Figure 2 for a schematic of the experimental paradigm.
|
|
|
Continuous EEG was acquired from 128 scalp electrodes (impedances < 5 kW), referenced to the nose, band-pass filtered from 0.05 to 100 Hz, and digitized at 500 Hz. The continuous EEG was divided into epochs (100 ms pre- to 500 ms post-stimulus onset). Trials with blinks and eye movements were automatically rejected off-line on the basis of the EOG. An artifact criterion of ±60 µV was used at all other scalp sites to reject trials with excessive EMG or other noise transients. The average number of accepted sweeps per non-target was 670, and per target was 95. EEG epochs were sorted according to stimulus type and averaged from each subject to compute the ERP. Baseline was then defined as the epoch from 100 ms to stimulus onset. Separate group-averaged ERPs for each of the stimulus types were calculated for display purposes and for identification of the visual N1, and the visual and auditory selective attention components the SN and Nd. Button press responses to the five target stimuli were acquired during the recording of the EEG and processed offline. Responses falling between 250 and 950 ms post stimulus onset were considered valid. This window was used so that a response could only be associated with a single trial.
Statistical Approach
Behavioral Analyses
For individual subjects, the per cent hits and average reaction time (RT) were calculated, and RT distributions were recorded, for each of the five types of target stimuli. To test for an effect of Target on RT, a one-way repeated measures analysis of variance (ANOVA) with five levels (A+, V+, VA+, V+A, and V+A+) was performed. A significant effect was unraveled with Tukey HSD tests. The same statistical analyses were performed to test for an effect of Target on per cent hits (i.e. accuracy).
Enhanced object-recognition processes, due to the simultaneous presentation of visual and auditory elements that belonged to the same object, were expected to be indexed by a significant improvement in performance for the V+A+ targets when compared to the unisensory targets (V+ targets and A+ targets). Interference with object-recognition processes, due to the simultaneous presentation of visual and auditory elements that belonged to different objects, would be indexed by a significant decrement in performance for the V+A targets and VA+ targets when compared to, respectively, the V+ targets and the A+ targets.
RT facilitation for the V+A+ targets was followed by a test of the race model. This is because RT facilitation under the present conditions can be accounted for by one of two classes of models: race models or coactivation models (see Miller, 1982). In race models each constituent of a pair of redundant targets independently competes for response initiation, and the faster of the two mediates the response for any given trial, resulting in the so-called redundant target effect (RTE). According to this model, probability summation produces the RTE, since the likelihood of either of the two targets yielding a fast RT is higher than that from one alone. By this account, the RTE need not occur because of any nonlinear neural interactions (i.e. multisensory interactions in this case). In contrast, in coactivation models, it is the interaction of neural responses to the simultaneously presented targets that facilitates response initiation and produces the RTE. It was therefore tested whether the RTE exceeded the statistical facilitation predicted by the race model (Miller, 1982
). When the RTE is shown to exceed that predicted by the race model, the coactivation model is invoked to account for the facilitation. Such violation of the race model, thus, would provide evidence that object information from the visual and auditory sensory modalities interacted to produce the RT facilitation.
In the test of the race model, the V+A+ targets were compared to the V+A targets and VA+ targets. The race model places an upper limit on the cumulative probability (CP) of RT at a given latency for stimulus pairs with redundant targets. For any latency, t, the race model holds when this CP value is less than or equal to the sum of the CP from each of the single target stimuli (the bisensory targets in which only the visual or auditory element corresponded to the target minus an expression of their joint probability (CP(t)V+A+) < ((CP (t)V+A + CP(t)VA+) (CP(t)V+A x CP(t)VA+)). For each subject the RT range within the valid RTs (250950 ms) was calculated over the three target types (V+A+, V+A and VA+) and divided into quantiles from the fifth to the hundredth percentile in 5% increments (5%, 10%,..., 95%, 100%). t-tests comparing the actual facilitation (CP(t)V+A+) and facilitation predicted by the race model [(CP(t)V+A + CP(t)VA+) (CP(t)V+A x CP(t)VA+)] were performed on quantiles that exhibited violation of the race model, to assess the reliability of the violations across subjects. Violations were expected to occur for the quantiles representing the lower end of the RTs, because this is when it was most likely that interactions of the visual and auditory inputs would result in the fulfillment of a response criterion before either source alone satisfied the same criterion (Miller, 1982; for recent application of Millers test of the race model, see Molholm et al., 2002
; Murray et al., 2002
).
Electrophysiological Analyses
The following approach was taken to constrain the analyses performed, without reference to the dependent variables. First, our primary hypothesis was that object information in the auditory sensory-modality would result in modulation of visual object-processing as represented by the N1 component. Thus, we used the response to unisensory visual stimulation (V+ and V) to define the latency window and scalp-sites of maximal amplitude for this component a priori, before assessing whether multisensory effects were present or not. The latency window and scalp sites were consistent with our previous studies of the visual N1 evoked by these pictorial stimuli (see Doniger et al., 2000, 2001, 2002; Foxe et al., 2001
).
A similar strategy was employed to identify latency windows and scalp sites for tests of the selection negativity (SN) and the negative difference wave (Nd). In the case of the SN, unisensory visual targets (V+) were compared with unisensory visual non-targets (V), which allowed us to define the timecourse and topography of the SN, independent of putative multisensory effects during the bisensory conditions. Similarly, the Nd was defined by comparing unisensory auditory targets (A+) to unisensory non-targets (A). Subsequently, these predefined windows and scalp-sites were used to make measures from all relevant multisensory conditions and these data were entered as the dependent measures into our ANOVAs.
It should be noted, however, that while the use of broadly defined component peaks is a good means of limiting the number of statistical tests that are conducted, these components often represent the activity of many simultaneously active brain generators at any given moment (e.g. Foxe and Simpson, 2002). As such, effects may not necessarily be coincident with a given component peak, especially in the scenario that only certain brain generators are affected by a given experimental condition. Thus, limiting the analysis to a set of discrete component peaks represents a very conservative approach to the analysis of high-density ERP data.
For the main ERP statistical analyses, only responses elicited by the bisensory stimuli were considered. This allowed us to examine ERP effects as a function of the object congruency of the visual and auditory elements. Recall that when the cow was the target, the possible bisensory combinations were: (i) V+A+ (e.g. a picture of a cow and the lowing of a cow); (ii) V+A (e.g. a picture of a cow and the barking of a dog); (iii) VA+ (e.g. a picture of a chimpanzee and the lowing of a cow); (iv) VA congruent (a picture of a dog and the barking of a dog); and (v) VA incongruent (a picture of a chicken and the croaking of a frog).
Multisensory Object-recognition Effects over Posterior Scalp in the Latency Range of the Visual N1. It was hypothesized that visual object-recognition processes in the ventral visual stream would be affected by the co-occurrence of visual and auditory elements that belonged to the same object, and that this would be initially reflected by modulation of the visual N1. Multisensory object-recognition effects on the mean amplitude of the ERPs over a 20 ms window, centered at the peak of the visual N1 (defined in the unisensory response), were tested with three-way repeated measures ANOVA. The factors were Stimulus (five levels: V+A+, V+A, VA+, VA congruent and VA incongruent), Electrode (three electrodes that best represented the visual N1 distribution over each hemisphere), and Hemisphere (left and right). When Main effects were uncovered, protected ANOVAs were conducted to unravel the effects. For these and all the following statistical tests, GeisserGreenhouse corrections were used in reporting P values when appropriate and the alpha level for significance was set at less than 0.05.
Localizing the Underlying Neural Generators of the Multisensory Object-recognition Effect. Information about the intracranial generators contributing to the multisensory N1-effect was obtained using two methods. The first was scalp current density (SCD) topographic mapping, as implemented in the Brain Electrical Source Analysis (BESA, Ver. 5.0) multimodal neuroimaging analysis software package (MEGIS Software GmbH, Munich, Germany). SCD analysis takes advantage of the relationship between local current density and field potential defined by Laplaces equation; in SCD analysis the second spatial derivative of the recorded potential is calculated, which is directly proportional to the current density. This method eliminates the contribution of the reference electrode and reduces the effects of volume conduction on the surface recorded potential that is caused by tangential current flow of dispersed cortical generators. This allows for better visualization of the approximate locations of intracranial generators that contribute to a given scalp recorded ERP.
The second was the method of dipole source analysis, also implemented through BESA 5.0. BESA models the best-fit location and orientation of multiple intracranial dipole generator configurations to produce the waveform observed at the scalp, using iterative adjustments to minimize the residual variance between the solution and the observed data (see, for example, Scherg and Von Cramon, 1985; Simpson et al., 1995
). For the purpose of the modeling, an idealized three-shell spherical head model with a radius of 85 mm and scalp and skull thickness of, respectively, 6 and 7 mm was assumed. The genetic algorithm module of BESA 5.0 was used to free fit a single dipole to the peak amplitude of the multisensory N1-effect. This was carried out first on the peak of the effect (within the tested latency window), and then across the whole of the tested latency window. This initial dipole was fixed and additional dipoles were successively free fit to assess if they improved the solution. Group averaged ERP data were used to maintain the highest possible signal-to-noise ratio as well as to generalize our results across individuals. The fit of the dipoles were constrained to the gray matter of the cortex. We should point out that in dipole analysis, each of the modeled equivalent current dipoles represents an oversimplification of the activity in the areas, and therefore each should be considered as representative of centers of gravity and not necessarily discrete neural locations (Murray et al., 2002
; Dias et al., 2003
; Foxe et al., 2003
).
Multisensory Object-recognition Effects on the Visual Selective Attention Component, the SN. The multisensory object-recognition effect on the processing of objects at the feature level was expected to be passed on from ventral visual object-recognition processes underlying the N1 to the ventral visual object-recognition processes underlying the SN, a negative going potential over occipital scalp that is elicited by relevant visual stimuli when relevance is defined on the basis of a non-spatial feature(s) (Harter and Aine, 1984; Kenemans et al., 1993
; Anllo-Vento and Hillyard, 1996
; Smid et al., 1999
). To establish the presence of the SN, a three-way repeated measures ANOVA on the mean ERP amplitudes over lateral occipital scalp, for a 30 ms window centered at the midpoint of the SN (as defined in the difference wave of the unisensory visual target (V+) and non-target (V) responses), was performed. This ANOVA had factors of Stimulus (five: V+A+, V+A-, VA+, VA congruent and VA incongruent), Hemisphere (right and left), and Electrode (three scalp sites over lateral occipital scalp). A significant effect of Stimulus was followed up with two planned three-way repeated measures ANOVAs with factors of Stimulus (two), Hemisphere (right and left), and Electrode (three). It was expected that the responses elicited by target stimuli that included a target visual element (V+A+ and V+A responses) would be significantly more negative going than the response elicited by non-target stimuli (VA congruent and VA incongruent responses). (The two bisensory non-target waveforms were averaged for these tests; non-target classes that are equivalently different from the targets should not differ with respect to selective attention effects.) To test for a multisensory object-recognition effect on the SN we similarly compared the V+A+ response and the V+A response. Both these responses should include the unisensory visual selective attention effects, while only the V+A+ response should include any multisensory object-recognition effects on visual selective attention processes. In addition, this difference should include an Nd. SCD mapping was used to spatially dissociate the multisensory SN-effect from the Nd.
Multisensory Object-recognition Effects on the Auditory Selective Attention Component, the Nd. Multisensory object-recognition effects were also tested on the Nd, a negative going potential over fronto-central/central scalp that is elicited by relevant auditory stimuli when relevance is defined along a physical dimension(s) (e.g. Hillyard et al., 1973; Näätänen et al., 1978
; Hansen et al., 1983
; Näätänen, 1992
). To establish the presence of the Nd, a three-way repeated measures ANOVA on the mean ERP amplitudes over fronto-central/central scalp, for a 30 ms window centered at the midpoint of the Nd (as defined in the difference wave of the unisensory auditory target (A+) and non-target (A) responses), was performed. This ANOVA had factors of Stimulus (five: V+A+, V+A, VA+, VA congruent and VA incongruent), Hemisphere (right and left) and Electrode (three scalp-sites). A significant effect of Stimulus was followed up with two three-way repeated measures ANOVAs, with factors of Stimulus (two), Hemisphere (right and left), and Electrode (three). It was expected that each of the responses elicited by stimuli that included a target auditory element (the V+A+ and VA+ responses) would be significantly more negative going than the response elicited by the non-targets (VA congruent and VA incongruent responses; see footnote 3). To test for a multisensory object-recognition effect on the Nd, we similarly compared the V+A+ response and the VA+ response. Both these responses should include the unisensory auditory selective attention effects, while only the V+A+ response would include any multisensory object-recognition effects on auditory selective attention processes.
![]() |
Results |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
In the object-recognition task, both reaction times and accuracy rates (per cent hits) were affected by Target Type (see Table 2). Consistent with the hypothesis that object-recognition was enhanced by the co-occurrence of visual and auditory elements that belonged to the same object, object-recognition for the V+A+ targets was superior to object-recognition for the other targets, with the fastest mean reaction time and highest mean hit rate. Visually based object-recognition was better and faster than auditory based object-recognition, with faster mean reaction times and higher mean percents hits for targets that included a target visual element. Overall, the false alarm rate was low at 2%; false alarms were more or less evenly distributed among the non-target stimulus types.
|
Mean reaction times were substantially different among the target types (see Fig. 3 and Table 2), with the difference between the shortest and longest reaction time greater than 130 ms. A main effect of Target [F(4,52) = 90.16, P < 0.001] was followed up with Tukey HSD comparisons. These revealed significant RT differences between all the target stimuli except the V+ targets and the V+A targets, and the A+ targets and the VA+ targets (see Table 3). The mean RT to the V+A+ targets was significantly faster than the mean RTs to all the other targets, consistent with the hypothesis that targets were more rapidly recognized when the visual and auditory elements belonged to the same object. The comparisons also revealed that the RTs to the targets so defined by their visual element were significantly faster than the RTs to the target stimuli so defined by their auditory element.
|
|
The race model was reliably violated for four successive quantiles in the early portion of the reaction time distribution (the third through the sixth: see Table 4 and Fig. 3, middle and right panels). Thus the reaction-time facilitation for the V+A+ target was not due to statistical summation, but rather to the neural interaction of visual and auditory object information.
|
Per cent hits also differed considerably among the target types, with a difference of 15% between the highest and the lowest hit rates (see Table 2). The pattern of differences generally paralleled those of the reaction times. For example, by both measures, the best performance was for the V+A+ targets and the worst performance was for the VA+ targets. There was a main effect of Target type [F(4,52) = 17.96, P < 0.001]. Tukey HSD comparisons revealed that the per cent hits to the targets so defined by their visual element were significantly higher than the per cent hits to the target stimuli so defined by their auditory element (see Table 3).
Interference effects for bisensory targets with visual and auditory elements that belonged to different objects were not apparent in our data, with a lack of significant differences between the V+A targets and V+ targets, and the VA+ targets and A+ targets, by both the RT and accuracy measures (see Table 3).
Electrophysiological Results
The group averaged electrophysiological responses elicited by the five bisensory stimuli are displayed in Figure 4. The responses showed classic visual and auditory sensory componentry. These included typical auditory components P1 peaking at 60 ms and N1 peaking at
118 ms (see Picton et al., 1974
; Vaughan and Ritter, 1970
; Vaughan et al., 1980
); and visual components P1 peaking at
78 and N1 peaking at
150 ms. As would be expected, the auditory P1 and N1 components appeared maximal over fronto-central scalp. The visual P1 appeared maximal over lateral occipital scalp and the visual N1 appeared maximal over posterior-temporal/temporo-occipital scalp (see Fig. 5c).
|
|
The Multisensory Object-recognition Effect in the Latency Range of the Visual N1
In line with our primary hypothesis, that multisensory object-recognition would result in the modulation of visual object-recognition processes, starting at 145 ms the response elicited by the V+A+ targets appeared more negative-going than the responses elicited by the other bisensory conditions (see Fig. 4h and Fig. 5a). This net negative modulation fell within the latency range and general scalp region of the visual N1. Topographic mapping revealed a distribution over right lateral occipital scalp (Fig. 4a) that was posterior to the topography of the N1 proper, seen in the V+ response (compare Figs 5a and 5b). Since the peak latency of the unisensory visual N1 was 150 ms, our dependent measure for the tests evaluating the multisensory N1-effect was the mean ERP amplitude in the latency window from 140 to 160 ms.
A repeated measures ANOVA with factors of Stimulus (five), Electrode (three), and Hemisphere (two) resulted in a Stimulus by Hemisphere interaction [F(4,52) = 3.75, P = 0.02]. Three planned comparison three-way ANOVAs with factors of Stimulus (two), Electrode (three) and Hemisphere (two) were conducted to better understand this effect. Each of the ANOVAs compared the responses evoked by two of the stimulus types, one in which the visual and auditory elements belonged to the same object and the other in which the visual and auditory elements belonged to different objects. The V+A+ versus V+A comparison revealed a significant Stimulus by Hemisphere interaction [F(1,13) = 8.08, P = 0.01; see Fig. 5a]. This was because the response to the V+A+ target was significantly more negative-going during the N1 timeframe, over the right hemisphere. The V+A+ target versus VA+ target comparison revealed a main effect of Stimulus [F(1,13) = 8.7, P = 0.01]. As with the previous test, the V+A+ response was significantly more negative-going during the N1 timeframe. The comparison between the VA congruent and VA incongruent responses revealed no significant differences (for the main effect of Stimulus, [F(1,13) = 1.56, P = 0.23)]. These data indicate that when the visual and auditory elements belonged to the same object, and were both targets, there was a significantly more negative-going response over right lateral occipital scalp, in the latency range and general scalp region of the visual N1 component. This effect was not due to the summation of unisensory selective attention effects and/or target effects; this control test is provided in the results subsection, The Multisensory Object-recognition Effects Are Not Due to Summation Effects.
Mapping and Source Analysis of the Multisensory N1-effect
To obtain a more precise estimation of the location of the generators underlying the multisensory N1-effect, we examined SCD maps and performed dipole source modeling on the grand mean V+A+ minus V+A difference wave.
SCD topographic mapping of the multisensory N1-effect revealed a stable source-sink distribution over right lateral occipital scalp (Fig. 5a). This topography is consistent with neural activity in LOC largely accounting for the multisensory effect. Comparison with the SCD topography of the response to the unisensory visual targets (V+) revealed that the multisensory N1-effect was in the same general scalp region as the visual N1 proper, but was clearly posterior in topography (compare Fig. 5a and b). These differences in topography may be due to (i) only some of the generators that contributed to the visual N1 contributing to the multisensory N1-effect (or their differential contribution), (ii) the presence of additional generators contributing to the N1 effect, or (iii) an altogether different set of neural generators accounting for the effect.
To confirm that the N1-effect was generated in LOC, we employed the genetic algorithm module of BESA to model the intracranial generators of the effect. A single dipole was allowed to freely fit to the peak of the N1-effect (156 ms). This resulted in a dipole situated in the right temporo-occipital cortex, consistent with a generator in the general region of right LOC (see Fig. 5c). Talairach coordinates (49, 67.2, 14) for this dipole place it in or about the fusiform gyrus. This single dipole accounted for 73.3% of the variance in the data across all channels at this timepoint. Refitting the dipole at neighboring timepoints resulted in a similar localization and similar levels of explained variance suggesting a stable fit across this timeframe. Therefore, we fixed the dipole in the location found at 156 ms and opened up the fitting window across an epoch from 140 to 160 ms (spanning the bulk of the N1-effect) and allowed the orientation parameter to be freely fit. Explained variance across this larger time-epoch was 63.0%. Finally, having fixed the location and orientation of this right LOC dipole, we added a second freely fitting dipole and allowed it to fit across the 140160 ms time-window. Multiple starting positions were given. This dipole did not find a stable fit and resulted in no more than a 34% increase in explained variance. Addition of a third dipole gave a similar result. That these later dipoles make only marginal contributions to the explained variance suggests that no other robust signals outside of the right LOC are being generated in this timeframe. It is important to point out at this juncture that manually shifting this single dipole within the general region of the right LOC did not substantially change the explained variance. As such, the exact coordinates of this dipole should not be considered as an exact localization to fusiform gyrus per se, but rather as a center-of-gravity for activity that is generated in this general region of the right LOC.
In all, these results from SCD mapping and inverse source modeling are highly consistent with the interpretation that neural generators situated within the LOC of the ventral visual stream largely accounted for the N1-effect.
Multisensory Object-recognition Effects on Visual Selective Processing: the Selection Negativity (SN)
It was expected that selective processing of visual elements that were targets would be reflected in the so-called selection negativity (SN) over lateral occipital scalp, consistent with previous reports. Figure 4fh shows that the responses evoked by the visual targets were more negative-going over occipital scalp than the responses to non-targets, with a maximal difference at 280 ms. This pattern of activity is suggestive of the elicitation of the SN. [An alternative N2 interpretation of the selective attention effects can be ruled out on the basis of the stimulus probability (Näätänen et al., 1982
; for a detailed discussion, see Näätänen, 1992
, pp. 236244,): The N2 is elicited by infrequently occurring stimuli, irrespective of target status; in our experiment each of the visual and auditory elements occurred with equal probabilities, whether targets or non-targets, and thus a differential N2 response is not expected.] Specific to the current multisensory design, we predicted that when the visual and auditory elements belonged to the same object (i.e. V+A+ targets), the SN would be enhanced. Beginning at
210 ms the V+A+ response became clearly more negative-going than the V+A response. This negative difference exhibited a bilateral SCD topographic distribution over lateral occipital scalp (see Fig. 6a) that was highly similar to that of the N1-effect (see Fig. 5a), and extended to
300 ms. In line with our hypothesis, this effect appeared to be an enhancement of the SN (compare the SCD maps in Fig. 6a,b). SN effects were evaluated by repeated measures ANOVA, where the dependent measure was the mean ERP amplitude in the latency window from 265 to 295 ms (based on the SN assessed in the unisensory V+ vs V response).
|
Unexpectedly, the VA+ response was also negative-going with respect to the response elicited by non-targets over lateral occipital scalp [F(1,13) = 7.2, P = 0.02], although to a somewhat lesser extent (see Fig. 4g). SCD topographic mapping showed that the most likely explanation for this negative-going response was volume conduction of the centrally focused Nd, which is associated with selective processing of auditory features (see next section for analysis of the Nd). Consistent with such an explanation, topographic mapping of the unisensory Nd (the A+ response minus the A response) revealed similar volume conduction of the Nd to electrodes over posterior scalp. We would also like to point out that a control test, which is reported two sub-sections below, ensured that the multisensory SN-effect that is described above was not due to volume conduction of the Nd.
Auditory Selective Processing over Central Scalp: the Negative Difference Wave (Nd)
Over central scalp, starting at 180 ms, the response elicited by stimuli that included a target auditory element (i.e. the V+A+ targets and VA+ targets) became more negative-going than the responses elicited by the stimuli without a target auditory element (i.e. V+A targets, and the non-targets see Fig. 4d,e) over central/fronto-central scalp. This difference extended to about
350 ms. This response pattern is consistent with elicitation of the auditory selective attention component, the Nd. Since we hypothesized that object-recognition processes for the objects used in the current design would be primarily mediated through the visual system, we did not expect multisensory effects on the auditory Nd component. There was no evidence of a multisensory effect on the Nd, with the V+A+ and VA+ waveforms overlapping in the timeframe of the Nd over central/fronto-central scalp (see Fig. 3d,e).
Nd effects were evaluated by repeated measures ANOVA, where the dependent measure was the mean ERP amplitude in the latency window from 225 to 255 ms (based on the Nd assessed in the unisensory A+ vs A response). A repeated measures ANOVA with factors of Stimulus (five), Electrode (three), and Hemisphere (two) resulted in a main effect of Stimulus [F(4,52) = 8.35, P = 0.001]. Three planned comparison three-way ANOVAs with factors of Stimulus (two), Electrode (three) and Hemisphere (two) were conducted to better understand this effect. Two of these ANOVAs compared responses evoked by stimuli that included an auditory target element to non-targets to assess the presence of the Nd. The V+A+ and the VA+ responses were significantly more negative-going than the non-target response, with both ANOVAs showing a main effect of Stimulus [respectively F(1,13) 15.07. P = 0.002; and F(1,13) = 15.73, P = 0.002], and Stimulus by Hemisphere interactions [respectively, F(1,13) = 5.7, P = 0.03; and F(1,13) = 4.68, P = 0.05)] due to a larger effect over left scalp. There was no evidence that multisensory object-recognition processes interacted with this auditory selective attention effect, with no significant difference between the V+A+ response and the VA+ response (F < 1).
The Multisensory Object-recognition Effects Are Not Due to Summation Effects
To rule out the possibility that the multisensory N1-effect and SN-effect were simply the consequence of the summation of the visual and auditory selective attention effects (e.g. the SN and the Nd) and/or target effects, we performed a control test in which the V+A+ response was compared with the sum of the responses elicited by each of the visual- and auditory-unisensory targets (hereafter Summed response). The V+A+ response and Summed response should each include the basic visual and auditory sensory evoked componentry in addition to any visual and auditory selective attention effects (e.g. SN and Nd) or target effects: if the V+A+ response significantly differs from the Summed response, a simple summation explanation cannot account for the multisensory object-recognition effects.
In the latency range of the N1-effect, the V+A+ response was more negative going than the Summed response (see Fig. 7a). Amplitude differences in this latency window (140160 ms) were examined using a two-way ANOVA with factors of ERP (2: V+A+ target vs Summed) and Electrode (three: over right occipito-temporal scalp from the original test of the multisensory N1-effect). The V+A+ response was significantly more negative going, with a main effect of ERP [F(1,13) = 5.33, P = 0.038]. This demonstrates that the multisensory N1-effect cannot be attributed to the summation of selective attention effects or target effects from the respective unisensory systems.
|
This analysis serves as an important control to rule against a summation explanation of the multisensory object-recognition effects. This further supports that the multisensory object-recognition effects were indeed due to the neural interaction of object-information presented in the visual and auditory sensory modalities.
Post hoc analysis
Visual inspection of the grand mean waveforms suggested a multisensory object-recognition effect in the ERPs to the non-targets, with the VA incongruent response more negative-going than the VA congruent response, starting at 390 ms (see Fig. 8). This difference had a relatively broad distribution over centro-parietal scalp. As we had no specific hypotheses regarding this later effect, the following analysis is exploratory and the findings should be treated with caution until a future study replicates the finding.
|
![]() |
Discussion |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
The neurophysiological data from this study indicate that multiple sensory inputs interacted to affect object processing at a relatively early stage in the information-processing stream, modulating neural processes in what are generally considered to be unisensory cortices. Specifically, we found that visual and auditory inputs interacted to enhance processing in the ventral visual stream, which is known for its role in visual object-recognition (e.g. Ungerleider and Mishkin, 1982; Allison et al., 1999
; Doniger et al., 2000
). This initial multisensory object-recognition effect may reflect the multisensory modulation of a subset of the neural generators underlying the visual N1, given their similar timeframe and general topography. The visual N1 is thought to reflect, at least in part, the structural encoding of visual objects (e.g. Bentin et al., 1999
; Eimer, 2000
; Rossion et al., 2000
; Murray et al., 2002
). The right hemisphericity of this effect was in line with findings from Molholm et al. (2002
) and Giard and Peronnet (1999
) of apparent multisensory modulation of the visual N1 that was limited to, or greater over, the right hemisphere. Thus, it may be the case that auditory influences on ventral visual stream processes in this timeframe are largely a right hemisphere function. Consistent with multisensory inputs affecting successive stages of object processing in the ventral visual stream, the N1 effect appeared to be passed on to subsequent neural processes that dealt with visual feature level information, so ascertained by the apparent multisensory modulation of the SN.
The multisensory effect on the visual N1 was present in responses to the target stimuli as we predicted, but to our surprise, it was not at all apparent in the responses to non-target stimuli. Based on our original hypothesis that such an effect would occur as a consequence of the visual and auditory elements belonging to the same object we fully expected the modulation to be present for both target and non-target stimuli. The restriction of the effect to targets, combined with the lack of evidence of an interference effect in the behavioral data for incongruent target trials (trials where the visual and auditory elements were mismatched), suggested an alternative explanation of the data. That is, it is possible that the behavioral and neurophysiological effects that we found resulted from the co-occurrence of task-relevant features in each of the visual and auditory sensory modalities, as opposed to resulting from the co-occurrence of visual and auditory features that were strongly associated with one another through long-term experience. There is some precedence in the literature for such, as Czigler and Balázs (2001) found that the amplitude of the SN was larger when both simultaneously presented visual and auditory elements were task-relevant, compared with when only the visual element was task-relevant. Key here is that the visual and auditory elements were unrelated prior to the task. Further, although not specifically tested, inspection of the waveforms in Czigler and Balázs (2001
) suggests that the same effect might also have been present over posterior scalp in the latency range of the visual N1 (see fig. 1 of Czigler and Balázs, 2001
). The same argument could well be made for our previous study (Molholm et al., 2002
) and the one by Giard and Peronnet (1999
) where multisensory N1 effects were also found and again, where there was no natural relationship between the visual and auditory elements. However, it is important to point out that both of these previous studies found a significant diminution of the ERP response during the timeframe of the visual N1. In contrast, in the present study, we found a response enhancement during the visual N1.
An alternative account, which preserves the original hypothesis that the effect is due to visual and auditory elements belonging to the same object, is that computationally costly operations such as the early integration of visual and auditory features of the same object were only performed on the elements that were relevant (i.e. targets). For example, subjects could have rejected non-target elements from relatively time-constrained processes, once the absence of a relevant visual or auditory feature (or the presence of a non-target feature) was detected. Thus, reduced processing of the irrelevant non-target stimuli would account for the target specificity of the multisensory object-recognition effect (as well as the lack of behavioral interference effects). In this case, presumably, the multisensory object-recognition effect would be observed in non-targets under conditions where selective attention processes could not so constrain processing. For example, in a task in which targets were wild animals, non-targets were domestic animals, and no animal was repeated; since selective attention could not be very effectively instantiated, by this explanation such effects would be seen in both the wild animal and domestic animal responses.
These data add to the expanding role that visual processes play in object-recognition outside the strictly visual domain. Recent functional imaging studies have shown that when an object is presented for identification in the tactile sensory modality, visual object-recognition processes in the ventral-occipital stream are substantially modulated (Amedi et al., 2001; James et al., 2002b
). Zangaladze et al. (1999
) showed that tactilely based object orientation judgements were negatively affected by the application of transcranial magnetic stimulation (TMS) over occipital scalp (TMS is employed to momentarily disrupt cortical function in relatively localized cortical regions). Of particular note in this study was the latency of maximal interference, which at 180 ms coincided well with the general latency range of the visual N1 component.
Our working hypothesis, that object-recognition processes would be a visually dominated function for the class of stimuli and task that were employed, was supported by the finding of multisensory object-recognition effects on the selective processing of visual, but not of auditory, features. That is, multisensory object-recognition processes modulated the SN but not the Nd. This finding of multisensory modulation specific to visual object-recognition processes complements analogous findings in the domain of speech perception. Functional imaging has shown that the presentation of matching auditoryvisual speech results in increased activation of auditory cortical areas involved in the processing of speech within the superior temporal sulcus (STS), when compared with mismatching auditoryvisual speech (Calvert et al., 2000). Thus, it seems that not only do multiple sensory inputs interact to modulate neural processes in what is generally considered to be unisensory cortices for the purpose of object-recognition, but also that the cortical locus of such effects depends upon the class of information to be recognized (e.g. for visualauditory speech, auditory cortical areas, and for visualauditory animals, visual object-recognition areas). Such a model of the neuronal basis of multisensory effects on recognition processes generates specific testable predictions. For example, a multisensory effect referred to as the parchment-skin illusion, in which tactile judgements of surface texture are influenced by simultaneous auditory stimulation (Jousmäki and Hari, 1998
), would be predicted to be mediated by the neural processes that underlie texture identification, presumably residing in somatosensory cortices. Findings from Foxe et al. (2000
) showing early multisensory modulation of processes in somatosensory cortices by auditory stimulation (5080 ms post stimulus onset) demonstrate the feasibility of such a prediction (see also Lutkenhoner et al., 2002
).
In addition to the predicted electrophysiological effects, a post hoc analysis suggested that there was a relatively late occurring congruency effect in the responses elicited by the bisensory non-targets (tested in the 400500 ms latency window), where non-targets with mismatching visual and auditory elements elicited a more negative going response than non-targets that had matching visual and auditory elements. This effect was not predicted and obviously needs to be replicated before any serious interpretation can be made. However, we suspect that the effect was related to the semantic congruency between the simultaneously presented visual and auditory elements, and propose that it belongs under the class of components encompassed by the N400. The N400 is elicited in a variety of situations by unexpected words or objects as compared with expected words or objects. It is hypothesized to reflect semantic access and integration into the semantic context (Kutas and Federmeier, 2000), and has been proposed to reflect both sensory specific and supramodal processes (e.g. Kutas and Federmeier, 2000
). Similar to the timing and topography of the effect observed in the present data, the N400 consists of a negative going response in the latency region of 400 ms, which has a broad central/centro-parietal voltage topography (e.g. Kutas and Hillyard, 1980
). Different from the typical N400, the present effect was elicited by simultaneously presented, as opposed to sequentially presented, semantic information (as in Ganis and Kutas, 2003
, in the visual sensory modality).
![]() |
Conclusions |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
![]() |
Notes |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Address correspondence to John Foxe, Cognitive Neurophysiology Laboratory, Program in Cognitive Neuroscience and Schizophrenia, Nathan S. Kline Institute for Psychiatric Research, 140 Old Orangeburg Road, Orangeburg, NY 10962, USA. Email: Foxe{at}nki.rfmh.org
![]() |
References |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Amedi A, Malacz R, Hendler T, Peled S, Zohary E (2001) Visuo-haptic object-related activation in the ventral visual pathway. Nat Neurosci 4:324330.[CrossRef][ISI][Medline]
Andersen RA, Snyder LH, Bradley DC, Xing J (1997) Multimodal representation of space in the posterior parietal cortex and its use in planning movements. Annu Rev Neurosci 20:303330.[CrossRef][ISI][Medline]
Anllo-Vento L, Hillyard SA (1996) Selective attention to the color and direction of moving stimuli: electrophysiological correlates of hierarchical feature selection. Percept Psychophys 58:191206.[ISI][Medline]
Anllo-Vento L, Luck SJ, Hillyard SA (1998) Spatio-temporal dynamics of attention to color: evidence from human electrophysiology. Hum Brain Mapp 6:216238.[CrossRef][ISI][Medline]
Bentin S, Mouchetant-Rostaing Y, Giard MH, Echallier JF, Pernier J (1999) ERP manifestations of processing printed words at different psycholinguistic levels: time course and scalp distribution. J Cogn Neurosci 11:235260.
Berman RA, Colby CL (2002) Both auditory and visual attention modulate motion processing in area MT+. Cogn Brain Res 14:6474.[CrossRef][ISI][Medline]
Calvert GA (2001) Crossmodal processing in the human brain: insights from functional neuroimaging studies. Cereb Cortex 11:11101123.
Calvert GA, Brammer MJ, Bullmore ET, Campbell R, Iversen SD, David AS (1999) Response amplification in sensory-specific cortices during crossmodal binding. Neuroreport 10:26192623.[ISI][Medline]
Calvert GA, Campbell R, Brammer MJ (2000) Evidence from functional magnetic resonance imaging of crossmodal binding in the human heteromodal cortex. Curr Biol 10:649657.[CrossRef][ISI][Medline]
Campbell R, Dodd B (1980) Hearing by eye. Q J Exp Psychol 32:8599.[ISI][Medline]
Czigler I, Balázs L (2001) Event-related potentials and audiovisual stimuli: multimodal interactions. Neuroreport 12:223226.[CrossRef][ISI][Medline]
Dias EC, Foxe JJ, Javitt DC (2003) Changing plans: a high density electrical mapping study of cortical control. Cereb Cortex 13:701715.
Di Russo F, Martinez A, Sereno MI, Pitzalis S, Hillyard SA (2002) Cortical sources of the early components of the visual evoked potential. Hum Brain Mapp 15:95111.[CrossRef][ISI][Medline]
Di Russo F, Martinez A, Hillyard SA (2003) Source analysis of event-related cortical activity during visuo-spatial attention. Cereb Cortex 13:486499.
Doniger GM, Foxe JJ, Murray MM, Higgins BA, Snodgrass JG, Schroeder CE, Javitt DC (2000) Activation time-course of ventral visual stream object-recognition areas: high density electrical mapping of perceptual closure processes. J Cogn Neurosci 12:615621.
Doniger GM, Foxe JJ, Schroeder CE, Murray MM, Higgins BA, Javitt DC (2001) Visual perceptual learning in human object based recognition areas: a repetition priming study using high-density electrical mapping. Neuroimage 13:305313.[CrossRef][ISI][Medline]
Doniger GM, Foxe JJ, Murray MM, Higgins BA, Javitt DC (2002) Perceptual closure deficits in schizophrenia: a high-density electrical mapping study. Arch Gen Psychiatry 59: 10111020.
Duhamel JR, Colby CL, Goldberg ME (1991) Congruent representation of visual and somatosensory space in single neurons of monkey ventral intraparietal cortex (VIP). In: Brain and space (Paillard J, ed.), pp. 223236. Oxford: Oxford University Press.
Eimer M (2000) Effects of face inversion on the structural encoding and recognition of faces. Evidence from event-related brain potentials. Brain Res Cogn Brain Res 10:145158.[ISI][Medline]
Fabiani M, Kazmerski VA, Cycowicz YM, Friedman D (1996) Naming norms for brief environmental sounds: effects of age and dementia. Psychophysiology 33:462475.[ISI][Medline]
Foxe JJ, Simpson GV (2002) Timecourse of activation flow from V1 to frontal cortex in humans: a framework for defining early visual processing. Exp Brain Res 142:139150.[CrossRef][ISI][Medline]
Foxe JJ, Morocz IA, Higgins BA, Murray MM, Javitt DC, Schroeder CE (2000) Multisensory auditorysomatosensory interactions in early cortical processing revealed by high density electrical mapping. Cogn Brain Res 10:7783.[ISI][Medline]
Foxe JJ, Doniger GM, Javitt DC (2001) Visual processing deficits in schizophrenia: impaired P1 generation revealed by high-density electrical mapping. Neuroreport 12: 38153820.[CrossRef][ISI][Medline]
Foxe JJ, Wylie GR, Martinez A, Schroeder CE, Javitt DC, Guilfoyle D, Ritter W, Murray MM (2002) Auditorysomatosensory multisensory processing in auditory association cortex: an fMRI study. J Neurophysiol 88:540543.
Foxe JJ, McCourt ME, Javitt DC (2003) Right hemisphere control of visuo-spatial attention: line-bisection judgments evaluated with high-density electrical mapping and source-analysis. Neuroimage 19:710726.[CrossRef][ISI][Medline]
Ganis G, Kutas M (2003) An electrophysiological study of scene effects on object identification. Brain Res Cogn Brain Res 16: 123144.
Giard MH, Peronnet F (1999) Auditoryvisual integration during multimodal object recognition in humans: a behavioral and electrophysiological study. J Cogn Neurosci 11:473490.
Hansen JC, Hillyard SA (1980) Endogenous brain potentials associated with selective auditory attention. Electroencephalogr Clin Neurophysiol 49:277290.[CrossRef][ISI][Medline]
Hansen JC, Dickstein PW, Berka C, Hillyard SA (1983) Event-related potentials during selective attention to speech sounds. Biol Psychol 16:211224.[CrossRef][ISI][Medline]
Harter MR, Aine C (1984) Brain mechanisms of visual selective attention. In: Varieties of attention (Parasiraman P, David DR, eds), pp. 293321. New York: Academic Press.
Haxby JV, Ungerleider LG, Clark VP, Schouten JL, Hoffman EA, Martin A (1999) The effect of face inversion on activity in human neural systems for face and object perception. Neuron 22:189199.[ISI][Medline]
Hillyard SA, Hink RF, Schwent VL, Picton TW (1973) Evoked potential correlates of auditory signal detection. Science 182:177180.[Medline]
Ishai A, Ungerleider LG, Martin A, Schouten JL, Haxby JV (1999) Distributed representation of objects in the human ventral visual pathway. Proc Natl Acad Sci USA 96:93799384.
James T, Humphrey G, Gati J, Menon R, Goodale M (2002a) Differential effects of viewpoint on object-driven activation in dorsal and ventral streams. Neuron 35:793801.[ISI][Medline]
James TW, Humphrey GK, Gati JS, Servos P, Menon RS, Goodale MA (2002b) Haptic study of three-dimensional objects activates extrastriate visual areas. Neuropsychologia 40:17061714.[CrossRef][ISI][Medline]
Jousmäki V, Hari R (1998) Parchment-skin illusion: sound-biased touch. Curr Biol 8:R190.
Kanwisher N, McDermott J, Chun MM (1997) The fusiform face area: a module in human extrastriate cortex specialized for face perception. J Neurosci 17:43024311.
Kenemans JL, Kok A, Smulders FTY (1993) Event-related potentials to conjunctions of spatial frequency and orientation as a function of stimulus parameters and response requirements. Electroencephalogr Clin Neurophys 88:5163.[CrossRef][ISI]
Kohler S, Kapur S, Moscovitch M, Winocur G, Houle S (1995) Dissociation of pathways for object and spatial vision: a PET study in humans. Cogn Neurosci Neurophysiol 6:18651868.
Kutas M, Federmeier KD (2000) Electrophysiology reveals semantic memory use in language comprehension. Trends Cogn Sci 4:463470.[CrossRef][ISI][Medline]
Kutas M, Hillyard SA (1980) Reading senseless sentences: brain potentials reflect semantic incongruity. Science 11:203205.
Lerner Y, Hendler T, Malach R (2002) Object-completion effects in the human lateral occipital complex. Cereb Cortex 12:163177.
Lutkenhoner B, Lammertmann C, Simoes C, Hari R (2002) Magnetoencephalographic correlates of audiotactile interaction. Neuroimage 15:509522.[CrossRef][ISI][Medline]
MacLeod A, Summerfield Q (1990) A procedure for measuring auditory and audio-visual speech-reception thresholds for sentences in noise: rationale, evaluation, and recommendations for use. Br J Audiol 24:2943.[Medline]
Malach R, Reppas JB, Benson RR, Kwong KK, Jiang H, Kennedy WA, Ledden PJ, Brady TJ, Rosen BR, Tootell RB (1995) Object-related activity revealed by functional magnetic resonance imaging in human occipital cortex. Proc Natl Acad Sci USA 92:81358139.[Abstract]
McGurk H, MacDonald J (1976) Hearing lips and seeing voices. Nature 264:746748.[ISI][Medline]
Meredith MA (2002) On the neuronal basis for multisensory convergence: a brief overview. Brain Res Cogn Brain Res 14:3140.[CrossRef][ISI][Medline]
Miller J (1982) Divided attention: evidence for coactivation with redundant signals. Cogn Psychol 14:247279.[ISI][Medline]
Molholm S, Ritter W, Murray MM, Javitt DC, Schroeder CE, Foxe JJ (2002) Multisensory auditoryvisual interactions during early sensory processing in humans: a high-density electrical mapping study. Brain Res Cogn Brain Res 14:115128.[ISI][Medline]
Murray MM, Wylie GR, Higgins BA, Javitt DC, Schroeder CE, Foxe JJ (2002) The spatiotemporal dynamics of illusory contour processing: combined high-density electrical mapping, source analysis, and functional magnetic resonance imaging. J Neurosci 22:50555573.
Näätänen R (1992) Attention and brain function. Hillsdale, NJ: Lawrence Erlbaum Associates.
Näätänen R, Gaillard AWK, Mantysalo S (1978) Early selective attention effect on evoked potential reinterpreted. Acta Psychol (Amst) 42:313329.[CrossRef][Medline]
Näätänen R, Simpson M, Loveless NE (1982) Stimulus deviance and evoked potentials. Biol Psychol 14:5398.[CrossRef][ISI][Medline]
Olson IR, Gatenby JC, Gore JC (2002) A comparison of bound and unbound audio-visual information processing in the human cerebral cortex. Brain Res Cogn Brain Res 14:129138.[ISI][Medline]
Picton TW, Hillyard SA, Krausz HI, Galambos R (1974) Human auditory evoked potentials. I. Evaluation of components. Electroencephalogr Clin Neurophys 36:179190.[CrossRef][ISI]
Puce A, Allison T, Asgari M, Gore JC, McCarthy G (1996) Differential sensitivity of human visual cortex to faces, letterstrings, and textures: a functional magnetic resonance imaging study. J Neurosci 16:52055215.
Puce A, Allison T, McCarthy G (1999) Electrophysiological studies of human face perception. III: Effects of top-down processing on face-specific potentials. Cereb Cortex 9:445458.
Rossion B, Gauthier I, Tarr MJ, Despland P, Bruyer R, Linotte S, Crommelinck M (2000) The N170 occipito-temporal component is delayed and enhanced to inverted faces but not to inverted objects: an electrophysiological account of face-specific processes in the human brain. Neuroreport 11:6974.[ISI][Medline]
Scherg M, Von Cramon D (1985) Two bilateral sources of the late AEP as identified by a spatio-temporal dipole model. Electroencephalogr Clin Neurophysiol 1:3244.
Schroeder CE, Foxe JJ (2002) The timing and laminar profile of converging inputs to multisensory areas of the macaque neocortex. Brain Res Cogn Brain Res 14:187198.[CrossRef][ISI][Medline]
Schweinberger SR, Pickering EC, Jentzsch I, Burton AM, Kaufmann JM (2002) Event-related brain potential evidence for a response of inferior temporal cortex to familiar face repetitions. Brain Res Cogn Brain Res 14:398409.[CrossRef][ISI][Medline]
Simpson GV, Pflieger ME, Foxe JJ, Ahlfors SP, Vaughan HG Jr, Hrabe J, Ilmoniemi RJ, Lantos G (1995) Dynamic neuroimaging of brain function. J Clin Neurophysiol 12:432449.[ISI][Medline]
Smid HGOM, Jakob A, Heinze H-J (1999) An event-related brain potential study of visual selective attention to conjunctions of color and shape. Psychophysiology 36:264279.[CrossRef][ISI][Medline]
Snodgrass JG, Vanderwart M (1980) A standardized set of 260 pictures: norms for name agreement, image agreement, familiarity, and visual complexity. J Exp Psychol Hum Learn 6:174215.[CrossRef][Medline]
Stein BE (1998) Neural mechanisms for synthesizing sensory information and producing adaptive behaviors. Exp Brain Res 123:124135.[CrossRef][ISI][Medline]
Stein BE, Dixon JP (1979) Properties of superior colliculus neurons in the golden hamster. J Comp Neurol. 183:269284.[ISI][Medline]
Stein BE, London N, Wilkinson LK, Price DD (1996) Enhancement of perceived visual intensity by auditory stimuli: a psychophysical analysis. J Cogn Neurosci 8:497506.[ISI]
Sumby WH, Pollack I (1954) Visual contribution to speech intelligibility in noise. J Acoust Soc Am 26:212215.[ISI]
Tanaka J, Luu P, Weisbrod M, Kiefer M (1999) Tracking the time course of object categorization using event-related potentials. Neuroreport 10:829835.[ISI][Medline]
Thompson LA (1995) Encoding and memory for visible speech and gestures: a comparison between young and older adults. Psychol Aging 10:215228.[CrossRef][ISI][Medline]
Ungerleider LG, Mishkin M (1982) Two cortical visual systems. In: Analysis of visual behavior (Engle DJ, Goodale MA, Mansfield RJ, eds), pp. 549586. Cambridge, MA: MIT Press.
Vaughan HG, Ritter W (1970) The sources of auditory evoked responses recorded from the human scalp. Electroencephalogr Clin Neurophys 28:360367.[ISI]
Vaughan HG, Ritter W, Simson R (1980) Topographic analysis of auditory event-related potentials. Prog Brain Res 54:279285.[Medline]
Vogel EK, Luck SJ (2000) The visual N1 component as an index of a discrimination process. Psychophysiology 37:190203.[CrossRef][ISI][Medline]
Woods DL, Alain C (2001) Conjoining three auditory features: an event-related brain potential study. J Cogn Neurosci 13:492509.
Zangaladze A, Epstein CM, Grafton ST, Sathian K (1999) Involvement of visual cortex in tactile discrimination of orientation. Nature 401:587590.[CrossRef][ISI][Medline]