1Laboratory for Cognitive Neuroscience, Division of Biophysical Engineering, Graduate School of Engineering Science, Osaka University; 2Core Research for Evolutional Science and Technology, Japan Science and Technology Corporation, Osaka 560-8531; and 3Department of Cognitive Neuroscience, Osaka University Medical School, Osaka 565-0871, Japan
![]() |
ABSTRACT |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Tanaka, Hiroki, Takanori Uka, Kenji Yoshiyama, Makoto Kato, and Ichiro Fujita. Processing of Shape Defined by Disparity in Monkey Inferior Temporal Cortex. J. Neurophysiol. 85: 735-744, 2001. Neurons in the monkey inferior temporal cortex (IT) have been shown to respond to shapes defined by luminance, texture, or motion. In the present study, we determined whether IT neurons respond to shapes defined solely by binocular disparity, and if so, whether signals of disparity and other visual cues to define shape converge on single IT neurons. We recorded extracellular activity from IT neurons while monkeys performed a fixation task. Among the neurons that responded to at least one of eight random-dot stereograms (RDSs) containing different disparity-defined shapes, 21% varied their responses to different RDSs. Responses of most of the neurons were positively correlated between two sets of RDSs, which consisted of different dot patterns but defined the same set of eight shapes, whereas responses to RDSs and their monocular images were not correlated. This indicates that the response modulation for the eight RDSs reflects selectivity for shapes (or their component contours) defined by disparity, although responses were also affected by dot patterns per se. Among the neurons that showed selectivity for shapes defined by luminance or disparity, 44% were activated by both cues. Responses of these neurons to luminance-defined shapes and those to disparity-defined shapes were often positively correlated to each other. Furthermore the stimulus rank, which was determined by the magnitude of responses to shapes, generally matched between these cues. The same held true between disparity and texture cues. The results suggest that the signals of disparity, luminance, and texture cues to define the shapes converge on a population of single IT neurons to produce the selectivity for shapes.
![]() |
INTRODUCTION |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Binocular disparity is a
positional difference between the left and right retinal images of an
object. When binocular disparity in a retinal region differs from that
in its surroundings, shape is perceived binocularly, even though it
cannot be from monocular images (Julesz 1971). This
indicates that binocular disparity is a sufficient cue for the
perception of shape.
In studies on binocular vision, binocular disparity has been focused on
as a cue for the perception of depth rather than that for shapes
(Regan 1991; Wheatstone 1838
). Previous
physiological studies aimed at determining the neural mechanisms of
stereopsis, which computes depth from binocular disparity cues, have
revealed that many visual cortical areas in primates, including areas
V1, V2, V3, V4, VP, MT, and MST, areas in the posterior
parietal cortex, and the inferior temporal cortex (IT) contain
disparity-selective neurons (Burkhalter and Van Essen
1986
; Felleman and Van Essen 1987
; Hubel
and Wiesel 1970
; Janssen et al. 1999
;
Maunsell and Van Essen 1983
; Poggio and Fischer
1977
; Poggio et al. 1988
; Roy et al.
1992
; Sakata et al. 1997
; Uka et al.
2000
; Watanabe et al. 2000
), and examined the
roles of these neurons in depth perception (Bradley et al.
1998
; Cumming and Parker 1997
, 2000
;
DeAngelis et al. 1998
; Prince et al.
2000
) or representation by these neurons of three-dimensional
surface structure (Janssen et al. 1999
, 2000a
,b
; Shikata et al. 1996
; Taira et al. 2000
;
Uka et al. 1997
). On the other hand, studies on the
neural processing of two-dimensional shape based on
disparity cues are very limited. von der Heydt et al.
(2000)
reported that some disparity-selective V2 neurons respond to edges defined by disparity in an orientation-selective manner. The edge information may be integrated into shape information in some area of the brain, although it is not known how neurons represent shapes defined by disparity.
The IT, the final stage of the ventral visual pathway in monkeys, is
considered as being critically involved in shape processing (Mishkin et al. 1983). Many neurons in this area respond
selectively to shapes; they prefer some shapes over others
(Desimone et al. 1984
; Gross et al. 1972
;
Tanaka et al. 1991
). Sáry et al.
(1993)
showed that a population of IT neurons responded not
only to shapes defined by luminance but also to those defined by
texture or motion. They also showed that the shape selectivity of these
neurons was similar for the cues that define shape. In the present
study, we attempted to determine whether IT neurons responded to shapes defined by disparity and, if so, whether signals of disparity and other
visual cues to define shape converge on the single IT neurons. We
recorded extracellularly the responses of neurons to eight random-dot
stereograms (Julesz 1971
), which contained different
shapes defined by disparity, and to the same sets of shapes defined by
luminance or texture. We show that some IT neurons were selective for
shapes defined by disparity. Shape selectivity of these neurons tended
to be similar between disparity and other cues, suggesting that signals
of different cues to define shapes converge on these neurons to show
their shape selectivity. Preliminary results have appeared elsewhere
(Tanaka et al. 1999
).
![]() |
METHODS |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Subjects and surgery
Two male monkeys (Macaca fuscata, 9 and 5 kg body wt)
were used. In the first surgery, a scleral search coil was implanted under the conjunctiva of one eye to monitor eye position (Judge et al. 1980), and a head post was attached to the skull using acrylic screws and dental cement to allow head fixation. After a
recovery period of >2 wk followed by 2-3 mo of training in a fixation
task, an eye coil was implanted in the other eye, and a recording
chamber was attached to the right side of the skull over the temporal
cortex. After the monkeys received supplementary training, neuronal
recordings were started. In monkey 1, another recording
chamber was attached to the skull over the left temporal cortex after
the recordings from the right IT were completed. All surgical
procedures were performed under surgical anesthesia (pentobarbital
sodium, 35 mg/kg ip) and aseptic conditions. After each surgery, the
monkeys were administered an antibiotic (piperacillin sodium, 30 mg/kg
im), analgesic (ketroprofen, 0.5 mg/kg im), and corticosteroid
(dexamethasone sodium phosphate, 0.1 mg/kg im) to minimize potential
inflammation. Surgical procedures and animal care conformed to the
Guidelines of the National Institutes of Health for the Care and Use of
Laboratory Animals (1996) and were approved by the animal experiment
committee of Osaka University Medical School.
Task and stimulus presentation
The monkeys were trained in a fixation task controlled by a
computer (PC486FS: EPSON, Suwa, Japan). They were seated on a primate
chair facing a 15-in color CRT monitor (frame rate, 70 Hz; size,
260 × 195 mm; resolution, 1,024 × 768) placed 57 cm away.
The monkey's head was fixed by screwing the head post to the chair.
Stimuli were presented on the monitor using a computer (Asus Computer
International, San Jose, CA). For monkey 1, the positions of
both eyes were sampled at a rate of 100 Hz using the search coil
technique (Judge et al. 1980) and stored for off-line analysis, although the position of one eye was monitored on-line. For
monkey 2, the position of only one eye was sampled and
monitored. Each trial was started with the presentation of a fixation
spot (0.2 × 0.2°) at the center of the monitor. The monkeys
were trained to fixate on it for 500 ms (fixation window: 2 × 2°). A visual stimulus was then presented for 2 s, and the
monkeys were rewarded with a drop of water if they maintained their
fixation for this duration. Otherwise the trial was aborted the moment
they broke their fixation. After the monkeys were returned to their
cages, they received an adequate amount of fruit. During the training and experimental sessions, the monkeys were deprived of water but were
allowed dry food ad libitum in their cages.
Visual stimuli
A static random-dot pattern consisting of bright and dark dots covered the entire screen of the monitor. Fifty percent of the dots were bright. Each dot occupied 2 × 2 pixels subtending a visual angle of 0.05 × 0.05°. The luminances of the bright and dark dots were 38 and 0.2 cd/m2, respectively. The stimulus set consisted of eight shapes (Fig. 1A) subtending a visual angle of 2°. Each shape was defined by a difference in disparity, luminance, or texture between the shape region and its surroundings (Fig. 1B).
|
RANDOM-DOT STEREOGRAMS (RDSS). Disparity-defined shapes were generated by adding a crossed disparity of 0.2° to the shape region (Fig. 1B, top and middle). Dots in the shape region of the left-eye image were horizontally shifted relative to those of the right-eye image. The right-eye image was identical for all stimuli. A liquid crystal stereoscopic modulator and polarized glasses were used for dichoptic stimulation (Tektronix SGS610). In control experiments where we examined whether the differential responses of IT neurons to the different RDSs were due to response selectivity for shapes defined by disparity or caused by slightly different dot patterns in the left-eye images, only the left-eye images were presented to the left eye.
LUMINANCE-DEFINED SHAPES (LUMS). Luminance-defined shapes were constructed by making the bright dots inside the shapes darker than those in the surroundings (Fig. 1B, bottom left). The luminance of the bright dots in the shape region was 20 cd/m2, yielding a contrast between the shape region and its surroundings of 50%.
TEXTURE-DEFINED SHAPES (TEXS). The dots in the shape region were four times as large as those in the surroundings (0.1 × 0.1°) and were arranged in a regular checkerboard pattern (Fig. 1B, bottom right). The average luminance of the shape region was the same as that of its surroundings. LUMs and TEXs did not have crossed disparity of 0.2° but contained only 0 disparity.
When visual stimuli were presented, the dot pattern in the central 7.5 × 5° rectangular region containing the shape region was changed with no correlation of dot position between the prestimulus and stimulus patterns. The shape disappeared when the arrangement of dots in this central region was returned to the original pattern. This procedure was adopted to avoid apparent motion of dots, something that could be an additional cue for the perception of shapes. To distinguish whether neuronal responses to the visual stimuli were induced by the disparity-defined shapes or by the change of dot patterns in the central rectangular region, we recorded the neuronal responses when the rectangular region of dots contained no shape (referred to as the "no-shape pattern"). The no-shape pattern was identical to the dot pattern of the central 7.5 × 5° region of the RDSs presented to the right eye. This dot pattern was also identical to that of the central 7.5 × 5° region of all stimuli for LUMs and TEXs, except for the shape region. Therefore any difference between the neuronal responses to a visual stimulus and the no-shape pattern was considered as being evoked by the shape contained in the visual stimulus. All the neurons, except for two, which we tested with only LUMs and RDSs, were tested with 24 stimuli (8 shapes × 3 cues) and the no-shape pattern shown in a random sequence. Additional control stimuli were then presented as long as the recording remained stable.Extracellular recording
The activity of IT neurons (mostly single units and some
multi-units) was recorded extracellularly from three hemispheres of the
two monkeys using glass-coated elgiloy electrodes (tip, 15-40 µm;
impedance, 2-3 M at 1 kHz). The electrodes were controlled by a
microdrive (MO-95 s, Narishige, Tokyo) attached to the recording chamber. The electrodes penetrated the dura mater into the lateral surface of the IT. The recorded signals were amplified,
band-pass-filtered, and fed to a window discriminator and an
oscilloscope. The triggered spikes were sampled in a computer (PC486FS:
EPSON, Suwa, Japan). A rastergram and a peristimulus time histogram
were plotted and displayed on-line. The timings of spike occurrence and
the behavioral responses were stored for off-line analysis.
Histology
After the neuronal recording sessions were completed, monkey 1 was anesthetized with an overdose of pentobarbital sodium (60 mg/kg ip), the chest cavity was opened, and heparin (200 IU/kg) was injected into the heart. The animal was transcardially perfused with 500 ml of phosphate-buffered saline (PBS, 37°C), followed by perfusion of 2 l of ice-cold 4% paraformaldyhyde in 0.1 M PBS. We implanted two pins in the brain at the anterior and posterior edges of the recording chambers on both the right and left sides. The brain was then removed, photographed, blocked, postfixed overnight in the fixative, and immersed in 0.1 M PBS containing a graded series of sucrose (10-30%). The location of the implanted pins was verified for reconstruction of the recording area. Monkey 2 is still alive and participating in a different experiment.
Data analysis
The spontaneous firing rate was calculated from the average spike count over 500 ms before stimulus onset. The net response was calculated by subtracting the spontaneous firing rate from the firing rate during a 2-s period starting 80 ms after stimulus onset. The net response was averaged across stimulus repetitions (5-10 times). This value was used as a measure of the neuronal responses to the stimulus.
To examine whether the strength of the responses to the stimulus was significantly different from spontaneous firing, we used a t-test (2-tailed, P < 0.05). We also used a t-test to determine whether the responses to shape stimuli were significantly different from those to the no-shape pattern. To determine whether response modulation to different shapes was statistically significant, one-way ANOVA was performed. Other analyses will be mentioned where relevant. The significance level in all tests was P < 0.05.
![]() |
RESULTS |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Recording sites
Histological analysis showed that our recording site in monkey 1 was in the central portion of the dorsal IT (striped area in Fig. 2). The recording region included area TEd and possibly the most anterior part of area TEO. The two dots in Fig. 2 show the location of the pins implanted as histological landmarks.
|
Monkey 2 is still alive. Since the recording chamber was attached to a position on the skull just over the temporal cortex, similar to that in monkey 1, we believe that the recordings were made from a similar portion of the IT.
Database
We recorded from 225 units in three hemispheres of the two monkeys (n = 166 in monkey 1, n = 59 in monkey 2). All the units were tested with RDSs, LUMs, and TEXs except two, which were tested with only the first two. We sampled neurons that responded to at least one of the 24 stimuli (i.e., 8 shapes × 3 cues). If a neuron responded to the no-shape pattern and the response was not different from any of the responses to the visual stimuli, the neuron was considered to respond to the dot pattern and was discarded from further analysis. Using this criterion, 201 units responded to at least one of the RDSs, LUMs or TEXs. Of these, 105 units responded to RDSs, 142 responded to LUMs, and 138 responded to TEXs. All these stimuli generally evoked excitatory responses, although the ratio of excitatory to inhibitory responses was slightly different across the three cues (Table 1).
|
In Table 2, we classified the 201 units into the following seven groups according to the cues that evoked responses: units that responded to RDSs alone, LUMs alone, TEXs alone, both RDSs and LUMs, both RDSs and TEXs, both LUMs and TEXs, and RDSs, LUMs and TEXs. Sixty-eight percent of these units responded to more than two cues, and 24% responded to three cues.
|
Of the 105 units responding to the RDSs, 22 (21%) showed statistically significant response modulation (or, response selectivity) for different RDSs. Eighteen were single isolated neurons, while the other four were multiple neurons. Of the 142 units responding to LUMs, 66 (46%) showed response selectivity to LUMs (53 single neurons). Of the 138 units responding to TEXs, 50 (36%) showed response selectivity to TEXs (39 single neurons).
Response selectivity for RDSs
Figure 3 shows an example of an IT neuron that responded to RDSs in a stimulus-selective manner. This neuron showed excitatory responses to five of the eight stimuli with a latency of ~150 ms. The magnitude of responses differed across different shapes (ANOVA, P < 0.01) with the maximum response to a square (stimulus 3; 10.8 spikes/s). Since this neuron showed a similar response modulation for LUMs (see Fig. 6), the response modulation for RDSs was considered to be largely due to the differences of the shapes defined by disparity and not due to the differences of the dot patterns in different RDSs.
|
The preferred shape and the degree of tuning differed among the units that showed response selectivity for RDSs. Neuron A in Fig. 4 responded only to a doughnut shape (stimulus 1), and little or no response was evoked by the other shapes. ANOVA revealed a highly significant modulation of responses by different shapes (P < 0.0005). Neuron B, responding to six different stimuli, showed significant, but rather broad, response selectivity (ANOVA, P < 0.01). To quantify the sharpness of the response selectivity for RDSs, we calculated the tuning width. This was defined as the number of stimuli to which a neuron responded with more than half the maximum response magnitude (taking a value from 1 to 8). The tuning width of neuron A was 1, indicating that this neuron belonged to the group of neurons showing the sharpest response selectivity. The tuning width of neuron B was 5. The median of the tuning width for the 17 single neurons that were selective for RDSs with excitatory responses was 2. Comparison of the distribution of the tuning widths for RDSs with that for LUMs (median = 3 for 47 single neurons that were selective for LUMs with excitatory responses) or TEXs (median = 2 for 31 single neurons that were selective for TEXs with excitatory responses) revealed no statistically significant differences (Mann-Whitney U test: P > 0.9 for LUMs vs. RDSs; P > 0.9 for TEXs vs. RDSs). Thus the frequency distribution of tuning widths for RDSs was comparable with that for LUMs or TEXs.
|
Neurons A and B in Fig. 4 also exhibited a contrast in their response strength. While neuron A responded rather weakly, neuron B showed vigorous and sustained responses. The maximum response magnitude of neuron B was 29.2 spikes/s. This was the strongest response among the units selective for the RDSs. The average of the maximum response magnitude to the RDSs among the 17 single neurons with excitatory responses was 9.4 ± 6.1 (SD) spikes/s. The average of the maximum response magnitude to LUMs among the single neurons which were selective for LUMs with excitatory responses was 13.6 ± 8.5 (n = 47) and that to TEXs was 11.3 ± 8.4 spikes/s (n = 31). When we compared the response magnitude to RDSs with those to LUMs and TEXs, the magnitudes of the responses to LUMs tended to be higher than those to RDSs, although the difference between the maximum response magnitudes to LUMs and RDSs fell slightly short of the significance level (Mann-Whitney U test, 0.05 < P < 0.1). Thus under the present stimulus condition, luminance tends to act as a stronger cue to evoke stimulus-selective responses in IT neurons than disparity cues, which is consistent with the observation that the number of single neurons selective for LUMs (n = 53) was about three times that for RDSs (n = 18).
Responses to monocular images and to new RDSs consisting of different dot patterns
Seventeen of the 22 units selective for RDSs were tested with
monocular images of the RDSs. Only the left-eye images were used
because the right-eye image was identical across the eight RDSs (see
METHODS). Figure
5A compares the responses of a
single neuron to the RDSs () with those to the left-eye images
(
). The responses to the RDSs and the monocular images are sorted according to the magnitudes of the responses to the RDSs. The response
profile to the monocular images was markedly different from that to the
RDSs. Therefore the response modulation for the RDSs of this neuron was
not explained by the responses to slightly different dot patterns in
the monocular images. To evaluate the similarity between the two
response profiles, we calculated Pearson's correlation coefficient
between the two sets of eight responses (referred to as "response
correlation"). In the 17 units tested, there was no response
correlation between the RDSs and the monocular images (
0.01 ± 0.38, n = 17, P > 0.9, sign test).
Hence, in general, the monocular responses to the dot patterns do not
account for the response modulation for the RDSs.
|
Next we examined whether responses to the routinely used set of RDSs
were correlated with those to a new set of RDSs consisting of totally
different dot patterns, but defining the same set of eight shapes.
Although the response magnitude to the new RDSs (Fig. 5A,
) was lower than that to the original RDSs, the two response
profiles showed strong positive correlation (r = 0.76, P < 0.05). Together with the finding that the response
profile to the monocular images was different from that to the RDSs,
the results showed that the response profile for the eight
RDSs of this neuron was mainly based on the shape defined by disparity, although the response magnitude was affected by the dot pattern.
Of the 12 units tested with the two sets of RDSs, we calculated the
ratio of the response magnitude for the more-effective RDSs to that for
the less-effective RDSs. The average of the ratio was 0.62 ± 0.2. This indicates that, for many neurons, the dot pattern affects the
response magnitude to the RDSs. However, response correlation between
the two sets of RDSs was positively distributed (Fig. 5B,
, mean: 0.48, P < 0.01, sign test). The same was
true when only single neurons were selected for analysis (- - -,
n = 9, mean: 0.56, P < 0.005, sign
test). We consider that the positive shift of the distribution reflects
similarity between responses of a single neuron to the two sets of RDSs
because such positive shift of the distribution of correlation
coefficient was observed only when the two sets of data were from the
same neuron. The correlation coefficient between responses to the
original RDSs of one neuron and those to the new RDSs of the same or a
different neuron was distributed around 0 ( · · ·,
mean: 0.09, P > 0.2, sign test, n = 78 pairs). There was a statistically significant difference between this
distribution and the distribution of the response correlation in the
same neuron (i.e., · · · vs.
, P < 0.005, Mann-Whitney U test). In addition, there was no
difference between the distribution of response correlation between two
sets of RDSs in the same neuron (
) and the distribution of
correlation coefficient between responses to LUMs and those to TEXs
(- - - in Fig. 7B, P > 0.1, Mann-Whitney
U test) to which IT neurons have been shown to exhibit
similar shape selectivity (Sáry et al. 1993
).
These results support the views that modulation of responses to the
eight RDSs, at least partially, represents response selectivity for the
shapes defined by disparity.
Responses to the no-shape pattern
Because the central 7.5 × 5° region changed its dot pattern at stimulus presentation, the change of the dot pattern could also affect the neuronal activity in addition to the shapes defined by disparity. However, because we analyzed only neurons that did not respond to the no-shape pattern or neurons whose responses to the shapes were different from their responses to the no-shape pattern, these factors were considered not to have a substantial effect on the responses of the 22 units that were selective for RDSs. The average responses to the no-shape pattern among the 22 units was 0.77 spikes/s. This value was less than one-tenth the average maximum response magnitudes for the RDSs (8.5 spikes/s, n = 22), and the difference between these two values was significant (Wilcoxon paired-rank test: P < 0.0001, n = 22).
Vergence eye movement
We analyzed vergence eye position of monkey 1 during recordings of activities of 16 neurons that were selective for RDSs. Except for one neuron, there was no statistically significant differences in the average vergence eye position during 2 s of stimulus presentation across different RDSs (ANOVA, P < 0.05).
Vergence eye movement around the onset of the visual stimuli was
calculated by subtracting the average eye position over 500 ms before
stimulus presentation from that during 2 s of stimulus presentation. The vergence eye movement was, on average, 0.08 ± 0.1°, n = 16). We did not find statistically
significant differences in the average vergence eye movement across
different RDSs except for one neuron (ANOVA, P < 0.05). Thus it is unlikely that the neuronal selectivity for RDSs was
caused by vergence eye movement.
Cue-invariant shape coding between disparity and other cues
We next examined whether responses to disparity-defined shapes and those to shapes defined by other cues were similar to each other. Pearson correlation coefficient was used to evaluate the similarity. Only single neurons were used for the analysis in the following text.
Luminance versus disparity
Figure 6 shows an example of a neuron whose shape selectivity to RDSs and LUMs was similar. This is the neuron shown in Fig. 3. The responses to LUMs were nearly twice as strong as those to RDSs for most of the stimuli (Fig. 6B). However, the response profiles were similar in that the stimulus rank was largely preserved between the two cues, and the two response curves were strongly correlated (r = 0.83, P < 0.01).
|
Sixty-three neurons showed shape selectivity for LUMs or RDSs. Of
these, 28 (44%) were activated for both cues. The distribution of the
response correlation for LUMs and RDSs of this cue-convergent group of
neurons is shown in a cumulative histogram (Fig.
7A, ). The distribution was
shifted toward positive values (mean = 0.26, n = 28) compared with that for responses to RDSs of one neuron and those to
LUMs of the same or a different neuron among this group ( · · ·, mean = 0.03, n = 406, Mann-Whitney
U test, P < 0.005). Furthermore the
distribution was similar to the distribution for LUMs and TEXs
(- - -, mean: 0.27, n = 33 single neurons that showed
shape selectivity for LUMs or TEXs and showed excitatory responses for
both, Mann-Whitney U test, P > 0.9) to
which IT neurons have previously been shown to respond in a
cue-invariant manner (Sáry et al. 1993
). The
results indicate that shape selectivity tends to be similar for
luminance and disparity cues among the cue-convergent group of neurons.
|
To further evaluate to what extent shape selectivity to LUMs and RDSs
matched among the cue-convergent group, we calculated the average
responses to the RDSs as a function of the shape rank determined by the
responses to the LUMs (Fig. 7B). For each neuron, shapes
were ranked according to the magnitude of the responses to the LUMs,
and the responses to the RDSs were sorted in this rank order. Then the
responses to the RDSs were normalized by the maximum response to the
RDSs for each neuron. Finally the responses to RDSs were averaged over
the population of neurons in each rank. The average rank response to
the RDSs decreased almost monotonically as the rank order determined by
the responses to LUMs became lower. The correlation between the rank
and the average rank response to the RDSs was significant [Spearman's correlation coefficient: 0.23, P < 0.001, n = 224 (28 neurons × 8 shapes)]. Therefore
shape selectivity for RDSs and LUMs, on average, matched to the extent
that the ranks of shape preference were maintained.
Texture versus disparity
Figure 8 shows an example of a neuron whose shape selectivity for TEXs and RDSs was similar. This is neuron A, which was shown in Fig. 4. Although the response correlation for the two cues did not reach a significance level (r = 0.60, 0.05 < P < 0.1), this neuron responded predominantly to the doughnut shape for both cues.
|
Forty-eight neurons showed shape selectivity for TEXs or RDSs. Among
them, 20 (42%) showed excitatory responses to both TEXs and RDSs.
Again this cue-convergent group tended to show a positive response
correlation for these cues (Fig. 9, ;
mean, 0.33). The distribution of response correlation for TEXs and RDSs
among the cue-convergent group was similar to that for LUMs and TEXs
(- - -, Mann-Whitney U test, P > 0.9).
The distribution was also shifted toward positive values compared with
that between responses to TEXs of one neuron and responses to RDSs of
the same or a different neuron among the cue-convergent group (
· · ·; mean = 0.02, n = 210, Mann-Whitney U test, P < 0.005). The
results indicate that shape selectivity tend to be similar for texture
and disparity cues.
|
The average response to the RDSs decreased roughly monotonically as the
rank of shape preferences determined by responses to TEXs became lower
(Fig. 9B). The correlation between the rank order and the
average response to RDSs was significant [Spearman's correlation
coefficient: 0.32, P < 0.0001, n = 160 (20 neurons × 8 shapes)]. Thus the shape rank was maintained
between RDSs and TEXs for the IT neurons.
![]() |
DISCUSSION |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
A population of IT neurons showed differential responses to RDSs. Results of monocular test as well as test with a new set of RDSs indicate that the differential responses to the eight RDSs containing different shapes largely or at least partially represent selectivity for shapes defined by disparity. Shape selectivity tended to be similar for disparity and luminance cues in the neurons that showed excitatory responses for both cues. The same held true for disparity and texture cues. This indicates that signals of disparity cue and signals of luminance or texture cues to define shape converge on single IT neurons to show their shape selectivity. It should be noted, however, that the response magnitude was different between the two sets of RDSs. The response magnitude was also dependent on the dot pattern of the RDSs.
Selectivity for RDSs
In the present experiment, 21% of the neurons that respond to at least one of eight RDSs showed response selectivity for the RDSs. Before we discuss the neural processing of shape defined by disparity, we consider possible alternative interpretations for what the response selectivity reflects.
First, there is a possibility that the differential responses for RDSs were caused by different dot patterns in the left-eye images. The monocular images of different RDSs look all alike perceptually, but dots in the shape region of the left-eye images were shifted relative to those of the right-eye images. IT neurons might be sensitive to the subtle differences in the dot patterns in the left-eye images. This possibility is, however, ruled out because responses to RDSs and those to the left-eye images were dissimilar. In addition, responses for the two sets of RDSs, which consisted of totally different dot patterns but contained the same set of shapes, were positively correlated (Fig. 5B).
Second, it is possible that the neurons responded to the components of
the shapes such as disparity of dots inside shape or contours defined
by disparity. The first possibility is unlikely because responses to
RDSs and those to shapes defined by other cues (which contained only 0 disparity) were correlated. On the other hand, it is difficult to
determine strictly whether the neurons respond to the global shape or
the component contours of the shape. For example, neuron B
in Fig. 4 may have responded to a horizontal edge defined by disparity,
which was common to shapes 2, 4, 6, and 8. To determine this, the
"reduction process," in which a critical stimulus feature essential
for neuronal activation is determined by stepwise decomposition of the
effective stimulus, is necessary (Fujita et al. 1992;
Tanaka et al. 1991
). It remains to be established
whether neurons selective for disparity-defined shapes responded to the
global shapes or their component contours.
Shape from disparity
The neural processing of shape has been studied mostly using
luminance-defined bars or shapes as stimuli. These studies have shown
that local edge information is first detected in V1, and more complex
features are then processed along the ventral visual pathway
(Gallant et al. 1993; Hedgé and Van Essen
2000
; Hubel and Wiesel 1968
; Kobatake and
Tanaka 1994
).
Several studies have examined the mechanism for processing of shape
based on texture or motion cues. Lesions in V4 of monkeys resulted in
deficits in the discrimination of the orientation of texture-defined
gratings (De Weerd et al. 1996), and some IT neurons
were shown to respond to texture-defined shapes (Sáry et
al. 1993
). These results suggest that the ventral visual
pathway processes shape based on texture cues. In regard to motion
cues, it was found that some V4 and IT neurons responded to
motion-defined gratings and shapes, respectively (Logothesis and
Charles 1990
; Sáry et al. 1993
). Lesions
in these areas impaired the ability to discriminate these stimuli
(Britten et al. 1992
; De Weerd et al.
1996
; Schiller 1993
). On the other hand, motion
signals themselves are mainly processed in the dorsal visual pathway.
Lesions in MT, which belongs to the dorsal pathway, also caused
moderate deficits in the ability to discriminate shapes defined by
motion (Schiller 1993
). Hence both the ventral and
dorsal visual pathways are thought to be involved in the analysis of
shape based on motion cues.
Compared with studies on shape defined by texture and motion cues, even
fewer studies have addressed the question of how and where shape based
on disparity is processed. We showed in the present study that some IT
neurons were selective for shapes defined by disparity. It remains
unclear whether shape based on disparity cues is processed only along
the ventral visual pathway because to our knowledge, no study has
addressed this issue in areas in the dorsal visual pathway. von
der Heydt et al. (2000) reported that some V2 cells were
orientation selective for disparity-defined edges, but it was yet
unclear whether such neurons were found predominantly in the thin
stripes or interstripes, which project to area V4. Since
disparity-sensitive neurons are abundant in areas along the dorsal
visual pathway (Maunsell and Van Essen 1983
; Roy
et al. 1992
; Sakata et al. 1997
), this pathway
may also contribute to the processing of shape based on disparity cues. There are two possibilities of such contribution. First, this pathway
merely provides disparity information to the ventral visual pathway.
Second, neurons in the dorsal pathway per se represent shape defined by
disparity. Recent reports have shown that some neurons in the posterior
parietal cortex, the final stage of the dorsal pathway, responded to
shapes defined by luminance (Murata et al. 1996
;
Sereno and Maunsell 1998
; Taira et al.
1990
). It is worthwhile examining whether these neurons also
respond to shapes defined by disparity.
In the present study, we investigated the neural representation by IT
neurons of two-dimensional "flat" shape defined by disparity. The
stimuli we used are different from three-dimensional (3-D) "slanted" or "curved" surfaces defined by disparity gradient. Neurons selective for such disparity-defined surface-slant have been
found in the caudal part of the lateral bank of the intraparietal sulcus (Shikata et al. 1996; Taira et al.
2000
). It has recently been shown that IT neurons also respond
to disparity-defined 3-D surface structures. Janssen et al.
(2000b)
reported that many neurons in the lower bank of the
superior temporal sulcus respond to disparity-defined curved or slanted
shape. Uka et al. (1997)
have also reported that a
population of IT neurons discriminated the depth order of two
superimposed surfaces, irrespective of the type (i.e., crossed or
uncrossed) of disparities added.
Convergence of signals of disparity and other cues on IT neurons
Sáry et al. (1993) found that shape
selectivity of a population of IT neurons was similar for luminance,
texture, and motion cues. In the present study, response selectivity
for disparity-defined shapes tended to be similar to that for shapes
defined by luminance or texture cues in the cue-convergent group of neurons.
However, not all IT neurons show such cue invariance. In our study,
56% of the neurons, which were shape-selective for disparity or
luminance cues, were activated by only one of these cues. Similar results were observed between disparity and texture cues. This is not
surprising if we consider that each visual attribute is, to some
extent, processed separately in earlier visual cortical areas. Then how
do signals of disparity and other cues to define shape converge on IT
neurons? One possibility is that cue-dependent IT neurons converge onto
another neuron in the IT. That is, an IT neuron that responds to shapes
defined by disparity and another IT neuron that prefers similar shapes
defined by other cues converge onto a third IT neuron. Another
possibility is that the orientation selectivity for an edge is already
cue-invariant in earlier visual cortices, and such cue-invariant
information is conveyed to IT. A recent report by von der Heydt
et al. (2000) showed evidence that supports the latter
possibility. They reported that the orientation selectivity of some V2
neurons was invariant for disparity and luminance cues. If such neurons
are abundant in V2 interstripes and V4 also contains such neurons, it
is highly possible that the cue-invariant edge information created in
these early areas is conveyed to IT. Further work is necessary to
reveal the underlying basis for cue invariance in shape representation
in the visual system.
![]() |
ACKNOWLEDGMENTS |
---|
We thank Dr. Hiroshi Tamura for valuable comments on the manuscript, Dr. Yusuke Murayama for computer programming, and M. Watanabe for technical help. H. Tanaka and T. Uka were recipients of the Japan Science for Promoting Science Research Fellowship for Young Scientists.
This work was supported by grants to I. Fujita from Core Research for Evolutional Science and Technology and Special Coordination Funds for Promoting Science and Technology of the Japan Science and Technology Agency, the Ministry of Education, Science, Sports and Culture (0926822), and Fujitsu Co.
![]() |
FOOTNOTES |
---|
Address for reprint requests: I. Fujita, Laboratory for Cognitive Neuroscience, Division of Biophysical Engineering, Graduate School of Engineering Science, Osaka University, Machikaneyama 1-3, Toyonaka, Osaka 560-8531, Japan (E-mail: fujita{at}bpe.es.osaka-u.ac.jp).
Received 4 October 1999; accepted in final form 1 November 2000.
![]() |
REFERENCES |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|