Disparity Selectivity of Neurons in Monkey Inferior Temporal Cortex

Takanori Uka,1,2 Hiroki Tanaka,3 Kenji Yoshiyama,1 Makoto Kato,1,2 and Ichiro Fujita1,2,3

 1Department of Cognitive Neuroscience, Osaka University Medical School and  2Core Research for Evolutional Science and Technology, Japan Science and Technology Corporation, Osaka 565-0871; and  3Laboratory for Cognitive Neuroscience, Department of Biophysical Engineering, Graduate School of Engineering Science, Osaka University, Osaka 560-8531, Japan


    ABSTRACT
TOP
ABSTRACT
INTRODUCTION
METHODS
RESULTS
DISCUSSION
REFERENCES

Uka, Takanori, Hiroki Tanaka, Kenji Yoshiyama, Makoto Kato, and Ichiro Fujita. Disparity Selectivity of Neurons in Monkey Inferior Temporal Cortex. J. Neurophysiol. 84: 120-132, 2000. The inferior temporal cortex (IT) of the monkey, a final stage in the ventral visual pathway, has been known to process information on two-dimensional (2-D) shape, color, and texture. On the other hand, the dorsal visual pathway leading to the posterior parietal cortex has been known to process information on location in space. Likewise, neurons selective for binocular disparity, which convey information on depth, have been found mainly in areas along the dorsal visual pathway. Here, we report that many neurons in the IT are also selective for binocular disparity. We recorded extracellular activity from IT neurons and found that more than half of the neurons changed their response depending on the disparity added. The change was not attributed to monocular responses or eye movements. Most neurons selective for disparity were "near" or "far" cells; they preferred either crossed or uncrossed disparity, and only a small population was tuned to zero disparity. Disparity-selective neurons were also selective for shape. Most preferred the same type of disparity irrespective of the shape presented. Disparity preference was also invariant for the fronto-parallel translation of the stimuli in most of the neurons. Finally, nearby neurons exhibited similar disparity selectivity, suggesting the existence of a functional module for processing of binocular disparity in the IT. From the above and our recent findings, we suggest that the IT integrates shape and binocular disparity information, and plays an important role in the reconstruction of three-dimensional (3-D) surfaces.


    INTRODUCTION
TOP
ABSTRACT
INTRODUCTION
METHODS
RESULTS
DISCUSSION
REFERENCES

When horizontal binocular disparity is added to a part of a two-dimensional (2-D) shape, we perceive a three-dimensional (3-D) structure consisting of multiple surfaces at different depths and orientations (Wheatstone 1838). Binocular disparity is thus an important cue that aids the visual system in reconstructing 3-D surface structures from 2-D retinal images.

The primary visual cortex (area 17 or V1) of the cat and monkey contains neurons that encode binocular disparity (Barlow et al. 1967; Poggio and Fischer 1977). In the monkey, disparity information is then thought to flow mainly along the magnocellular-dominated stream, and subsequently along the dorsal visual pathway. Disparity-selective neurons are found in a greater number in the thick stripes of V2 than in the other stripes (Hubel and Livingstone 1987; Peterhans and Von der Heydt 1993; Roe and T'so 1995), and many areas in the dorsal visual pathway have been shown to contain disparity-selective neurons (Maunsell and Van Essen 1983; Roy et al. 1992; Sakata et al. 1997). Furthermore, DeAngelis et al. (1998) provided evidence from microstimulation experiments that neuronal activity in area MT indeed contributes to the discrimination of stereoscopic depth.

Neurons in the ventral visual pathway, on the other hand, have been shown to respond to the surface characteristics of objects such as color, 2-D shape, and texture. Neurons in V4 are selective for color (Schein and Desimone 1990; Zeki 1978) and shape (Gallant et al. 1993; Kobatake and Tanaka 1994), and neurons in the inferior temporal cortex (IT) are selective for color (Komatsu et al. 1992), texture (Sáry et al. 1995), and shape (Desimone et al. 1984; Gross et al. 1972; Tanaka et al. 1991). Information on the location of a stimulus in the front-parallel plane is largely lost in the IT, a characteristic known as position invariance (Ito et al. 1995; Schwartz et al. 1983).

To perceive 3-D surfaces, however, the brain must know the depth as well as the other attributes of a surface, such as its shape. So how are the information on shape and location in depth integrated? One possibility is that both two types of information are integrated in a single neuron. Neurons selective for both disparity and shape might be found in either the dorsal or the ventral visual pathway.

Neurons in the dorsal visual pathway have long been thought to lack information on shape. Sereno and Maunsell (1998), however, recently reported shape-selective neurons in area LIP (also see Murata et al. 1996; Taira et al. 1990). Their finding suggests that neurons in the dorsal visual pathway may have the potential to integrate shape and disparity information, although they have not examined whether both shape and disparity information converge in a single neuron.

A recent study by Janssen et al. (1999) showed that neurons in the IT, a higher stage in the ventral visual pathway, are selective for disparity gradients. Their finding suggests that IT neurons are sensitive to binocular disparity. In the present study, we address the potential role of the IT in stereopsis by quantitatively examining how IT neurons are tuned to binocular disparity. The results indicate that a large population of IT neurons are indeed selective for binocular disparity, as well as shape. Furthermore, we show that their disparity selectivity is invariant for position, and that neurons with similar disparity selectivity are clustered together. Preliminary results have been reported elsewhere (Uka et al. 1997).


    METHODS
TOP
ABSTRACT
INTRODUCTION
METHODS
RESULTS
DISCUSSION
REFERENCES

Subjects

Two male Japanese monkeys (Macaca fuscata, 7 and 11 kg body wt) were used. The monkeys were subjected to a psychophysical study described in a different paper (Uka et al. 1999) and were confirmed to have stereoscopic vision. In short, the monkeys were trained on a two-alternative forced choice discrimination task, where they discriminated a cross with crossed disparity on its horizontal limb from a cross with uncrossed disparity on its horizontal limb. After extensive training, discrimination tests were conducted to investigate whether the training effect extended to crosses segmented into two crossing bars by occluding contours. Transfer of the training effect confirmed that the monkeys were discriminating depth reconstructed from the disparity cues.

All animal care and experimental procedures were in accordance with the National Institutes of Health Guide for the Care and Use of Laboratory Animals (1996) and were approved by the animal experiment committee of Osaka University Medical School. The monkeys were rewarded for correct responses with a drop of water during the experimental sessions. After each session, they were returned to their cages and given an adequate amount of vegetables or fruits. Water supply to the monkeys was restricted at their home cages throughout the experiments. Monkey chow was made available to them ad libitum.

Surgery

First, we attached a head holder to the monkeys' skulls to fix their heads to the chair and implanted a search coil under the conjunctiva of one eye for monitoring eye position (Judge et al. 1980), using standard aseptic surgical procedures and pentobarbital sodium anesthesia (35 mg/kg ip). After the surgery, the monkeys were treated with an antibiotic (piperacillin sodium, 30 mg/kg im), an analgesic (ketoprofen, 0.5 mg/kg im), and a corticosteroid (dexamethasone sodium phosphate, 0.1 mg/kg im) to reduce potential inflammation. They were allowed to recover for at least 3 wk before the first training session for a fixation task. After the monkeys were able to maintain their fixation for a 2-s period of stimulus presentation, we performed a second operation. A recording chamber was attached to one side of the monkeys' skull, and a search coil was implanted into the other eye. After a further 2 wk of recovery, we started electrophysiological recordings in the hemisphere with the recording chamber. After completing the recordings in this hemisphere, we performed a third surgery to attach a recording chamber to the other side of the skull.

Task and visual stimuli

The monkeys were made to sit on a primate chair and face a 15-in. color monitor (screen size: 260 mm × 195 mm) placed 57 cm away. The monkeys were trained for a computer-controlled fixation task (PC486FS: Epson, Suwa, Japan). The positions of both eyes were sampled at the rate of 100 Hz using the search coil technique (Judge et al. 1980) and stored for off-line analysis, although the position of only one of the eyes was monitored on-line. Stimuli were presented using a PC/AT computer (Asus Computer International, San Jose, CA, display resolution: 1,024 × 768 pixels).

A gray spot (0.2 × 0.2°) was presented at the center of the monitor on a black background (luminance 1.0 cd/m2), and the monkeys were required to fixate within a 2.0 × 2.0° "electronic" window within 500 ms. A stimulus appeared at the fixation point after another 500 ms. The monkeys were required to maintain their fixation within the fixation window throughout the 2-s period of visual stimulus presentation to receive a drop of water. Otherwise, the task was aborted the moment they broke their fixation.

For each neuron recorded, effective stimuli were first determined from a stimulus set consisting of bars, crosses, squares, a circle, an oval, and a star (Fig. 1B) at zero disparity. Except for the two short white bars (luminance 15.8 cd/m2), the color of the figures was red (luminance 5.7 cd/m2), because red phosphors are short-lived compared with those of other colors, and we can obtain better stereo separation between the two eyes. Only the neurons that responded to one or more stimuli in the set at zero disparity were further analyzed in this study. Disparity was then added to the most effective figure, and in some cases, to sub-optimal and nonresponsive figures as follows: nine disparities at -0.8, -0.4, -0.2, -0.1, 0, 0.1, 0.2, 0.4, and 0.8°. A maximum of five figures was used to determine disparity selectivity for one neuron. Stereoscopic figures were displayed using a liquid crystal stereoscopic modulator (SGS610: Tektronix, Beaverton, Oregon, refresh rate: 70 Hz for each eye). The stimuli at each disparity level were presented 10 times in random order. For monocular stimulation, a blank screen with a fixation spot was presented to the unstimulated eye.



View larger version (18K):
[in this window]
[in a new window]
 
Fig. 1. Recording site and initial stimulus set. A: lateral view of the right cerebral hemisphere of monkey 1. The striped area corresponds to the recording region. The region contained areas TEd and TEOd in the inferior temporal cortex, as well as the ventral bank of the superior temporal sulcus. The 2 dots indicate the location of the pins implanted at the rostral and caudal ends of the recording region. B: after isolating a neuron, we 1st presented these stimuli one by one at zero disparity. If a neuron responded to any one of the stimuli at zero disparity, we examined the disparity selectivity of the neuron by adding disparity to the most effective stimulus and a few other stimuli. Except for the 2 short white bars, the color of the figures was red, although shown in gray in this figure.

Electrophysiological recordings

We recorded the extracellular activity of single neurons from three hemispheres of the two monkeys. A small hole (3 mm diam) was made in the skull within the recording chamber a day before we started recordings. An elgiloy recording electrode (tip diameter 7-15 µm, impedance 2-3 MOmega at 1 kHz) was advanced from the lateral side of the skull with a micromanipulator (MO-95s, Narishige, Tokyo) mounted on the recording chamber. The electrode penetrated the dura mater to reach the lateral surface of the IT. Single neurons were isolated using a conventional amplifier and a window discriminator. The number of action potentials recorded during the task was counted by a computer for off-line analysis. After 1-2 wk of recording, the hole was closed with dental cement, and a new hole was made for another 1-2 wk of recording. This procedure was repeated until all the approachable areas in the IT were thoroughly surveyed.

Histology

After all the experiments were completed, we implanted two pins into the brain at the anterior and posterior edges of the recording chamber. The animals were anesthetized with an overdose of pentobarbital sodium (60 mg/kg ip), the chest cavity was opened, and heparin (200 IU/kg) was injected into the heart. The animals were transcardially perfused with 500 ml of phosphate-buffered saline (PBS, 37°C) and then with a fixative solution consisting of 1,000 ml of ice-cold 4% paraformaldehyde, 0.1% glutaraldehyde in 0.1 M PBS, and 800 ml of ice-cold 4% paraformaldyhyde in 0.1 M PBS. The brains were removed, photographed, blocked, postfixed overnight in the last-mentioned fixative, and soaked in 0.1 M PBS containing a graded series of sucrose (10-30%). The location of the implanted pins was verified for reconstruction of the recording area. The primary visual cortex (V1) from these brains was used in a different anatomical study (Wang et al. 1998).

Data analysis

The spontaneous firing rate was calculated from the 500-ms period immediately prior to stimulus presentation, while the monkey was fixating. The response magnitude was calculated from the firing rate during a 2-s period starting 80 ms after the onset of stimulus presentation. Both were calculated for each trial, the spontaneous activity was averaged over all trials for each neuron, and the magnitude of responses to the stimulus presentation was averaged over the 10 trials for each stimulus. The standard error of the mean was calculated for the magnitude of responses to the stimulus presentation for each stimulus. All statistical analyses were performed using the magnitude of response to the stimulus presentation.


    RESULTS
TOP
ABSTRACT
INTRODUCTION
METHODS
RESULTS
DISCUSSION
REFERENCES

Recording site

Histological analysis revealed that our recording sites (striped area in Fig. 1A) were around the posterior middle temporal sulcus. The recording region contained areas TEd, TEOd, and the ventral bank of the superior temporal sulcus. The two dots in Fig. 1A show the location of the implanted pins. The data from all these areas were combined, since we found no difference in disparity selectivity among the areas.

Disparity selectivity of IT neurons

We recorded from 225 neurons (n = 89 in monkey 1, n = 136 in monkey 2), which responded to at least one stimulus in the initial stimulus set (Fig. 1B) at zero disparity. Of these, 142 (63.1%, n = 54 in monkey 1, n = 88 in monkey 2) changed their response depending on the binocular horizontal disparity added to at least one of the stimuli presented (ANOVA, P < 0.05). We will refer to these neurons as "disparity-selective neurons."

Figure 2 shows an example of the responses of a disparity-selective neuron. This neuron responded to a red cross at zero disparity (cf. Fig. 8A). In addition to the red cross, the neuron responded to a red vertically oriented bar at nonzero disparity. Figure 2 shows the responses of the neuron to the red vertical bar. ANOVA revealed a significant modulation of the response amplitude of the neuron by the addition of binocular disparity (P < 0.0001). The neuron responded more strongly when a crossed disparity was added to the bar than when an uncrossed disparity was added. The disparity tuning curve in Fig. 2B shows that the neuron is a "near" cell described by Poggio and Fischer (1977) or a "tuned near" cell described by Poggio et al. (1988).



View larger version (28K):
[in this window]
[in a new window]
 
Fig. 2. An example of the responses of a disparity-selective inferior temporal cortex (IT) neuron. A: responses of an IT neuron to a red vertical bar with disparity. The 1st vertical line in the rastergram indicates the onset of the stimuli, and the 2nd line indicates the offset of the stimuli. A dot in the rastergram indicates the occurrence of a spike in 30 ms. The bin of the poststimulus time histogram (PSTH) corresponds to 150 ms. Filled squares, top left, indicate that the neuron had a significant excitatory response compared with the spontaneous activity (nonparametric Kormogorov-Smirnov test, P < 0.01); open squares indicate significant excitatory response of P < 0.05, and slashes indicate no significant response. The neuron responded more strongly to crossed disparity cues than to uncrossed disparity cues. B: disparity tuning curve (average ± SE) of the neuron in A. The neuron had a "near" or "tuned near" type of response. Tuning curves of the monocular responses are also shown. Monocular responses cannot explain the binocular responses. The dotted line indicates the spontaneous firing rate.

To determine whether this response modulation was caused simply by the translation of the stimulus figure in either eye, we examined the responses of the neuron to the disparity figures presented to each eye separately. The monocular tuning curves in Fig. 2B show that the response of the neuron was significantly modulated by the translation of the stimulus figure in the right eye (ANOVA, P < 0.0001 and P > 0.05 for the right and left eye, respectively). However, the modulation was small and did not account for the binocular effect, to rule out the possibility that the neuron responded to the translation of the stimulus figure. We will further analyze the responses to monocular figures later.

Figure 3 shows an analysis of the eye movements during the presentation of the disparity stimuli to the neuron in Fig. 2. The traces of the vergence angle in Fig. 3A show that the monkey did not respond with vergence movements to disparity stimuli. The monkey's average vergence angle deviated within only 0.1° with the introduction of disparity stimuli, as can be seen in Fig. 3B. There was also no difference in the average vergence angle for various disparity stimuli (ANOVA, P > 0.3). This was also true for other neurons. In most cases (94%) in which eye positions were compared between zero-disparity trials and trials with each disparity level, the average vergence angle deviated within only 0.1° (within 0.05° in 71% of the cases). This indicates that for most trials with disparity cues, the monkeys looked at the same plane in depth as they did for zero-disparity trials. From these control studies, we conclude that the response modulation of the neuron was neither due to monocular responses nor to eye movements, but represented genuine selectivity for binocular disparity.



View larger version (15K):
[in this window]
[in a new window]
 
Fig. 3. Vergence eye movement during presentation of disparity-added stimulus. A: traces of the vergence angle are shown for 3 disparities added to the stimulus during recording of the neuron in Fig. 2. Eye traces are shown from the onset of the fixation point (Fix-on), and during the 2-s period of stimulus presentation (Stim-on to Stim-off). We chose the stimuli with 0.4° because the neuron responded strongly to the stimulus with -0.4° of disparity. B: the vergence angle (average ± SD) during the 2-s period of stimulus presentation is plotted for each disparity level. The monkey showed no systematic eye movements with presentation of disparity stimuli. Therefore the binocular response in Fig. 2 was not attributed to eye movements. The average standard deviation of the vergence angle within and across trials was 0.05 and 0.11°, respectively.

Types of disparity-selective neurons

Poggio and Fischer (1977) defined four types of disparity tuning. In Fig. 4, we show examples of disparity-selective IT neurons for each type. Figure 4A shows a typical "far" neuron that responded more strongly to uncrossed disparity cues than to crossed disparity cues. This neuron is complementary to the "near" neuron shown in Fig. 2. Figure 4B shows a typical "tuned excitatory" neuron that responded strongly to zero disparity: the tuning curve was sharp and symmetric around the peak. Figure 4C shows a typical "tuned inhibitory" neuron, inhibited at zero disparity, with a sharp symmetric tuning around the peak. Some neurons, however, did not fall into any of these four types. The neuron in Fig. 4D, for example, had a tuning curve that could be classified either as a near neuron or a tuned inhibitory neuron. We categorized these intermediate types of neurons as "unclassifiable." The classification procedure was purely subjective.



View larger version (35K):
[in this window]
[in a new window]
 
Fig. 4. Representative examples of various types of disparity-selective responses obtained in the IT. A: an example of a "far" neuron. B: an example of a "tuned excitatory" neuron. C: an example of a "tuned inhibitory" neuron. D: an example of an "unclassifiable" neuron. Disparity tuning curves (average ± SE) and their corresponding monocular tuning curves (average ± SE) are shown. Dotted lines indicate the spontaneous firing rate.

Most of the neurons in which we presented more than one figure exhibited the same type of disparity tuning for each figure and were classified as such. A small population of neurons that showed different types of disparity tuning for different shapes were classified as "mixed."

Figure 5 shows the distribution of each type of neurons. Most of the disparity-selective neurons were classified as near cells or far cells (40% in monkey 1, 39% in monkey 2), and there were only a few tuned excitatory and tuned inhibitory neurons (3% in monkey 1, 7% in monkey 2). The findings of more than half of the neurons in the IT being disparity selective, and that most of them were near or far cells were consistent in the two monkeys. The distribution was similar to that observed in the MT (Maunsell and Van Essen 1983) and MST (Roy et al. 1992). In the present analysis, we have included "tuned near" and "tuned far" cells described by Poggio et al. (1988) in the near and far cells, respectively. We might have underestimated the percentage of tuned near or tuned far, as well as that of tuned inhibitory neurons, since we did not analyze neurons that did not respond at zero disparity. Although most of the disparity-selective neurons were classified into the discrete types of neurons described by Poggio and Fischer (1977), a substantial number (16% in monkey 1, 13% in monkey 2) were unclassifiable. These data lead one to question the existence of discrete types of disparity tuning in the IT.



View larger version (20K):
[in this window]
[in a new window]
 
Fig. 5. Distribution of each type of disparity-selective neurons in 2 monkeys. The neurons were classified into 4 groups according to the classification of Poggio and Fischer (1977). "Mixed" corresponds to a small population of neurons that showed different types of disparity selectivity when presented with different shapes. More than half of the neurons were disparity-selective in both monkeys. Most disparity-selective neurons were near or far neurons, and we observed only a few tuned excitatory and tuned inhibitory neurons.

Tuning of disparity-selective neurons

We next determined the disparity tuning index to show the degree of modulation by disparity of the response of these neurons. The disparity tuning index is calculated using the following formula
Disparity tuning index=<FR><NU>maximal response to disparity−minimal response to disparity</NU><DE>maximal response to disparity+minimal response to disparity</DE></FR>
We calculated this index for all of the 225 neurons recorded. If we tested disparity selectivity with more than one figure, we calculated the index using responses to the most effective figure. Figure 6 shows the distribution of the disparity tuning index. An index of 1 indicates a large modulation of the response of the neuron by disparity, while an index of 0 indicates that the response of the neuron is not modulated by disparity. An index of 0.33 indicates that the maximal response to disparity was twice as large as the minimal response to disparity. The results indicate that the modulation of response by disparity was large (median = 0.36) for many neurons.



View larger version (18K):
[in this window]
[in a new window]
 
Fig. 6. Distribution of disparity tuning index. An index of 1 indicates a large modulation of the response of the neuron by disparity, while an index of 0 indicates that the response of the neuron is not modulated by disparity. The distribution of the index indicates that the responses of many neurons were modulated by disparity.

Monocular responses and binocular interaction

Most neurons in the IT were binocular in that they responded equally to monocular presentation of a figure to either eye, whether or not they were disparity selective. The ocular dominance histogram based on the response of 59 neurons indicates that most neurons do not have preference for the ipsilateral or contralateral eye (Fig. 7, class 4). A large population of neurons (36%, 21/59, class I) were not responsive to monocular figures, although they were to binocular figures, while some neurons responded more strongly to monocular figures than to binocular figures. However, we do not know whether any neuron responded only to monocular and not to binocular presentation of a figure, because we did not analyze neurons that were not activated binocularly.



View larger version (13K):
[in this window]
[in a new window]
 
Fig. 7. Ocular dominance histogram of IT neurons. The histogram was constructed in accordance with Hubel and Wiesel (1962). Neurons were divided into 7 classes depending on their responses to stimuli presented to the ipsilateral and contralateral eyes. Class 4 corresponds to neurons that responded to stimuli presented to each eye nearly equally. Classes 1 and 7 correspond to neurons that responded only to stimuli presented to either the contralateral or ipsilateral eye, respectively. Other neurons were in between these classes. Class I (rightmost) corresponds to neurons that responded only to binocular presentation, but not to monocular presentation. Most neurons responded equally to stimulus presentation in each eye (class 4) and a sizable population was insensitive to monocular presentation (class I).

The presentation of monocular figures of the disparity stimuli to either the right or left eye, which results in a translation of the figure across the fronto-parallel plane, did modulate the responses of the neurons. A large population of neurons (69%, 41/59) modulated their responses for the right or left eye depending on the location of the stimuli as shown in Fig. 2B (ANOVA, P < 0.05). Although we do not know the exact cause of the modulation by the translation of the monocular figure across the fronto-parallel plane, one explanation could be that the neurons were responding to the combined shape of the fixation point and the stimulus. Because the fixation point was visible during the presentation of the figures, and because the figures were presented on the fixation point, the translation of monocular figures resulted in a change of the compound shape of the visual stimuli, which might have modulated the responses of the neurons. In any case, the modulation by translation in one eye did not explain the selectivity for binocular disparity of the cell shown in Fig. 2B, as described above. We now examine whether this holds true for other neurons.

We conducted a two-way ANOVA between each monocular responses (i.e., responses to the right or left eye) and the binocular responses to the most effective stimulus for each of the 39 disparity-selective neurons from which we were able to obtain monocular responses. The main factors were ocularity (binocular and right eye, or binocular and left eye) and translation across the fronto-parallel plane. We defined a neuron as having binocular selectivity (not explained by monocular modulation) if the two-way ANOVA showed significance for either ocularity or interaction between ocularity and translation (P < 0.05). Ninety percent (35/39) of the neurons showed a significant effect for either ocularity or interaction, indicating that for the majority of neurons, the binocular responses did not correspond to responses to either the right or left eye.

We also calculated whether a linear summation or an average of the monocular responses could explain the binocular responses. We conducted a two-way ANOVA between the summation and the average of the net monocular responses and the net binocular responses to the most effective stimulus for the 39 disparity-selective neurons. We defined a neuron as having binocular selectivity explainable by a linear summation or an average of the monocular responses if they showed no significance for both ocularity and the interaction between ocularity and translation (P > 0.05). Only 2 of the 39 neurons did not show significance for both ocularity and the interaction for the linear summation, and only one did not show significance for the average. The results show that for most neurons, a linear summation or an average of the monocular responses cannot explain the binocular responses.

Shape selectivity of disparity-selective neurons

Next, we examined whether disparity-selective IT neurons were also selective for shape. If disparity-selective neurons were also selective for shape, the integration of shape and disparity information in a single neuron would enable them to encode the depth of a particular shape. Figure 8 shows two examples of neurons selective for both disparity and shape. Figure 8A shows the responses of the neuron in Fig. 2 to horizontal and vertical bars, a cross, a star, and a circle. The response magnitude changed depending on the shape of the stimulus. Two-way ANOVA revealed a main effect for shape and disparity (P < 0.0001), and also the interaction between shape and disparity (P < 0.0001).



View larger version (21K):
[in this window]
[in a new window]
 
Fig. 8. Examples of shape- and disparity-selective neurons. Disparity tuning curves (average ± SE) of 2 neurons to various shapes are shown. Dotted lines indicate the spontaneous firing rate. A: responses of the neuron in Fig. 2 showing a near type response irrespective of the shape presented. B: an example of another shape-selective disparity-selective neuron showing a far type response irrespective of the shape presented. H_bar, V_bar, and Square_rot correspond to a horizontal bar, a vertical bar, and a square rotated by 45°, respectively.

For 36 disparity-selective neurons, we examined the disparity selectivity for more than three shapes. Two-way ANOVA revealed that all of the 36 neurons were selective for shape in that they exhibited a main effect for shape (P < 0.05). A majority of neurons (78%, 28/36) also exhibited a significant interaction between shape and disparity (P < 0.05).

The interaction between shape and disparity information observed in the two-way ANOVA, however, seems to be a side effect of shape selectivity. The neuron in Fig. 8A exhibited a significant interaction (P < 0.0001), while it exhibited a near type response for most of the figures presented, and none of the figures produced a far type response. The same observation holds for the neuron in Fig. 8B. This neuron exhibited a far type response for all the figures and never exhibited a near type response (interaction, P < 0.0001). The interaction seems to be observed because some figures elicited no or weak response, or because responses to some shapes were more tuned to a particular disparity than those to others. Overall, the neurons seemed to have the same type of disparity preference (i.e., such as near preference or far preference) when presented with different shapes. Indeed, only 5 of the 36 neurons changed their disparity tuning (i.e., near, far, tuned excitatory, and tuned inhibitory) when presented with different shapes. Of the five neurons that did change their disparity tuning, three changed from near or far to tuned excitatory or tuned inhibitory, while only two changed from near to far. Thus the peak of the disparity tuning curve did not change much.

To quantitatively show that neurons did not change their disparity preference, we focused on 28 disparity-selective neurons that we examined the disparity selectivity of more than 3 shapes, and that had near or far preference. First, we calculated the average response to crossed and uncrossed disparity cues for the shape that gave the maximal response. We termed the neuron as having a near or far preference if the average responses to crossed disparity cues were larger or smaller than those to uncrossed disparity cues, respectively (t-test, 2-tailed, P < 0.05). Then we calculated the average responses to crossed and uncrossed disparity cues for the other shapes tested.

In Fig. 9, we plotted the average responses to crossed disparity cues after subtracting those to uncrossed disparity cues for the shape that gave the maximal response on the abscissa, and the average responses to crossed disparity cues after subtracting those to uncrossed disparity cues for other sub-optimal shapes on the ordinate. If the type of disparity tuning does not change, there should be a positive correlation between the abscissa and the ordinate. Open circles correspond to shapes for which the neuron exhibited a significant near or far preference, whereas closed circles correspond to those without significant preference. A majority of open circles (95%, 35/37) were found in the top right quadrant or the bottom left quadrant, indicating that they had the same type of disparity preference as the shape that gave the maximal response. Furthermore, there was a strong positive correlation between the abscissa and the ordinate (r = 0.81, P < 0.0001, n = 71), which supported the idea that most neurons have the same type of disparity preference even if they are presented with different shapes. Therefore although the two-way ANOVA revealed an interaction between shape and disparity information, a majority of neurons did not change their disparity preference depending on the shape presented, but rather only changed their response magnitude and/or tuning width.



View larger version (19K):
[in this window]
[in a new window]
 
Fig. 9. Similarity of disparity preference among various shapes. The average response to crossed disparity cues after subtracting those to uncrossed disparity cues for the shape that elicited the maximal response [Max shape (C - U)] is plotted on the abscissa, and that for other shapes [Other shapes (C - U)] is plotted on the ordinate. C is the average of the discharge rate for 4 crossed disparities (-0.1, -0.2, -0.4, and -0.8°), and U is the average of the discharge rates for 4 uncrossed disparities (0.1, 0.2, 0.4, and 0.8°). Data from 28 neurons are plotted. Open circles indicate a significant preference for crossed or uncrossed disparity added to the shape (t-test, 2-tailed, P < 0.05). Most of the dots are distributed in the top right or bottom left quadrants, indicating that most neurons respond strongly to the same type of disparity, irrespective of the shape presented. The correlation r between the abscissa and the ordinate was 0.81.

Position invariance of disparity selectivity

Next, we examined whether the responses of the disparity-selective neurons were invariant for position. If a neuron was selective for disparity on a stimulus shape presented at the center of the monitor, we presented the same disparity-added shape outside the center of the monitor in random sequence. Figure 10A shows an example of a neuron responding to disparity-added shapes presented at different locations. The stimuli were presented 2° from the center to the ipsilateral, contralateral, upper, or lower visual fields. This neuron had a far cell type response, regardless of whether the stimulus was presented in the center, ipsilateral, contralateral, or lower visual field, although the response was weaker and modulated to the largest extent by disparity (i.e., the disparity tuning index was the largest) at the center. However, the different degrees of modulation between different stimulus locations might be attributable to the fact that we did not present the stimuli at the center in the same block of trials as those in which the stimuli were presented outside the center.



View larger version (24K):
[in this window]
[in a new window]
 
Fig. 10. Examples of neurons showing position invariance of its disparity selectivity. A: disparity selectivity was examined using the same shape at 5 locations; the center, 2° to the ipsilateral, contralateral, upper and lower visual fields. Disparity tuning curves (average ± SE) for each location are shown. Dotted lines indicate the spontaneous firing rate. The neuron exhibited a far type response irrespective of whether the figure was presented at the center, ipsilateral, contralateral, or lower visual field. B: disparity selectivity was examined at 3 locations; the center, 2° to the ipsilateral and contralateral visual fields. The neuron exhibited a near type response irrespective of whether the figure was presented at the center, ipsilateral, or contralateral visual field.

We examined a total of nine disparity-selective neurons at five positions as above, and five other disparity-selective neurons in three positions (center, 2° to the ipsilateral and contralateral visual field, an example is shown in Fig. 10B). Twelve of the 14 neurons exhibited a near or far response to the stimulus presented at the center (t-test, 2-tailed, P < 0.05). For these neurons, we plotted the average responses to crossed disparity cues after subtracting those to uncrossed disparity cues for the stimulus presented at the center on the abscissa, and the average responses to crossed disparity cues after subtracting those to uncrossed disparity cues for other positions on the ordinate (Fig. 11). If the type of disparity tuning does not change, there should be a positive correlation between the abscissa and the ordinate. Open circles correspond to locations where the neuron produced a significant near or far preference, whereas closed circles correspond to those without significant preference. A majority of open circles (77%, 23/30) were in the top right quadrant or the bottom left quadrant, indicating that they had the same type of disparity preference as the stimulus presented in the fovea. Furthermore, there was a strong positive correlation between the abscissa and the ordinate (r = 0.65, P < 0.0001, n = 40), which supported the idea that for most neurons, the type of disparity preferred is position invariant.



View larger version (17K):
[in this window]
[in a new window]
 
Fig. 11. Similarity of disparity preference among different positions. The average response to crossed disparity cues after subtracting those to uncrossed disparity cues for the shape at the center [Center (C - U)] is plotted on the abscissa, and that at other positions [Other positions (C - U)] is plotted on the ordinate. C is the average of the discharge rates for 4 crossed disparities (-0.1, -0.2, -0.4, and -0.8°), and U is the average of the discharge rates for 4 uncrossed disparities (0.1, 0.2, 0.4, and 0.8°). Data from 12 neurons are plotted. Open circles indicate a significant preference for crossed or uncrossed disparity added to the shape (t-test, 2-tailed, P < 0.05). Most of the dots are distributed in the top right and bottom left quadrants, indicating that most neurons responded strongly to the same type of disparity irrespective of the position at which the stimulus was presented. The correlation r between the abscissa and the ordinate was 0.65.

In the present study, we did not map the receptive field of the neurons. Thus we do not know how much the translation of the figures covered the entire receptive field. The receptive fields of IT neurons are known to be large (Boussaoud et al. 1991; Kobatake and Tanaka 1994). According to Kobatake and Tanaka (1994), the average sizes of the receptive fields were 16.5 and 5.4° for areas TEd and TEOd, respectively. The receptive fields of most of the neurons covered the center of vision. Therefore the figures were presented presumably inside the receptive field of the neurons analyzed in the present study. For neurons in TEOd, the translation of 2° presumably covered most of the receptive field. For neurons in area TEd with large receptive fields, however, the translation of 2° presumably did not cover much of the entire receptive field. Nonetheless, to the extent that we were able to examine, the type of disparity preference was position invariant for most IT neurons.

Local clustering of disparity-selective neurons

In some cases, we were able to record from more than one single neuron with a single electrode; from two single units or from one single and a background multiple unit. We next report on the results of such simultaneous recordings from multiple neurons.

For multiple unit recordings, we used two conventional window discriminators. Recordings from more than one single unit were made only when the isolation of one single unit (or 2 single units) was reliable, to prevent contamination of spikes. The upper level of one window discriminator was set substantially below the lower level of the other window discriminator to exclude the spikes of the single unit from the other single unit or the multiple unit. The spontaneous firing rate of the multiple unit was on average 7.3 times higher than that of the single unit. Therefore we recorded from approximately seven single neurons in our multiple unit recording.

Figure 12A shows the disparity tuning curves from two single neurons recorded simultaneously from the same electrode. Both neurons were near tuned, had a large peak at -0.4° and a small peak at 0.4°, and exhibited a strong inhibition of responses below their spontaneous firing rates. To analyze the similarity of the responses of the two neurons, we calculated Pearson's correlation coefficient of the average response magnitude to disparity stimuli between the two neurons. Values near 1 indicate that the response selectivity for disparity was similar for the two neurons. We will refer to this value as the response correlation. The response correlation r for this pair of neurons was 0.95 (P < 0.0001).



View larger version (29K):
[in this window]
[in a new window]
 
Fig. 12. Two examples of units recorded simultaneously from a single recording electrode. Disparity tuning curves (average ± SE) for each pair of units are shown. Dotted lines indicate the spontaneous firing rate. A: an example of 2 single units recorded simultaneously from the same site. Both neurons had a large peak at -0.4°, a small peak at 0.4°, and a strong inhibitory response below the spontaneous firing rate. The response correlation r for this pair of units was 0.95. B: an example of a single unit and a background multiple unit recorded simultaneously from the same site. Both units were near tuned, and peaked at -0.4°. The response correlation r for this pair of units was 0.85.

Figure 12B shows the disparity tuning curves from one single neuron and the background multiple unit recorded simultaneously from the same electrode. The single neuron is the same as the one shown in Fig. 2. Both of the units were near tuned, and peaked at -0.4°. The response correlation r for this pair of units was 0.85 (P < 0.005). The spontaneous firing rate of the multiple unit was 14 times larger than that of the single unit.

We recorded simultaneously from 30 pairs of units (13 pairs of 2 single neurons and 17 pairs of a single unit and a multiple unit) in which at least one of the neurons was selective for disparity. Figure 13A shows the distribution of the response correlation for these pairs of units. Only the responses to the most effective shape were analyzed for each unit. The overall distribution was shifted toward positive values (median = 0.30). The distribution was also similar for single unit-single unit pairs () and single unit-multiple unit pairs (). We also calculated the response correlation for pairs of units that were not recorded simultaneously; i.e., pairs of units from different locations. We calculated all possible combinations of response correlations for the 60 units from 30 recording sites, and excluded those calculated from the same recording site (Fig. 13B). The distribution of response correlation from different recording sites peaked near 0 (median = -0.01), showing no overall correlation of disparity selectivity between pairs of neurons recorded apart. The distributions shown in Fig. 13, A and B, are significantly different (Mann-Whitney U test, P < 0.0005).



View larger version (16K):
[in this window]
[in a new window]
 
Fig. 13. Distribution of response correlation. Response correlation was calculated as the Pearson's correlation coefficient of the average response magnitude to disparity stimuli between pairs of units. A: distribution of response correlation for pairs of units recorded from the same site. The distribution is shifted toward positive values. , single unit-single unit pairs; , single unit-multiple unit pairs. B: distribution of response correlation for pairs of units recorded from different sites. The distribution peaks near 0, indicating no overall correlation of disparity selectivity between units recorded from different sites. This indicates that nearby neurons have similar disparity selectivity.

The results indicate that nearby neurons have similar disparity selectivity; in other words, neurons with similar disparity selectivity are clustered together. This suggests that a functional module of disparity-selective neurons exists in the IT.


    DISCUSSION
TOP
ABSTRACT
INTRODUCTION
METHODS
RESULTS
DISCUSSION
REFERENCES

The present study shows that IT neurons process binocular disparity in addition to color (Komatsu et al. 1992), texture (Sáry et al. 1995), and 2-D shape (Desimone et al. 1984; Gross et al. 1972; Tanaka et al. 1991). More than half of the neurons in the IT were disparity selective as well as shape selective, and their disparity selectivity was invariant for position. This suggests that similar disparity and shape information from different retinotopic locations converges onto a single IT neuron. Furthermore, we showed that neurons with similar disparity selectivity are clustered together. Together with our recent findings on the role of the interaction between shape and disparity information (Uka et al. 1997), the present results suggest that the IT plays an important role in the reconstruction of 3-D surfaces.

Disparity selectivity in IT and other visual areas

Poggio and Fischer (1977) were the first to describe disparity-selective neurons in areas V1, V2, and V3 in awake monkeys. Subsequent studies showed that there exist a larger population of disparity-selective neurons in the thick stripe, compared with the thin and pale stripes in V2 (Hubel and Livingstone 1987; Peterhans and Von der Heydt 1993; Roe and Ts'o 1995). Along with the findings that many neurons in the dorsal visual pathway leading to the posterior parietal cortex are disparity selective (Maunsell and Van Essen 1983; Roy et al. 1992; Sakata et al. 1997), the dorsal visual pathway has been thought to be important for stereopsis (e.g., Sakata et al. 1997). Results of positron-emission tomography (PET) studies in human subjects (Gulyas and Roland 1994) are also consistent with this idea. Activation was observed mainly in the occipital and parietal regions, but not in the temporal regions, during visual discrimination of binocular disparity.

The present results show that the dorsal visual pathway is not the only pathway that processes disparity information. This is consistent with a recent finding that IT neurons respond to disparity gradients (Janssen et al. 1999). Rather than having an independent pathway for disparity processing, the visual system seems to process disparity information in parallel in several visual cortical areas, each possibly playing different roles in stereopsis. Further studies are necessary to distinguish among the different roles of different visual areas in stereopsis.

DeAngelis and Newsome (1999) recently reported that disparity-selective neurons in MT are clustered to form a columnar organization. The observation that disparity-selective IT neurons are locally clustered is consistent with this finding, although we do not know whether disparity-selective IT neurons are clustered into columns. Clustering of disparity-selective neurons may be a common feature across different visual areas. It should be examined whether disparity-selective neurons in other areas are also clustered. It would also be interesting to investigate the relationship between disparity clusters and columns for visual features of objects (Fujita et al. 1992) in the IT.

How do IT neurons become disparity selective?

The binocular responses of disparity-selective IT neurons could not be explained by either a linear summation or an average of the monocular responses. This is unlike simple cells in V1 that encode binocular disparity with a characteristic receptive field structure (Anzai et al. 1997; DeAngelis et al. 1991). The responses of disparity-selective IT neurons are more similar to those of complex cells, in which neurons are sensitive to small binocular disparity change, although they are not due to the translation of stimuli in each eye (Ohzawa et al. 1990). Just as disparity-selective complex cells are thought to receive information from disparity-selective simple cells (Ohzawa et al. 1997), it is likely that IT neurons do not detect binocular disparity per se, but receive information on disparity from neurons in earlier visual areas. The fact that major inputs into the IT do not arise from neurons that carry information on monocular images, such as those in the LGN or V1, is also consistent with this idea (but see Yukie and Iwai 1981).

Where do disparity-selective IT neurons receive their inputs from? As a larger population of disparity-selective neurons exists in the thick stripes than in the thin and pale stripes in V2 (Hubel and Livingstone 1987; Peterhans and Von der Heydt 1993; Roe and Ts'o 1995), a majority of disparity information fed to the dorsal visual pathway is thought to rise from disparity-selective neurons in the thick stripes. On the other hand, projections to the IT are known to arise from the thin and pale stripes of V2, via V4 (Felleman et al. 1997). Thus a major candidate pathway for conveying disparity information into the IT would be from the disparity-selective neurons in the thin stripes, albeit their having a smaller population of disparity-selective neurons than the thick stripes (Peterhans and Von der Heydt 1993), through V4. Another candidate pathway would be from the disparity-selective neurons in the thick stripe, into V4, via some other cortical areas such as V3, V3A, or MT. In any case, disparity-selective neurons should exist in V4. DeYoe and Van Essen (1985) reported that some neurons in V4 have binocular interaction, although no study has ever extensively examined the effects of binocular disparity on the responses of V4 neurons (but see Felleman and Van Essen 1987). Yet another candidate pathway for conveying disparity information into the IT would be from posterior parietal cortex to areas to the superior temporal sulcus (STS), which in turn projects to the IT.

The result that nearby IT neurons have the same type of disparity selectivity irrespective of the shape or location of the stimulus imposes a constraint on the organization of the inputs into disparity-selective IT neurons. As discussed above, IT neurons possibly do not detect binocular disparity per se, but receive information on disparity from neurons in earlier visual areas. Since nearby IT neurons have the same type of disparity selectivity irrespective of the shape or location of the stimulus, they are likely to receive the same type of disparity information from earlier cortical areas. We propose that disparity information inputs from local disparity detectors into clusters of disparity-selective IT neurons, arising first from V1, come from neurons with similar disparity selectivity, irrespective of their orientation selectivity or receptive field location, at least in central vision.

Role of disparity-selective neurons in the IT

The presence of disparity-selective neurons in the IT does not necessarily mean that the IT is involved in stereoscopic depth perception. For example, Cumming and Parker (1997) showed that responses of disparity-selective neurons in monkey V1 do not correlate with perception of depth, but rather signal information possibly used for vergence eye movements (Masson et al. 1997).

Cowey and Porter (1979) found that lesions of the monkey IT impair the ability of the animal to discriminate depth in random dot stereograms. Ptito and Zatorre (1988) and Ptito et al. (1991) also found that lesions of the temporal lobe impair the ability in humans to discriminate depth in random dot stereograms without disturbing stereoacuity. These results suggest that the role of the IT or the ventral visual pathway in stereopsis is to reconstruct 3-D surfaces from local disparity cues.

Cowey (1985) later described that lesions of the monkey IT impaired stereoacuity. Although this finding contradicts those reported in humans by Ptito et al. (1991), it is possible that disparity-selective IT neurons are involved in stereoacuity. How the coarse tuning of disparity-selective neurons can achieve stereoacuity, however, remains to be explained (Lehky and Sejnowski 1990).

Recently, we found that a population of disparity-selective IT neurons are not only selective for local disparity cues. Some disparity-selective IT neurons responded to the depth order of surfaces, irrespective of the type of disparity added (Uka et al. 1997), and also irrespective of whether the cues defining the structure was binocular disparity or occlusion cues (our unpublished observation). Thus an important role of disparity-selective IT neurons is to represent the depth order of surfaces reconstructed from local disparity cues, and not to detect and signal the local disparity cues per se. Consistent with these findings, Janssen et al. (1999) recently described evidence that some IT neurons respond to disparity gradients rather than the local disparity cues. This suggests that IT neurons may be involved in the reconstruction of 3-D shape.

Another possible role of disparity-selective IT neurons is to process disparity cues to reconstruct shape from disparity, such as in random dot stereograms. Indeed, we have recently found that some neurons in the IT respond to shape in random dot stereograms (Tanaka et al. 1999).

In spite of the various speculations, the role of disparity-selective IT neurons still remains unknown. Future studies correlating the responses of IT neurons with a particular behavior might clarify the role of disparity-selective IT neurons in stereopsis.


    ACKNOWLEDGMENTS

We thank Dr. Gregory C. DeAngelis for valuable comments on the manuscript.

This work was supported by grants to I. Fujita from Core Research for Evolutional Science and Technology of the Japan Science and Technology Corporation, Science and Technology Agency (Special Coordination Funds for Promoting Science and Technology), and the Ministry of Education, Science, Sports, and Culture (09268222). T. Uka and H. Tanaka are recipients of the Japan Society for the Promotion of Science Research Fellowship for Young Scientists.


    FOOTNOTES

Address for reprint requests: I. Fujita, Laboratory for Cognitive Neuroscience, Dept. of Biophysical Engineering, Graduate School of Engineering Science, Osaka University, Machikaneyama 1-3, Toyonaka, Osaka 560-8531, Japan (E-mail: fujita{at}bpe.es.osaka-u.ac.jp).

The costs of publication of this article were defrayed in part by the payment of page charges. The article must therefore be hereby marked "advertisement" in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.

Received 15 November 1999; accepted in final form 15 March 2000.


    REFERENCES
TOP
ABSTRACT
INTRODUCTION
METHODS
RESULTS
DISCUSSION
REFERENCES

0022-3077/00 $5.00 Copyright © 2000 The American Physiological Society