1Department of Psychology, University of Pennsylvania, Philadelphia, Pennsylvania 19104-6196; 2Department of Physiology, Oxford University, Oxford OX1 3PT, United Kingdom; 3Xerox Palo Alto Research Center, Palo Alto 94034; and 4Department of Psychology, Stanford University, Stanford, California 94305-2130
![]() |
ABSTRACT |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Backus, Benjamin T., David J. Fleet, Andrew J. Parker, and David J. Heeger. Human Cortical Activity Correlates With Stereoscopic Depth Perception. J. Neurophysiol. 86: 2054-2068, 2001. Stereoscopic depth perception is based on binocular disparities. Although neurons in primary visual cortex (V1) are selective for binocular disparity, their responses do not explicitly code perceived depth. The stereoscopic pathway must therefore include additional processing beyond V1. We used functional magnetic resonance imaging (fMRI) to examine stereo processing in V1 and other areas of visual cortex. We created stereoscopic stimuli that portrayed two planes of dots in depth, placed symmetrically about the plane of fixation, or else asymmetrically with both planes either nearer or farther than fixation. The interplane disparity was varied parametrically to determine the stereoacuity threshold (the smallest detectable disparity) and the upper depth limit (largest detectable disparity). fMRI was then used to quantify cortical activity across the entire range of detectable interplane disparities. Measured cortical activity covaried with psychophysical measures of stereoscopic depth perception. Activity increased as the interplane disparity increased above the stereoacuity threshold and dropped as interplane disparity approached the upper depth limit. From the fMRI data and an assumption that V1 encodes absolute retinal disparity, we predicted that the mean response of V1 neurons should be a bimodal function of disparity. A post hoc analysis of electrophysiological recordings of single neurons in macaques revealed that, although the average firing rate was a bimodal function of disparity (as predicted), the precise shape of the function cannot fully explain the fMRI data. Although there was widespread activity within the extrastriate cortex (consistent with electrophysiological recordings of single neurons), area V3A showed remarkable sensitivity to stereoscopic stimuli, suggesting that neurons in V3A may play a special role in the stereo pathway.
![]() |
INTRODUCTION |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Since Wheatstone's report in
1838 that binocular disparity is sufficient to evoke a percept of depth
(Wheatstone 1838), the remarkable computations that
support stereopsis have been under study. Neurons selective for
binocular disparity were first described in the primary visual cortex
of the cat (Barlow et al. 1967
; Nikara et al.
1968
; Pettigrew et al. 1968
). In nonhuman
primates, disparity-selective cells have been identified in visual
areas V1, V2, V3, V3A, V4, MT (V5), and MST (Burkhalter and Van
Essen 1986
; DeAngelis and Newsome 1999
;
Gonzalez and Perez 1998
; Hinkle and Conner
2001
; Maunsell and Van Essen 1983
; Poggio
1995
). The great majority of electrophysiological studies have
been performed in V1, but disparity-selective activity in V1 is not
always correlated with stereo depth perception (Cumming and
Parker 1997
). Although they might contribute directly as the
sensory input to the vergence system, there are several respects in
which the neuronal signals in V1 would need further processing to
extract an unambiguous representation of stereoscopic depth
(Cumming and Parker 2000
; Fleet et al.
1996
; Parker et al. 2000
; Prince et al.
2000
). For example, neurons in V1 respond to the absolute
disparity of visual stimuli, showing essentially no sensitivity to
relative disparity (Cumming and Parker 1999
), whereas
the finest stereoacuity judgments are generated psychophysically only
by stimuli that contain relative disparity information (Kumar
and Glaser 1992
; Westheimer 1979
). Absolute
disparity reflects a disparity of features within the left and right
retinal images with respect to anatomical landmarks on the left and
right retinae, whereas relative disparity reflects the differences in
the absolute disparities of two visual features in the
three-dimensional (3-D) scene. It is therefore of particular interest
to investigate how the signals from disparity-selective neurons in V1
are transformed by other visual areas in extrastriate cortex (analogous
to the well-characterized visual motion pathway). As a step toward that
goal, we have used functional magnetic imaging (fMRI) to measure the
response of several human visual areas to stimuli that contain
binocular disparity.
Although human perceptual responses to binocular disparity have been
studied extensively (Howard and Rogers 1995), there have been relatively few studies of how human cortical activity is related
to stereo depth perception. Of these few studies, most relied on
measurements of visual evoked potentials, a method that has limited
spatial resolution (Braddick and Atkinson 1983
;
Fiorentini and Maffei 1970
; Norcia and
Tyler 1984
; Norcia et al. 1985
). A handful of positron emission tomography (PET) and fMRI experiments have
been performed, focusing primarily on localizing those cortical areas
that are most strongly activated by stereoscopic stimuli (Gulyas
and Roland 1994
; Khan et al. 1997
;
Nakadomari et al. 1999
; Ptito et al.
1993
; Rutschmann and Greenlee 1999
). In
contrast, we measured fMRI responses as a parametric function of
disparity in each of several predefined visual cortical areas,
analogous to parametric measurements of contrast (Boynton et al.
1996
, 1999
; Wandell et al. 1999
)
and motion coherence (Rees et al. 2000b
).
Our goals in the present study were to quantify the disparity-related responses in early cortical visual areas, and to examine how the responses of these areas are related to stereoscopic depth perception. We focused on two psychophysical measures of stereoscopic vision: the stereoacuity threshold and the upper depth limit. These measures characterize, respectively, the smallest and largest disparities that can be detected by the visual system. With fMRI, we measured cortical activity as a function of stimulus disparity. We found that responses in each of several cortical areas covaried with psychophysics and perception, and that area V3A was relatively more sensitive to binocular disparity.
![]() |
METHODS |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Visual stimuli
Stimuli were dynamic random-dot stereograms containing 1,000 white dots on a black background (Fig.
1). Dots were repositioned randomly at 4 Hz. Dots had a raised-cosine luminance profile 0.5° diam. The display
subtended 34 × 22° of visual angle. The left and right eyes'
stimuli were displayed side by side on a flat-panel display (NEC,
multisynch LCD 2000) in a Faraday Box with a conducting glass front,
positioned beyond the subjects' feet. Subjects lay on their backs and
viewed the screen through approximately ×8 binoculars (320 cm from the
display). A pair of angled mirrors, attached to the binoculars just
beyond the two objective lenses, enabled the subjects to see the two
halves of the display. Vergence posture of the eyes was set, by
rotating the mirrors, to be comfortable for the subject. A septum
between the subject's knees prevented each eye from seeing the
other's image. Subjects fixated a binocular square marker at the
center of the screen, with additional (horizontal and vertical)
monocular Nonius lines to allow subjective monitoring of fixation
accuracy, as shown in Fig. 1 (Sheedy 1980). The fixation square was 1° wide. Dots within 2° of the center were eliminated from each half of the display.
|
We chose to use transparent planes, rather than corrugated surfaces or
other patterns with edges in depth, on the grounds that depth edges
would be more likely to excite neuronal processes common to all aspects
of contour identification (as were studied by Mendola et al.
1999), whereas we were interested in stereoscopic processing
per se. Dots were assigned to one of two planes in depth by adding
horizontal disparity to the images. Interplane disparity was varied
between 0 and ±4°. Perceptually, as the disparity between the planes
increases, one sees first a single plane of dots (for interplane
disparities less than ±0.25 arcmin), then a thickened plane (at ±0.25
to ±1 arcmin), two distinct planes (±1 arcmin to ±1.5°), one plane
either near or far (±1.5° to ±4°), and finally the display
becomes indistinguishable from dots that are randomly placed (i.e.,
uncorrelated) in the two eyes' images (for disparities greater than
±4°). The uncorrelated display appears to have twice the dot density
of the small-disparity, correlated displays. There is no obvious
rivalry in the uncorrelated displays, but the dots (being monocular)
have a lustrous quality and appear to be less bright.
The left and right 2° margins of the displays contained binocularly uncorrelated dots so that both the width of the binocularly correlated images (18° of visual angle) and the width of the cyclopean images (22°) were kept constant across disparities.
Acquisition of fMRI data
The experiments were undertaken with the written consent of each subject, and in compliance with the safety guidelines for magnetic resonance (MR) research. Subjects participated in multiple MR scanning sessions on different days: one to obtain a standard, high-resolution, anatomical scan; one to functionally define the retinotopic visual areas V1, V2, V3, V3A; one to define area MT+ (in 6 of the 8 subjects); and one or more sessions to measure fMRI responses in the various experimental conditions (2 for ACH, 12 for DJH, 16 for BTB, and 1 for each of the other 5 subjects). All subjects had normal or corrected-to-normal vision. A bite bar stabilized the subjects' heads.
MR imaging was performed either on a GE 3T scanner (attention control experiment) or on a standard clinical GE 1.5 T Signa scanner (all other experiments), with custom-designed dual surface coils. Every fMRI scan consisted of 14 blocks, with 2 stimuli shown alternately (ABAB ...). Each block lasted 18 s. The entire scan therefore lasted 252 s. Subjects were instructed to hold fixation (monitoring Nonius alignment for fixation accuracy) throughout each scan while attending spatially to the entire stimulus. In the attention control experiment, subjects performed a depth discrimination task while holding fixation (see Experimental conditions).
fMRI scans were performed using a T2*-sensitive, gradient recalled
echo, spiral pulse sequence (Glover 1999; Glover
and Lai 1998
; Noll et al. 1995
;
Sawyer-Glover and Glover 1998
). Spiral fMRI pulse
sequences compare favorably with echo-planar imaging on scanners in
terms of sensitivity, and spatial and temporal sampling resolution
(Sawyer-Glover and Glover 1998
). Pulse sequence parameters varied across experiments (Table
1) to take advantage of several hardware
and software upgrades that provided improvements in the fMRI
signal-to-noise ratio. Slices were either coronal or oblique (oriented
perpendicular to the calcarine sulcus), with the posterior slice near
the occipital pole.
|
Each scanning session began by acquiring a set of T1-weighted
structural images using a spin echo pulse sequence (500-ms repetition time, 15-ms echo time, 90° flip angle) in the same slices as the functional images. These inplane anatomical images were aligned to the
high-resolution anatomical scan of each subject's brain using custom
software (Nestares and Heeger 2000), so that the functional data (across multiple scanning sessions) from a given subject were co-registered.
Experimental conditions
The full group of eight subjects was run on a subset of the stimulus conditions. Each subject in this population average experiment participated in one scanning session that included: 1) four repeated scans of a ±7.5 arcmin (2-plane) stimulus alternated with a zero disparity (1-plane) stimulus, and 2) five repeats of the ±7.5 arcmin (2-plane) stimulus alternated with a blank screen. After establishing through these measurements that cortical activity depended on stereoscopic depth, we proceeded to study this dependence in greater detail in two subjects (authors BTB and DJH) in the stereoacuity and upper depth limit experiments.
In the stereoacuity experiments, two-plane stimuli were alternated with a one-plane (zero-disparity) stimulus. The interplane disparities of the two-plane stimuli were systematically varied in separate scans. The off-horopter stereoacuity experiments were similar to the stereoacuity experiments except that the stimuli were displaced in depth (relative to fixation) by some common amount. In the upper depth experiments, two-plane stimuli were alternated with a stimulus consisting of dots whose positions were uncorrelated between the left and right images. These experiments were performed on only two of the subjects because of the large number of stimulus conditions; each subject repeated each of 15 experimental conditions between 4 and 12 times in separate fMRI scans (Table 2). The repeated measurements of each stimulus condition were typically distributed across multiple scanning sessions on different days.
|
Psychophysical stereoacuity and upper depth limit thresholds were
measured for comparison with the fMRI data. These psychophysical thresholds were measured in separate sessions using a standard forced-choice protocol. In a stereoacuity trial, 3 s of the
(1-plane) zero-disparity stimulus were followed by 5 s either of
the same zero-disparity stimulus or a two-plane stimulus with small
(±0.25 to ±1.0 arcmin) disparity. In an upper depth limit trial,
3 s of the uncorrelated stimulus were followed by 5 s of
either the uncorrelated stimulus or a two-plane stimulus with large
(±1 to ±6°) disparity. The subject made a yes-no response to
indicate whether the second interval contained stereoscopic depth. The experience was thus similar to being in the scanner, noticing that the
zero-disparity (or uncorrelated) stimulus had or had not been replaced
by a stimulus containing nonzero disparity. A discrimination index
(d') was computed from the hit and false alarm rates
(Green and Swets 1966). A value of d' = 1 corresponds approximately to 80% correct performance. Although these
psychophysical experiments and the fMRI measurements were performed in
separate sessions using different experimental apparatuses, the stimuli were as similar as possible: the two LCD monitors were calibrated to
have approximately the same mean luminance and display size (see
Visual stimuli), but the screen was viewed in the
psychophysical experiments with a modified Wheatstone stereoscope
(optical path length 40 cm) rather than binoculars and mirrors.
Two further experiments served as controls. In the attention control experiment, subjects performed a demanding depth discrimination task throughout each scan. Each trial lasted 6 s and consisted of a pair of 2.7-s stimulus intervals, separated by a 50-ms blank interval, and followed by a 550-ms response interval. One stimulus interval contained an interplane disparity with a base value (either ±7.5 or 0 arcmin), and the other interval contained an increment over and above the base value. The subject indicated which interval had greater depth by a button press. Throughout each fMRI scan, subjects performed three successive trials of the depth discrimination task at a base disparity of ±7.5 arcmin, followed by three trials of the task at a base disparity of 0 arcmin, and so on. Subjects practiced the task extensively before scanning until their thresholds reached asymptotic performance. Feedback was not provided to subjects during the fMRI scans. Task difficulty was controlled by a 2-down 1-up staircase procedure (i.e., the disparity increment varied slightly from trial to trial) to keep the stimuli at the subjects' psychophysical threshold. The stimuli in this experiment were limited to the peripheral visual field (>4°) to minimize the possibility that subjects might rely on differential shifts of spatial attention to perform the task at the two different base disparities, e.g., to avoid the possibility of attending centrally for 0 arcmin and peripherally for ±7.5 arcmin. The attention control experiment was performed in one scanning session, for each of two subjects (BTB and DJH). During that scanning session, each subject participated in 1) four repeated scans of depth discrimination alternating between large (±7.5 to ±9.0 arcmin) and small (0 to ±2 arcmin) interplane disparities, 2) four repeated scans of essentially the same stimulus conditions, but without performing the depth discrimination task and without the threshold changes in interplane disparity (to prevent subjects from covertly performing the task), and 3) four repeated scans of the ±7.5-arcmin stimulus alternated with a blank screen.
The response saturation control experiment was similar to the two-plane/one-plane (±7.5 vs. 0 arcmin interplane disparity) condition of the population average experiment, except that stimulus contrast was lower. Light gray dots were shown against a medium gray background (15% Michelson contrast). The response saturation experiment was performed in one scanning session, for each of three subjects (BTB, DJH, and ACH). During that scanning session, each subject participated in 1) four repeats of the low-contrast, ±7.5 arcmin (2-plane) stimulus alternated with the low-contrast, zero disparity (1-plane) stimulus, and 2) four repeats of the low-contrast, ±7.5 arcmin (2-plane) stimulus alternated with a blank (gray) screen.
Analysis of fMRI data
Details of the analysis methods have been described previously
(Heeger et al. 1999). Data from the first 36-s cycle
were discarded to avoid effects of magnetic saturation and to allow the
hemodynamics to reach steady state (noting that the full duration of
the hemodynamic impulse reponse is well over 20 s). Data from each
scan were analyzed separately in each of the identifiable visual areas
(see Localization of visual areas). We computed the fMRI
response amplitudes and phases by 1) correcting for any
residual head movements during each scan using custom software
(Nestares and Heeger 2000
); 2) removing the
linear trend in the time-series to compensate for the fact that the
fMRI signal tends to drift very slowly over time (Smith et al. 1999
);
3) dividing each voxel's time series by its mean intensity
(to convert the data from arbitrary intensity units to units of percent
signal modulation, and because the mean image intensity varies
substantially with distance from the surface coil); 4)
averaging the resulting time series over the set of voxels
corresponding to the stimulus representation within a visual area;
5) calculating the amplitude and phase of the best fitting 36-s period sinusoid (the phase is a measure of the temporal delay of
the hemodynamic response relative to the onset of the stimulus cycle
and the amplitude is a measure of the level of modulation of cortical
activity); and then 6) extracting the projected amplitude (as described in Heeger et al. 1999
). Finally, we
computed the mean and standard error of the mean (SE) of the amplitudes
across repeated scans of each stimulus condition. The final mean
amplitude represents our estimate of the response of a given visual
area for a given stimulus condition.
In addition, correlation maps were computed by calculating a
correlation coefficient between the best-fitting 36-s period sinusoid
and the corresponding time series, separately for each voxel (Fig. 3).
The correlation is a measure of signal-to-noise (Engel et al.
1997); it takes on a value near 1 when the signal modulation
(the 36-s period component of the fMRI time series) is large relative
to the noise (the other frequency components of the time series), and
it takes on a value near 0 either when there is no signal modulation or
when the signal is overwhelmed by noise. The correlation maps thus
locate regions that responded reliably to the periodic changes in the
stimuli. Amplitude and correlation differ in that measurement noise
(both noise inherent in the MR signal and physiological noise) directly
reduces correlation, but affects only the variance and not the true
mean of the response amplitude measurements.
Localization of visual areas
Following well-established methods (DeYoe et al.
1996; Engel et al. 1994
, 1997
;
Sereno et al. 1995
) the polar angle component of the
retinotopic map was measured by recording fMRI responses as a stimulus
rotated slowly (like the second hand of a clock) in the visual field.
To visualize these retinotopy measurements, a high-resolution MRI of
each subject's brain was computationally flattened (Teo et al.
1997
; Wandell et al. 2000
). In each hemisphere, areas V1, V2d, V2v, V3d, V3v (also known as VP), and V3A were identified. Area boundaries were drawn by hand on the flat maps near
reversals of polar angle, leaving a gap of approximately 2 mm near the
reversals that was unassigned to either area. We found neither
ventral/dorsal nor left-/right-hemisphere differences in activity
within a given cortical area. Hence, areas V2d and V2v from both
hemispheres were combined for analysis into a single region designated
V2, areas V3d and V3v were combined into V3, V1 was combined across the
two hemispheres, and V3A was combined across the two hemispheres.
Figure 2 shows the locations of some of
the areas in the right hemisphere of BTB's brain. V3A can
be seen at the fundus of the transverse occipital sulcus, in agreement with previous reports (Tootell et al. 1997
). Our
retinotopy measurements were too noisy to map areas V4v, V7, and V8
with complete confidence in all subjects.
|
For some of the experiments, the data were also analyzed in area MT+
(also known as V5), an area of the human brain that is believed to be
homologous to monkey areas MT and MST. However, data collected from
subject BTB using the eight-slice protocol (Table 1) could
not be analyzed in MT+ because the slices did not cover this area.
Following previous studies (Tootell et al. 1995;
Watson et al. 1993
; Zeki et al. 1991
),
area MT+ was identified based on fMRI responses to stimuli that
alternated in time between moving and stationary dot patterns. The dots
(small white dots on a black background) moved (10°/s) radially
inward and outward for 18 s, alternating direction once every
second. Then the dot pattern was stationary for the next 18 s.
This moving/stationary cycle was repeated seven times. We computed the
cross-correlation between each fMRI voxel's time series and a sinusoid
with the same (36 s) temporal period. We then drew MT+ regions by hand around contiguous areas of strong activation, lateral and anterior to
the retinotopically organized visual areas. MT+ was identified in this
way for six of the eight subjects.
The procedures to define the visual areas were performed only once per subject. Because the fMRI data recorded during successive scanning sessions in a given subject were co-registered (see above), we could localize these areas from one scanning session to another.
Reference scans
We defined a subregion of each visual area based on responses to a reference stimulus. The reference scan responses were used to exclude unresponsive voxels, e.g., brain regions that would have responded to visual field locations outside the 34 × 22° stimulus aperture, and voxels that had too little overlap with gray matter. The reference scans (for all but the attention control experiment) consisted of a two-plane stimulus with an interplane disparity of ±7.5 arcmin shown in alternation with a blank screen. One reference scan was run during each scanning session, typically as the first scan in the session. Voxels that were unresponsive in the reference scans were discarded in the analysis of all subsequent scans in that scanning session. Responsive voxels were defined as those for which the fMRI time series was well correlated (r > 0.4 and, consistent with hemodynamics, a 0- to 9-s time lag) with a sinusoid of period 36 s. For the attention control experiment, the reference scan stimuli alternated between two planes (±7.5 arcmin) and one plane (zero disparity), instead of alternating with a blank screen. This was done because the goal of this experiment was to determine whether the areas that were activated during passive viewing would again be activated when subjects performed the depth discrimination task.
Our results were not biased by subselecting voxels based on the
reference scan responses. The reference scans activated large, contiguous regions of visual cortex, corresponding to the retinotopic representations of the stimuli within each visual area (Fig.
3, B and C). The
particular interplane disparity used in the reference scans (±7.5
arcmin) was chosen because it gave stronger responses than did a single
plane, in all of the studied visual areas. The correlation threshold
was chosen to exclude only gray matter voxels that corresponded
retinotopically to visual field locations outside the 34 × 22°
stimulus aperture, and the results were similar when the data were
reanalyzed for a range of different correlation thresholds from
r > 0.2 to r > 0.5. There is evidence
for spatial clustering of disparity-tuned neurons in macaque cortical
visual areas V2 (Hubel and Livingstone 1987;
Hubel and Wiesel 1970
; Peterhans and von der
Heydt 1993
; Roe and Ts'o 1995
) and MT
(DeAngelis and Newsome 1999
). Organization of this type
is presumably invisible in our fMRI measurements because it occurs on a
spatial scale (~1 mm) that is much smaller than the size of our
voxels (~3 × 3 × 4 mm).
|
A second use of the reference scans was to validate comparisons of data collected with the different scanning protocols (Table 1). This comparison was performed for subject BTB because we measured reference scan responses for that subject using each of the protocols. The reference scan responses in all of the visual areas were highly reproducible; the 68% confidence interval obtained from one protocol contained the respective means from each of the other protocols.
Normalized responses
We normalized the responses of each visual area and each stimulus condition by dividing by the mean responses to a baseline stimulus condition in each visual area. The normalized responses are analogous to selectivity indices (e.g., disparity- or direction-selectivity indexes) that are commonly reported in single-unit electrophysiology studies. The normalized responses characterize how responsive a cortical area is to the change between two visual stimuli, relative to its response to a baseline stimulus condition. The normalized responses are thus complementary to the unnormalized responses, being particularly useful when comparing responses across subjects, cortical areas, and stimulus conditions.
The baseline responses were measured using stimulus conditions
identical to those used for the reference scans (±7.5 arcmin alternated with a blank screen). The baseline responses were averaged over a set of repeated scans (between 4 and 9), that excluded those
scans used to select the subregions of each visual area (see
Reference scans). The three panels of Fig.
4 plot the unnormalized responses, the
baseline responses, and the normalized responses. Figures
5-10 plot responses that have also been
normalized in this way. The normalized responses are expressed in units
of percent, that is the percentage of the baseline response evoked by
each stimulus condition. For example, a normalized response of 50% in
the stereoacuity experiment would mean that alternating the two-plane
stimulus with the one-plane stimulus evoked one-half the modulation in
cortical activity as alternating the two-plane stimulus with a blank
screen. Normalizing the responses in this way simplified the
interpretation of the results because it compensated for any
differences in the hemodynamic response across individuals and/or
across cortical areas within an individual. One visual area might have
been more responsive to all stimulus conditions than another visual
area for reasons unrelated to the stimulus disparity. First, the
stimuli might have been more effective in driving one visual area than
another (e.g., in terms of spatial or temporal frequency content).
Second, the vasculature, and consequently the hemodynamic response,
might have differed between the visual areas and/or between subjects
(Aguirre et al. 1998). Third, errors in identifying the
visual areas (for example, by including different fractions of
unresponsive tissue, such as white matter or cerebrospinal fluid) could have introduced a systematic scaling of the
measured responses in one of the areas. Fourth, some areas may have
been more susceptible to the influences of attention. To the extent that such effects were multiplicative and of the same size for all
stimulus conditions, they were mitigated by normalization.
|
|
Statistics
One-tailed t-tests were used to determine the statistical significance of the responses by testing the null hypothesis that the mean response amplitudes were zero, i.e., that there was no modulation of cortical activity. Analogous t-tests were used to compare the relative responses across visual areas, e.g., to show that the responses in area V3A were larger than those in other visual areas. These statistical tests were always performed on the unnormalized responses. These were typically more conservative tests than the comparable tests on the normalized responses.
The error bars for the normalized responses in Figs. 4-10 were
computed using a parametric bootstrapping procedure (Efron and Tibshirani 1993). This procedure works by randomly resampling from the measured responses. In particular, we randomly sampled values
from the normal distributions defined by the mean and SE for each test
condition and for the baseline (reference scan) condition. The number
of samples was equal to the actual number of repeated measurements for
each condition. We then analyzed the resampled data as described above.
These steps were repeated 1,000 times for each condition in each visual
area. Finally, 68% confidence intervals were computed from the
resulting bootstrapped response distributions.
Error bars estimate different quantities in different Figs. In Fig. 4, bars show confidence for the population mean (based on 8 subjects, 6 for area MT+). In Figs. 5-8 bars show confidence for the mean of a single subject in a given condition. In Figs. 9 and 10 bars show confidence for the mean of the two (and 3) particular subjects, estimated as the square root of the summed variance for subject means, divided by number of subjects.
Eye tracking
Eye-tracking measurements were performed to determine whether patterns of eye movements might account for some of our results. These experiments were performed in a psychophysical laboratory, not in the MR scanner, but the stimuli were identical to those displayed in the scanner, calibrated for the same luminance, contrast, and display size. Although it would have been ideal to record eye movements and acquire functional data simultaneously, that was not possible with the equipment we had available. We recorded eye movements using an infrared eye-tracking system (Ober 2, Timra, Sweden) that sampled horizontal and vertical eye positions at 100 Hz.
![]() |
RESULTS |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Population average
Activity in early visual cortex was larger for stimuli with stereoscopic depth than for a single flat plane. A representative example from one subject is shown in Fig. 3. Figure 3A plots the fMRI time series (red curve) averaged across the set of gray matter voxels corresponding to the stimulus representation within V3A, for scans that alternated between two planes (±7.5 arcmin) and a single plane (zero disparity). Note that the signal increased during the presentation of the two-plane stimulus and decreased during the presentation of the one-plane stimulus. The thick green curve is the best-fitting sinusoid. The amplitude of this sinusoid reflects the difference in cortical activity evoked by the two stimuli.
Figure 3, B and C, shows examples of correlation
maps (see METHODS) superimposed on the inplane anatomical
slices from one subject's brain. The correlation between the fMRI time
series and the best-fit sinusoid at each voxel is a measure of the
signal-to-noise ratio (Engel et al. 1997); it takes on a
value near 1 when the stimulus-driven signal modulation is large
relative to the noise in the fMRI time series, and it takes on a value
near 0 either when there is no signal modulation or when the signal is
overwhelmed by noise. Figure 3B shows regions, including V3A
(indicated by the red contour), where the cortical activity modulated
strongly with the two-plane/one-plane stimulus alternations. Figure
3C shows regions, including V1 (indicated by the blue
contour), where the cortical activity modulated strongly for stimuli
that alternated between two planes (±7.5 arcmin) and a blank screen.
We observed additional regions of visual cortex that also gave large
responses in the two-plane/one-plane scans (Fig. 3B), including a ventral area, perhaps V4v or V8 (Hadjikhani et al. 1998), and a dorsal area adjacent to V3A, perhaps V7
(Mendola et al. 1999
; Tootell et al.
1998a
,b
) or V3B (Smith et al. 1998
). However,
our retinotopy measurements were too noisy to map these areas with
complete confidence in all subjects.
Similar results were evident across the eight subjects. Figure 4A plots fMRI response amplitudes (see METHODS) in each of several visual areas, averaged across subjects, for the two-plane/one-plane stimulus alternations. The fMRI responses were generally small in magnitude, but they could nevertheless be measured reliably. The mean responses were statistically significant in all visual areas except for MT+ (P < 0.001, 1-tailed t-tests of the unnormalized responses).
Area V3A was highly sensitive to stereoscopic depth. The responses were largest in V3A, smaller in V1, V2, and V3, and not significantly different from zero in MT+. The mean response in V3A tended to be larger than that in any of the other visual areas, although this was statistically significant only in comparison with V3 and MT+ (P < 0.01, 1-tailed t-tests on the unnormalized responses). The high sensitivity of area V3A was particularly evident after normalizing the responses. Figure 4B plots the responses from the two-plane/blank baseline scans. The baseline responses were largest in V1 and progressively smaller in the later visual areas (a 2-way ANOVA on the baseline responses showed significant effects of both subject, P < 0.01, and visual area, P < 0.0001). Figure 4C plots the normalized responses, i.e., after dividing the responses in the Fig. 4A by the respective baseline responses in Fig. 4B. The responses were normalized separately for each subject, before averaging across subjects, to compensate for the inter-subject differences in the baseline responses. V1 was distinguished by giving large responses in the baseline condition (Fig. 4B) but small responses in the two planes/one plane condition (Fig. 4A), so it had small normalized responses (Fig. 4C). In particular, alternating the two-plane stimulus with the one-plane stimulus evoked a modulation of V1 activity that was only 5% of that evoked by alternating the two-plane stimulus with a blank screen. V3A, by contrast, gave small responses in the baseline condition (Fig. 4B) but the largest responses when alternating between two planes and one plane (Fig. 4A), so it had the largest (18%) normalized responses (Fig. 4C). The higher sensitivity of V3A to stereo disparity was evident in the normalized responses from all of the individual subjects.
Having established that cortical activity depends on stereoscopic depth, we proceeded to study this dependence in greater detail in two subjects (BTB and DJH) in the stereoacuity and upper depth limit experiments.
Stereoacuity
Psychophysical performance in the stereoacuity task is plotted in
the pair of graphs on the left side of each panel in Fig. 5. Both
subjects reliably distinguished the two-plane stimuli from the
one-plane stimulus when the interplane disparities were greater than or
equal to ±0.5 arcmin. Below this disparity, performance dropped off;
at half this disparity, d' was estimated to be <1, with the
95% confidence interval for percent correct no longer including the 75 percent correct point (assuming a binomial distribution for the
behavioral responses). These psychophysical thresholds are
consistent with reports in the literature for stimuli like ours
(Stevenson et al. 1989).
Cortical activity as a function of interplane disparity is plotted on the right side of each panel in Fig. 5. The measured activity rose quickly as disparity increased. For subject BTB, the responses in all visual areas were statistically significant (P < 0.05, 1-tailed t-tests on the unnormalized responses) when the interplane disparity was ±0.5 arcmin or more. For subject DJH, the responses in all areas were significant (P < 0.05, 1-tailed t-tests on the unnormalized respones) at ±1 arcmin or more.
We again found that area V3A was remarkably sensitive to binocular
disparity (Fig. 6). The responses in V3A
were statistically significant (P < 0.05, 1-tailed
t-tests on the unnormalized responses) in both subjects at
the smallest interplane disparities tested (±0.25 arcmin for
DJH and ±0.5 arcmin for BTB). Even though
DJH shows measurable V3A activity at a threshold-level
disparity, it would be a mistake to say that V3A activity was more
sensitive than the observer. The data show that V3A activity, averaged
across a duration of many minutes, was more sensitive than the observer on a single trial. In fact, the variability in the fMRI measurements was probably dominated not by the noise that limited psychophysical performance, but rather by variability in hemodynamic response due to
extraneous physiological factors (Biswal and Ulmer 1999; Biswal et al. 1997
; Mitra et al. 1997
;
Stillman et al. 1995
). It seems likely that V3A would
show a smaller but still reliable activity at lower disparities still,
if sufficient data could be collected to overcome measurement noise.
|
Off-horopter stereoacuity
Cortical activity covaried with psychophysical thresholds when the
stereoacuity stimuli were positioned off the horopter. Figure
7 plots normalized responses for stimuli
that alternated between two planes and one plane, for various
interplane disparities and for various displacements in depth in front
of or behind the fixation marker. When the interplane disparity was
large enough, activity was generally greater for the two-plane stimuli
than for one plane, even when all dots were in crossed (or all in
uncrossed) disparity. However, the responses were small or absent when
the interplane disparity was small (rightmost set of bars for each subject). These small interplane disparities were chosen to be above
the psychophysical stereoacuity threshold when the stimuli were
presented at the horopter, but below threshold off the horopter (Blakemore 1970; Ogle 1953
). On the
horopter, these stimuli were perceptually different from the
zero-disparity stimulus and evoked measurable responses (Fig. 5,
P < 0.05 in V3A for DJH at ±0.5 arcmin,
P < 0.01 in all visual areas for BTB at ±1
arcmin, 1-tailed t-tests on the unnormalized responses). Off
the horopter, these stimuli were not perceptually distinguishable and
did not evoke significant activity (P > 0.35 in all
visual areas in both subjects, 1-tailed t-tests on the
unnormalized responses).
|
Area V3A again showed the highest sensitivity to interplane disparity. In all five conditions for which interplane disparity was suprathreshold, the measured responses tended to be greatest in V3A and smallest in V1. For the two conditions with subthreshold disparities, there were no differences between areas because there was no measurable activity in any area.
Upper depth limits
Cortical activity again covaried with psychophysical performance in the upper disparity limit experiment, where the two-plane stimuli were alternated with a binocularly uncorrelated stimulus. As interplane disparity was increased, the two-plane stimuli became indistinguishable from the uncorrelated stimulus (Fig. 8, left pair of graphs). The modulation of cortical activity evoked by alternating the two-plane stimulus and the uncorrelated stimulus dropped to zero just before psychophysical performance dropped to chance levels (Fig. 8, right pair of graphs).
|
The zero-disparity stimulus evoked greater activity than the uncorrelated stimulus in all visual areas in BTB (leftmost data points, P < 0.01, 1-tailed t-test on the unnormalized responses). The same trend was evident in DJH, but the responses were statistically significant at zero-disparity only in areas V2 and V3 (P < 0.01). As would be predicted from the stereoacuity data (Fig. 5), activity increased with disparity for small disparities. Once again the responses tended to be greatest in V3A and smallest in V1.
In an additional experiment (performed only on subject BTB in 4 scans), a different spatial structure within the two-plane stimulus evoked similar levels of activity. This two-plane stimulus contained a corrugation in depth (horizontal stripes, each 3° tall, with dots in alternate stripes at ±6 arcmin) instead of transparent planes, and was shown in alternation with the uncorrelated stimulus. Mean activity levels in V1, V2, V3, and V3A were almost identical to those in the most comparable condition using transparent planes (±7.5 arcmin). Thus significant changes in the spatial structure of the stimulus had little or no effect on our measurements of the cortical responses.
Attention control
Some of the difference in activity between the two-plane stimuli and the one-plane stimulus, or between two planes and the uncorrelated stimulus, might have had nothing to do with stereoscopic processing per se. Observers reported that the two-plane stimuli were more engaging, which could mean that they paid more attention during periods when the two-plane stimuli were displayed, resulting in greater cortical activity.
Although attention can strongly influence fMRI measurements of activity
in visual cortex (Brefczynski and DeYoe 1999;
Gandhi et al. 1999
; Kastner et al. 1999
;
Ress et al. 2000
; Somers et al. 1999
;
Tootell et al. 1998a
; Watanabe et al.
1998
), there is evidence that our measurements are not entirely
the result of differential attention to the different stimulus
conditions. First, the absolute response in V1 was larger than that in
V3A when a two-plane stimulus was alternated with a blank screen, but
V3A responded more than V1 when the two-plane stimuli were alternated with one plane. This interaction between cortical area and stimulus condition cannot be explained by a nonspecific, attention-related increase in response to the two-plane stimuli. Second, to the extent
that attention evokes a multiplicative change in the gain of cortical
responses (Hillyard et al. 1998
; McAdams and
Maunsell 1999
; Treue and Martinez Trujillo
1999
), these attentional influences were mitigated by our
normalization procedure (dividing by the baseline responses). Third,
neural responses continued to increase with interplane disparity (Fig.
5), even after the two-plane stimuli were easily discriminated from one
plane. Likewise, responses in the upper depth limit experiment started
to drop well before the two-plane stimuli became indiscriminable from
the uncorrelated stimulus (Fig. 8). These findings are consistent with
a previous observation that the amplitudes of evoked potentials
increased with disparity for a single random-dot plane alternating in
depth (Norcia et al. 1985
).
Nonetheless, we performed an additional experiment aimed to explicitly control subjects' attention by having them perform a demanding depth discrimination task throughout each scan (see METHODS). The results, plotted in Fig. 9, suggest that the measured cortical signals reflect both sensory and attentional influences. Figure 9A plots the responses from scans in which subjects performed the depth discrimination task, and Fig. 9B plots the responses when subjects viewed similar stimuli without performing the task. The responses are smaller in Fig. 9A than Fig. 9B, suggesting that exogenous attention to the stereoscopic two-plane stimulus during passive fixation contributed to our measured responses. Even so, both data sets show the same pattern across visual areas. Whether or not the task was performed, 1) the normalized responses were smallest in V1 and progressively larger in V2, V3, and V3A; 2) the responses were statistically significant in each of several visual areas, in both subjects (DJH task: V2, V3, V3A; BTB task: V1, V2, V3A; DJH no task: V1, V2, V3, V3A; BTB no task: V3A; P < 0.5, 1-tailed t-tests on the unnormalized responses); and 3) the responses in V3A were significantly larger than those in any of the other visual areas, in both subjects (P < 0.5, 1-tailed t-tests on the unnormalized responses).
|
Consistent with the results from our other experiments, certain cortical regions responded more strongly to the two-plane stimulus (i.e., containing stereoscopic depth) than the one-plane stimulus, whether or not subjects performed the depth discrimination task. However, there were adjacent subregions in several visual areas that responded more strongly to one-plane than two-planes only when subjects were performing the task. These subregions did not exhibit any preference for one stimulus over the other without the task. Further experiments will have to be performed to determine why this was the case.
If it is the very nature of depth-containing stimuli to compel greater attention, then of course we cannot dissociate bottom-up stimulus-evoked responses from top-down attentional effects, because the latter would be driven directly by the former. An attentional effect of this sort would have to vary parametrically in size with disparity to account for the data. We do not suggest that this is the case, but we must admit that it is a possibility.
Response saturation control
The fMRI responses evoked by our disparity manipulations were small relative to the large responses evoked when alternating the random-dot stimuli with a blank screen. In V1, for example, the response amplitude was 0.13 in the two-plane/one-plane scans (Fig. 4A) as compared with 2.6 in the two-plane/blank scans (Fig. 4B). This led us to be concerned about response saturation. If the hemodynamic response saturates (levels off) with increases in neuronal activity, then the presence/absence of the random-dot stimuli might evoke a nearly maximum fMRI response in V1, thereby leaving very little headroom to reveal any additional increment in neuronal activity as a function of stimulus disparity.
We performed a control experiment to test for effects of response saturation, using low contrast stimuli. The results, plotted in Fig. 10, demonstrate that our results are not confounded by response saturation. Even at low contrasts, the responses were statistically significant in each of several visual areas, in all three subjects (DJH: all areas except V1; BTB: all areas except MT+; ACH: V3, V3A; P < 0.5, 1-tailed t-tests on the unnormalized responses). Even at low contrasts, the responses in V3A were significantly larger than those in any of the other visual areas in subjects DJH and BTB, and the V3A responses were larger than those in all areas except MT+ in subject ACH (P < 0.5, 1-tailed t-tests on the unnormalized responses).
|
Critically, the low contrast stimuli avoided saturation by leaving
plenty of headroom available for larger responses. The V1 responses in
the low contrast baseline scans (Fig. 10B), averaged across
the three subjects, were 58% as large as those measured at high
contrast (culled from the data plotted in Fig. 4). This is consistent
with previous fMRI measurements of the contrast dependence of V1
activity (Boynton et al. 1996, 1999
;
Demb et al. 1998
; Goodyear and Menon
1998
; Tootell et al. 1995
). The change from high
to low contrast caused the V1 responses in the two-plane/one-plane scans and the two-plane/blank scans to change by about the same scale
factor, so that the normalized V1 responses were roughly the same for
low (Fig. 10C) and high (Fig. 4C) contrasts. The
high contrast two-plane/blank scans evoked progressively smaller
responses in V1, V2, V3, V3A, and MT+, respectively; however, the low
contrast responses were similar across these areas (compare Figs.
4B and 10B). MT+ responses appear to be positive
in the low contrast experiment (Fig. 10A) and near-zero in
the high contrast experiment (Fig. 4A), but in fact the data
plotted in Figs. 10 and 4 are not directly comparable because the Fig.
4 data were collected from a larger group of subjects. Two of the three
subjects that were included in both experiments had similar MT+
responses in the two experiments. The third subject had larger MT+
responses at low contrast (P < 0.05). Indeed MT+
responses were generally highly variable across subjects and experiments.
Eye movement control
Differences in eye movements between conditions could potentially confound the interpretation of some of our results. A tendency to make more fixational, vergence, or pursuit eye movements while viewing the two plane stimuli might have been sufficient to modulate the fMRI signal. Specifically, eye movements might have evoked differentially larger responses in some areas (e.g., V3A and MT+) with a greater proportion of motion-sensitive neurons than other visual areas.
We believe, however, that our measurements are not confounded by eye
movements. First, the bulk of the data were collected with balanced
disparity (crossed and uncrossed) to avoid just this problem; it is
known that crossed and uncrossed disparities cancel each other during
automatic vergence eye movements (Mallot et al. 1996).
Second, the Nonius lines in the stimuli were large and salient,
and their position was easy to monitor subjectively. Third, because the
dot patterns were updated at 4 Hz, it seems likely that the cortical
activity induced by small versional eye movements would be
insignificant relative to that induced by the motion energy present in
the stimuli at all times. Fourth, subjects DJH, BTB, and
ACH have previously shown the ability to accurately fixate
dynamic stimuli (Huk and Heeger 2000
).
As a further precaution against differences due eye movement patterns, we measured the eye movements of subjects DJH and BTB while they viewed a subset of the stimulus conditions, in the same blocked design as in the fMRI experiments. Inspection of the traces showed that across subjects and conditions, eye position was steady to within ±0.25° of fixation.
![]() |
DISCUSSION |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
The main result of this study is that activity in early visual cortex covaries with stereoscopic depth perception. Perceptually, subjects cannot distinguish a one-plane stimulus from a two-plane stimulus when the interplane disparity is too small. Likewise, subjects cannot distinguish a two-plane stimulus from a binocularly uncorrelated stimulus when the interplane disparity is too large. Cortical activity in each of the studied visual areas followed the same pattern; activity first increased with disparity for interplane disparities from 0 to ~15 arcmin (Fig. 5), and then decreased with disparity for interplane disparities greater than ~30 arcmin (Fig. 8). Both perception and cortical activity depended on where the stimuli were placed in depth relative to fixation, that is, relative to the horopter. On the horopter, a small interplane disparity was perceptually detectable and evoked a measurable increase in cortical activity (Fig. 6). Off the horopter, a slightly larger interplane disparity was undetectable and did not evoke an increase in cortical activity (Fig. 7). Thus activity in visual cortex, pooled across a very large number of neurons, rises quickly in the vicinity of psychophysical threshold.
We also found that area V3A was highly sensitive to binocular disparity, exhibiting clear responses right down to the neighborhood of psychophysical threshold (Fig. 6). V3A was generally the most responsive of the five studied visual areas. V2 and V3 gave intermediate normalized responses, and V1 generally gave the smallest normalized responses. MT+ responses were highly variable across subjects and experiments; MT+ appeared to be sensitive to binocular disparity in some experiments (e.g., Fig. 6, subject DJH; Fig. 10), but not in other experiments (e.g., Fig. 4).
Possible interpretations
The interpretation of fMRI measurements is hampered by our lack of
understanding about how they relate to neural activity. The available
evidence suggests that fMRI responses are correlated with average
neural activity (Heeger et al. 1999,
2000
; Logothetis et al. 2001
; Rees
et al. 2000a
; Seidemann et al. 1999
;
Wandell et al. 1999
). But even so, the interpretation of
our fMRI data are limited by two issues. First, stereoscopic depth
judgments involve multiple processes. Second, our fMRI measurements
combined the activity indiscriminately of many neurons in each cortical area, whereas it is unlikely that subjects monitored total neuronal activity to distinguish two-plane from one-plane stimuli.
No fewer than six processes are active during stereoscopic vision, any
of which could in principle contribute to changes in neural activity
that we observed as a function of disparity. First, and most obvious,
is the computation (and neural representation) of the absolute
disparities of the dots in the stimulus. The relatively large responses
we observed in V3A might result from the presence of neurons that are
similar to disparity-responsive V1 neurons, but in greater numbers per
unit volume of cortex. Second is the explicit computation of relative
disparity. The finest stereoacuity judgments are generated
psychophysically only by stimuli that contain relative disparity
information (Kumar and Glaser 1992; Westheimer
1979
). The two-plane stimuli in our experiments afforded the
extraction of relative disparities. The relatively high sensitivity we
observed in area V3A would be expected if neurons in this area represent relative disparity. Third is the spread of disparity information to initiate depth filling-in. The dots in our stimuli appear to lie on surfaces, which suggests a process that "fills in"
the depth for the blank spaces between the dots. This process might
involve more neural processing for two surfaces than one surface. A
fourth neural process is segmentation based on disparity (Parker
and Yang 1989
; Stevenson et al. 1989
;
Westheimer 1986
). At intermediate interplane
disparities (roughly ±1 arcmin to ±1.5°) the two-plane stimulus
segregates perceptually into two distinct surfaces at different depths.
Fifth is calibration of disparity to estimate depth. For two planes, an
additional computation is needed to determine the depth between them,
which depends on the visual system's estimate of their distance from
the observer (Howard and Rogers 1995
; Wallach and
Zuckerman 1963
). Sixth, some stimuli may, by virtue of the
percept they create, compel attention-related or other top-down
activity in early visual areas. Because we did not explicitly control
each of these neural processes related to stereoscopic vision, we
cannot distinguish between them as causes for the observed changes in
fMRI activity. Note, however, that the same interpretational
ambiguities would apply in an electrophysiology experiment.
The fMRI signal effectively integrates activity within a volume of cortex containing millions of neurons. fMRI cannot therefore distinguish a high firing rate in each of a few neurons from a low firing rate in many neurons. Nor can it reveal an increased firing rate in a subpopulation of neurons when offset by a decreased firing rate in other neurons nearby. The inverted-U fMRI data of Fig. 8 might be predicted from a detailed description of the neural population (as we do for area V1 in Comparison with V1 electrophysiology), but inference in the other direction is impossible.
Despite these complications, the current results provide a useful
platform for further imaging research on stereopsis, and constrain
models of the neural processing that support stereoscopic vision in
humans. For example, a straightforward implementation of the
Lehky and Sejnowski (1990) neural model of stereoacuity predicted greater average activity in cortical area V1 for the one-plane stimulus than for the two-plane stimulus, in disagreement with our results. We next consider the relationship between the fMRI
signal and single-unit physiology in the context of our data.
Comparison with V1 electrophysiology
The responses of individual macaque neurons to disparity are best
characterized for area V1. Given the published literature, it is
plausible but not obvious that average V1 firing rates would increase
with interplane disparity above the stereoacuity threshold, then
decrease as interplane disparity approaches the upper depth limit.
Prince et al. (2000) found significant changes in the
firing patterns of single V1 cortical neurons when single-plane random dot stimuli were altered in disparity by as little as 0.6-1.2 arcmin,
corresponding well to the smallest interplane disparities of 0.5-1
(±0.25 to ±0.5) arcmin in our two-plane stimuli. In V1 a large number
of neurons are tuned for zero disparity (Poggio et al.
1988
). These neurons would fire maximally to the (single plane)
zero-disparity stimulus, and less to (2 plane) stimuli with nonzero
interplane disparities. Other V1 neurons fire maximally to near
disparities or far disparities. These neurons could respond best to
two-plane stimuli with appropriate interplane disparities and less to
the (single plane) zero-disparity stimulus. The pooled activity, as
measured with fMRI, depends on the relative sizes of these two effects,
i.e., the relative responsiveness and the relative number of neurons in
V1 with different disparity tuning. That V1 responded more strongly to
the two-plane stimulus than to the one-plane stimulus (Figs. 4 and 5)
therefore suggests that the mean activity of V1 neurons is a bimodal
function of absolute retinal disparity. That the fMRI response falls
off at large interplane disparities suggests that the local maxima are
<60 arcmin apart. That small interplane disparities evoke fMRI
responses suggests that the central trough in the bimodal response
function is narrow and centered at zero disparity.
Prince et al. (2001b) characterized disparity tuning for
180 disparity-selective neurons in macaque V1. These data permit a post
hoc test of the prediction that primate V1 neurons fire more, on
average, to nonzero than to zero disparity. Although the 180 neurons'
collective ability to detect a small change in disparity was unimodally
distributed near zero disparity (Prince et al. 2001a
),
we report here the surprising discovery that average firing rate was
bimodal as a function of disparity (Fig.
11). This finding was robust, that is,
it did not depend on just a few neurons within the sample. We performed
a bootstrap analysis of the Prince et al. (2001a)
data
by 1) repeatedly summing the responses of 180 neurons drawn
at random with replacement, to create a bootstrap sample of 1,000 curves, 2) subtracting out baseline differences between the
curves (baseline standard deviation, 405 spikes/s), 3)
separately ordering the 1,000 values from these curves at each disparity, and 4) plotting the 2.5th and 97.5th percentile
at each disparity (dotted curves in Fig. 11). Of 1,000 bootstrap sample curves, only 8 failed to show a central trough between 2 peaks. Figure
11 thus confirms the prediction of bimodality and that the local maxima
are spaced <60 arcmin apart. There were a number of differences
between experimental protocols in the human fMRI and monkey single-unit
experiments, including differences in the stimulus parameters, but the
sample of 180 neurons was selected on the basis that they showed
statistically significant changes in firing when stimulus disparity was
altered, so we suspect that bimodality characterizes the population of
V1 disparity-selective neurons as a whole.
|
Figure 11 also shows that the prediction of a narrow trough at zero
disparity was not confirmed. Instead, the trough was relatively broad
and was centered at 10-15 arcmin crossed disparity. Thus the response
of these neurons to absolute disparity cannot fully explain the fMRI
responses we observed in V1. One explanation for this failure is that
the fMRI response to small interplane disparities may have been due to
neurons coding central vision, where stereoacuity is highest
(McKee 1983; Rawlings and Shipley 1969
),
whereas neurons in the macaque sample coded a range of eccentricities.
A second explanation is that some component of the fMRI response in V1
may be due to factors other than absolute disparity per se (see
Possible interpretations). This view is supported by the
fMRI responses we observed in V1 using off-horopter stimuli (Fig. 7,
subject BTB) and the reduced response of V1 when a task was
used to control attentional affects (Fig. 9).
As with disparity, it is plausible but not obvious that average V1
firing rates would be larger for correlated stimuli than for
binocularly uncorrelated stimuli, due to binocular facilitation (Freeman and Ohzawa 1990). Complex cells in V1 act
essentially as correlation detectors (Anzai et al. 1999
;
Fleet et al. 1996
; Ohzawa et al. 1990
)
and could therefore account for the greater activity evoked by our
correlated one- and two-plane stimuli, as compared with our
uncorrelated stimulus. On the other hand, the uncorrelated stimulus
ought to impinge on the receptive fields of more cortical neurons than
do the correlated stimuli. Thus the greater cortical activity observed
for correlated stimuli probably reflects a decrease in the total number
of neurons activated, together with a more than compensatory increase
in the response of those neurons. Average neural activity for an
uncorrelated stimulus is predicted in Fig. 11 by the asymptotes on
either side of the wiggle, as these asymptotes give the response to
stimuli with very large disparities, which is expected to equal the
uncorrelated response. Figure 8 (rightmost data) shows that
large-disparity and uncorrelated fMRI responses were indeed the same.
Stereo pathway beyond V1
Neurophysiological studies have shown that there is a generally
widespread distribution of disparity-selective neurons throughout the
striate and extrastriate cortex of nonhuman primates (Burkhalter and Van Essen 1986; Maunsell and Van Essen 1983
;
Poggio 1995
). Little has been done to divide these
neurons into classes that can be more specifically associated with
identified visual areas. Nonetheless, it is known that there is a
columnar organization for disparity in macaque areas V2 (Hubel
and Livingstone 1987
; Hubel and Wiesel 1970
;
Peterhans and von der Heydt 1993
; Roe and Ts'o
1995
) and MT (DeAngelis and Newsome 1999
), and
also that binocular facilitation of neuronal responses may vary from
area to area when tested with zero-disparity stimuli (Zeki
1978
, 1979
). These observations are clearly
relevant to understanding the neural processing that supports
stereoscopic depth perception, although they do not nearly provide an
account of the process.
Poggio et al. (1988) examined the disparity selectivity
of neurons in macaque visual cortex in some detail. Two of their
findings may be particularly relevant to the interpretation of our own data. First, they found that the ratio of disparity-responsive to
disparity-unresponsive neurons was 1:1 for V1, 2:1 for V2, and 4:1 in a
region that probably was V3-V3A. This is qualitatively consistent with
differences in responses we observed across these visual areas: we also
observed significantly larger responses to stereoscopic stimuli in
areas beyond V1 than in V1 itself. However, the V3-V3A neurons they
encountered had receptive fields centered more peripherally than their
V1 neurons, so while perhaps suggestive, we cannot draw an ironclad
connection between those data and ours. Second, they found that in V1
many neurons were tuned to near-zero disparities, but that in V3-V3A
almost all neurons were tuned near or far. This finding is also
consistent with our data, but again the difference in eccentricities in
those samples makes it logically difficult to predict our responses from their data. In addition, there are known differences between visual processing in monkey and human V3A (Tootell et al.
1997
).
Since areas V3 and V3A have been associated with stereoscopic depth
signals in earlier single-unit recording experiments (Poggio et
al. 1988), a consistent interpretation of the specific fMRI signal observed in V3A is that it could arise from a concentration of
neurons sensitive to relative disparity; if the neurons carry signals
about relative disparity, then they would respond specifically to the
presence of the interplane disparity in our two-plane stimulus. Further
experiments in the extrastriate cortex will be necessary for testing
this particular interpretation. Regardless of whether this speculation
is correct, the present results add considerably to the case that V3A
may be relatively specialized for stereoscopic processing.
![]() |
ACKNOWLEDGMENTS |
---|
We thank H. Baseler, R. Dougherty, A. Huk, R. Khan, C. Tyler, and A. Wade for serving as subjects; G. H. Glover (and the Richard M. Lucas Center for Magnetic Resonance Spectroscopy and Imaging, supported by a National Center for Research Resources grant) for technical support; and S. Prince and B. Cumming for providing single unit data.
This research was supported by National Eye Institute Grant R01-EY-12741 to D. J. Heeger, an Alfred P. Sloan Research Fellowship to D. J. Fleet, a grant from the Wellcome Trust (UK) to A. J. Parker, and National Research Service Award Postdoctoral Fellowship F32-EY-06899 to B. T. Backus.
![]() |
FOOTNOTES |
---|
Address for reprint requests: B. T. Backus, Dept. of Psychology, University of Pennsylvania, 3815 Walnut St., Philadelphia, PA 19104-6196.
Received 25 August 2000; accepted in final form 4 May 2001.
![]() |
REFERENCES |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|