1The Rockefeller University, New York, New York 10021; and 2Department of Neuroscience, University of Pennsylvania School of Medicine, Philadelphia, Pennsylvania 19104
![]() |
ABSTRACT |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Kapadia, Mitesh K., Gerald Westheimer, and Charles D. Gilbert. Spatial Distribution of Contextual Interactions in Primary Visual Cortex and in Visual Perception. J. Neurophysiol. 84: 2048-2062, 2000. To examine the role of primary visual cortex in visuospatial integration, we studied the spatial arrangement of contextual interactions in the response properties of neurons in primary visual cortex of alert monkeys and in human perception. We found a spatial segregation of opposing contextual interactions. At the level of cortical neurons, excitatory interactions were located along the ends of receptive fields, while inhibitory interactions were strongest along the orthogonal axis. Parallel psychophysical studies in human observers showed opposing contextual interactions surrounding a target line with a similar spatial distribution. The results suggest that V1 neurons can participate in multiple perceptual processes via spatially segregated and functionally distinct components of their receptive fields.
![]() |
INTRODUCTION |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
An important task of the visual cortex is to integrate local information from different parts of a visual image into global percepts such as contours, surfaces, and three-dimensional shapes. Although the process of visuospatial integration has been traditionally ascribed to high-order cortical areas, there is a growing body of evidence that suggests that the primary visual cortex (V1) may play an important role. While the receptive fields (RFs) of V1 neurons, as measured by a simple stimulus, are quite small, stimuli outside of this region can have powerful modulatory influences when presented concurrently with stimuli inside the receptive field. The modulatory influences allow neurons to integrate information from large parts of the visual field and may allow neurons at this early stage of visual processing to participate in complex perceptual tasks such as contour integration and surface segmentation.
The existence of surround effects in V1 neurons, especially inhibitory
effects, has been known for many years (Bishop et al. 1973; Gilbert 1977
; Gulyas et al.
1987
; Hubel and Wiesel 1965
; Knierim and
Van Essen 1992
; Li and Li 1994
; Maffei
and Fiorentini 1976
), but it is now clear that excitatory
interactions may play an equally important role in neural responses
(Allman et al. 1985
; Kapadia et al. 1995
;
Nelson and Frost 1985
; Polat et al. 1998
; Sillito et al. 1995
).
These findings at the cellular level have their counterpart in
psychophysical studies where the perception of an object's attributes
is dependent on the context in which a stimulus is presented.
Contextual interactions affect many perceptual attributes including the
detection of low contrast objects (Dresp 1993;
Kapadia et al. 1995
; Polat and Sagi 1993
)
and the perception of brightness (Heinemann 1955
;
Ito et al. 1998
; Rossi et al. 1996
),
depth (Westheimer 1986
), position (Badcock and
Westheimer 1985
), and orientation (Gibson and Radner
1937
; Tyler and Nakayama 1983
; Westheimer
1990
).
While excitatory contextual interactions have been postulated to play a
role in contour integration and saliency (Field et al.
1993), inhibitory interactions are thought to be important in
the segmentation of surfaces and textures (Knierim and Van Essen
1992
). If opposing neural interactions are needed for different perceptual processes, the question arises as to how these processes can
exist in V1 neurons without canceling each other out. One possibility
is that excitatory and inhibitory interactions are present in the same
neurons but are located at different parts of the RF. To test this
hypothesis, we designed physiological and psychophysical experiments to
map the RF surround.
![]() |
METHODS |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Physiology
Experiments were performed using two, alert macaque monkeys (Macaca mulatta). The monkeys were comfortably seated in a primate chair 1.5 m from a computer monitor with a resolution of 1,200 × 800 pixels refreshed at 60 Hz. Experiments were performed under photopic conditions with ambient light. All procedures complied with the National Institutes of Health Guide for the Care and Use of Laboratory Animals and were approved by the institutional animal review boards.
Experimental and surgical procedures were as previously described
(Kapadia et al. 1995). Briefly, stimuli were presented
in the near periphery while animals performed a foveal dimming task to
help them maintain tight fixations. The animals received a juice reward
if they held fixation within a 0.7-1.0° window and indicated the
dimming by releasing a lever at the appropriate time. Neural recordings
were obtained in 600-ms epochs. The stimulus was turned on 200 ms after
the start of the recording and presented for 100 ms. Each fixation
trial consisted of three to five recording epochs separated by at least
300 ms.
The time window used to calculate the evoked response from each recording site was adjusted according to that cell's composite temporal response profile over the entire set of experiments performed on that cell. The number of spikes that occurred in this window was converted to spikes/second by dividing the response by the window length. Spontaneous activity for each experiment was defined as the neural activity in the 200 ms prior to stimulus onset, averaged across all stimuli within an experiment and converted to spikes/second. The displayed results represent the mean evoked response of the unit over the 8-15 trials of each stimulus after subtracting spontaneous activity. Error bars show one standard error of the mean above and below this value.
Neural activity was recorded from the operculum of striate cortex using glass-coated platinum-iridium electrodes. Neural signals were amplified, band-pass filtered between 300 and 3,000 Hz and fed through a time-amplitude window discriminator to isolate individual units and small clusters of two to three units. The results from the single-unit recordings were indistinguishable from the multi-unit recordings. All recording sites had complex cell-like properties and were from superficial cortical layers, the neurons of which form the main output to extrastriate visual areas.
At each recording site, we quickly estimated the basic properties of the unit under study such as its orientation tuning and RF center by displaying bars at different positions and orientations and assessing the neuron's response on an audio monitor. The neuron's optimal orientation was then determined quantitatively by recording neural responses to long oriented bars (measuring 120 × 3') spaced 10-20° apart over a full range of 180°. Recording sites that showed poor orientation tuning were omitted from the study. The RF center was determined by presenting small optimally oriented bars (measuring 15 × 3') at adjacent positions along the length and width axes of the RF. The center was defined as the location that produced the largest response. The remainder of the experiments, including the two-dimensional mapping studies, was done with bars measuring 30 × 3'. The length of the stimulus corresponds to the approximate size of RFs at this eccentricity. The stimuli were not scaled according to the dimensions of individual RFs. We use the word "target" to represent a stimulus placed at the center of a neuron's RF and "flank" to represent peripheral stimuli.
The eccentricities of RF centers ranged from 2 to 7° from the fovea, averaging around 4°. The central target stimulus was always presented at the optimal orientation at the center of the units' RFs. Since the experiments were performed over a narrow range of eccentricities, RF sizes tended to be similar to one another, and we chose not to scale the size of the stimuli between individual recording sites.
The stimulus set used to create the two-dimensional maps consisted of 82 unique stimuli interleaved in a random block format. Thus a minimum of 656 stimulus presentations (82 stimuli × 8 trials for each stimulus) were needed to create each set of maps under each set of contrast conditions. Since the subtraction techniques used to create the context and nonlinearity maps depended strongly on the response to the central target presented alone, we often interspersed an additional 8-32 presentations of this stimulus to minimize the uncertainty in its mean response.
Although the fixation window provided limits for continuing a trial, the animal maintained much more precise fixation than that allowed by the window. Eye positions were monitored and recorded at 100 Hz using a scleral search coil system (C-N-C Engineering). An analysis of eye position data for 33 experiments is shown in Fig. 1. The data were summarized by calculating the mean eye position during each 100-ms stimulus presentation for a given experiment and calculating the mean and standard deviation of the distribution. The variability in eye position between trials (measured as 1 SD of the mean) for each of the 33 experiments is shown in Fig. 1A, and the average of the 33 values in Fig. 1B. The standard deviation in eye position, averaged across all experiments, was 2.7 min of arc in the horizontal direction and 2.9 min in the vertical position. This is much smaller than the grain of the grids used in the two-dimensional receptive field maps (15-30').
|
Psychophysics
Psychophysical experiments were performed on human observers using a nulling technique to measure the perceived orientation of a vertical line when a pair of additional, tilted lines was presented simultaneously. In these experiments, we use the word "target" to refer to the stimulus that the subject is asked to make a judgement about and "flank" to refer to the additional, contextual lines presented in the surround. Stimuli were presented on a CRT monitor with a resolution of 1,024 × 768 pixels refreshed at a rate of 60 Hz. Stimulus presentation was controlled by a Matrox Millenium video card and a PC-compatible computer. No error feedback was provided. Observation was binocular with normal pupils and a free head. All human experiments were approved by the institutional committees on human experiments.
The foveal tilt illusion experiments were performed at an observation distance of 6 meters. The central target and flanks consisted of white vertical lines, 8 × 1' in size. The line length used in this and other psychophysical experiments were given values that approximate the size of V1 receptive fields. Unless otherwise noted, the Michelson contrast of the stimuli was 99%. All data points presented here are based on at least 450 total trials for both flank orientations distributed over a minimum of 2 days.
Each trial consisted of a 2,000-ms cycle. The target and a pair of tilted flanks were presented for 100 ms and followed by a 1,900-ms interval during which the subject reported whether the target appeared tilted clockwise or counterclockwise by pressing the appropriate button on a computer mouse. The target line was accompanied by two flanks, which were symmetrically positioned with respect to the target. To concentrate on the orientation signal and to obviate any position clues, the stimulus was designed so that the ends of the flanks had the same relationship to the end of the target for all target and flank orientations. During each session, the location and orientation of the flanking lines remained constant but by changing flank orientations and positions from session to session a full range of values of these parameters could be explored.
During each presentation, the target was shown randomly at one of seven
equally spaced orientations centered on vertical, and the observer
reported, if necessary by guessing, whether it appeared tilted
clockwise or not. A psychometric curve was fitted to the proportion of
"yes" responses at the seven orientations by the method of probits
(Finney 1952), where the mean value, i.e., the
orientation at which "clockwise" and "counterclockwise" responses are equally probable, provides an estimate of the target orientation at which it has no apparent tilt (see Fig. 3). To eliminate
possible biases, stimuli with clockwise and counterclockwise flanks
were randomly interleaved in each experiment. The induced tilt, defined
as half the difference between the means under conditions of clockwise
and counterclockwise flanks, is a bias-free measure of the tilt
illusion obtained by a nulling method based entirely on the observers'
"yes" or "no" responses to the question whether the target line
appeared to have a clockwise tilt in a minimum of 600 presentations
with randomly distributed values of target line orientation and
direction of flank tilt.
In general, there were no important qualitative differences between the fovea and the periphery in this class of psychophysical studies. To assure ourselves that the pattern of attractive and repulsive tilt illusions exists also in the near periphery, one observer performed the experiments described above at an eccentricity of 4° (view distance: 1m, line lengths: 30'). Care was exercised to factor out asymmetries depending on retinal areas by accumulating data across eight meridia.
![]() |
RESULTS |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Two-dimensional contextual maps
PHYSIOLOGY. The physiological experiments were designed to study the spatial distribution of excitatory and inhibitory surround interactions around a neuron's RF. After quantitatively mapping the center of the RF and finding the units' optimal orientations, we presented combinations of stimuli inside and outside the RF. The total stimulus set was composed of 82 stimuli positioned in a 9 × 9 grid centered on the RF (Fig. 2A). The dimensions of this grid were fixed and were not scaled to the size of individual RFs. The stimulus set was designed to minimize the number of stimuli required to map the RF and its surround and therefore minimize the time required to maintain isolation of the recorded units. All stimuli were presented at the optimal orientation for each cell under study. The stimulus set consisted of conditions where either the central target was presented alone, the target was presented in conjunction with a pair of flanks located symmetrically with respect to the RF center, or the pair of flanks were presented in isolation.
|
PSYCHOPHYSICS.
The spatial segregation of excitatory and inhibitory contextual
interactions at the neural level led us to search for a similar dichotomy at the level of visual perception. One type of contextual interaction that has been studied extensively at the
psychophysical level is the tilt illusion. In the classical description
of this illusion, the presence of tilted, flanking lines causes a
target to appear tilted in a direction opposite to the orientation of the flanking lines, which is a "repulsive" effect
(Gibson and Radner 1937; Westheimer
1990
). We suspected that if opposing contextual interactions
were segregated in different positions around the receptive
field, a perceptual effect observed when a context is placed along a
target line's axis might be reversed for contextual stimuli placed
along the orthogonal axis. Furthermore plotting the direction and
magnitude of induced tilts with the flanks in different spatial
positions should produce similar two-dimensional maps of contextual
interactions as those seen in the physiological experiments.
|
|
|
|
Contrast dependency
PHYSIOLOGY. In the next set of experiments, we examined the effects of changing stimulus contrast on the pattern of contextual interactions. Initially, the contrast of the target and flanks were changed simultaneously. Figure 7 shows the results of experiments performed at one recording site using stimuli at three different contrasts. The raw neural response to the five main stimulus configurations at each contrast is shown in Fig. 7A. At a contrast of 20%, the central target presented in isolation produced a small response from the cell; but this response was greatly enhanced by the addition of collinear flanks. When the central target was presented with side-by-side flanks, there was little change in the neuron's firing rate. At a contrast of 50%, collinear flanks increased the target alone response by a much smaller amount, while side-by-side flanks were inhibitory. The experiments at 30% contrast show intermediate results. Both collinear facilitation and lateral inhibition were evident at this contrast.
|
|
|
PSYCHOPHYSICS. The separation distance between target and surround at which one could see the maximum attractive tilt effect could be changed by altering the relative contrast of the target and surround. When the contrast of the target was lowered to 15% and the surround maintained at high contrast, the optimum separation for obtaining the tilt effect was 20-24', and the effect was maintained out to 32' (Fig. 10A). This is quite different from the condition in which all three lines were at high contrast, where the maximal effect was obtained with an 8-10' separation and disappeared beyond 16' separation (Fig. 10B). In the same manner, for both the physiology and the psychophysics, manipulating contrast influenced the spatial extent over which contextual influences were exerted.
|
![]() |
DISCUSSION |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
The principal conclusion that comes from this work is that contextual influences, both at the physiological and psychophysical levels, are not uniform but rather are highly dependent on the spatial positioning of the surrounding stimuli relative to the receptive field or to the target. While there are somewhat contradictory studies on the sign (facilitatory vs. inhibitory) and character of contextual interactions, our findings show that the RF surround is composed of both excitatory and inhibitory regions and that individual neurons are capable of displaying both types of interactions. We show further that the balance between facilitation and inhibition changes in a stimulus-dependent fashion.
The two levels of analysis (physiology and psychophysics) show that interactions of opposing sign are found in lobes located along orthogonal axes. Excitatory neural interactions and attractive tilt illusions were observed along the orientation axis of line segments, while inhibitory interactions and repulsive tilt illusions were observed at the sides of lines. The physiological and psychophysical interactions show similar dependencies on contrast, such that as one changes the relative contrast of center and surround, the facilitatory surrounds and attractive tilt influences extend farther from the central line.
One might go further to relate the psychophysical and
neurophysiological parts of this study via a simple, population-coding model of orientation discrimination, which can explain how tilt illusions can arise from excitatory and inhibitory contextual interactions at the neural level (Fig.
11). The neural basis of the tilt
illusion, as supported by population coding models, may be the pattern
of contextual interactions shown here. Previous implementations of this
model have suggested iso-orientation inhibition as a possible basis for
repulsive interactions in the orientation domain (Gilbert and
Wiesel 1990). Physiological experiments support the feasibility
of this class of models (Gilbert and Wiesel 1990
; Knierim and Van Essen 1992
; Li and Li
1994
; Li et al. 2000
; Nothdurft et al.
1999
).
|
As indicated in the model, the orientation tuning curves of neurons are sufficiently broad that a single oriented line activates cells optimally tuned to a wide range of orientation preferences. In this model, each neuron acts as a labeled line, signaling the presence of a stimulus at its own preferred orientation. The perceived orientation of a stimulus is derived from the activity of all neurons that are activated by that line. In this model, at the neural level, attractive tilts arise from iso-orientation facilitation and repulsive tilts arise from iso-orientation inhibition (Fig. 11, B and C). The flanking lines facilitate or inhibit neighboring neurons with similar orientation preferences, which skew the population vector in manner that induces a tilt in one direction or another.
Whether the dominant influence of contextual interactions is a change
in the height of orientation tuning curves, as used in this model, or
also involves changes in the tuning curves' peak orientation or width
requires further study. Once a population coding model of orientation
is acceptedand there is little choice in view of the broad
orientation tuning of neurons and the high precision of orientation
perception of contours
one can think of several other ways of
implementing interactions to yield the tilt illusion, but to
distinguish between them, requires detailed knowledge of the changes in
parameters of orientation tuning curves. The model presented here lends
plausibility to the idea that the opposing tilt effects can be directly
mapped onto the excitatory and inhibitory contextual zones around the
receptive field and provides a suggestion for why the patterns of
interactions observed in both parts of the study were so similar.
In the physiological experiments, we found that the balance between excitation and inhibition in the RF was not fixed but was regulated in a dynamic, stimulus-dependent manner. Low contrast stimuli tended to invoke strong excitatory interactions and weak inhibition, while the opposite was true at high contrasts. Contrast did not seem to change the positioning of the excitatory and inhibitory subregions but instead seemed to alter their strength and dimensions. Excitatory contextual interactions were observed at both high and low contrasts as evident in the context maps in Fig. 7. At low contrasts, the excitatory drive came from outside the neuron's receptive field; the neuron's response to the target and flanks presented simultaneously was more than the sum of the responses to the individual stimuli. At high contrasts, the excitation came from within the neuron's minimum response field; the neuron's response to the target and flanks presented simultaneously was more than the response to the target alone but less than the sum of the responses to the individual stimuli. Thus high and low contrast stimuli lead to a similar pattern of excitatory interactions in the context maps, but the nonlinearity maps look quite different under the two stimulus conditions. Also, while strong excitatory interactions were only observed close to the RF when the contrast of the flanks was low, excitation could be seen over much longer distances for higher contrast flanks.
Inhibitory contextual interactions were strongest at high contrasts.
Under these conditions, there were lobes of strong inhibition along the
orientation and orthogonal axes of the RF, which are consistent with
the well known properties of end- inhibition (Hubel and Wiesel
1965) and side-band inhibition (Bishop et al.
1973
), respectively. The current results lend further credence
to the idea that end-inhibition is contrast dependent. High contrast stimuli invoke strong end-inhibition, while at low contrasts, the same
regions at the ends of RFs are excitatory and little or no
end-inhibition is observed (Kapadia et al. 1999
;
Sceniak et al. 1999
).
While stimulus contrast is one factor that can influence the strength
and sign of contextual interactions, there are likely to be many
others. One factor that has been implicated in other studies is the
complexity of the environment in which a stimulus is presented
(Kapadia et al. 1995, 1999
). When a high contrast stimulus is embedded in a complex surround, the neural response to that
stimulus is often much less than the response to the same stimulus
presented in isolation. The suppression induced by the complex surround
also alters the pattern of contextual interactions for the stimulus. In
effect, the suppression makes the neuron behave as if the central
stimulus is at a lower contrast and excitatory contextual interactions
become prominent at even the highest contrasts tested. This finding
suggests that under real-world conditions, where objects are likely to
be embedded in complex scenes, excitatory contextual interactions are
likely to be prominent at all levels of stimulus contrast not just at
the relatively low contrasts found using the simple stimuli used here.
A strong candidate for the anatomical substrate of the contextual
interactions observed here is likely to be the intrinsic, long-range
horizontal connections formed by pyramidal cells in V1 (Gilbert
and Wiesel 1979, 1983
; Martin and Whitteridge
1984
; Rockland and Lund 1982
), although feedback
from extrastriate areas may also play a role. Long-range horizontal
connections can extend over distances as large as 6-8 mm and tend to
connect cells with similar orientation preferences (Bosking et
al. 1997
; Gilbert and Wiesel 1989
;
Kisvarday et al. 1997
; Ts'o et al.
1986
). The extent of long-range connections at the anatomical
level correlates well with the extent of contextual interactions
observed in the psychophysical and physiological experiments. At an
eccentricity of 4°, the average eccentricity of the RFs in this
study, the cortical magnification factor is 2.5 mm/° (Dow et
al. 1981
). This means that horizontal connections can integrate
information over 2-3° of visual space, which is similar to the
extent of interactions we observed here.
Another important aspect of the long-range horizontal connections is
their ability to provide both excitatory and inhibitory inputs to their
postsynaptic neurons. Since the long-range connections arise from
glutamatergic pyramidal neurons, they provide excitatory input directly
but can also exert inhibitory effects through a disynaptic circuit
involving inhibitory interneurons (McGuire et al. 1991).
The dynamic change in long-range inputs from excitatory to inhibitory
as a function of stimulus contrast is reminiscent of similar effects
observed in intracellular recordings in cortical slices, where synaptic
potentials evoked by a stimulating electrode change from excitatory to
inhibitory depending on the intensity of electrical stimulation
(Hirsch and Gilbert 1991
; Weliky et al.
1995
). In these experiments, low intensity stimulation evoked excitatory synaptic potentials, while higher intensity stimulation also
activated inhibitory interneurons, resulting in strong, negative synaptic potentials that overwhelmed the smaller, excitatory
potentials. Changing the contrast of a visual stimulus is likely to
invoke similar cortical mechanisms as changing the intensity of an
electrical stimulus.
The experiments in this study help delineate the basic structure
of contextual interactions but likely reveal only a component of the
full complexity of the interactions. The stimulus set used here was
chosen to minimize the number of stimuli because of the difficulty in
maintaining stable neural recordings in alert animals for long periods
of time. For example, the stimulus imposes a mirror symmetry on the
data since each experiment uses a pair of flanks instead of a single
flank. Experiments with single flanks would be able to reveal
asymmetries in the distribution of contextual interactions in
individual cells. Also, the influence of a surround stimulus on a
central target is dependent on the presence of additional contextual
elements (Kapadia et al. 1995, 1999
), and these
cascading nonlinearities may explain higher-order aspects of visual
processing. The experiments in this study were also limited to
iso-orientation interactions; to characterize fully the role of V1
cells in analyzing complex visual scenes, one must study interactions
between stimuli that differ in orientation.
A simple, conceptual model of how the observed pattern of contextual interactions might form the neural basis contour integration and surface segmentation is shown in Fig. 12. In this model, the saliency of a stimulus element results from the neural response of that feature relative to other elements in the surround. Collinear, excitatory interactions enhance the neural activity of stimulus elements that form smooth contours, resulting in enhanced saliency of these elements relative to other features in their surrounds. Lateral inhibitory interactions suppress the neural response of stimulus elements whose neighbors have the same orientation. The loss of this suppression in areas where there is a change in orientation results in enhanced saliency of the texture boundary.
|
The results of the current study suggest that the contextual influences may allow cells at early stages in cortical visual processing to mediate complex processes in intermediate level vision. The spatial segregation of excitatory and inhibitory inputs may allow individual V1 neurons to participate in multiple perceptual tasks that require opposing neural interactions. Our results suggest further that there is no clear separation between the stages of visual processing that serve for the analysis of simple stimulus attributes, such as orientation, and those involved in higher order mechanisms of visuospatial integration.
![]() |
ACKNOWLEDGMENTS |
---|
We thank S. Kane for help with eye coil implantation and A. Glatz and J. Lopez for expert technical assistance.
This work was supported by National Institutes of Health Grants EY-07968 to C. D. Gilbert and MH-11394 to M. K. Kapadia.
![]() |
FOOTNOTES |
---|
Address for reprint requests: C. D. Gilbert, The Rockefeller University, 1230 York Ave., New York, NY 10021 (E-mail: gilbert{at}rockefeller.edu).
The costs of publication of this article were defrayed in part by the payment of page charges. The article must therefore be hereby marked "advertisement" in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.
Received 21 December 1999; accepted in final form 22 June 2000.
![]() |
REFERENCES |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|