 |
INTRODUCTION |
Temporal coding can, in principle, provide a way that individual visual neurons can signal more than one stimulus attribute simultaneously and at least partially independently (McClurkin and Optican 1996
; McClurkin et al. 1991a
,b
; Purpura et al. 1993
). We previously have shown (Purpura et al. 1993
; Victor and Purpura 1996a
) that in both simple and complex cells, temporal coding contributes significantly to the representation of contrast, spatial frequency, and orientation, but the relationship of spatial phase to temporal coding remains largely unexplored.
The extent to which the spatial phase of a grating influences a neuron's response plays a crucial role in the distinction between simple and complex receptive fields (Skottun et al. 1991
; Spitzer and Hochstein 1985a
,b
). In qualitative terms, a simple cell's characteristic modulated response to a moving grating is consistent with linear combination of signals from subregions of the receptive field, leading to a marked dependence of responses on spatial phase. In contrast, a complex cell's characteristic steady elevation of firing rates in response to a moving grating is typically considered to be indicative of additive combination of signals across an array of rectifying subunits (Spitzer and Hochstein 1985b
) and would lead to responses that are independent of spatial phase.
Thus one might expect that in simple cells, the attributes of contrast, orientation, and spatial frequency can be signaled only if spatial phase is fixed, whereas in complex cells, these attributes are signaled in a phase-invariant manner. However, this conclusion (and indeed, the classification of cells into simple and complex types) is based on an implicit assumption that a neuron's response is characterized by the number of spikes in some period of time
i.e., a rate, or spike count, code. It is clear that this assumption is not justified (McClurkin and Optican 1996
; McClurkin et al. 1991a
,b
; Purpura et al. 1993
; Victor and Purpura 1996a
). Conceivably, simple cells might be able to exploit temporal coding to signal stimulus attributes in a manner that is not confounded by spatial phase even though the firing rate envelope might be strongly dependent on spatial phase. Conversely, temporal coding of multiple stimulus attributes in the discharge of a complex cell (Victor and Purpura 1996a
) might be confounded by changes in spatial phase unless the putative subunits that make up a complex cell's receptive field (Spitzer and Hochstein 1985b
) combine in a temporally coherent fashion.
From a functional point of view, spatial phase is critically important in extracting image features (Field 1987
; Morgan et al. 1991
; Oppenheim and Lim 1981
; Shapley et al. 1990
; Tadmor and Tolhurst 1992
; Victor and Conte 1996
). It is natural to assume that spatial phase, because of its close relationship to position, is encoded by the locus of activity across a population of neurons. However, to the extent that neurons may be regarded as local Fourier analyzers (DeValois and DeValois 1988; De Valois et al. 1985
), spatial phase and spatial position are distinct entities (Ohzawa et al. 1996
). From the point of view of local Fourier analyzers, features such as lines, edges, and smooth gradations are superpositions of grating patches in specific relative phases. Changing the phase but not the position of the patches changes the nature of the feature, whereas changing their position (but not their relative phases) translates the feature. Thus neural representation of the nature of a feature and its location might require more than a simple spatial code.
In our previous investigations (Victor and Purpura 1996a
, 1997a
) in V1, and in the work investigating the temporal encoding of gratings in V1 and V2 (K. P. Purpura and L. M. Optican, unpublished results), spatial phase was not explicitly examined in part because of the limited control of eye position available in awake behaving animals (Creutzfeldt et al. 1987
). In this paper, we analyze the temporal structure of responses of single neurons in V1 in the anesthetized, paralyzed animals (under conditions in which spatial phase is controlled precisely), to address the issues raised above.
A portion of this work was presented at the 1996 meeting of the Society for Neuroscience, Washington, DC (Victor and Purpura 1996b
).
 |
METHODS |
Physiological methods
We recorded single-unit activity in the parafoveal representation in cortical area V1 of 10 anesthetized, paralyzed macaque monkeys. Single units (25) were isolated and stable recordings maintained for sufficient time (4-6 h) for the studies reported here. All procedures involving the animals were performed in accordance with National Institutes of Health guidelines for the care and use of laboratory animals.
GENERAL PREPARATION.
Anesthesia was induced with ketamine 15 mg/kg im potentiated by xylazine 2 mg/kg im (Rompun, Haver), supplemented as needed by methohexital boluses (0.5-1 mg/kg iv) during the preparatory surgery. Pupils were dilated with atropine 1% eyedrops, and flurbiprofen 2.5% (Ocufen, Allergan) was instilled as prophylaxis against ocular inflammation. Incision sites were prepped with betadine and infiltrated with xylocaine 1%. Venous access was obtained via bilateral femoral vein cannulation. The femoral artery was catheterized for continuous blood pressure monitoring, and the trachea was cannulated for mechanical ventilation. After transfer of the animal to a stereotaxic frame, anesthesia was maintained with sufentanil (Sufenta, Janssen), 3 µg/kg iv bolus, 1-6 µg·kg
1·h
1 iv. A few animals were refractory to sufentanil at 6 µg·kg
1·h
1, and for these animals, urethan (400-500 mg/kg iv loading, 200 mg/kg iv every 12 h) was substituted for sufentanil. Dexamethasone 1 mg/kg iv was administered at the start of the experiment and daily thereafter to reduce cerebral edema. Procaine penicillin G 25,000 U/kg im and benzathine penicillin G 25,000 U/kg im (Pen BP-48, Pfizer, New York, NY) were administered as prophylaxis against surgical infection. Gentamicin (5 mg/kg im daily) was given if fever, hypoxia, increased tracheal secretions, or chest auscultation suggested the development of infection. Every 12-24 h, the corneas were irrigated with Ringer and flurbiprofen was instilled. Local antibiotic (bacitracin, neomycin, and polymyxin B ointment) was applied to the conjunctivae if a discharge was present.
Eyelids were retracted with 6-0 chromic gut sutures and corneas were protected with contact lenses. The dura overlying V1 was exposed via a careful craniotomy centered at 12-15L, 12-15P. A portion of the dura just posterior to the lunate sulcus was removed under an operating microscope. After these surgical procedures, paralysis was induced with pancuronium bromide 1 mg iv bolus, 0.2-0.4 mg·kg
1·h
1 iv, and anesthesia with sufentanil or urethan was maintained. Core temperature, monitored with a rectal thermistor, was maintained at 37°C with a thermostatically controlled heating blanket. Ventilator settings were adjusted to maintain an end-expiratory CO2 at 30-35 mmHg. Supplemental oxygen was administered every 6 h, and electrocardiograms and oxygenation were monitored continuously. Hydration (lactated Ringer solution with 5% glucose, 2-3 ml·kg
1·h
1) was maintained throughout the experiment.
The positions of the foveae and optic disks were mapped onto a tangent screen with a modified hand-held fundus camera or direct ophthalmoscope. Refraction was optimized for the viewing distance of 114 cm with trial lenses as determined by slit retinoscopy and confirmed or refined if necessary by optimizing responses of individual cortical cells to drifting gratings. Artificial pupils (3-mm diam) were centered in front of the natural pupils.
SINGLE-UNIT RECORDING AND PRELIMINARY CHARACTERIZATION.
An Ainsworth tungsten-in-glass microelectrode (typical resistance, 2 M
) was advanced through a small durotomy until the action potential of a single neuron was discriminated reliably by a window discriminator (Winston Electronics, Millbrae, CA) either alone or augmented by one or more analog "hoops" (Tucker-Davis Technology, Gainesville, FL) that placed amplitude and latency criteria on later phases of the spike waveform. The receptive field was mapped onto a tangent screen and ocular dominance was determined by auditory criteria. In all subsequent recording, the nondominant eye was occluded. A first-surface mirror was adjusted to align the receptive field with the center of a computer-driven CRT display (mean luminance 150 cd/m2 with a green phosphor, subtending 4 × 4° at the viewing distance of 114 cm). This display system, a modification of the system described by Milkman et al. (1980)
, provides for a 256 × 256-pixel raster at 270 Hz with look-up table correction of intensity-voltage nonlinearities. Although every attempt was made to align the center of the receptive field with the center of the display (spatial phase 0), it is recognized that this alignment is to some extent arbitrary, and our analysis strategy does not depend on absolute knowledge of spatial phase.
The general ranges of spatial frequency tuning, orientation tuning, and temporal tuning were estimated by rapid auditory assessment of the response to sinusoidal drifting gratings. Computer-controlled stimulation and recording then was begun. Orientation tuning was determined by responses to gratings at each of 16 orientations (equally spaced in steps of 22.5° or, for narrowly-tuned units, 11.25°), presented at a contrast [(Lmax
Lmin)/(Lmax + Lmin)] of 0.5-1.0 the spatial frequency and temporal frequency of which were determined by the auditory assessment. Spatial frequency tuning was determined by responses to gratings at each of eight spatial frequencies (typically 0.25, 0.5, 1.0, 2.0, 3.0, 4.0, 6.0, and 8.0 cycles/deg) at a contrast 0.5-1.0, the orientation of which was determined by the quantitative orientation tuning run, and the temporal frequency of which was determined by the auditory assessment. In most units, a contrast response function also was determined by responses to drifting gratings at contrasts of 0.0625, 0.125, 0.25, 0.5, and 1.0, (optimal orientation and spatial frequency, temporal frequency determined by the auditory assessment), and temporal tuning was assessed by responses to 1-, 2-, 4-, and 8-Hz drifting gratings at the optimal orientation, spatial frequency, and contrast. In all of these tuning runs, stimuli were presented in randomized order in four to eight blocks. Each stimulus was presented continuously for 11 s, the last 10 s of which were Fourier analyzed at the stimulus frequency and its second harmonic to quantify responses. In a few cases, the quantitative characterization led to tuning functions for spatial or temporal frequency that differed substantially from the auditory assessment. In these cases, the quantitative characterization was repeated with these modified values.
Cells were classified as simple or complex (Skottun et al. 1991
) on the basis of whether their response to a drifting grating of high spatial frequency was predominantly a modulated response at the fundamental frequency (simple cells) or elevation of the mean (complex cells). Confidence limits for Fourier coefficients were determined by the T2circ statistic (Victor and Mast 1991
).
LESIONS, EUTHANASIA, AND HISTOLOGY.
At locations along the electrode track corresponding to recording sites and at an additional location at the end of the track, lesions were made by current passage (3 µA × 3 s). At the conclusion of the experiment, the animal was killed by rapid injection of a barbiturate (>15 mg/kg methohexital iv), exsanguinated via perfusion with phosphate-buffered saline, and perfused with 4% paraformaldehyde in phosphate-buffered saline. Cryostatic sections (40 µm) were stained by the Nissl method and examined under light microscopy to confirm track location in V1. Nearly all units were in granular and supragranular layers.
EXPERIMENTAL DESIGN.
Experiments to analyze the signaling of contrast, spatial frequency, and orientation were organized as diagrammed in Fig. 1. Stimuli consisted of transiently-presented full-field (4 × 4°) stationary sinusoidal luminance gratings. Stimuli were organized into runs of 16 grating presentations (237-ms duration, 710-1,026 ms between presentations), and there were 10-s gaps between runs. Between presentations of gratings within a run, and in the gaps between the runs, the display returned to a uniform field at the mean luminance. Within each run, all gratings had a fixed contrast, spatial frequency, and orientation (varied across runs as described later), and spatial phase was varied across 16 equally spaced values (in steps of 112.5 or 90°). As shown in Fig. 1, a sequence of 16 steps of 112.5° phase increments covers the same phases as would steps of 22.5° but in an order that reduces the sense of apparent motion.

View larger version (23K):
[in this window]
[in a new window]
| FIG. 1.
Layout of a spatial frequency experiment. Stimuli were presented transiently for 237 ms and grouped into runs of 16 presentations that included all spatial phases once. From run to run, spatial frequency and the initial spatial phase were varied in a balanced fashion. In other experiments, contrast, orientation, or two of the parameters (contrast, spatial frequency, orientation) were varied from run to run, along with initial spatial phase.
|
|
In each experiment, one or two of the stimulus parameters (contrast, spatial frequency, and orientation) was varied across runs. When orientation or spatial frequency was varied, it was allowed to assume the value at which the drifting grating response was maximal and two or more values that were clearly away from the peak. Spatial parameters that were not varied (i.e., orientation or spatial frequency) were held fixed at their values at which the grating response was maximal. Contrast was varied across a range consistent with the range at which the unit responded in the preliminary runs or was maintained at 1.0 for units with low contrast sensitivity. This strategy, rather than an attempt to cover densely the entire range of sensitivity of the neuron, was adopted to limit the number of different stimuli and thus reduce the upward bias of information-theoretic measures of clustering (Carlton 1969
; Treves and Panzeri 1995
). A typical experiment might thus consist of varying the contrast between two values (e.g., 0.5 and 1.0) and varying orientation across three values (e.g., peak, peak + 22.5°, peak + 45°). For each of these six contrast × orientation pairs, there were 16 runs (to provide each spatial phase with the opportunity to be presented first). This block of 6 × 16 = 96 unique runs, presented in randomized order, thus contained 16 presentations of each of the six grating stimuli at each of 16 spatial phases. For each run, the initial spatial phase and sequence of spatial phases was chosen in a pseudorandom fashion, so that for each contrast, spatial frequency, and orientation, each spatial phase was presented exactly the same number of times and presented in each serial position within runs exactly the same number of times. This arrangement was designed to counterbalance any effects of contrast adaptation. Several (typically 2-4) repetitions of the block of unique runs were obtained, with the order of runs within each block randomized. Runs were aborted if spike discrimination became unreliable or if there was a major change in responsiveness. Spike times were recorded with a resolution of 1.2 ms (1/3 of the frame time) by the DEC 11/93 computer that sequenced the runs and controlled the visual stimulator.
If recording stability permitted, we attempted to perform analyses of all three attributes (contrast, spatial frequency, and orientation) in each unit either individually or in pairs. Data sets collected with two attributes varied were "sliced" into data sets for analysis of coding of a single attribute, but only the optimal value of the second attribute was considered for this purpose. In some cases, we recorded two independent data sets for a single attribute (e.g., contrast covarying with spatial frequency in 1 data set and covarying with orientation in another), which resulted in two independent data sets for the overlapping attribute in that unit. A tally of the experiments performed and data sets analyzed is presented in Table 1.
In a few experiments only four spatial phases (in steps of 90°) were examined. These experiments are included with the 16-phase experiments because reanalysis of subsets of the 16-phase experiments restricted to 4 phases yielded results similar to results of the analysis of the full 16-phase experiments.
Data analysis
Our goal is to determine the extent to which spike trains elicited by transient presentations of static gratings can represent the contrast, spatial frequency, or orientation of the grating and to what extent this representation depends on spatial phase. We wanted to minimize assumptions about how stimulus attributes might be represented, both in terms of the relevant features of a spike train, and the mapping between these features and the parameterization of the stimulus. For example, the attribute of contrast naturally runs monotonically from 0 to 1, but one cannot assume that spike counts represent this attribute in a linear fashion, and there may be a contribution to the representation of contrast from bursts, latency changes, or other temporal features. Representation of spatial attributes raises additional issues. For example, the attribute of spatial frequency runs monotonically from low to high, but a typical neuron produces the largest response at an intermediate spatial frequency, and responses to spatial frequencies significantly below or above this optimum might have far fewer spikes. The time courses of these off-peak responses might (or might not) have consistent differences. Thus although in a formal sense spatial frequency is a monotonic variable, there is certainly no reason to assume that this attribute is represented in a "linear" fashion, and it is even unclear whether it is represented in a monotonic fashion.
To minimize assumptions concerning the neural representation of the stimulus space, we based our analysis on the metric-space approach we recently introduced (Victor and Purpura (1996a
, 1997a)
and describe briefly here. We consider a series of candidates for the notion of a "metric" (distance or dissimilarity) between spike trains. Our primary assumption is that if a candidate metric reflects the manner in which stimuli are represented by neural discharges, then distances between individual responses to the same stimulus will be small, whereas distances between individual responses to distinct stimuli will be large. Thus for each candidate metric, the analysis breaks into two stages: a calculation of distances between response pairs and an assessment of the extent to which these distances indicate stimulus-dependent clustering. Each metric is a way of comparing individual responses with each other; the clustering calculation examines the relationship between stimuli and these individual responses.
METRICS.
The metrics we consider include comparisons based solely on the number of spikes (called the "spike count" metric, Dcount), as well as a family of distances (parameterized by a quantity q) that are sensitive to the temporal pattern of the spikes ("spike time" metrics, Dspike[q]). The parameter q (s
1) indicates the sensitivity of the distance to the precise timing of spikes. For spike trains to be seen as similar in the sense of Dspike[q], they must have a similar number of spikes and the times of these spikes must agree to within 1/q. More precisely, the distance between two spike trains Sa and Sb in the sense of Dspike[q] is defined as the minimal cost required to transform Sa into Sb via a sequence of any of the following transformations: insertion of a spike, which entails a cost of 1; deletion of a spike, which entails a cost of 1; and shifting a spike by an amount of time
t, which entails a cost of q
t. For q = 0, the metric Dspike[q] collapses into the spike count metric Dcount. The first step in data analysis thus consists of calculation of the distances between all pairs of individual responses, for each of the candidate metrics (Dcount and Dspike[q], for q ranging from 1 to 512 s
1 in octave steps).
STIMULUS-DEPENDENT CLUSTERING.
The second step of the analysis is a determination of the extent to which each metric induces stimulus-dependent clustering. The experimental set-up defines a set of stimulus classes s1, s2, . . . , sC, (e.g., one for each spatial frequency). Based solely on the calculated pairwise distances, one can cluster the recorded spike trains into response classes r1, r2, . . . , rC, (Victor and Purpura 1996a
, 1997a
). In essence, the clustering algorithm puts each spike train into the class that corresponds to the stimulus that elicited the closest set of observed responses in the other trials. Application of the clustering algorithm to each response yields a partition of the Ntot observed spike trains into an array N(s
, r
) that tallies the number of times that a response in class r
was elicited by a stimulus in class s
.
Maximal stimulus-dependent clustering corresponds to an array N(s
, r
) that is nonzero only on the diagonal-that is, no stimuli are misclassified. The other extreme, an absence of stimulus-dependent clustering, corresponds to an array N(s
, r
) that is randomly filled. Between these extremes, the array N(s
, r
) is larger on the diagonal than off
corresponding to a situation in which individual responses to each stimulus form partially overlapping clouds (as illustrated in Victor and Purpura 1997a
, Figs. 9 and 13). A dimensionless quantity to quantify clustering is the "transmitted information" H (Abramson 1963
) of the matrix N(s
, r
). H is given by
|
(1)
|
where logarithms are taken to the base 2. Perfect clustering of C equally probable stimulus classes corresponds to H = log C, and random clustering corresponds to H = 0. We stress that we interpret the transmitted information H merely as an index of stimulus-dependent clustering. The calculation of H is in no way intended to be optimal (for this would entail additional assumptions concerning the nature of stimulus encoding and response variability) nor to reflect how biological processes extract information from spike trains.

View larger version (21K):
[in this window]
[in a new window]
| FIG. 9.
Analysis of encoding of contrast for the data set of Fig. 8. , Hpooled; , Hindiv, plotted as in Fig. 3. Analyses are performed on the responses restricted to the first 100 ms (A), the first 256 ms (B), and the first 473 ms (C).
|
|
BIASES DUE TO FINITE SAMPLES.
For finite data samples, the information estimate H given by Eq. 1 is upwardly biased (Carlton 1969
; Panzeri and Treves 1996
; Treves and Panzeri 1995
). Thus we estimated conservatively (Panzeri and Treves 1996
) the upward bias of the estimate (Eq. 1) for H by repeating the above calculations for synthetic data sets that consisted of a random reassignment of the observed responses to the stimulus categories. The mean of 10 such calculations will be denoted by H0, and all comparisons that we present will be based on the empirically corrected value H
H0, rather than H. A formula that asymptotically estimates the upward bias has recently been developed (Panzeri and Treves 1996
; Treves and Panzeri 1995
). Strictly speaking, this formula is not applicable here, because our clustering procedure violates the "independent-binning" assumption required for its derivation. Nevertheless for a few example data sets, comparison of our empiric estimate of the bias by resampling and the analytic formula were similar. Finally, the similarity of results across 16- and 4-phase data sets was further evidence that our results were not merely due to sample-size biases in the estimates of H.
The measured value of H does not reflect the intrinsic coding capacity of the neuron but rather depends on the range of stimulus parameters explored. However, bias in the estimate of H increases in approximate proportion to the number of stimulus categories (Treves and Panzeri 1995
). Because the overall aim of the study is to determine whether or not representation of stimulus parameters is disrupted by spatial phase, we chose a wide, but sparsely sampled, range of stimulus parameters (as described earlier) to make H large but to keep its bias small.
RELEVANCE OF H.
The clustering measure H specifies the extent to which stimuli that differ in one or more attributes lead to responses that have systematic differences. That is, statistically significant values of H imply a systematic representation of a stimulus attribute in the spike train (i.e., that it has been encoded) but do not necessarily imply that the visual system has the capacity to decode it.
However, the biological relevance of H is no more tenuous than measures based on spike counts and histograms. H is influenced by both spike counts and, to an extent that depends on q, temporal patterns of spikes. Overall firing rate is known to be biologically relevant (Salzman et al. 1990
), but direct electrical stimulation necessarily induces a change in the temporal pattern of spikes as well. Conversely (Roelfsema et al. 1994
), there are functional correlates associated with changes in temporal pattern of spikes, even when there is no change in overall firing rate.
The reader might be concerned by the apparently sophisticated mathematical operations that this approach is unlikely to be relevant to neurophysiologic processes. But the sophistication of the mathematics does not reflect an assumption that the nervous system performs analogous calculations; rather, it is required by a loosening of assumptions about how neurons work. The relationship of our approach to traditional approaches like counting spikes and accumulating histograms is similar to the relationship of nonparametric statistics (such as the median) to parametric statistics (such as the arithmetic mean). The median is harder to compute than the arithmetic mean and cannot be described as readily in terms of the elementary mathematical operations. However, the arithmetic mean makes the assumption that the measured values of the parameter have a linear relationship to the quantities of interest, whereas the median only assumes that rank order is significant. In this context, the use of spike counts and response histograms carries the implicit assumption of a linear, additive structure for the response measure. Although this might be adequate for certain idealized neuron models, the real neurons are not linear, and are sensitive to coincidences (Softky and Koch 1993
) among their inputs. The relevant time scale of these coincidences (and thus of the temporal structure of spike trains) may range from submillisecond to many milliseconds, depending on the biophysical mechanisms involved (Bourne and Nicoll 1993
; Softky 1994
). Our approach explicitly recognizes these possibilities and uses appropriate mathematical methods, including nonparametric elements, to address them. In view of the ongoing debate concerning the relevance of detailed firing patterns (Shadlen and Newsome 1995
; Softky 1995
), we consider this to be an appropriately conservative strategy.
COMPARISON WITH ASSESSMENT OF TUNING.
To assess tuning, one chooses a measure of response size, such as the spike count or a particular Fourier component, and asks how this measure depends on one or more stimulus parameters. Here we examine a set of measures of the dissimilarity (or difference) between two responses and ask how these measures depend on the choice of stimuli. This is a more general strategy. A measure of response size always can be turned into a measure of dissimilarity
for example, Dcount is the difference in the number of spikes contained in the two trains that it compares. However, measures of dissimilarity need not correspond to measures of response size, as in the case of Dspike[q] for q > 0.
In laboratory application of either technique, it is often necessary to sample discrete values of a parameter that conceptually spans a continuous range (e.g., contrast, spatial phase, spatial frequency). The implicit assumption is that the sampling captures the main features of what is fundamentally a continuous correspondence between stimuli and responses. In the "tuning" approach, this assumption is supported by the smooth appearance of tuning curves. For the present approach, this assumption is supported by the lawful behavior of multidimensional scaling based on the response metrics (Victor and Purpura 1997a
; Victor et al. 1997
).
Our rationale for choosing this more general approach is that we are interested in a neuron's role in suprathreshold vision not just detection tasks. For the latter, it could be argued that it is sufficient to consider measures of response size. But to understand suprathreshold vision (e.g., discrimination tasks and neural representation), consideration of response dissimilarity appears the more natural approach.
 |
RESULTS |
Interaction of spatial phase with contrast, spatial frequency, and orientation
We will begin by presenting an analysis of several data sets in detail and then will present a summary of our findings across the recorded units. The detailed presentation also will help clarify our approach to the analysis of how the encoding of contrast, spatial frequency, and orientation interacted with spatial phase. We will develop two clustering measures: Hpooled, in which responses from all spatial phases are pooled, and Hindiv, in which each spatial phase is treated individually. These information-theoretic measures indicate the extent to which the spike discharge represents (i.e., has the potential to signal) a particular spatial attribute in a context in which spatial phase is allowed to vary (Hpooled) or held fixed (Hindiv). As we have pointed out above, these quantities are not used as absolute measures of information, and we will focus on comparing them, comparing their evolution over time, and comparing their dependence on the stimulus parameter of interest.
SAMPLE DATA SET 1: SPATIAL FREQUENCY ENCODING IN A SIMPLE CELL.
Responses of a simple cell to gratings at three spatial frequencies (0.5, 2, and 4 cycles/deg) are shown in Fig. 2. This simple cell was directional and orientationally tuned and had a spatial frequency cutoff of 6 cycles/deg, and the illustrated responses were collected at its optimal orientation. At each spatial frequency, systematic phase dependence of the response is evident. For example, at 0.5 cycles/deg, the largest responses occur for spatial phases in the range 157.5-270°, and responses to gratings with spatial phases near 0° are minimal. At 2 cycles/deg, the largest responses occur for spatial phases 247.5-337.5°, and responses to gratings with spatial phases in the range 112.5-180° are small. At this spatial frequency, a prominent off-discharge also is present when the on-response is large. At 4 cycles/deg, responses are smaller, and the dependence of response on spatial phase is less marked, but there is still a response maximum for phases in the range 45-135°. Because of the joint dependence of response size on spatial frequency and spatial phase, the size of the response does not necessarily indicate the spatial frequency of the grating. For example, a moderate response could either indicate the presence of an 0.5 cycles/deg grating near a null spatial phase or a 4 cycles/deg stimulus near the peak spatial phase. This intuitive analysis, namely that spatial frequency and spatial phase are jointly encoded, is supported by the quantitative analysis we now describe.

View larger version (17K):
[in this window]
[in a new window]
| FIG. 2.
Responses of a V1 simple cell to gratings that vary in spatial frequency (rows: 0.5, 2, and 4 cycles/deg) and spatial phase (columns: steps of 22.5°). Stimulus onset is at time 0. Vertical line at 237 ms marks the disappearance of the stimulus. Contrast: 0.5. Orientation: 90° (preferred). Unit 21/2.
|
|
First the data set was analyzed independently of spatial phase. That is, the 48 different stimuli (16 spatial phases and 3 spatial frequencies) were pooled into three classes, ignoring the differences in spatial phase. Stimulus-dependent clustering for this pooled analysis, denoted by Hpooled, was taken to be the corrected value of the transmitted information H
H0. Hpooled was calculated for the spike count metric Dcount and each of the spike time metrics Dspike[q] (for q ranging from 1 to 512 s
1 in octave steps). Second, the data set was partitioned into 16 subsets, 1 for each spatial phase. Within each of these 16 subsets, stimulus-dependent clustering was assessed by a calculation of H
H0. For this calculation, each stimulus class consisted of a single spatial frequency at a single spatial phase, and the only responses used to calculate H or H0 were responses obtained at that spatial phase. The resulting 16 values of H
H0, one for each spatial phase, were averaged, to obtain Hindiv. In essence, Hindiv indicates the extent to which the spike trains represent spatial frequency in a context in which spatial phase is held fixed, whereas Hpooled indicates the extent to which the spike trains represent spatial frequency in a context in which spatial phase is allowed to vary.
Separate calculations of Hindiv and Hpooled were performed for the first 100 ms of each response, the first 256 ms, and the first 473 ms, and for each of the two contrast levels studied. A comparison of these quantities is shown in Fig. 3 for the three analysis intervals at a contrast of 0.5 (A-C) and a contrast of 1.0 (D-F). In all analyses, Hindiv exceeds Hpooled, both for the spike count metric (plotted at q = 0) and nearly all the spike time metrics. For most of the analyses (all but Fig. 3E), the maximal clustering is achieved for q > 0. This indicates that stimulus-dependent clustering is more prominent when the temporal structure of the spike train is taken into account (and spike trains are compared via Dspike[q]) than when it is ignored (and spike trains are compared only on the basis of the number of spikes they contain).

View larger version (32K):
[in this window]
[in a new window]
| FIG. 3.
Analysis of encoding of spatial frequency for the data set of Fig. 2. , Hpooled (a measure of the representation of spatial frequency, in the context that spatial phase is allowed to vary). , Hindiv (a measure of the representation of spatial frequency, in the context that spatial phase is held fixed). A-C: contrast 0.5, with data analysis restricted to the first 100 ms (A), the first 256 ms (B), and the first 473 ms (C). D-F: contrast 1.0, with data analysis restricted to the first 100 ms (D), the first 256 ms (E), and the first 473 ms (F). Hpooled and Hindiv have been corrected for estimated bias via a resampling procedure and have been calculated for a range of spike time metrics Dspike[q], as well as the spike count metric Dcount, plotted at q = 0. Missing symbols indicate that the calculated values of H did not exceed the value expected by chance.
|
|
The value of q that provides optimal clustering typically ranges from 16 to 64 s
1, corresponding to a temporal precision (1/q) of ~15-60 ms. As q increases beyond this point, Hindiv decreases, eventually to chance levels. This indicates that the pattern of spikes at a higher temporal resolution (<15 ms) does not appear to depend in a systematic way on the stimulus. Other data sets show a drop in values of Hindiv and Hpooled at lower values of q, indicating a proportionately more coarse temporal resolution. This timescale for the "informative" precision of a spike agrees with our previous findings in recordings in the awake macaque (Victor and Purpura 1996a
) and results of others (Heller et al. 1995
) using a different analytic technique. It shows that there is a substantial difference between the informative precision of a spike and the intrinsic precision of the neural spike-generating mechanism (Mainen and Sejnowski 1995
; Reich et al. 1997
).
Hindiv and Hpooled have different patterns of evolution in time. Hindiv is distinctly above zero even for an analysis restricted to the initial response segment (the first 100 ms; Fig. 3, A and D), but Hpooled is only significantly greater than zero when the entire response (the first 256 ms; Fig. 3, B and E) is analyzed. The initial portion of the response represents spatial frequency in a phase-dependent manner, as would be expected from a linear receptive field with uniform dynamics. The response then evolves over time to include a representation of spatial frequency that is independent of spatial phase, suggesting inputs from other kinds of receptive field elements (see DISCUSSION). Inclusion of the off-component of the response does not produce a major change either in Hindiv and Hpooled (Fig. 3, C and F). However, there is a greater separation between Hindiv and Hpooled, suggesting that the off-response is strongly phase dependent in this cell.
SAMPLE DATA SET 2: ORIENTATION ENCODING IN A COMPLEX CELL.
Figure 4 shows the responses of a complex cell to gratings that varied in orientation and spatial phase. As seen in the figure, the complex cell was tuned orientationally in response to static presentations of gratings, as was demonstrated for neurons in both V1 and V2 in the study by K. P. Purpura and L. M. Optican (unpublished results). Responses to drifting gratings (not shown) were tuned to the same orientation and were direction selective as well. This unit had a spatial frequency cutoff of 2 cycles/deg, and the illustrated responses were collected at 1 cycles/deg near its optimum. As is seen from the response histograms, there is a maximal response at an orientation of 90°, with smaller responses often present at the neighboring orientation of 112.5°, and smaller still at 67.5°. Although some dependence on spatial phase is present, these three orientations contained the largest responses at each spatial phase. At most spatial phases, the largest response was at 90°. Although some spikes are present during presentations of orientations that are removed from this peak, these spikes typically do not occur during the onset of the grating, and the rasters (not shown) suggest that they are not systematically present (i.e., noise).

View larger version (23K):
[in this window]
[in a new window]
| FIG. 4.
Responses of a V1 supragranular complex cell to gratings that vary in orientation (rows: steps of 22.5°) and spatial phase (columns: steps of 22.5°). Responses to orientations of 0, 22.5, and 157.5° contained very few spikes and are not shown. Stimulus onset is at time 0. Vertical line at 237 ms marks the disappearance of the stimulus. Contrast: 1.0. Spatial frequency: 1 cycles/deg. Unit 20/1.
|
|
The formal analysis of phase-independent and phase-dependent clustering is shown in Fig. 5. For the three analysis intervals, Hpooled exceeds Hindiv for all metrics considered. Hpooled is near its maximal value within the first 100 ms (Fig. 5A) but Hindiv increases between 100 and 256 ms (Fig. 5B). Hindiv increases further when the off-discharge is included (Fig. 5, C compared with B). As in the example of Fig. 3, maximal clustering is achieved for a nonzero value of q, in the range 16-64 s
1. But in this case, the increase of Hpooled for q > 0 over Hpooled for a spike count code (q = 0) is small, indicating that there is only a minimal contribution of the temporal structure of the response to the representation of orientation. Additionally, this increment is only seen for the higher of the information curves (Hpooled) and not for Hindiv. That is, for this unit, a spike count code is superior for representing orientation provided that each orientation is presented at a single spatial phase. The major finding, that Hpooled exceeds Hindiv, confirms the intuition that for this unit, the representation of orientation is phase independent. Hindiv is clearly not zero, but the fact that it is smaller than Hpooled indicates that there is no confounding of spatial phase and orientation, in contrast to the interaction between spatial phase and spatial frequency analyzed in Figs. 2 and 3.

View larger version (18K):
[in this window]
[in a new window]
| FIG. 5.
Analysis of encoding of orientation for the dataset of Fig. 4. , Hpooled; , Hindiv, plotted as in Fig. 3. Analyses are performed on the responses restricted to the first 100 ms (A), the first 256 ms (B), and the first 473 ms (C).
|
|
SAMPLE DATA SETS 3 AND 4: CONTRAST ENCODING.
Figure 6 shows the responses of a complex cell to gratings that varied in contrast. The unit was orientationally tuned and direction selective. The illustrated responses were collected at 1.0 cycles/deg, near its spatial cutoff and at the preferred orientation of 135°. Responses are somewhat noisy, and there is little dependence of response size on spatial phase.

View larger version (25K):
[in this window]
[in a new window]
| FIG. 6.
Responses of a V1 supragranular complex cell to gratings that vary in contrast (rows: 0.125, 0.25, 0.5, and 1.0) and spatial phase (columns: steps of 22.5°). Stimulus onset is at time 0. Vertical line at 237 ms marks the disappearance of the stimulus. Spatial frequency: 1.0 cycles/deg. Orientation: 135° (preferred). Unit 19/2.
|
|
In the first two intervals analyzed (0-100 ms and 0-256 ms), Hindiv and Hpooled are comparable, as shown in Fig. 7, A and B. Hindiv slightly exceeds Hpooled for initial portion of the response (Fig. 7A). Both quantities increase somewhat for the full stimulus-on period, with Hpooled slightly exceeding Hindiv for the full on response (Fig. 7B). That is, contrast-dependent clustering is comparable, whether or not spatial phase is held fixed
consistent with the notion that complex cells respond in a phase-invariant manner (Skottun et al. 1991
). However, as opposed to the previous data sets, the effect of temporal coding is dramatic: Hpooled and Hindiv are near zero for the spike count code (q = 0) and only become substantial for distances that are sensitive to the temporal pattern of spikes (Dspike[q], for q in the range 8-32 s
1).

View larger version (18K):
[in this window]
[in a new window]
| FIG. 7.
Analysis of encoding of contrast for the data set of Fig. 6. , Hpooled; , Hindiv, plotted as in Fig. 3. Analyses are performed on the responses restricted to the first 100 ms (A), the first 256 ms (B), and the first 473 ms (C).
|
|
There is an additional aspect of this unit's response worth noting that was not a common feature of the other recordings. In most units, there is little change in Hpooled or Hindiv when the off-discharge is taken into account, but in this unit, there is a near-doubling of Hpooled (cf. Fig. 7, C with B). Inspection of the response histograms (Fig. 6) suggests the likely response feature responsible for this increase in stimulus-dependent clustering: a cessation of the firing rate ~100 ms after the removal of the stimulus, most prominent for stimulus contrasts of 1.0, and present at most spatial phases.
The final data set, illustrated in Fig. 8, consists of the responses of a simple cell to gratings that varied in contrast. The unit was tuned orientationally but not direction selective. It had a spatial frequency cutoff of 1 cycles/deg, and the illustrated responses were collected at 0.5 cycles/deg, near its tuning peak, at the preferred orientation of 135°. At the two highest contrasts, there is a clear dependence of response size on spatial phase, less-obvious at the lowest contrast because of small responses overall. Additionally, this unit's peak response is relatively prolonged, with response onset occurring at ~70 ms and peaking at ~180 ms.

View larger version (16K):
[in this window]
[in a new window]
| FIG. 8.
Responses of a V1 simple cell to gratings that vary in contrast (rows: 0.25, 0.5, and 1.0) and spatial phase (columns: steps of 22.5°). Stimulus onset is at time 0. Vertical line at 237 ms marks the disappearance of the stimulus. Spatial frequency: 0.5 cycles/deg. Orientation: 135° (preferred). Unit 26/2.
|
|
In the initial (0-100 ms) analysis interval, Hindiv exceeds Hpooled (Fig. 9A), which is not above chance. For the longer analysis intervals (Fig. 9, B and C), Hpooled exceeds Hindiv but this difference is minimal. Additionally, for the shortest analysis interval (Fig. 9A), simply counting spikes provides the largest value of H. However, in the longer analysis intervals (Fig. 9, B and C), both Hpooled and Hindiv achieve their maximal values for a nonzero value of q (in the range 8-32 s
1). The dependence on q is relatively small and of unclear significance.
Inspection of Fig. 8 shows, not surprisingly, that the largest responses require not only a large contrast but also particular spatial phases. Intermediate responses are elicited either by a high contrast stimulus at a near-null spatial phase (e.g., a contrast of 1.0 at a spatial phase of 90°) or by a lower contrast stimulus in an optimal phase (e.g., a contrast of 0.5 at a spatial phase of 337.5°). What this analysis shows is that despite this coupling, contrast can be effectively represented even when spatial phase is ignored. However, this distinction requires more than the initial part of the response: at 100 ms, Hindiv exceeds Hpooled, which is not significantly different from 0 (Fig. 9A), whereas at 256 ms, Hpooled exceeds Hindiv (Fig. 9B).
SUMMARY ACROSS DATA SETS.
To collate the observations in individual data sets across the population of units recorded (Table 1), we proceeded as follows. For each data set, we identified peak values of Hpooled and Hindiv (as a function of q) for each analysis period (the first 100, 256, and 473 ms). These maximum values of H were averaged separately for simple cells and complex cells and for each attribute (contrast, spatial frequency, or orientation) that was studied. Averaged values were normalized by the maximal average value achieved for that attribute (for Hpooled and Hindiv, any of the 3 analysis periods, and either cell type). The rationale for this normalization is to compare encoding with spatial phase held fixed to encoding with spatial phase allowed to vary, in simple and complex cells, and to examine how encoding evolves over time. Averages for each attribute were separately normalized because our experimental protocol (different numbers and ranges of contrasts, spatial frequencies, and orientations) would confound a comparison of absolute values across modalities. The averaged, normalized values of Hpooled and Hindiv are presented in Fig. 10.

View larger version (27K):
[in this window]
[in a new window]
| FIG. 10.
Summary across data sets. Normalized values of Hpooled ( and ) and Hindiv ( and ) are plotted as a function of the length of the analysis interval, for simple cells ( and ) and complex cells ( and ). Averaging is carried out for contrast experiments (A), spatial frequency experiments (B), orientation experiments (C), and experiments in which 2 parameters were varied (D). D pools results from contrast × spatial frequency experiments, contrast × orientation experiments, and spatial frequency × orientation experiments.
|
|
For representation of contrast (Fig. 10A) within the stimulus-on period (the first 100 or 256 ms), Hindiv is greater than Hpooled for simple cells, consistent with the idea that a simple cell's response is phase dependent. That is, a single simple cell's discharge can only be considered to represent contrast if spatial phase is held fixed. For complex cells, within the first 100 ms, Hpooled is (just barely) greater than Hindiv, consistent with the idea that a complex cell's response is phase independent, and thus contrast can be represented reliably without a confound by spatial phase. However, later in the on response, complex cells apparently behave in a phase-dependent manner that confounds the representation of contrast
that is, Hindiv is greater than Hpooled for an analysis of the first 256 ms. Finally, when the off response is included (
473 ms), Hindiv is comparable with Hpooled for simple and complex cells, indicating phase-independent representation in both cell populations. This is primarily a result of an increase in Hpooled, indicating that the off response is relatively phase independent. Additionally, Hindiv decreases somewhat for both cell types. The significance of this decrease is unclear, but it may indicate that the contrast dependence of the off response is distinct from that of the on response and hence confounds the representation of contrast when it is included in the response.
For spatial frequency (Fig. 10B), the picture is more straightforward. For both simple and complex cells, Hindiv is greater than Hpooled for all analysis intervals, indicating that variations in spatial phase interact with (i.e., confound) the representation of spatial frequency. Regarded in this way, both simple and complex cells' responses are phase dependent. For orientation (Fig. 10C), a similar confounding effect of spatial phase is seen for simple cells. However, for complex cells, Hpooled is greater than Hindiv at all analysis intervals, indicating that representation of orientation is relatively independent of variations in spatial phase.
Joint encoding of two stimulus parameters
The preceding analysis investigated the extent to which the output of a neuron in primary visual cortex can represent a single stimulus attribute. However, contrast, spatial frequency, and orientation interact to determine a neuron's response. Next, we examine data sets in which two parameters were varied in addition to spatial phase to determine to what extent joint representation of multiple attributes is affected by variations in spatial phase.
DATA SET IN DETAIL.
Figure 11 shows responses of an oriented, directionally selective V1 simple cell to gratings at three contrasts and two orientations, the preferred orientation (Fig. 11A), and an off-peak orientation (Fig. 11B). This unit had a spatial frequency cutoff of 6 cycles/deg, and the responses illustrated were recorded with 2 cycles/deg gratings. This is one of the most clear-cut "simple" cells we encountered: responses are strongly dependent on spatial phase. There was a sufficiently high maintained discharge so that one could see a reduction in firing accompanying a stimulus 180° away from the peak spatial phase. Orientation and phase interact (cf. the 2 orientations and the spatial phases of 90 and 270°) but (within each phase) orientation tuning did not depend on contrast. The clustering analysis (Fig. 12) reflects this interaction of spatial phase with orientation in that Hindiv exceeds Hpooled for each of the analysis intervals.

View larger version (31K):
[in this window]
[in a new window]
| FIG. 11.
Responses of a V1 simple cell to gratings that vary in contrast (rows: 0.125, 0.25, and 0.5), orientation (A: 67.5°, the preferred orientation; B: 112.5°) and spatial phase (columns: steps of 90°). Stimulus onset is at time 0. Vertical line at 237 ms marks the disappearance of the stimulus. Spatial frequency: 2 cycles/deg. Unit 28/5.
|
|

View larger version (20K):
[in this window]
[in a new window]
| FIG. 12.
Analysis of encoding of contrast for the dataset of Fig. 11. , Hpooled; , Hindiv, plotted as in Fig. 3. Analyses are performed on the responses restricted to the first 100 ms (A), the first 256 ms (B), and the first 473 ms (C).
|
|
The maximal response is elicited at a contrast of 0.5, an orientation of 67.5°, and a spatial phase of 90°. Submaximal responses are elicited by changing spatial phase, contrast, or orientation. There appears to be a subtle change in response dynamics elicited by an off-peak orientation (112.5°, Fig. 11B): at optimal spatial phase (180°), the response has a gradual buildup during the last half of the presentation of the grating; at other phases, the response is primarily contained in a transient just after stimulus onset. At the preferred orientation (67.5°, Fig. 11A), high-contrast responses are primarily transient, and responses to the lowest contrast are relatively sustained, a result seen for many different types of transiently presented stimuli in the work of K. P. Purpura and L. M. Optican (unpulished results). No responses in Fig. 11A show the buildup seen at the off-peak orientation. This difference in responses to preferred and nonpreferred gratings is reflected in an increase in clustering for metrics that are sensitive to temporal structure: maximal values of Hindiv are achieved for Dspike[q] with q in the range of 16-64. However, this change in temporal structure is phase dependent, and thus there is no corresponding increase in Hpooled. That is, the strongly phase-dependent nature of this simple cell's response confounds the representation of contrast and orientation, unless spatial phase is fixed.
SUMMARY ACROSS DATA SETS.
Observations from individual data sets were collated as described above for the single-parameter experiments. However, because there were relatively few experiments with each pair of parameters (Table 1), all two-parameter experiments were pooled (Fig. 10D). Although the degree of clustering in simple cells is higher, simple cells show a confounding effect of spatial phase (Hindiv greater than Hpooled for all analysis intervals), whereas complex cells do not (Hpooled greater than Hindiv for all analysis intervals).
Contribution of spatial phase to response variability
Several authors have reported that the variability in a single neuron's response is greater than that expected from a Poisson process (Holt et al. 1996
; Softky and Koch 1993
; Tolhurst et al. 1981
, 1983
; Victor and Purpura 1996a
). One possible contributing factor to this (especially in studies in awake animals) is that small fluctuations in eye position effectively lead to changes in spatial phase, and hence, greater variability (Gur and Snodderly 1987
). Thus reliable but phase-dependent responses might be mistaken for responses with intrinsically high variability. We investigated this possibility directly by comparing response statistics with and without explicit variation of spatial phase.
For this purpose, we examined responses obtained during the first 256 ms after the presentation of a grating of optimum contrast, spatial frequency, and orientation. We only considered data sets in which responses to 16 spatial phases were recorded and in which there was a clear systematic dependence of the response on spatial phase. Additionally, data sets in which inspection of the rasters showed a change in overall firing rate from the beginning to the end of the experimental run were excluded, as were data sets in which individual rasters contained a long run of "spikes," suggestive of possible artifact. With these restrictions, analysis was focused on nine simple cells and four complex cells (Tables 2 and 3).
One way of comparing our data with the expectations of a Poisson process is to examine the fraction of trials that contained specific numbers of spikes. For a Poisson process, the fraction of responses with exactly n spikes should be given by f(n) = (Nn/n!)e
N, where N is the mean number of spikes per trial. For each data set, this distribution was compared with the observed fraction of trials with n spikes via the
2 test (Table 2). When responses to all spatial phases were pooled, all data sets deviated in a highly significant manner (P < 0.001) from the expectations of a Poisson process. When responses to each spatial phase were considered individually, 73% of the data sets (85% of those derived from simple cells, 44% of those derived from complex cells) had a response distribution that deviated in a highly significant manner (P < 0.001) from Poisson expectations. To ensure that these findings were not the result of inclusion of a small number of outliers, this analysis was repeated after exclusion of the responses that were in the upper quartile of the spike count distribution. This truncated distribution was compared with a similarly truncated Poisson distribution. Again, all pooled data sets were inconsistent (P < 0.001) with Poisson expectations, as were most phase-specific data sets (78% at P < 0.05, 60% at P < 0.001).
This analysis shows that firing is not governed by a Poisson process but does not characterize the nature of the deviation. To characterize the nature of the deviation, we examined the variance in the number of spikes per trial (Table 3). If firing were governed by a Poisson process (even one whose mean rate varied with time), then the variance in the number of spikes in a trial should be equal to the average number of spikes. For processes that are more regular than Poisson (e.g., integrate and fire), the variance/mean ratio will be <1. A refractory period also will tend to decrease the variance/mean ratio because it "regularizes" the spike train. For processes that are more irregular than Poisson (e.g., more "bursty"), the variance/mean ratio will be >1. With responses pooled across spatial phases, this ratio was >1 for all cells examined (minimum: 1.85, maximum, 5.42, geometric mean: 3.05), and there was no significant difference (P > 0.05 by t-test) between simple cells (geometric mean 3.05) and complex cells (geometric mean 3.04). To determine the extent to which variation of spatial phase contributed to this excess variance, the same responses were analyzed with each spatial phase considered individually. The variance/mean ratio decreased
from 3.05 to 2.63 (P < 0.01 by paired t-test) for simple cells and from 3.04 to 2.31 (P < 0.05 by paired t-test) for complex cells. However, after the removal of the variance due to spatial phase, the variance/mean ratio was still >1 for all cells examined (minimum 1.58, maximum 4.80, geometric mean 2.53), and again, there was no significant difference (P > 0.05 by t-test) between simple cells (geometric mean 2.63) and complex cells (geometric mean 2.31). (The lack of a difference between simple and complex cells does not imply that there is no difference in the dependence of responses on spatial phase between simple and complex cells in general
as noted earlier, this analysis only included data from either cell class recorded under conditions in which a phase dependence was apparent.) Across all data sets, variation in spike count due to variation in spatial phase accounted for an average of 14% of the variance (range: 2-30 ± 8%, mean ± SD), but this source of variance was not nearly enough to account for the excess variance compared with the expected variance of a Poisson process.
In sum, although variation in spatial phase does contribute to the variability in the response of simple and complex cells to gratings in random positions, this source of variation is relatively small. Even when it is removed, firing statistics of simple and complex cells show substantially greater variance than would be expected from a Poisson process. The amount of excess variance we observed was comparable with the threefold excess observed by others (Tolhurst et al. 1981
, 1983
), further indicating that changes in spatial phase are at most a minor contributor to response variability in cortical neurons.
 |
DISCUSSION |
Summary of results
Our major aim was to analyze how spatial phase interacted with contrast, spatial frequency, and orientation to produce changes in the temporal pattern of spike discharges in the transient response. We use the term "phase independent" to refer to the representation of a particular attribute if, in a statistical sense, the representation of the attribute of interest in the temporal pattern of the response is not degraded by including the responses to gratings with different spatial phases. That is, with phase-independent representation, the value of the attribute can be determined from the neural response even if spatial phase is not fixed. (A phase-independent representation may be the result of phase-invariant responses. But phase-independent representations also can arise if the stimulus attribute of interest and spatial phase both influence the response but in a manner in which the effects of spatial phase do not constitute a confound.) Conversely, we use the term phase dependent to refer to the representation of a particular attribute if the spatial phase must be fixed to determine the value of the attribute from the neural response. For contrast (Fig. 10A), representation was strongly phase dependent in simple cells. Complex cells represented contrast in a phase-independent manner in the initial response segment (the first 100 ms), but the full on response (256 ms) depended jointly on contrast and spatial phase. For spatial frequency (Fig. 10B), representation was phase dependent in simple and complex cells. For orientation (Fig. 10C), representation was phase dependent in simple cells but phase independent in complex cells. Finally, although changes in spatial phase influence grating responses in most neurons, it is not the source of the supra-Poisson variability across trials reported by us and by others (Tolhurst et al. 1981
, 1983
).
One rationale for transient stimulation was that it provides a convenient means to follow the pattern of evolution of the response over time. Steady-state stimulation protocols might have yielded other results
including a reduction or elimination of temporal coding (Mechler et al. 1997
). However, transient presentation more closely mimics the time course of retinal stimulation that occurs during a sequence of physiological visual fixations (Viviani 1990
). Moreover, the use of drifting gratings in these experiments necessarily would have induced a technical confound between spatial phase and time lags in neural circuits, because changing the initial spatial phase of a drifting grating is the same as shifting it in time.
Implications for receptive field organization
The classification (Skottun et al. 1991
) of simple and complex cells on the basis of whether their responses were phase dependent or not might lead to the expectation that in simple cells all attributes are represented in a phase-dependent manner, whereas in complex cells, all attributes are represented in a phase-independent manner. As described above, our data do not conform to this expectation. There are several factors that likely underlie this departure. First, the classification of simple and complex cells is not dichotomous. Many cells show both phase-dependent and phase-independent behavior (Pollen et al. 1988
; Spitzer and Hochstein 1985a
): a cell may be classified as complex because of its phase-invariant responses at high spatial frequencies yet may display prominent phase-dependent responses at low spatial frequencies. Thus especially in experiments that compare responses across a range of spatial frequencies, complex cells may display hallmarks of phase dependence.
Second, the simple/complex classification implicitly ignores the possible informative value of the temporal pattern of responses. Phase-independent representation of visual attributes might appear to be phase dependent if timing were ignored. This can occur if a particular pattern of spikes (or the number of spikes within a brief interval) represents the attribute of interest, but these spikes might comprise only a small portion of the total discharge, the bulk of which could be phase dependent. We did not see this phenomenon in most of the recorded cells. The converse, however, was a prominent finding: representation that is phase dependent could appear to be phase independent if timing were ignored. An example of this shown in Fig. 7, A and B, in which Hpooled > Hindiv for the spike count metric Dspike[0], but Hindiv > Hpooled (Fig. 7A) or Hindiv
Hpooled (Fig. 7B) for the optimal spike time metric Dspike[q]. That is, that number of spikes in the response to a grating may be relatively independent of spatial phase, even though their timing is strongly dependent on phase.
Finally, our analysis shows that the balance between phase-dependent and phase-independent representations evolves over time (Fig. 10). As a cortical neuron's response evolves, its inputs include contributions both from intrinsic cortical circuitry and feedback pathways between cortical areas (Bullier and Nowak 1995
). K. P. Purpura and L. M. Optican (unpublished results) found that the initial 50 ms after stimulus onset carried measurable amounts of information about the orientation and spatial frequency of transiently presented sinewave gratings. There was a rapid rise in information between 50 and 100 ms followed by a slower rise during the following 100 ms. This suggested that the rise in information between 100 and 200 ms was due to local recurrent and feedback circuits and that the prolonged tonic activity in feedforward pathways may contribute to temporal encoding through the activation of and interaction with membrane components in cortical neurons that produce bursts and other temporal patterns. In sum, whereas the initial visual response reflects the geniculocalcarine connections in a straightforward way, the remainder of the response is influenced heavily by inputs from other cortical neurons. We showed (Fig. 10A) that for complex cells, the initial 100 ms represents contrast in a phase-independent manner, but the full on response shows substantial phase dependence. The initial component may well be explained by a superposition of feedforward nonlinear receptive field subunits (Spitzer and Hochstein 1985b
); we hypothesize that the later, phase-dependent components represent the influence of intrinsic cortical activity.
To the extent that simple cells can be considered as approximately linear, the interdependence of spatial phase and the other stimulus parameters is easy to understand. For linear receptive fields, the average response (across all spatial phases) to gratings of any chosen contrast, spatial frequency, and orientation must be zero. That is, any change in the mean firing rate can always be mitigated by a change in spatial phase. Consequently, a linear representation of a stimulus attribute without a confound by spatial phase must use the dependence of the temporal structure of the response. In other words, for a linear receptive field structure, phase-independent representation requires spatiotemporal inseparability and would only be seen for metrics that are sensitive to temporal pattern. [A "separable" receptive field is one with a spatiotemporal sensitivity profile that can be expressed as a single spatial function multiplied by a single temporal function. In response to pattern appearance, such receptive fields generate a stereotyped temporal response the overall amplitude, but not shape (timecourse of activation), of which can be modified by the spatial pattern. An "inseparable" receptive field, which could be constructed by summing together two separable components, can produce response profiles with a shape, as well as overall size, that depends on the spatial pattern.] In our population of simple cells, representation was primarily phase dependent, indicating that, in a functional sense, simple cells behaved as if they were predominantly linear and separable. However, there were indications that inseparability and nonlinearity did contribute to phase-independent representation. The evidence for inseparability is that phase-independent clustering, when present, was generally higher for the spike time metrics (Dspike[q], q > 0) than for the spike count metric Dcount = Dspike[0] (e.g., Fig. 3). This indicates that phase-independent representation exploits the time course of the response, and not just amplitude. The evidence for a contribution of response nonlinearity is that in some cases (as in the analysis of contrast in Fig. 7), a modest phase-independent representation was present for Dcount = Dspike[0]. Because for linear receptive fields the average response (across all spatial phases) must be zero, a phase-independent representation that is manifest in the overall firing rate implies a significant contribution from a receptive-field nonlinearity
in this case, the absence of a maintained discharge.
For complex cells, the subunit model proposed by Spitzer and Hochstein in the cat (1985b) accounts readily for the interdependence of spatial phase and spatial frequency on the basis of the phase-dependent responses of the subunits. However, this model also predicts a comparable interdependence of spatial phase and orientation. These predictions hold if the signals within elongated subunits (Szulborski and Palmer 1990
) are summed before rectification (as originally proposed by Spitzer and Hochstein 1985b
) or if there is a local rectification within these elongated regions as well (Purpura et al. 1994
; Victor and Conte 1991
). Although there is some evidence that the orientation specificity of cortical cells in Layer IV is determined by their subcortical inputs (Ferster 1987
; Reid and Alonso 1995
), there is also evidence that intracortical processing plays a substantial role in orientation tuning (Bonds 1989
; Morrone et al. 1982
; Sillito 1975
; Sillito and Jones 1996
), particularly as the response evolves in time (Ringach et al. 1997
; Volgushev et al. 1995
; K. P. Purpura and L. M. Optican, unpublished data). Orientation-specific inputs from other cortical neurons (either excitatory or inhibitory) can lead to the phase-independent representation of orientation that we observe, provided that these inputs act as nonlinear subunits or have distinctive time courses. (Otherwise, their impact would merely be to change the effective sensitivity profile of the receptive field.) To remain consistent with our findings that spatial-frequency representation is phase dependent, we postulate that these subunit inputs span a broad range of spatial scales and thus do not contribute to spatial frequency tuning. In this way, intracortical connectivity among cells that share a common orientation could provide a mechanism for phase-independent representation of orientation but not spatial frequency.
The existence of a system of intracortical connections in V1 that primarily involves neurons of similar orientation preferences but different spatial frequency tunings is supported by independent experimental studies. Connections between neurons of like orientation preference were demonstrated by a cross-correlation method (Ts'o et al. 1986
). Interactions of oriented subunits that span a range of spatial scales was shown to be the crucial computational element required to account for isodipole texture selectivity (Purpura et al. 1994
).
A neuron with a receptive field built from nonlinear subunits with a common orientation tuning but a range of spatial frequency tunings could function as a feature detector for specific nonsinusoidal one-dimensional profiles, such as edges. These profiles have Fourier components that share a common orientation but span a wide range of spatial frequencies. If the relative phases of these Fourier components match those of the corresponding subunits, a large response would result.
Cells classified as simple and complex have both phase-dependent and -independent response features that evolve over time (Fig. 10, A-C). This is not a statement that the simple/complex distinction has no value. Rather the analysis of temporal pattern adds to the understanding of this classification and the receptive-field structures that might underlie it.
Our finding (Fig. 10D) that complex cells can represent at least two spatial parameters in the face of variations in spatial phase provides additional evidence that their receptive field structure is functionally elaborate. Within the framework of a subunit model, representation of two attributes independently requires that the subunits themselves are spatiotemporally inseparable and that the subunit signals combine in a temporally coherent fashion.
Spatial phase, spatial frequency, and location
Early cortical circuitry must process visual information for a variety of purposes. For some purposes (e.g., resolution), point-like receptive fields represent information in the most immediately useful form, while for other purposes (e.g., texture analysis), receptive fields that are tuned to specific spatial frequencies represent information in the most immediately useful form. However, a cortex that contained, at every spatial location, a full complement of cortical neurons that behaved as local Fourier analyzers subserving every orientation, spatial frequency, spatial phase, and bandwidth would be highly redundant. The Gabor-like spatial structure of simple cortical cell receptive field profiles is well recognized (Kulikowski and Vidyasagar 1986
; Kulikowski et al. 1982
; Ohzawa et al. 1996
), and theorists have advanced arguments for a variety of evolutionary and developmental pressures that favor this kind of structure (Atick 1992
; Daugman 1990
; Field 1987
; Olshausen and Field 1996
) as a compromise between the demands of analyses localized in space and analyses localized in the Fourier domain.
A similar tension exists between spatial phase and spatial position. For some purposes, spatial phase is a crucial stimulus attribute. For example, a superposition of sinusoidal components forms an edge (or any other local feature) only if the phase relationships are appropriate. Another example is stereopsis, the neural mechanism of which appears to rely critically on spatial phase (Ohzawa et al. 1996
). For other purposes, orientation and spatial position are key but spatial phase can be ignored. For example, an object's boundary can be demarcated by a thin line or an edge (luminance step). From the point of view of a local analyzer centered at this boundary, a thin line would appear to have cosine phase (or antiphase, depending on polarity), whereas the edge would appear to have rising or falling sine phase (depending on the direction of the luminance gradient). Each of these local features has distinct phase characteristics, but once they have been extracted for image segmentation, only their orientations and positions are important.
It has been suggested that the visual system meets the need to analyze both spatial phase and spatial position by limiting the spatial phases represented at each point to a stereotyped few
i.e., even- and odd-symmetric receptive fields or quadrature pairs (Emerson 1997
; Emerson and Huang 1997
; Field and Tolhurst 1986
; Liu et al. 1992
; Rentschler and Treutwein 1985
). Our data support another strategy to resolve the conflicting demands of phase-independent and phase-dependent representation. As summarized in Fig. 10, cortical neurons can represent contrast, orientation, and, to a limited extent, spatial frequency in a phase-independent manner. Optimal clustering is achieved for values of q in the range 16-64 s
1, corresponding to a temporal precision (1/q) of ~15-60 ms. On the other hand, when spike trains are viewed with a somewhat higher temporal resolution (up to q = 128 s
1) then spatial phase itself is represented in a systematic manner (Victor et al. 1997
). This finding was present in both simple and complex cells and thus appears to be more general than the well-known spatiotemporal progression within the receptive field thought to underlie direction selectivity in simple cells (McLean and Palmer 1989
; McLean et al. 1994
; Movshon et al. 1978
; Reid et al. 1987
, 1991
; Saul and Humphrey 1992b
). We point out that although spatiotemporal quadrature has theoretical advantages for motion analysis (Adelson and Bergen 1985
), mere spatial and temporal offsets of separable receptive field elements suffice to represent spatial phase via the temporal structure of the response. The temporal asynchrony across phases could be generated by lagged geniculate cells (Saul and Humphrey 1992a
), but much smaller temporal offsets (of only 20 ms) also would suffice to account for our observations. Offsets in this range could be generated by the intrinsic dynamics of signal integration in cortical neurons (Frégnac et al. 1996
). A receptive field the spatial components of which are offset but not orthogonal in time plays the dual role of representing a range of spatial phases (when its discharge is viewed as a high-resolution spike time code) and the effective contrast at a single spatial phase (when its discharge is viewed as an overall rate code). When the discharge is viewed with an intermediate temporal resolution, our data show that certain stimulus attributes may be represented in a phase-independent manner.
Feature extraction is a nonlinear process that requires joint processing of spatial phase, spatial frequency, and orientation in localized patches. This work indicates that in V1, spatial phase and spatial frequency are represented jointly but orientation is represented in a largely phase-independent manner. Temporal structure may improve the efficiency of these representations. Moreover, we speculate that neural mechanisms sensitive to spike timing, such as coincidence detection, provide the nonlinear interactions that are required for the extraction of one-dimensional features.