Reward-Predicting and Reward-Detecting Neuronal Activity in the Primate Supplementary Eye Field

Nelly Amador,1 Madeleine Schlag-Rey,2 and John Schlag2

 1Interdepartmental Program in Neuroscience and  2Department of Neurobiology and Brain Research Institute, UCLA School of Medicine, Los Angeles, California 90095-1763


    ABSTRACT
TOP
ABSTRACT
INTRODUCTION
METHODS
RESULTS
DISCUSSION
REFERENCES

Amador, Nelly, Madeleine Schlag-Rey, and John Schlag. Reward-Predicting and Reward-Detecting Neuronal Activity in the Primate Supplementary Eye Field. J. Neurophysiol. 84: 2166-2170, 2000. In addition to cells specifically active with visual stimuli, saccades, or fixation, the supplementary eye field contains cells that fire in precise temporal relationship with the occurrence of reward. We studied reward-related activity in two monkeys performing a prosaccade/antisaccade task and in one monkey trained in memory prosaccades only. Two types of neurons were distinguished by their reciprocal firing pattern: reward-predicting (RP) and reward-detecting (RD). RP neurons linearly increased their firing as early as 150 ms before saccade onset until the occurrence of reward, at which time they abruptly ceased firing. In contrast, RD neurons fired in phase with reward delivery, even when its duration was varied and when it was repeated at different frequencies. RD discharges were little affected or unaffected by the position of a visual cue that briefly anchored the goal at the onset of reward. The complementary firing patterns of the RP and RD neurons could provide a feedback mechanism necessary for learning and performing the task.


    INTRODUCTION
TOP
ABSTRACT
INTRODUCTION
METHODS
RESULTS
DISCUSSION
REFERENCES

To a monkey performing a task, the reward given on correct performance is perhaps the most significant event of the trial. Until the monkey "learns the rules of the game," she proceeds by repeating only those behaviors that are rewarded and by avoiding those that are not. In primates, reward-dependent neuronal activity has been found in the prefrontal cortex, parietal cortex, rostral cingulate cortex, and basal ganglia (see DISCUSSION). The present study especially is concerned with neuronal signals time-locked to the reward event that were recorded in the supplementary eye field (SEF). A preliminary report appeared in Amador et al. (1999).

Patterns of neuronal activity concomitant with reward were analyzed in three monkeys who were trained to perform delayed and nondelayed saccade tasks, including prosaccades and antisaccades. In our experiments, monkeys often had to make a saccade to an unmarked location. When the saccade terminated in a window centered on the required goal and gaze remained in the window for 300-500 ms, a flash of light paired with juice delivery indicated this goal (but no flash/juice appeared on incorrect trials). The concomitant occurrence of the feedback flash and the juice is here defined as reward. One objective of this study was to determine how these events were represented in the reward-related neuronal activity.

Reward-related neurons were recorded in the SEF region containing saccade-related and/or visual fixation cells similar to those previously described (Schlag and Schlag-Rey 1987; Schlag et al. 1992). Two types of neurons were distinguished by their reciprocal firing patterns: reward-predicting (RP) and reward-detecting (RD).


    METHODS
TOP
ABSTRACT
INTRODUCTION
METHODS
RESULTS
DISCUSSION
REFERENCES

Single-unit recordings were made from the SEF of three rhesus monkeys (implanted with scleral search coils) performing memory-guided saccades and pro/antisaccades (Amador et al. 1998; Schlag-Rey et al. 1997). Receptive fields and/or movement fields were first mapped by presenting light spots (0.3°) on a tangent screen. Saccade targets were then presented at the center of---diametrically opposite---the response field. Correct performance was rewarded by 50% diluted apple juice (sweetened with aspartame) paired with a flash of light at the prescribed saccade goal. An infrared camera was used to observe licking and drinking movements. All task events were controlled by computer. Prior electrical stimulation (<= 40 uA) ascertained that the recording sites were in the SEF. Final verification by histological examination was obtained in two monkeys. All procedures conformed with National Institutes of Health guidelines for the care and use of laboratory animals and were approved by the UCLA Animal Research Committee.

When a neuron showed a pattern of firing that was time-locked to reward, one or more tests were applied: 1) the flash or the juice, or both, was omitted; 2) within a trial, the frequency and/or the duration of the compound reward was varied; 3) juice and feedback light were repeated at asynchronous frequencies. A Fourier analysis was then used to determine the relative power of light-related and juice-related signals.


    RESULTS
TOP
ABSTRACT
INTRODUCTION
METHODS
RESULTS
DISCUSSION
REFERENCES

Three types of reward-related activity were distinguished. One of them (n = 11/65) was invariably associated with oral movements. Because it was observed at posterior locations, possibly at sites in the supplementary motor area (SMA) (see Schall 1991), it will not be considered here. The other two types were complementary: one predicting the imminence of reward (RP neurons; n = 23/65) the other signaling its occurrence (RD neurons; n = 31/65). RP and RD neurons were recorded within the physiologically and anatomically defined SEF (Schlag and Schlag-Rey 1987; Shook et al. 1990), i.e., at a level slightly anterior to the bend of the arcuate sulcus, at sites 3-4 mm distant from the midline in superficial and deep layers of the dorsomedial cortex. RP and RD neurons were found intermingled with saccade-related (SR) neurons within the same or adjacent tracks. In recent experiments, the ratio of reward-related (RP + RD) neurons to saccade-related neurons was 2:1. However, this proportion may have been influenced by the complexity of the task and/or by its recent learning (see Hollerman et al. 1998).

The firing of RP neurons often started before the eye movement but culminated at the onset of reward (Fig. 1B). This smooth increase in firing resembled that of SR neurons commonly encountered in the SEF. However, SR and RP neurons were easily distinguished by the timing of their peak firing. Typical SR firing (Fig. 1A) peaked during the saccade and then progressively declined whereas the RP discharge (Figs. 1B and 2A) kept increasing throughout the movement and throughout a subsequent fixation period (300-500 ms), peaking, on average, 133 ± 32 ms before reward onset. For 9/23 RP neurons, the prelude started after the saccade (Fig. 2B). RP neurons were further characterized by a striking inhibition starting with and lasting through the reward epoch. The difference between peak firing rate and activity following reward onset (both measured over 100 ms, paired t-test) was significant at P < 0.001 for all but one neuron (P < 0.05).



View larger version (36K):
[in this window]
[in a new window]
 
Fig. 1. Contrast between saccade-related (A) and reward-predicting (B) activity. Rasters and spike density profiles are aligned on saccade start. Red bar, duration of reward epoch. Icons represent presence or absence of light and juice. A: saccade-related neuron increasing its firing frequency before a saccade in the preferred direction. The peak activity was reached before the end of the saccade and sharply dropped within 100 ms following saccade offset. B: reward-predicting (RP) neuron increasing its firing frequency before, during, and after the eye movement until the occurrence of the reward silenced the neuron (solid curve). When the reward was omitted unexpectedly on interleaved trials (even though performance was correct), the activity was prolonged and gradually subsided (dotted curve).



View larger version (40K):
[in this window]
[in a new window]
 
Fig. 2. Reciprocal activity of 3 different RP neurons (A-C) and 2 different reward-detecting (RD) neurons (D-E) showing antiphase inhibition/activation time-locked to reward (solid red bar). Rasters and spike density profiles are aligned on saccade end. Green ticks mark saccade start. A: an RP neuron abruptly stopped firing shortly after reward onset. D: an RD neuron was activated for the duration of the reward. B and E: timing contrast between the firing of an RP neuron (B) and an RD neuron (E) when, after the first usual reward (light and juice), juice was delivered alone. Without the joint occurrence of the light, the response of the neuron in E was weaker. C: RP neuron showing a cyclic activation/inhibition pattern with multiple rewards of shorter duration. F: average spike density profiles of 3 groups of RD neurons (n = 19) activated during a reward lasting, respectively, 300 ms (solid curve, n = 10), 400 ms (dashed curve, n = 4), and 500 ms (dotted curve, n = 5).

To probe the role of the reward event in inhibiting RP neurons, trials ending with and without reward were interleaved. The monkey could not predict when this reward/no reward paradigm would be introduced. When reward was omitted, the RP activity was prolonged, as if the monkey still expected the reward, but then decreased much more slowly (Fig. 1B, dotted curve) than when reward was given (Fig. 1B, solid curve). Thus the RP profile appears not only to predict the timing of reward but also to reflect the subject's confidence in performing correctly because, in previous trials, correct performance was always rewarded.

In contrast to RP neurons, RD neurons (Fig. 2, D-F) were silent before reward onset, but they fired at a time when the RP neurons were inhibited; compare Fig. 2, A and D. The difference between peak firing and activity preceding reward (measured as for RP neurons) was significant at P < 0.001 (n = 13), P < 0.01 (n = 7), and P < 0.05 (n = 1). Whereas the RP neurons stopped firing just before reward onset (mean = 3 ± 9.3 ms, SE; n = 22), RD neurons produced a few spikes before that event (mean = 10 ± 4.9 ms; n = 20; this group excludes neurons tested with asynchronous light and juice). As expected, RD neurons were completely silent on incorrect trials. All of the 31 RD neurons signaled the occurrence of the reward regardless of the location of the visual stimulus or the direction of eye movement. RD bursts depended neither on visual fixation per se (as evident from other periods of fixation during a trial; see Fig. 2E), on small corrective saccades (which eventually occurred after reward onset, but not always; see Fig. 3A), or on licking and swallowing movements (with which RD bursts were visibly asynchronous).



View larger version (37K):
[in this window]
[in a new window]
 
Fig. 3. A: seven eye-movement traces showing stable fixation for periods during which the juice and feedback light were given asynchronously. B and C: comparison of 2 neurons, one activated with light (B), the other activated with juice (C). Light (blue ticks) and juice (red bars) were asynchronously presented. Rasters and spike density profiles (B1 and C1) above corresponding results of Fourier analysis. B2 and C2, top: peak frequency components in each neuron's spike train; bottom: frequency of the juice (red line) and that of the light (blue line).

RP and RD neurons remained contrasted when juice was given with, and then without, feedback light (compare Fig. 2, B and E). Typically, RP firing increased in expectation of the reward (light and juice) and was inhibited afterwards. Juice given alone (Fig. 2B, second reward) also inhibited RP firing, but not immediately. As for the RD response, it was still present but significantly weaker in the juice-alone condition.

When we varied the frequency and duration of the compound reward, RP neurons still exhibited their predictive pattern (Fig. 2C). For RD neurons, increasing the duration of the reward epoch prolonged the activity (Fig. 2F, group profiles) whereas increasing the frequency of shorter rewards increased the number of discrete bursts (not illustrated).

For 10 neurons in 28 experiments with repeated reward, juice and light of various durations were combined at different frequencies. A Fourier analysis was used to discriminate signals related to juice delivery from those related to feedback light during stable fixation (Fig. 3A). The majority of neurons (n = 8/10) responded exclusively to or more strongly to the light (Fig. 3B) than to the juice (even when the light was given for 2 ms) whereas two neurons responded exclusively to the juice (Fig. 3C).


    DISCUSSION
TOP
ABSTRACT
INTRODUCTION
METHODS
RESULTS
DISCUSSION
REFERENCES

The major finding of the present study is that neurons displaying reciprocal firing patterns hinged on reward for correct eye movements can be found in the SEF. Signals related to reward expectation or detection of correct hand movements have been reported in many studies and in diverse areas such as the caudate nucleus (Hikosaka et al. 1989), prefrontal cortex (Rosenkilde et al. 1981; Tremblay and Schultz 1999; Watanabe 1989), and rostral cingulate motor area (Shima and Tanji 1998).

In contrast with studies showing modulations of visual or motor signals resulting from differential reward (e.g., Hollerman et al. 1998; Kawagoe et al. 1998; Platt and Glimcher 1999; Shima and Tanji 1998; Watanabe 1996), in our experiments, the reward remained invariant during a session.

The RP profile of excitation/inhibition could not be confounded with a presaccadic prelude of activity because it had a clearly different time course and was time-locked to a different event. RD bursts were specifically triggered by the reward onset and, in most cases, they lasted for the duration of the reward epoch. These activities did not depend on fixation, corrective saccades, or oral movements. More data are needed to specify the exact temporal relationship between RP inhibition and RD excitation, but their striking coincidence suggests that they might be complementary elements in a control system used to self-evaluate performance of a task (see also Stupphorn et al. 1999).

Except in recent tests, our monkeys always saw a feedback light appear simultaneously with juice delivery. The juice served to sustain the monkey's motivation for performing the task and the feedback light provided an error estimate of the difference between the required goal and the achieved goal of the saccade. Because this light was constantly paired with the primary reinforcer (juice), it played an important role as a secondary reinforcer. What was the respective influence on RD firing of the motivation and cognitive elements confounded in a compound reward occurrence? Our signal analysis suggests that, in our experiments, the feedback light had a greater influence than the juice. Accurate feedback may be particularly important when making a saccade to an internally defined goal, such as in antisaccade trials. The fact that the neurons described were found in the SEF, where sensori-oculomotor signals are processed, is consistent with a cognitive role for this region (Chen and Wise 1995). Further research will have to determine whether this cognitive aspect of reward is more prominently represented in the SEF than, for example, in the orbitofrontal cortex, where neurons appear to be more attuned to motivational parameters of reward (for a recent review see Schultz et al. 2000).

The SEF may be one of the brain sites that encode the expected timing of reward and record its actual occurrence, as distinct from the quality of reward. Whether this activity is unique to the SEF or whether it is prompted by the complexity of the task remains to be seen.


    ACKNOWLEDGMENTS

We thank the anonymous referees for constructive suggestions, A. Dorfsman for computer expertise, and Y. Kwon for general assistance.

This work was supported by National Eye Institute Grants EY-02305 and EY-05879.


    FOOTNOTES

Address for reprint requests: M. Schlag-Rey, Dept. of Neurobiology, UCLA Medical Center (CHS), 10833 Le Conte Ave., Box 951763, Los Angeles, CA 90095-1763 (E-mail: msr{at}ucla.edu).

The costs of publication of this article were defrayed in part by the payment of page charges. The article must therefore be hereby marked "advertisement" in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.

Received 28 March 2000; accepted in final form 19 June 2000.


    REFERENCES
TOP
ABSTRACT
INTRODUCTION
METHODS
RESULTS
DISCUSSION
REFERENCES

0022-3077/00 $5.00 Copyright © 2000 The American Physiological Society