Influence of Expectation of Different Rewards on Behavior-Related Neuronal Activity in the Striatum

Oum K. Hassani, Howard C. Cromwell, and Wolfram Schultz

Institute of Physiology, University of Fribourg, CH-1700 Fribourg, Switzerland


    ABSTRACT
TOP
ABSTRACT
INTRODUCTION
METHODS
RESULTS
DISCUSSION
REFERENCES

Hassani, Oum K., Howard C. Cromwell, and Wolfram Schultz. Influence of Expectation of Different Rewards on Behavior-Related Neuronal Activity in the Striatum. J. Neurophysiol. 85: 2477-2489, 2001. This study investigated how different expected rewards influence behavior-related neuronal activity in the anterior striatum. In a spatial delayed-response task, monkeys reached for a left or right target and obtained a small quantity of one of two juices (apple, grenadine, orange, lemon, black currant, or raspberry). In each trial, an initial instruction picture indicated the behavioral target and predicted the reward. Nonmovement trials served as controls for movement relationships. Consistent preferences in special reward choice trials and differences in anticipatory licks, performance errors, and reaction times indicated that animals differentially expected the rewards predicted by the instructions. About 600 of >2,500 neurons in anterior parts of caudate nucleus, putamen, and ventral striatum showed five forms of task-related activations, comprising responses to instructions, spatial or nonspatial activations during the preparation or execution of the movement, and activations preceding or following the rewards. About one-third of the neurons showed different levels of task-related activity depending on which liquid reward was predicted at trial end. Activations were either higher or lower for rewards that were preferred by the animals as compared with nonpreferred rewards. These data suggest that the expectation of an upcoming liquid reward may influence a fraction of task-related neurons in the anterior striatum. Apparently the information about the expected reward is incorporated into the neuronal activity related to the behavioral reaction leading to the reward. The results of this study are in general agreement with an account of goal-directed behavior according to which the outcome should be represented already at the time at which the behavior toward the outcome is performed.


    INTRODUCTION
TOP
ABSTRACT
INTRODUCTION
METHODS
RESULTS
DISCUSSION
REFERENCES

One of the central structures involved in the motivational control of behavior appears to be the striatum (Beninger 1983; Fibiger and Phillips 1986; Robbins and Everitt 1996; Wise 1982). Studies searching for neurophysiological correlates of motivational functions in primates describe neurons in the striatum (caudate nucleus, putamen, ventral striatum including nucleus accumbens) that process reward information in several distinctive forms. Some striatal neurons respond phasically following the delivery of a drop of liquid reward during well-established behavior, thus detecting the outcome of a behavioral action (Aosaki et al. 1994; Apicella et al. 1991a, 1997; Bowman et al. 1996; Hikosaka et al. 1989; Shidara et al. 1998). Other striatal neurons are activated during several seconds before the occurrence of a reward, suggesting access to stored information about the expected outcome (Apicella et al. 1992; Hikosaka et al. 1989; Hollerman et al. 1998; Schultz et al. 1992). With changing reward contingencies during learning, striatal neurons show changes in reward expectation activity that correspond closely to adaptations in the animals' behavior (Tremblay et al. 1998). Particularly interesting reward influences are seen in the striatum during the preparation and execution of movement. Neuronal activity is particularly prominent when rewards are expected as opposed to no reward (Hollerman et al. 1998; Kawagoe et al. 1998). These studies suggest that the striatum may constitute one of the prominent reward centers in the brain.

To elucidate the role of the primate striatum in the processing of reward information, we investigated to what extent neurons in the anterior striatum might discriminate between different rewards and how such information could influence neuronal activity related to the behavior leading to these rewards. We aimed primarily for the anterior, or "associative," striatum, which encompasses all three major striatal subdivisions (caudate, putamen, ventral striatum including nucleus accumbens), and made comparisons among these parts. Delayed-response tasks can be considered as crucial tests for assessing the behavioral functions of the striatum and frontal cortex (Divac et al. 1967; Jacobsen and Nissen 1937). We used a modified delayed-response task to test, separately, the behavioral reaction to be performed (arm movement to the left vs. right, movement vs. nonmovement reaction) and the type of liquid reward to be obtained (different juices). These procedures allowed us to investigate how an expected reward may influence neuronal activity during the decision, preparation, initiation, and execution of the behavioral reactions. The results were previously presented in abstract form (Cromwell et al. 1998).


    METHODS
TOP
ABSTRACT
INTRODUCTION
METHODS
RESULTS
DISCUSSION
REFERENCES

The study was performed on three macaque monkeys (A: Macaca fascicularis, female, 3.4 kg; B: M. fascicularis, male, 2.8 kg; C: M. mulatta, male, 6.0 kg). The activity of single neurons was recorded with moveable microelectrodes during performance of a spatial delayed-response task with different liquid rewards while monitoring arm muscle activity, licking movements, and eye movements. Electrode positions were reconstructed from small electrolytic lesions on 50-µm-thick, cresyl-violet-stained histological brain sections. Most methods were similar to those described in detail before (Hollerman et al. 1998). All experimental protocols conformed to National Institutes of Health guidelines and the Swiss Animal Protection Law; they were supervised by the Fribourg Cantonal Veterinary Office.

Behavioral procedures

Animals performed in a spatial delayed-response task for liquid reward. The monkey kept its right hand relaxed on an immovable, touch-sensitive resting key. It faced a 13-in computer monitor positioned behind a transparent plastic wall in which two small levers were mounted to the left and right of the midline. At the start of each trial, a color instruction picture (13 × 13°) appeared for 1 s on a computer screen above the left or right lever (Fig. 1, top). The instruction indicated both the target of a future arm movement and the kind of liquid reward obtained at trial end. After a randomly varying delay of 3.5-4.5 s following instruction onset, two identical red squares appeared simultaneously as movement trigger at the left and right positions of the instruction. The trigger determined the time of the behavioral response without indicating the spatial target or the specific reward. The animal released the resting key, touched the lever at the position previously indicated by the instruction and received the liquid reward indicated by the instruction. Both trigger squares extinguished on correct lever touch. Muscle contractions during the instruction-trigger delay, precocious key release, incorrect lever touch, or failure to touch the correct lever within 2 s after trigger onset was considered as an error, led to cancellation of the trial, and went unrewarded.



View larger version (73K):
[in this window]
[in a new window]
 
Fig. 1. Behavioral task and instruction picture sets. Top: spatial delayed response task. An initial instruction picture of 1-s duration indicated the left or right movement target and the liquid reward delivered at trial end. After a random delay of 3.5-4.5 s following instruction onset, two identical squares appeared and elicited the movement from the resting key to the left or right target lever indicated by the instruction. Correct performance was rewarded after a delay of 2 s with a small quantity of the liquid indicated by the instruction. Bottom: the most commonly used instruction pictures. Each instruction set contained 3 pictures for 3 different rewards, which are shown inside the vertical rectangles. Only 2 rewards with their corresponding associated pictures of a single picture set were used in a given block of trials.

Liquid rewards (0.10-0.20 ml) were dispensed 2.0 s after lever touch by computer-controlled liquid valves from spouts at the animal's mouth. Three spouts (2 mm ID) were positioned in a horizontal arrangement (distance, 7.5 mm) in front of the animal's mouth. Each spout delivered only a single liquid during any given day. Liquids were apple, grenadine, orange, lemon, black currant, and raspberry juices. A specific instruction picture indicated at trial onset the juice to be delivered for correct performance at trial end. Each picture remained constantly associated with the same specific juice throughout experimentation with the same animal. However, we used two to six different sets of three instruction pictures for the same three rewards in each animal to distinguish between the influences of visual features versus rewards on neuronal responses (Fig. 1). Only two instruction pictures with their associated two liquid rewards were used in a given block of trials. All neurons were tested with at least one picture set in one block of trials, and the majority of neurons was tested with all three pictures of a given set, necessitating at least two trial blocks. About 100 neurons were tested with two or three sets of pictures, including neurons showing responses to the pictures. All rewards were used in combinations in which animals showed reliable and persistent preferences throughout at least one block of trials and usually during one or several days or weeks. The effects of satiation on particular juices were not tested.

The two spatial targets and two liquid rewards alternated semi-randomly with the consecutive occurrence of same trial types being restricted to three trials. Trials lasted 12 s irrespective of behavioral performance; intertrial intervals were 2-3 s. Closed-circuit video systems served to continuously supervise limb and mouth movements. Animals were partially fluid-deprived during weekdays and were returned to their home cages after each daily session.

Two additional trial types served for control purposes. First, we used nonmovement trials to assess further movement relationships of selected neurons. These trials were semi-randomly interspersed with left and right movement trials in the delay task. The instruction picture was presented for 1.0 s at the center of the monitor, instead of the left or right position, and the trigger stimulus was the same as in movement trials. The animal kept its hand on the resting key for 2.0 s beyond the instruction-trigger delay to receive the liquid reward indicated by the instruction. In the second task variation, we assessed preferences for liquid reward in blocks of two-reward choice trials several times on each day on which neurons were recorded. In otherwise unchanged delayed-response trials, two different instructions for two rewards, instead of one instruction for one reward, were shown simultaneously, their left and right positions alternating semi-randomly. This allowed the animal to chose its reward by touching the appropriate lever following the trigger stimulus. Each pair of instructions was composed of one picture associated with a preferred and one with a nonpreferred reward.

Data acquisition

Following behavioral conditioning, animals were implanted under deep pentobarbital sodium anesthesia and aseptic conditions with two horizontal cylinders for head fixation and a stainless steel chamber permitting vertical access with microelectrodes to the left striatum. The dura was left intact. Teflon-coated, multistranded, stainless steel wires were implanted into the extensor digitorum communis and biceps brachii muscles of the right arm for electromyographic (EMG) recordings. The implant was fixed to the skull with stainless steel screws and several layers of dental cement. Animals received postoperative analgesics and antibiotics.

Glass-insulated, platinum-plated tungsten microelectrodes positioned inside a metal guide cannula served to record extracellularly the activity of single neurons, using conventional electrophysiological techniques. Inspections of histological sections revealed that the tips of the guide cannulas ended above the most dorsal parts of striatum. Although guide cannulas damaged more tissue than solid microelectrodes, they permitted the use of thin microelectrodes causing very little damage to the areas investigated. Discharges from neuronal perikarya were converted into standard digital pulses by means of an adjustable Schmitt trigger. EMGs were converted into standard digital pulses by a Schmitt trigger. EMGs and horizontal and vertical eye positions (infrared oculometer, Iscan) were collected during neuronal recordings. Licking movements were recorded during neuronal recordings as standard digital pulses produced by tongue interruptions of an infrared light beam at the liquid spout.

Pulses from neuronal discharges and EMGs were sampled together with digital signals from the behavioral task by a computer, together with analog signals from electrooculograms. Only data from neurons sampled and displayed by the computer for at least 10 trials in each of the four trial types (2 spatial targets and 2 liquid rewards) are reported. All data from neurons suspected to covary with some task component, and occasionally from unmodulated neurons, were stored uncondensed on computer disks.

Data analysis

Task-related increases of activity from the group of slowly discharging striatal neurons were assessed during individual task periods with the nonparametric one-tailed Wilcoxon signed-rank test incorporated into the evaluation software (P < 0.01). Data are only reported from neurons showing statistically significant activity increases in relation to at least one task event compared with a 1- or 2-s control period. This period was immediately before the instruction as first task event or at a period of apparent lack of modulation in cases of suspected activations preceding the instruction. Preinstruction activations are not reported. Depressions of the low background activity were difficult to assess during any task period and were not further studied.

Task-related increases of individual neurons were compared on the basis of impulse counts from individual trials during identical task periods and durations with the two-tailed Mann-Whitney U test (P < 0.01). We compared between two different rewards, left and right targets, movement and nonmovement trials, and corresponding instructions of different picture sets. Comparisons between different rewards used the following standard time windows: 0-1.0 s after instruction, 0-3.0 s before the trigger, 0-1.0 s after trigger, 0-2.0 s before the reward, and 0-2.0 s after reward. Only data from neurons with insignificant differences in control periods were considered (P > 0.05).

Durations of anticipatory licks were measured from instruction onset to reward onset in each trial. Reaction times (from trigger stimulus onset to key release) and movement times (from key release to lever touch) were collected during neuronal recording sessions. Licks, reaction times, and movement times were parametrically distributed and tested for predicted differences between two rewards with the one-tailed Student's t-test. Error rates were measured for movements to the left or right target levers. Their skewed distributions were compared with the one-tailed Wilcoxon test. Anatomical distributions of task-related activations were assessed with the chi 2 test.


    RESULTS
TOP
ABSTRACT
INTRODUCTION
METHODS
RESULTS
DISCUSSION
REFERENCES

Behavioral performance

All three animals performed the task >95% correctly throughout neuronal recording periods. In two-reward choice trials, animals reliably preferred the same reward in 65-99% of trials irrespective of left or right instruction positions. Animal A preferred grenadine or lemon over orange over apple juice. Animal B preferred black currant or raspberry over orange over grenadine or lemon juice. Animal C preferred raspberry over black currant over orange or lemon juice. Some of the preferences changed over periods of days or weeks, but none changed reliably on satiation during individual days.

Licks were assessed during trial periods preceding reward as a measure of reward expectation. They began mostly after instruction onset and became more frequent toward the reward (Fig. 2). When pooled over all trials of a given two-reward comparison, anticipatory licks lasted significantly longer with preferred than nonpreferred rewards in 7 of 13 comparisons (P < 0.025 to P < 0.0005; 1-tailed t-test; Fig. 3, top). Errors in task performance were significantly lower with preferred rewards in 9 of 14 comparisons (P < 0.1 to P < 0.001; 1-tailed Wilcoxon test on errors in individual trial blocks; Fig. 3, middle). Reaction times of arm movements were significantly shorter with preferred rewards in 8 of 14 comparisons (P < 0.025 to P < 0.0005; 1-tailed t-test in pooled trials; Fig. 3, bottom). Movement times differed inconsistently between rewards.



View larger version (23K):
[in this window]
[in a new window]
 
Fig. 2. Lick movements during performance of the spatial delayed response task for 2 different liquid rewards. The instruction cue prepared the animal for a reaching movement to 1 of 2 levers in response to a trigger stimulus. Reward was delivered 2 s after lever touch from a spout at the animal's mouth. Horizontal lines indicate the timing of licking in each trial as determined by interruption of the animal's tongue of an infrared light beam at the liquid spout. Small vertical bars indicate onset of reward delivery. The 2 rewards alternated semirandomly during the experiment. Trials in the displays are separated according to rewards and ranked vertically according to instruction-reward intervals.



View larger version (34K):
[in this window]
[in a new window]
 
Fig. 3. Behavioral indices of reward discrimination. The 2-reward comparisons used in the 3 animals are indicated below the graphs. Total durations of anticipatory licks were measured between instruction onset and reward onset. Error rates were measured for movements to the left or right target levers. Reaction times were measured for the right hand reaching to the right lever. Licks and reaction times were pooled over several trial blocks and compared between 2 rewards with the 1-tailed Student's t-test (*P < 0.025; **P < 0.005; ***P < 0.0005). Errors were computed for each trial block and compared with the 1-tailed Wilcoxon test (+P < 0.1; ++P < 0.05; +++P < 0.001). Juice names in capitals indicate the preferred juices in a given combination. In some reward combinations, 1 juice was used at different times, and then both names are indicated. n, number of trials. Time data are means ± SE.

Behavioral reactions in individual trial blocks rarely showed significant differences between rewards. Anticipatory licks differed in 87 of 444 blocks (20%), and reaction times differed in 49 of 610 neuronal recording blocks (8%; P < 0.025; 1-tailed t-test).

Behavioral differences were slightly more pronounced in trials in which neurons discriminated between rewards, as compared with trials with nondiscriminating neurons. This was seen in pooled trials (licks, 7 vs. 4 of 13 reward comparisons, P < 0.025; errors, 6 vs. 4 of 14 comparisons, P < 0.1; reaction times, 9 vs. 5 of 14 comparisons, P < 0.025) and in individual blocks (licks, 26% of 156 vs. 16% of 288 blocks; reaction times, 9% of 221 vs. 7% of 389 blocks; P < 0.025).

Muscle activity differed during the reaching movement between left and right targets in the extensor digitorum communis and biceps but failed to vary systematically between different rewards (Fig. 4). Despite occasional activities before trial onset, these muscles were relaxed during the instruction-trigger delay. Gaze and eye movements were comparable for the different rewards (Fig. 5). The instruction elicited an ocular saccade to a relatively fixed position on each instruction picture unless the gaze was already there. The trigger stimulus in both movement trials elicited a saccade to the appropriate response lever. In no cases were differences of neuronal activity between rewards clearly related to differences in eye movements.



View larger version (25K):
[in this window]
[in a new window]
 
Fig. 4. Activity of the biceps muscle of the right arm during task performance. Trials alternated semirandomly between left and right target positions and between the 2 liquid rewards indicated. They were separated for display. Activity differed between left and right targets but not between the 2 rewards. Raster dots correspond to rectified activity above a threshold level. Individual trials are presented as horizontal lines and are ranked vertically according to instruction-trigger intervals.



View larger version (48K):
[in this window]
[in a new window]
 
Fig. 5. Eye movements during performance in the delayed response task for different juice rewards. Traces show horizontal and vertical eye positions during a single trial. Eye movements occurred most consistently after instruction onset and offset as well as after trigger onset.

Neuronal database

We studied >2,500 slowly discharging striatal neurons with control rates of 0.1-2.5 imp/s in the spatial delayed response task, and a subset of 105 of them in the delayed go-nogo task. Of the >2,500 neurons, 610 showed 716 statistically significant task-related activations (63, 251, and 296 neurons recorded in animals A, B, and C, respectively, requiring 302 recording days). The remaining neurons failed to show task-related modulations in raster displays during the experiment and were not further investigated. Five forms of task relationship were found during the different trial periods and consisted of responses to instructions, activations preceding the trigger, activations following the movement trigger, activations preceding rewards, and responses to rewards. The separate group of tonically active striatal neurons (Apicella et al. 1991b; Kimura et al. 1984) with discharge rates of 3-8 imp/s were not further studied.

A total of 216 of the 610 task-related neurons (35%) showed statistically significant differences in 242 task-related activations between at least two liquid rewards. Reward-discriminating activity was seen in 17-49% of task-related neurons (Table 1). Activations were higher with either preferred (62% of discriminations; Figs. 7-9, 12B, and 14, A and B) or nonpreferred rewards (38%; Figs. 6, 10-12A, and 13) in all five forms of task relationships. Reward-discriminating differences in activity were 112 ± 18.3% (mean ± SE) for the 61 instruction responses, 76 ± 6.5% for the 50 activations preceding the trigger, 65 ± 12.4% for the 28 activations following the trigger, 69 ± 12.5% for the 25 activations preceding rewards, and 109 ± 16.9% for the 78 reward responses.


                              
View this table:
[in this window]
[in a new window]
 
Table 1. Numbers of task-related striatal activations discriminating between two liquid rewards



View larger version (25K):
[in this window]
[in a new window]
 
Fig. 6. Discriminating responses to reward-predicting instructions in a spatially differentiating ventral striatal neuron. A and B: comparable responses of the same neuron to 2 different instruction sets. This neuron differentiated between the 2 instruction positions when raspberry juice was indicated (left vs. right) and between rewards with the left instruction position irrespective of which instruction picture set was used. Insets: the employed instruction pictures at left or right positions for the 2 liquid rewards (raspberry juice, preferred, and black currant juice, nonpreferred reward). Perievent time histograms are composed of neuronal impulses shown as dots below and are aligned to instruction onset. Each line of dots shows 1 trial. For each instruction set, trials alternated semi-randomly between the 2 rewards and 2 instruction positions and are separated for analysis. Original trial sequence is displayed from top to bottom.

Responses to instructions

A total of 124 neurons responded to the instructions with transient activations which subsided within 500 ms after stimulus offset (Table 1). Of these, 61 neurons (49%) discriminated between two liquid rewards. Responses discriminating between left and right instruction positions were seen in 29 neurons, 18 of which discriminated between rewards, all of them only on one side of instruction presentation (Fig. 6). Responses to instructions presented on the other side either failed to discriminate between rewards (10 neurons) or were entirely absent (8 neurons). Spatially unselective neurons discriminated frequently also between rewards (Fig. 7). A subset of 22 nonspatial neurons were tested in nonmovement trials, and 6 of them showed higher responses in movement than nonmovement trials. Of these, one neuron also discriminated between rewards in nonmovement trials (Fig. 7C). Fourteen reward-discriminating neurons were tested with multiple instruction picture sets (3 spatial, 11 nonspatial neurons). Responses in nine of these neurons maintained the same reward discriminations and varied insignificantly between instructions indicating the same rewards (Figs. 6, A vs. B, and 7, A vs. B).



View larger version (38K):
[in this window]
[in a new window]
 
Fig. 7. Differential responses to reward-predicting instructions in a nonspatial ventral striatal neuron. A and B: comparable responses of the same neuron to 2 different instruction sets. This neuron failed to differentiate between the 2 spatial instruction positions but discriminated between juice rewards. Pictures above histograms show the employed instructions for the 2 liquid rewards (raspberry juice, preferred, and black currant juice, nonpreferred reward). C: response in nonmovement trials discriminating between reward-indicating instructions. Trials alternated semi-randomly between 2 juice rewards, 2 instruction positions and movement vs. nonmovement trials and are separated for analysis while pooling over left and right instruction positions. Trials are rank-ordered according to instruction-reinforcer intervals. A and C are from the same trial block.

Activations during the instruction-trigger delay

A total of 153 task-related neurons showed activations which began during the instruction-trigger interval and terminated >500 ms after instruction offset, either before or immediately after the trigger stimulus (Table 1). Activations in 50 of these neurons differed between rewards (33%). Of 31 neurons with spatially discriminating activations, 10 discriminated between rewards (Fig. 8A). Reward discriminations occurred only on one side of instruction presentations in all but three neurons. Activations with instructions on the other side either failed to discriminate (1 neuron) or were entirely absent (6 neurons). A subset of 48 nonspatial neurons was tested in nonmovement trials, and 24 neurons showed higher activations in movement than nonmovement trials. Of these, eight neurons discriminated between rewards in movement trials (Fig. 9A). Spatially unselective neurons with substantial nonmovement activity discriminated also between rewards (Fig. 10). Reward-discriminating activity was unrelated to reaction times, which overlapped considerably (Figs. 8B and 9B).



View larger version (34K):
[in this window]
[in a new window]
 
Fig. 8. Spatial delay activity discriminating between juice rewards in a caudate neuron. A: activity during the instruction-trigger delay discriminated between left and right movement targets and between rewards with left movement target. This neuron showed no task-related activity in nonmovement trials. Insets: the employed instruction pictures at left, right, or center (nonmovement) positions for the 2 liquid rewards (raspberry juice, preferred, and black currant juice, nonpreferred reward). All 6 trial types alternated semi-randomly and are separated for analysis. Trials are rank-ordered according to instruction-reward intervals. B: differences in neuronal activity between rewards despite overlapping reaction times (from trigger stimulus onset, reference line at 0, to movement onset on key release marked by short vertical lines). Reaction times differed insignificantly between trials with different rewards (278-448 ms with raspberry juice, preferred, and 284-480 with black currant juice, nonpreferred). Trials are rank-ordered according to reaction time. Same neuron as shown in A (left instruction position). Note different time scales in A and B.



View larger version (34K):
[in this window]
[in a new window]
 
Fig. 9. Nonspatial movement vs. nonmovement delay activity discriminating between juice rewards in a ventral striatal neuron. A: activations between the instruction-trigger delay discriminated betwen raspberry juice (preferred) and black currant juice (nonpreferred). This neuron failed to differentiate between left and right movement targets (not shown) but differentiated between movement and nonmovement trials when raspberry was used as reward. Top: data pooled from left and right movement targets. B: differences in neuronal activity between rewards for movements were unrelated to reaction times which overlapped and differed insignificantly between rewards (284-358 ms with raspberry juice, preferred, and 264-390 with black currant juice, nonpreferred). Trials are rank-ordered according to reaction time. Same neuron as in A but different time scale.



View larger version (55K):
[in this window]
[in a new window]
 
Fig. 10. Nonspatial delay activity of a ventral striatal neuron discriminating also in nonmovement trials between juice rewards. This neuron failed to differentiate between left and right movement targets and showed substantial activity also in movement and nonmovement trials. It discriminated between raspberry (preferred reward) and lemon juices (nonpreferred). Top: data pooled from left and right movement targets.

Activations following the trigger stimulus

A total of 168 task-related neurons showed activations that closely followed the movement trigger stimulus (Table 1). Activations in 28 of these neurons discriminated between rewards (17%). Ranking of trials according to the interval between the trigger stimulus and movement onset allowed us to determine the temporal relationships to these two events. Accordingly, the activations were classified as movement-related (123 neurons, 22 of them reward-discriminating; Fig. 11), undefined (39 neurons, 6 of them reward-discriminating; Fig. 12, A and B), or trigger responses (6 neurons, none of them reward-discriminating). Twelve of the 53 neurons differentiating between left and right movement targets discriminated between rewards, 9 of them on one side only (Fig. 11). Activations with movement targets on the other side either failed to discriminate (7 neurons) or were entirely absent (2 neurons). A subset of 35 nonspatial neurons were tested in nonmovement trials, and 27 of them showed higher responses in movement than nonmovement trials. Of these, four neurons discriminated between rewards in movement trials (Fig. 12A). Reward-related differences in neuronal activity were unrelated to reaction times, which overlapped considerably (Fig. 11B).



View larger version (28K):
[in this window]
[in a new window]
 
Fig. 11. Reward discrimination in a movement-related putamen neuron. Activity differed significantly between left and right movement targets for both juice rewards and between rewards for left and right movement targets. Reaction times differed insignificantly between rewards (286-452 ms with raspberry juice, preferred, and 276-620 ms with black currant juice, nonpreferred reward). Trials alternated semi-randomly among all 4 types and are separated for analysis. Trials are rank-ordered according to reaction time.



View larger version (47K):
[in this window]
[in a new window]
 
Fig. 12. Reward-discriminating activations following the movement trigger stimulus in 2 nonspatial ventral striatal neurons. A: this neuron differentiated between movement and nonmovement trials and showed higher activity with less preferred juice reward in movement trials. Reaction times overlapped largely but differed significantly between rewards (280-400 ms with lemon juice, preferred, and 286-484 ms with black currant juice, nonpreferred reward). B: higher activation with preferred juice reward. Reaction times overlapped largely and differed insignificantly between rewards (310-1304 ms with black currant juice, preferred, and 316-810 ms with orange juice, nonpreferred reward).

Activations preceding rewards

A total of 101 task-related neurons showed activations that usually began well before the reward (Table 1). Most of these activations remained present until the reward was delivered and terminated <500 ms afterward, even when reward occurred before or after the usual time. These activations showed similar, insignificantly varying magnitudes in trials with left versus right movement targets. They occurred in both movement and nonmovement trials. Prereward activations in 25 of the 101 neurons (25%) discriminated between rewards (Fig. 13). Studies of the licking behavior suggested that the reward-discriminating, anticipatory neuronal activities were not due to differences in anticipatory licking. In the example of Fig. 13, higher neuronal activity was associated with the less preferred reward in anticipation of which the animal licked less.



View larger version (51K):
[in this window]
[in a new window]
 
Fig. 13. Reward-discriminating activation preceding juice rewards in a single ventral striatal neuron. This neuron discriminated between raspberry (preferred) and black currant juice (nonpreferred reward). It failed to differentiate between left and right movement targets. Graphs below raster displays show simultaneously recorded licking behavior. Horizontal lines indicate periods during which the monkey's tongue interrupted an infrared light beam immediately below the lick tube. Thus licks occurred after the instruction, during the trigger-reward interval, and after reward delivery. Durations of anticipatory licks in individual trials between instruction onset and reward were 1,000-3,425 ms. They were slightly longer in trials with preferred as compared with nonpreferred reward (raspberry vs. black currant), the differences being significant only in nogo trials (P < 0.01). Lick durations were significantly longer in nogo as compared with go trials with either reward (P < 0.01). All 4 trial types alternated semi-randomly and are separated for analysis. Trials are rank-ordered according to intervals between instructions and reward onset.

Activations following rewards

A total of 170 task-related neurons showed responses that followed the delivery of a reward and subsided before the instruction of the subsequent trial (Table 1). Activations showing close temporal relationships to licking movements (Apicella et al. 1991a) were discarded from the data sample. The reward responses varied insignificantly in magnitude between left and right movement targets and occurred in both movement and nonmovement trials. Activations in 78 of the 170 neurons (46%) discriminated between rewards (Fig. 14) irrespective of the side of the movement target. Reward-discriminating neuronal activations occurred in trial periods in which there were no major differences in licking behavior (Fig. 14A). They were also observed in nonmovement trials (Fig. 14B).



View larger version (38K):
[in this window]
[in a new window]
 
Fig. 14. Reward-discriminating activations following juice rewards. A: caudate neuron showing higher activity for the preferred reward (orange juice). Graphs below rasters show that licks following the reward lack major differences between the 2 rewards. B: activity in ventral striatal neuron is higher for the preferred reward (orange juice) in both movement and nonmovement trials.

Recording positions

Histological reconstructions of recording positions revealed that neurons were sampled in caudate nucleus, putamen, and ventral striatum, including nucleus accumbens, between rostrocaudal levels A18 and A25 and thus mostly rostral to the anterior commissure. Recordings were made throughout the entire dorsoventral extent of these structures and were mediolaterally concentrated around the internal capsule (Fig. 15).



View larger version (35K):
[in this window]
[in a new window]
 
Fig. 15. Positions of all task-related striatal neurons recorded in the 3 monkeys. A: data from the 2 Macaca fascicularis monkeys. B: data from the M. mulatta monkey. Neurons showing statistically significant differences in activity are indicated by , and undiscriminating neurons by -. - - -, approximate borders between caudate nucleus (CD), putamen (PUT), and ventral striatum (VST). Standard coronal sections from the left hemisphere are labeled in rostrocaudal stereotaxic planes according to distances from the interaural line (A18-A25). AC: anterior commissure.

Reward-discriminating neurons were found in the caudate (53 of 165 task-related neurons; 32%), putamen (73 of 219 neurons; 33%), and ventral striatum (90 of 226 neurons; 40%). Their distribution failed to vary significantly between the three structures (P = 0.19; chi 2 test) and among the rostrocaudal levels explored (A18-25; P = 0.17). The distribution of spatially discriminating neurons varied significantly among these structures (P < 0.0005), being lower in ventral striatum (10% of neurons with instruction, delay or trigger activations) as compared with caudate (22%) and putamen (24%). The distribution of reward-discriminating neurons among the spatially discriminating neurons varied insignificantly among the three structures (P = 0.26). Very similar results were obtained when the same comparisons were made separately for the two monkey species employed.


    DISCUSSION
TOP
ABSTRACT
INTRODUCTION
METHODS
RESULTS
DISCUSSION
REFERENCES

The present data show that some of the behavior-related activity of neurons in the anterior striatum distinguished among different liquid rewards in a spatial delayed response task. All forms of task-related activity differed depending on the type of liquid reward expected at trial end. Reward relationships were observed during the preparation, initiation, and execution of the movement leading to the reward. They occurred apparently on the basis of differential reward expectations. These data extend comparable results from a previous study on rewarded versus unrewarded movements in the striatum (Hollerman et al. 1998) and suggest that information about expected reward may influence neuronal activity related to the behavioral action leading to the reward. These activities may contribute to brain mechanisms directing behavioral reactions to rewarding goals.

Task and behavior

As judged from the effects of lesions, the spatial delayed response task tests the mnemonic and movement preparatory functions of primate prefrontal cortex and striatum (Divac et al. 1967; Jacobsen and Nissen 1937). The initial, occasion-setting instruction cue determines the operant response to a subsequent stimulus. In our task, the instruction in addition contained information about the expected liquid reward. This association was acquired by a Pavlovian mechanism, as the animal could disregard the instruction picture and still receive the predicted reward. The consistent reward preferences in choice trials with two different instruction pictures, the durations of anticipatory licks, the errors in behavioral performance, and the reaction times suggest that the animals discriminated between the different rewards and had established expectations of the upcoming reward. Apparently, the animals expected the type of reward produced by the movement.

Mechanisms of reward discrimination

DIFFERENTIAL REWARD EXPECTATIONS VERSUS BEHAVIORAL DIFFERENCES. Although animals appeared to distinguish between the different liquid rewards, the neuronal differences did not seem to be predominantly due to behavioral differences. The ranking of trials according to reaction time revealed that neuronal activity related to the preparation or execution of movement differed between rewards despite comparable reaction times in individual trials (Figs. 8B, 9B, 11, and 12B). A similar result was obtained with reward-dependent activity in the striatum in rewarded versus unrewarded movements (Hollerman et al. 1998). Differences in anticipatory licking were also an unlikely factor, and some neurons showed even higher activity with less licking toward the nonpreferred reward (Fig. 13).

IMPORTANCE OF REWARD EXPECTATION. All forms of task-related activity found in our experiments discriminated between the different liquid rewards employed. However, the frequencies of reward discrimination varied (Fig. 16). They were highest for responses following rewards and for responses to the initial, reward-predicting instruction. They were lowest for activity during the execution of movement following the trigger stimulus. Reward discriminations were not better for activity closer to the reward as compared with earlier periods during which the behavior toward the reward was being prepared. This suggests that the expectation of reward exerted a strong influence during task periods in which the reward information was used for organizing the behavioral reactions toward the reward.



View larger version (16K):
[in this window]
[in a new window]
 
Fig. 16. Relative proportions of reward-discriminating neuronal activity in anterior striatum among the 5 different task relationships.

The influence of reward expectation on behavior-related activity is not unique for the anterior striatum. Similar reward-discriminating activities have been reported with different food or liquid rewards or different reward quantities in the dorsolateral prefrontal cortex (Leon and Shadlen 1999; Watanabe 1996) and posterior parietal cortex (Platt and Glimcher 1999). Reward expectation activities during movement preparation and initiation change systematically in the striatum and orbitofrontal cortex when the animal's expectations adapt to modified reward contingencies during learning (Tremblay and Schultz 2000; Tremblay et al. 1998), suggesting a contribution to the monitoring of the appropriate outcome of behavior.

REWARD VERSUS OBJECT RELATIONSHIPS. The use of different instruction cues allowed us to assess the contribution of visual features to reward-discriminating activity. Object-related visual responses may occur in the anterior striatum due to inputs from ventrolateral prefrontal cortex and inferotemporal cortex (Liu and Richmond 2000; Miller et al. 1996; Selemon and Goldman-Rakic 1985) and are reported in more posterior striatal regions (Brown et al. 1995). More than half of our neurons tested with multiple instruction sets for the same rewards maintained their discriminating activity with the rewards, suggesting that these neurons discriminated between instructions on the basis of the associated reward rather than visual features. The impact of reward associations over visual features may be even higher in the orbitofrontal cortex where a considerably fraction of instruction responses reflected the predicted type of food or liquid reward in a delayed response task (Tremblay and Schultz 1999).

REWARD VERSUS MOVEMENT RELATIONSHIPS. Spatially discriminating neurons frequently distinguished between rewards for only one of the two spatial targets. Such neurons could show strong activations for both rewards on one side and for only one reward on the other side (Figs. 6 and 11), or they were activated only on one side and only with one reward. However, more gradual differences were also observed (Fig. 8). Likewise, neurons discriminating between movement and nonmovement trials were activated with both rewards on one trial type and only one reward on the other trial type (Fig. 10), in only one trial type with only one reward (Figs. 7 and 9), or showed more gradual differences. Selective striatal activations in rewarded movement trials, as opposed to unrewarded movement trials or nonmovement trials, were also observed in a delayed go-nogo task (Hollerman et al. 1998). When more than two different rewards are compared, neurons in the prefrontal cortex may show even finer differences with rewards that may occur on both sides in a two-target delayed response task (Watanabe 1996).

Although one might expect that spatial and reward stimulus components are processed by different neurons, no such evidence was presently found in the anterior striatum. If at all, spatial neurons had a slight, statistically insignificant, tendency for more rather than less reward-discriminating activity (Table 1). A similar absence of separation was observed in the prefrontal cortex (Watanabe 1996). These data reinforce the notion that reward activity is integrated into behavior-related activity in these structures. By contrast, neurons in orbitofrontal cortex failed to show spatial relationships, including neurons discriminating between rewarded and unrewarded movements or between different rewards (Tremblay and Schultz 1999, 2000), suggesting that orbitofrontal neurons are particularly tuned to rewards rather than movements.

LOW REWARD DISCRIMINATION DURING MOVEMENT EXECUTION. The lowest degree of reward discrimination was found in neurons showing activations related to the execution of movement following the movement-triggering stimulus. This corresponds to the comparatively low fraction of movement execution neurons discriminating between rewarded and unrewarded trials in the anterior striatum and the putamen motor region (Hollerman et al. 1998). In a similar way, activity during the execution of eye movements, as compared with other task periods, showed relatively little discrimination between different reward magnitudes in the posterior parietal cortex (Platt and Glimcher 1999). These data suggest that the expectation of reward has a weaker influence on activity related to the execution of movement than other behavioral components.

REWARD DISCRIMINATION VERSUS AROUSAL. The expectation of highly valued rewards may be accompanied by high arousal levels, and differences in arousal might contribute to differential reward-related activities. This argument might be made when neuronal activity is higher in rewarded than unrewarded trials (Hollerman et al. 1998; Tremblay et al. 2000) or with larger than smaller rewards (Leon and Shadlen 1999; Platt and Glimcher 1999). However, about one-third of the presently reported activations were stronger with less preferred rewards (Figs. 6, 10, 11, 12A, and 13), suggesting that the differential reward activities may not in a simple way reflect different arousal levels.

REWARD PREFERENCE VERSUS PHYSICAL PROPERTIES. A previous study showed that neurons in the orbitofrontal cortex may distinguish between different rewards on the basis of the motivational value, as expressed by the animal's preference behavior, rather than the physical properties of the reward objects (Tremblay and Schultz 1999). It is interesting to note that neuronal activity in the present study on the anterior striatum was higher for preferred rather than nonpreferred rewards in nearly two-thirds of the reward-discriminating neurons. This may indicate that a similar coding mechanism may also exist in some striatal neurons. However, a more explicit assessment of motivational value would require a different experimental plan, and a good distinction between striatal coding of motivational value versus physical identity is difficult to make from the present data.

Generation of striatal reward activities

The present results reveal several ways in which rewards are processed by different groups of striatal neurons. The differential responses to rewards and reward-predicting stimuli may be involved in the perception of rewarding events. The integrated reward-discriminating and behavior-related activities may provide information about the type of reward expected for a particular behavioral reaction. This combination of reward and behavioral processing follows the general idea of an anatomically based limbic-motor convergence in the basal ganglia by which information about behavioral reactions is combined with the motivational aspects to execute the behavior (Mogenson et al. 1980).

The observed heterogeneous activities may reflect inputs from various reward-related neurons in cortical and subcortical structures (Schultz et al. 2000). Dopamine neurons seem to detect an error in the prediction of reward and produce a neuronal signal suitable for approach learning (Schultz 1998). Probably all striatal neurons receive dopaminergic inputs (Freund et al. 1984; Smith et al. 1994) and may be influenced by the dopamine reward signal. Although dopamine neurons discriminate between rewarded and unrewarded stimuli, they respond similarly to different food and liquid rewards. They might contribute to striatal reward processes and even be involved in sustained neuronal activities (Durstewitz et al. 2000) but would unlikely be responsible for their reward-discriminating capacities.

Neurons in the anterior orbitofrontal cortex and the amygdala detect rewards and reward-predicting stimuli, are active during the expectation of immediate rewards, and differentiate well between different reward objects, possibly on the basis of relative reward value (Critchley and Rolls 1996; Hikosaka and Watanabe 2000; Nishijo et al. 1988; Schoenbaum et al. 1998, 1999; Thorpe et al. 1983; Tremblay and Schultz 1999). Inputs from such neurons may contribute to the reward-discriminating responses and sustained reward expectation activities reported here.

Neurons in the dorsolateral prefrontal cortex and posterior parietal cortex show sustained activations related to mnemonic processes and the preparation of movements. Some of these activities appear to be influenced by expected rewarding outcomes (Leon and Shadlen 1999; Platt and Glimcher 1999; Watanabe 1996), suggesting coding of both the reward and the behavioral reaction toward the reward. Activities in these neurons may contribute to the sustained reward and movement preparation-related activities reported presently.

Given the existence of partly closed frontal cortex-basal ganglia loops, it cannot be ruled out that striatal reward activities are generated through reverberating loop activity with additional inputs from the amygdala and possibly other structures in which reward information is processed. This suggestion would require that reward-related activities exist also in other components of the loops, such as the globus pallidus and anterior thalamus, which remain to be shown.

Neurophysiological basis for goal-directed behavior?

Rewards may serve as goals for voluntary behavior when the behavior is intentionally directed at obtaining the reward. According to motivational theory (Dickinson and Balleine 1994), there should be representations of the outcome already at the time at which behavioral reactions toward the reward are being prepared and executed (knowing the outcome when doing the action). Behavioral choices are made according to the motivational value of rewards (quality, quantity, and probability of reward). Monkeys can estimate the outcome of behavior and consistently choose among different outcomes irrespective of spatial positions or visual features of cues, as observed with video games (Washburn et al. 1991), food versus cocaine discrimination (Nader and Woolverton 1991), and nutrient rewards (Baylis and Gaffan 1991; Tremblay and Schultz 1999; present data).

The present data on differential reward-related activity may point to neurophysiological mechanisms in the striatum related to the representation of goals before and during the execution of actions. Groups of striatal neurons do not only carry information that a given movement will produce a reward, as opposed to no reward (Hollerman et al. 1998; Kawagoe et al. 1998), their activity reflects which of several rewards will likely be obtained. Such detailed representations would permit striatal neurons to contribute important information to mechanisms involved in making choices between different movements toward different rewards. A similar coding of specific, basic reward aspects, namely quality, quantity, and probability of reward, appears to take place in the prefrontal and posterior parietal cortex (Leon and Shadlen 1999; Watanabe 1996), where such neurons may be involved in making decisions between different outcomes (Platt and Glimcher 1999). The localization of reward-dependent, behavior-related activity in closely related striatal and cortical circuits may contribute to an anatomical and functional basis for a future framework of neuronal mechanisms of decision-making.


    ACKNOWLEDGMENTS

We thank B. Aebischer, J. Corpataux, A. Gaillard, B. Morandi, and F. Tinguely for expert technical assistance.

This study was supported by the Swiss National Science Foundation (Grants 31.43331.95 and NFP38.4038-43997), the Biomed 2 program of the European Community via the Swiss Office of Education and Science (BMH4-CT95-0608 via 95.0313-1), and by an International Research Fellowship Award from the National Science Foundation (U.S.) to H. C. Cromwell (INT-9802538).


    FOOTNOTES

Address for reprint requests: W. Schultz (E-mail: Wolfram.Schultz{at}unifr.ch).

Received 2 November 2000; accepted in final form 14 March 2001.


    REFERENCES
TOP
ABSTRACT
INTRODUCTION
METHODS
RESULTS
DISCUSSION
REFERENCES

0022-3077/01 $5.00 Copyright © 2001 The American Physiological Society