Institute of Physiology and Program in Neuroscience, University of Fribourg, CH-1700 Fribourg, Switzerland
![]() |
Abstract |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
![]() |
Introduction |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
We were interested in studying the neuronal processing of reward information to advance the understanding of the neuronal basis of voluntary, goal-directed behavior. After investigating the responses of dopamine neurons to primary rewards and reward-predicting stimuli (Schultz, 1998), we aimed to address the question of how information about rewards is integrated into neuronal processes underlying reward-directed behavioral actions. We focused on the orbitofrontal cortex and basal ganglia as presumed key structures for the processing of reward information and investigated how rewards expected at trial end influenced neuronal activity related to various components of delayed response tasks which test some of the typical functions of these structures (Jacobsen and Nissen, 1937
; Divac et al., 1967
). This article presents a summary of our recently published results and interprets the different rewardrelated acivities found in the orbitofrontal cortex, striatum and dopamine neurons. A summary figure presented at the end of this review compares the different forms of reward-related activity observed (Fig. 9
).
|
![]() |
Neurophysiological Methods |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
![]() |
Results |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Behavioral Tasks
A modified delayed go/no-go task employed with two monkeys allowed us to compare neuronal processing in liquid-rewarded versus unrewarded trials (Fig. 1, top) (Tremblay and Schultz, 2000a
). When the animal kept its right hand relaxed on a resting key, one of three colored, fractal instruction pictures appeared on a computer monitor for 1.0 s and indicated one of three trial types, namely rewarded movement, rewarded non-movement and unrewarded movement. A red square trigger stimulus presented at 2.53.5 s after instruction onset required the animal to execute or withhold a reaching movement according to the trial type. The trigger stimulus was the same in all three trial types. In rewarded movement trials, the animal released the resting key and touched a small lever below the trigger to receive a small quantity of apple juice after a delay of 1.5 s. In rewarded non-movement trials, the animal remained motionless on the resting key for 2.0 s and received the same liquid reward after a further 1.5 s. In unrewarded movement trials, the animal reacted as in rewarded movement trials, but correct performance was followed not by liquid reward but by a 1 kHz sound after 1.5 s. One of the two rewarded trial types followed each correctly performed unrewarded trial. Animals were required to perform the unrewarded movement trial, as incorrect performance resulted in trial repetition. The sound constituted a conditioned auditory reinforcer as it ameliorated task performance and predicted a rewarded trial, but it was not an explicit reward, hence the simplifying term unrewarded movement. Thus, each instruction was the unique stimulus in each trial indicating the behavioral reaction to be performed following the trigger (execution or withholding of movement) and predicting the type of reinforcer (liquid or sound). Each trial contained two delay periods, namely the instructiontrigger delay, during which the animal remembered the type of instruction and prepared for the behavioral reaction (delay 1 in Fig. 1
), and the triggerreinforcer delay, during which the animal could expect the reinforcer (delay 2). Trials lasted 1214 s, intertrial intervals were 46 s.
|
In the delay task each instruction picture indicated specifically one reward at trial onset. We used five different pairs of instruction pictures in liquid trials and one set of three pictures in food trials to assess the influence of visual features on neuronal responses. Only two instruction pictures with their associated two liquid or two food rewards were presented in a given block of trials. Liquid rewards were grenadine and apple juice in one animal, and orange and grape juice in a second animal. Food rewards became available in a box located to the right of the monitor following computer-controlled opening of its door (40 x 40 mm frontal aperture). Foods were raisins (most preferred), small apple morsels (intermediate preferred) and sugar-honey cereal (least preferred). Rewards and target positions alternated randomly in each block of trials, with maximally three consecutive identical trials.
Reward preferences were assessed in separate blocks of choice trials in the spatial delay task before or after recording from each neuron. In these trials two different instructions for two rewards appeared simultaneously at randomly alternating left and right target positions, allowing the animal to touch the lever of its choice following the trigger stimulus.
Types of Task-related Activity
Orbitofrontal neurons in area 11, rostral area 13 and lateral area 14 displayed only three major types of event relationships in the present experiments, although the employed delayed tasks comprised many more behavioral events. These three types consisted of responses to instructions, activations preceding reinforcers and responses to reinforcers (Fig. 9, top). Neurons with instruction responses were widely distributed, being significantly more frequent in the medial than in the lateral parts of the explored region (P < 0.05; chi-square test). Neurons with activations preceding reward were predominantly found in rostral area 13 and were significantly more frequent in the posterior than the anterior orbitofrontal cortex (P < 0.001). Neurons responding to reinforcers were significantly more frequent in the lateral than the medial orbitofrontal cortex (P < 0.01). In addition, a few neurons showed activations preceding the instructions, during the entire instructiontrigger delay or following the trigger stimulus. Nearly all activations in the orbitofrontal cortex showed pronounced relationships to rewards. This was true for the responses to instructions, activations preceding reinforcers and responses to reinforcers which differed between rewarded and unrewarded trials in the go/no-go task and distinguished between different rewards in the spatial task.
Reward Versus No Reward
In the delayed go/no-go task we tested how task-related activity in 188 of 505 tested neurons (37%) differed between rewarded and unrewarded trials (Tremblay and Schultz, 2000a). Rewarded nonmovement trials served as controls for movement relationships.
Instruction responses occurred in 99 of 188 task-related neurons (54%). Two-thirds of the 99 responses reflected the type of reinforcer in occurring preferentially in rewarded movement trials, rewarded nonmovement trials or both rewarded trial types irrespective of the execution or witholding of movement, but not in unrewarded movement trials (38 neurons; Fig. 2A). Conversely, 22 responses occurred preferentially in unrewarded movement trials but not in any rewarded trial type. Only three neurons responded preferentially in both types of movement trial irrespective of the type of reinforcer. Instruction responses in 35 of the 99 neurons occurred unselectively in all three trial types.
|
Responses following the delivery of the reinforcer were found in 67 of the 188 task-related neurons (36%) (Fig. 2C). These responses were unlikely to be related to mouth movements, as licking movements occurred also at other task periods during which these neuronal responses were not observed. All of these responses reflected the type of reinforcer, occurring mostly in both liquid-rewarded trial types but not in sound-reinforced trials (62 neurons), or only in rewarded movement trials (two neurons). Only three responses occurred preferentially after sound reinforcement in unrewarded movement trials.
Different Rewards
In the spatial delayed response task we tested how task-related orbitofrontal neurons discriminated between liquid rewards (294 of 1095 neurons, 27%) and between food rewards (138 of 329 neurons, 42%) (Tremblay and Schultz, 1999).
All three principal types of orbitofrontal activations discriminated between liquid rewards and between food rewards (Fig. 3). Activations occurred exclusively or were significantly higher for one liquid reward than the other liquid rewards in 150 of 218 instruction responses (69%), 65 of 160 activations preceding reward (41%) and 76 of 146 reward responses (52%). These differences were unrelated to eye or licking movements before or after reward delivery, which varied inconspicuously between trials using the two rewards. By contrast, only seven of 218 instruction responses (3%) discriminated between left and right movement targets, and none of the 50 neurons tested with several instruction sets showed significantly different responses to different instructions indicating the same reward. Thus about one-half of reward-related activations of orbitofrontal neurons discriminated between different liquid and food rewards. The instruction responses in orbitofrontal neurons reflected much better the predictable rewards than the spatial or visual features of the instructions in the present task situation.
|
We used different food rewards in the spatial delayed response task to test the relationships of 65 reward-discriminating task-related neurons to the animals' preferences of rewards (Tremblay and Schultz, 1999).
We assessed reward preferences behaviorally in a choice version of the spatial delayed response task in which two instruction pictures instead of one were shown simultaneously, above the left and right levers, respectively. Each picture was associated with a different reward. Following the trigger stimulus, animals chose one of the rewards by touching the corresponding lever. We used three food rewards (AC) but only presented two of them in a given trial block (AB, BC or AC). Animals showed clear preferences in every comparison, choosing reward A over B, B over C and A over C in 90100% of trials, independent of the side of lever touch. Thus reward B was chosen less frequently when reward A was available but more frequently when reward C was available in a given trial block. Apparently the preference for reward B was relative and depended on the reward with which it was being compared.
Neuronal activity was studied in the standard delay task with single instruction pictures indicating a single, preferred or non preferred food reward. Two rewards alternated randomly in each trial block. The activity of 40 of 65 reward-discriminating orbitofrontal neurons did indeed depend on the relative preference of food reward. Activations were significantly higher with the relatively more preferred reward as compared with the less preferred reward, no matter which of the different reward combinations was used (Fig. 4). Or, neuronal activations with an intermediately preferred reward were significantly higher when this reward was the relatively preferred one as compared with trials in which this reward was the nonpreferred one (reward B in Fig. 4
, top versus bottom). Neuronal activity reflected the preferred reward with 12 of 28 instruction responses, eight of 20 pre-reward activations and seven of 17 reward responses. Conversely, activations reflecting less preferred rewards were observed with six of 28 instruction responses, five of 20 prereward activations and two of 17 reward responses. Thus, the reward discrimination occurred in some orbitofrontal neurons on the basis of relative preference rather than physical properties.
|
We studied neuronal responses to liquid reward in the delayed go/no-go task in which the reward was predicted by the task stimuli and compared them with responses in free liquid trials in which the same reward was delivered at unpredictable times outside of any behavioral task (Tremblay and Schultz, 2000a).
A total of 76 neurons responding to reward were tested in both the go/no-go task and free liquid trials. Of these, 27 neurons were activated in free liquid trials but failed to respond to liquid in the task (Fig. 5). Of the remaining neurons, 46 responded indiscriminately to the liquid during task performance and in free liquid trials, and three responded only in the task but not in the free liquid trials. Thus reward responses in some orbitofrontal neurons depended on the temporal unpredictability of reward.
|
We tested neuronal processing in rewarded versus unrewarded trials in slowly discharging neurons in the caudate nucleus, putamen and ventral striatum including the nucleus accumbens with two monkeys, using the same delayed go/no-go task as for the study of the orbitofrontal cortex (1487 neurons in the striatum; Fig. 1, top) (Hollerman et al., 1998
). The investigation followed earlier studies in which we tested in two monkeys neuronal activity in temporal relation to reward in a symmetrically reinforced delayed go/no-go task without unrewarded movement trials [1173 neurons in the caudate and putamen (Apicella et al., 1991
, 1992
); 420 neurons in the ventral striatum (Schultz et al., 1992
)]. Slowly discharging neurons comprise 95% of the striatal neuronal population. Statistically significant taskrelated activations occurred in 991 task-related neurons of the total of 3080 tested neurons (32%).
Types of Task-related Activity
Striatal neurons displayed a considerable variety of task relationships. These consisted of responses to instructions, sustained activations preceding the trigger stimulus and movement, responses to the trigger stimulus, activations immediately preceding or accompanying the arm movement, sustained activations preceding immediately the reinforcers and responses following the reinforcers (Fig. 9, middle). We studied reward processing with the two types of changes immediately preceding or following the reinforcers and with the remaining task relationships.
Direct Reward Relationship
In the different versions of the delayed go/no-go task we tested how many of a total of 991 task-related striatal neurons showed changes in activity in direct temporal relation to reward (Apicella et al., 1991, 1992
; Schultz et al., 1992
) or a conditioned auditory reinforcer (Hollerman et al., 1998
).
One type of reward relationship consisted of activations preceding the reward in both rewarded trial types irrespective of the go or no-go reaction. These activations were found in 164 of the 991 task-related striatal neurons (17%). They were more frequent in the ventral striatum (43 of 117 task-related neurons; 37%) as compared with the dorsal regions of the caudate and putamen (87 of 615 neurons; 14%). Activations began usually >1 s before the liquid reward (mean 1200 ms), remained present until the reward was delivered and terminated 0.51.0 s afterwards, even when the reward occurred before or after the usual time (Fig. 6). Some of these activations discriminated between different reward liquids. They were usually absent in soundreinforced trials, with the exception of 11 of 259 task-related neurons (4%) which showed mostly weak activations preceding preferentially the conditioned auditory reinforcer.
|
Reward Dependence
In the delayed go/no-go task with unrewarded movement trials we tested how activity in 259 task-related striatal neurons differed between rewarded and unrewarded trials (Hollerman et al., 1998). Rewarded nonmovement trials served as controls for movement relationships.
The prediction of liquid reward at trial end influenced all forms of task relationship. In one form, movement preparatory activity depended on the reward expected at trial end. Relationships to movement preparation were seen with transient responses to the initial instructions and with sustained activations during the instructiontrigger delay which occurred frequently only in movement but not in nonmovement trials (54 and 50 of the 259 task-related neurons, 20 and 19%, respectively). Nearly all of these neurons showed transient instruction responses or sustained delay activations that occurred selectively either in rewarded (33 and 41 neurons) or in unrewarded movement trials (19 and eight neurons, respectively) (Fig. 7A). A purely reward-predicting property was seen in 18 and 10 of the 259 task-related neurons (7 and 4%) which showed transient instruction responses and sustained delay activity, respectively, in both rewarded trial types irrespective of the movement, but not in unrewarded movement trials.
|
Reward Coding by Dopamine Neurons
We tested dopamine neurons in several behavioral tasks. (i) In free liquid trials tested in four monkeys, fluid-deprived animals received a small quantity of apple juice at the mouth (0.15 ml) in the absence of reward-predicting phasic stimuli and without performing in any behavioral task (Mirenowicz and Schultz, 1994; Hollerman and Schultz, 1998
). Intervals between rewards were irregular and >10 s. (ii) When testing selfinitiated movements in two monkeys, animals had one hand on a touch-sensitive, immovable resting key below a food box (Romo and Schultz, 1990
). A cover prevented vision into the interior of the box while permitting access of the hand from below. At a self-chosen moment, animals released the resting key without any phasic external stimuli and reached into the box for obtaining a small morsel of apple. (iii) In reaction time tasks tested in four monkeys, stimuli consisted of a light emitted by a diode or a sound emitted by a small loudspeaker mounted slightly below eye level in front of the animals (Ljungberg et al.,;1992
; Mirenowicz and Schultz, 1994
). Following the stimulus, animals released the resting key, touched a small lever mounted immediately below the stimulus and received apple juice. (iv) In an aversive, active avoidance task tested in two monkeys, animals released a small key following conditioned visual or auditory stimuli to avoid a mild air puff to the hand (Mirenowicz and Schultz, 1996
). (v) In a spatial delayed response task tested in two monkeys, we used small light-emitting diodes of different colors, forms and spatial position as instruction and movement-triggering stimuli (Schultz et al., 1993
). The other aspects of the task were similar to those described for testing orbitofrontal neurons (Fig. 1
, bottom). This task was learned via two intermediate tasks which served to introduce the spatial and temporal delay components. (vi) In a visual discrimination task tested in two monkeys, two colored pictures were presented side by side on a computer monitor, and animals touched a lever below one of the pictures to receive a drop of liquid after a delay of 1.0 s (Hollerman and Schultz, 1998
).
More than 2000 dopamine neurons were tested with free primary liquid reward in the absence of any specific task, with primary food reward during self-initiated movements, in reaction time tasks using visual and auditory stimuli, and during the learning of reaction time, delayed response and visual discrimination tasks. A sample of 314 dopamine neurons were also tested with primary aversive stimuli and conditioned visual and auditory stimuli in an active avoidance task.
Dopamine neurons in and close to the pars compacta of the substantia nigra and ventral tegmental area (groups A8, A9 and A10) were characterized by their typical electrophysiological properties, namely polyphasic, initially negative or positive waveforms with relatively long durations (1.85.5 ms) and low basal discharge rates (0.58.0 imp/s), which contrasted with those of pars reticulata neurons of the substantia nigra (7090 imp/s and <1.1 ms duration) and neighboring fibers (<0.4 ms duration).
Effective Stimuli
Dopamine neurons responded to two types of reward-related events in a similar manner. They were activated by (i) primary liquid and food rewards and (ii) by conditioned, rewardpredicting, visual and auditory stimuli (Fig. 8). Thus, ~80% of dopamine neurons showed rather homogeneous, short latency activations following free liquid (Mirenowicz and Schultz, 1994
; Hollerman and Schultz, 1998
) and when touching food during self-initiated movements (Romo and Schultz, 1990
). Most of these dopamine neurons were also activated by conditioned, reward-predicting, visual and auditory stimuli but lost their response to the primary rewards (Ljungberg et al., 1992
; Mirenowicz and Schultz, 1994
). During learning of the reaction time and delayed response tasks, dopamine neurons were activated by the primary reward whenever reward contingencies changed with a new learning step, whereas reward responses occurred only exceptionally when learning was completed (Ljungberg et al., 1992
; Schultz et al., 1993
; Mirenowicz and Schultz, 1994
). A similar result was obtained when the same dopamine neurons were studied both during and after learning of new pictures in the discrimination task. Here 50% of dopamine neurons were activated by reward during the rising phase of the learning curve, whereas only 12% of them were activated when the curve approached an asymptote (Hollerman and Schultz, 1998
). By contrast, primary or conditioned visual or auditory aversive stimuli were rather ineffective, activating <15% of dopamine neurons (Mirenowicz and Schultz, 1996
).
|
Reward Prediction Error
A closer inspection of the reward responses suggested that dopamine neurons were very sensitive to the unpredictability of rewards (Schultz et al., 1993, 1997
; Schultz, 1998
). They were only activated by rewards that were not predictable in time by phasic stimuli, and showed no response to fully predictable rewards (Fig. 8
). If, however, a fully predictable reward was omitted because of an error of the animal or an intervention by an experimenter, dopamine neurons were depressed in their activity at the exact time at which the reward would have occurred (Fig. 9
, bottom). However, the more general prediction of reward in an experimental context did not seem to influence their responses. This obvious temporal aspect in reward prediction was tested by suddenly modifying the time of reward delivery (Hollerman and Schultz, 1998
). An earlier reward (0.5 s) induced an activation for a few trials at the earlier time, and a delayed reward induced a depression at the old reward time and an activation at the new time. Thus dopamine neurons appear to code the temporal discrepancy between the occurrence and the prediction of reward, which is also termed an error in the temporal prediction of reward.
![]() |
Discussion |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Reward Signals
Predominating Reward Relationships in Orbitofrontal Neurons
One amazing finding in the presently employed tasks was the predominant reward-related nature of orbitofrontal task-related activations. These neurons showed a limited spectrum of covariations with behavioral events compared with dorsolateral and ventrolateral prefrontal cortex, where many neurons process, respectively, spatial positions and visual features of environmental objects (the where or how, and the what) (Kubota and Niki, 1971; Fuster and Alexander, 1971
; Funahashi et al., 1990
; Wilson et al., 1993
; Ungerleider et al., 1998
). These functions may be derived from inputs from posterior parietal cortex and inferotemporal cortex (Petrides and Pandya, 1988
; Pandya and Yeterian, 1996
). As motivational processes determine the probability and intensity of behavioral reactions, orbitofrontal neurons coding motivational aspects of goal-directed behavior would influence why a behavioral reaction is performed (Fig. 10
). These activities may result from trans-synaptic inputs from the striatum with the prominent relationships to reward expectation described in this report, and from inputs from the amygdala and rostral and medial temporal lobe (Barbas, 1988
; Carmichael and Price, 1995
; Cavada et al., 2000
; Öngür and Price, 2000
). The apparent oversimplicity of this scheme is demonstrated by the observation that spatial and visual features are not treated in complete separation in the prefrontal convexity (Rao et al., 1997
; Owen et al., 1998
) and that the delay activity in some dorsolateral prefrontal neurons is sensitive to reward information (Watanabe, 1996
; Hokisaka and Watanabe, 2000).
|
Orbitofrontal neurons discriminated very well between rewards and conditioned reinforcers, and between different liquid or food rewards. Many of these discriminations appeared to reflect the motivational value relative to the available alternative reward, as inferred from the animals' preference in overt choice behavior. Apparently these orbitofrontal neurons did not primarily code the pure physical aspects of the reward objects. Physical properties of rewards may be coded in more caudal orbitofrontal areas where neurons discriminate well between various tastes and smells of rewards (Rolls et al., 1990; Critchley and Rolls, 1996
; Rolls, 2000
).
Striatal Activities in Comparison
Similar to orbitofrontal neurons, slowly firing neurons in the striatum showed activities preceding or following rewards, some of which also discriminated between different reward liquids. However, striatal neurons showed a large spectrum of other behavioral relationships (Schultz et al., 1995). These neurons responded to signals instructing for arm and eye movements, and were activated during the delay periods of tasks testing short-term memory and the preparation of externally instructed or self-initiated arm and eye movements (Hikosaka et al., 1989a
; Alexander and Crutcher, 1990
; Johnstone and Rolls, 1990
; Apicella et al., 1992
; Schultz and Romo, 1992
). They were also activated during the initiation and execution of arm and eye movements (Crutcher and DeLong, 1984
; Hikosaka et al., 1989a
; Crutcher and Alexander, 1990
; Montgomery and Buchholz, 1991
; Romo and Schu;tz, 1992). Our data suggest now that a considerable proportion of these activities depended on liquid reward delivered at trial end rather than a conditioned auditory reinforcer (Hollerman et al., 1998
). A comparable result was obtained in a task using a fixed ratio response schedule in which some striatal neurons responded only to the stimulus closest to the final reward (Shidara et al., 1998
). Eye movement-related activations in the striatum showed a comparable dependence on liquid reward rather than no reward, and their directional tuning was strongly modified by the spatial positions of rewarded as compared with unrewarded stimuli (Kawagoe et al., 1998
). These data suggest that the expectation of reward influenced heavily striatal activities related to the behavior producing these rewards.
Dopamine Responses in Comparison
Dopamine neurons showed a much smaller spectrum of taskrelated activities than orbitofrontal and striatal neurons. They were driven by primary rewards, reward-predicting stimuli and attention-inducing stimuli. The responses were similar between different neurons and between different stimuli eliciting them. By contrast, different neurons in the orbitofrontal cortex and striatum showed different responses to the same events or responded to different events. The strong reward relationships of dopamine responses were similar to orbitofrontal responses, although responses to certain attention-inducing stimuli were also seen in dopamine neurons. Dopamine neurons failed to show sustained activations preceding rewards which occurred in orbitofrontal and striatal neurons.
Reward Unpredictability
Most reward-related activations in the orbitofrontal cortex and striatum occurred in relation to well-predictable task events. However, some orbitofrontal neurons responded to reward only when it occurred unpredictably outside of any behavioral task. The dependence on event predictability was the rule for dopamine neurons, being activated by unpredictable reward, uninfluenced by predictable reward and depressed by omitted reward. Thus, a subset of orbitofrontal neurons and most dopamine neurons appeared to code an error in reward prediction.
Distributed Processing of Reward Features
Learning
Rewards have an important function in learning, as they increase the frequency and intensity of behavior leading to rewards. This function depends on the unpredictability of rewards, as formal learning theories and reinforcement learning models indicate that only surprising rewards have the capacity to support learning (Rescorla and Wagner, 1972; Sutton and Barto, 1981
). Neuronal responses signalling selectively unpredictable rewards would be particularly effective as teaching signals in learning networks. The rather homogeneous reward response in the majority of dopamine neurons would be broadcast as a global reinforcement signal along divergent projections to large numbers of neurons in the striatum and frontal cortex, where it could exert an immediate or long-term influence on the efficacy of other inputs to these neurons or modify synaptic strengths further downstream (Schultz, 1998
). By contrast, orbitofrontal reward responses occur in selected groups of cortical neurons which, unlike dopamine neurons, project in a highly specific and selective manner to postsynaptic neurons. These responses might serve as selective teaching signals for certain groups of neurons, rather than having a global reinforcing effect.
In contrast to the potential function of teaching signals, dopamine responses did not appear to discriminate between different rewards. (Orbitofrontal responses to unpredictable rewards have not yet been tested with different rewards.) However, correct learning about specific rewards requires discrimination between different rewards. Reward-discriminating neurons described above have been found in the anterior the orbitofrontal cortex. They have been reported also in the posterior, gustatory and olfactory parts of the orbitofrontal cortex (Rolls et al., 1990; Critchley and Rolls, 1996
; Rolls, 2000
), striatum (Cromwell et al., 1998
) and amygdala (Nishijo et al., 1988
). These neurons detected rewards irrespective of any prediction, indicating that their responses may not constitute ideal teaching signals. As reward unpredictability and reward discrimination may be processed separately, the optimal use of rewards for learning may involve the simultaneous processing of both reward aspects in different brain structures. Deficiencies in any of these systems would lead to slowed, inaccurate or redundant learning.
Established Task Performance
Activities of neurons in all three structures investigated here reflected the expectation of predictable reward (Fig. 9). In the first form, neurons showed phasic activations following the occurrence or depressions during the omission of predictable rewards. Thus orbitofrontal and striatal neurons were activated by predictable rewards, and dopamine neurons were depressed by omitted rewards. In the second form, dopamine and some orbitofrontal and striatal neurons responded phasically to signals predicting rewards. In the third form, orbitofrontal and striatal neurons showed sustained activations during the expectation immediately preceding a predictable reward. In the fourth form, striatal activations related to the preparation and execution of movements were specifically modulated by expected rewards. Similar activities have been reported for the dorsolateral prefrontal cortex (Watanabe, 1996
). Thus various aspects of predictable rewards were processed simultaneously in different brain structures. The optimal use of predictable rewards for controlling behavior would be based on combining this heterogeneous information.
From Reinforcers to Goals of Intentional Behavior
Activities following Rewards
One of the main functions of rewards is to act as positive reinforcers. In initial learning situations, neuronal mechanisms of reinforcement could only use responses following the occurrence of reward, as established reward predictions do not exist. Any activity following reward may contribute to these mechanisms. Responses following reward were presently found in the orbitofrontal cortex, striatum and dopamine neurons. The responses of dopamine and orbitofrontal neurons may constitute effective teaching signals because of their relationship to reward unpredictability, a requirement for appetitive learning (Rescorla and Wagner, 1972). By contrast, rewards occur predictably in established tasks and during learning-related modifications of reward predictions. Only neurons responding to predictable rewards would be activated in these situations. Responses which discriminate between different rewards could be essential for the selectivity of learning. This report showed such responses in the orbitofrontal cortex, and have also been reported for the posterior orbitofrontal cortex (Rolls et al., 1990
; Critchley and Rolls, 1996
; Rolls, 2000
), striatum (Cromwell et al., 1998
) and amygdala (Nishijo et al., 1988
).
Activations Related to the Expectation of Predictable Rewards
A higher form of reward processing was found with activations which preceded the occurrence of predictable rewards in orbitofrontal, striatal and dopamine neurons. The occurrence of activations before reward in each trial suggests that these neurons had access to stored representations of the upcoming reward. The representations were evoked by the presentation of instruction and trigger stimuli which had gained a predictive value for the future occurrence of reward through the past experience of the animal.
Activations preceding rewards were found in two forms, namely phasic responses to reward-predicting stimuli in orbitofrontal, striatal and dopamine neurons, and sustained activations in orbitofrontal and striatal neurons which lasted between the last event before the reward and the reward itself. Similar phasic and sustained reward-predicting activations were reported for orbitofrontal and amygdala neurons (Nishijo et al., 1988; Schoenbaum et al., 1998
), and for striatal neurons during oculomotor tasks (Hikosaka et al., 1989b
) and in relation to the distance to the predicted reward (Shidara et al., 1998
). Neurons in the orbitofrontal cortex and striatum adapted their reward prediction-related activities when learning involved the modification of existing reward predictions (Tremblay et al., 1998
; Tremblay and Schultz, 2000b
).
The activations related to the expectation of a predictable reward provide information about the upcoming event. This information allows subjects to identify objects of vital importance, discriminate valuable from less valuable objects before they appear, mentally evaluate various ways of reacting, compare the gains and losses from each possible reaction, and select and prepare behavioral reactions. The neuronal processing of predictive reward information would be an enormous advantage for the control of behavior and would increase the likelihood of obtaining reward objects.
Activations Reflecting Representations of Goals?
Rewarding outcomes can be considered as goals of voluntary behavior when the behavior is intentionally directed at obtaining the reward. According to motivational theory, at least two basic criteria should be fulfilled before calling an action goal-directed (Dickinson and Balleine, 1994): (i) there should already be representations of the outcome at the time the behavior is executed (knowing the outcome when performing the action), and (ii) the behavior should be based on knowledge about the contingency between choice and outcome (knowing the causal relationship between the action and the outcome obtained by that action). Behavioral choices are made according to the motivational value of rewards (quality, quantity and probability of reward). Monkeys can apparently make decisions about the outcome of behavior. They show consistent choices among different outcomes irrespective of spatial positions or visual features of cues, as observed with video games (Washburn et al., 1991
), food versus cocaine discrimination (Nader and Woolverton, 1991
) and nutrient rewards (Baylis and Gaffan, 1991
; Tremblay and Schultz, 1999
).
The present experiments revealed two forms of reward processing that may reflect the neuronal coding of goals of intentional behavior. First, the expectation of reward had pronounced influences on all forms of task-related activity in the striatum (Hollerman et al., 1998). Reward expectation modified the responses to instructions and the sustained activations related to the preparation of movement. Previous studies related these activities to parameters of upcoming movements (Hikosaka et al., 1989a
; Alexander and Crutcher, 1990
), to execution versus withholding of movement (Apicella et al., 1992
) and to stimulus-triggered versus self-initiated movements (Schultz and Romo, 1992
). Similar reward influences were seen on activations following the trigger stimulus which may be related to stimulus detection, movement initiation and movement execution (Aldridge et al., 1980
; Hikosaka et al., 1989a
; Montgomery and Buchholz, 1991
; Romo and Schultz, 1992
; Gardiner and Nelson, 1992
). Comparable reward influences were seen with arm and eye movements in the dorsolateral prefrontal cortex and caudate nucleus (Watanabe, 1992a
,b
, 1996
; Kawagoe et al., 1998
). These data suggest that these neurons already have access to representations of the future reward at the time they are coding the preparation and execution of the action leading to this outcome. Thus striatal and prefrontal neurons could fulfill criterion (i) of goal-directed behavior stated above. However, the data do not reveal whether these neurons would code the causal relationship between the action and its outcome.
In the second form of reward processing in the context of goal-directed behavior, reward discriminations in some orbitofrontal neurons reflected the motivational value of reward relative to available alternatives, as inferred from the preferences animals showed during choice behavior (Tremblay and Schultz, 1999). These neurons showed higher or lower reward-related activations associated with the more preferred reward among two rewards, irrespective of which combination of rewards was used. By contrast, very few orbitofrontal activations reflected physical reward characteristics irrespective of motivational value, although such relationships probably exist in posterior, gustatory and olfactory parts of the orbitofrontal cortex (Rolls et al., 1990
; Critchley and Rolls, 1996
; Rolls, 2000
). These results are consistent with results of lesions of the orbitofrontal cortex. Human patients make errors in decisions about outcomes of actions (Bechara et al., 1998
) and monkeys change their reward preferences (Baylis and Gaffan, 1991
).
The coding of reward preferences in the orbitofrontal cortex suggests that neurons have access to representations of outcomes relative to other outcomes. The neuronal activations might fulfill criterion (ii) of goal-directed behavior stated above, as the representations seem to take into account the animal's own choice. However, it is unclear whether the activations reflect the relative motivational value of the outcome irrespective of any action or concern also the contingencies of the choice behavior leading to the preferred outcome. A further, interesting aspect in the coding of actionoutcome relationships is found in neurons of the cingulate motor area which become active selectively when monkeys change to a different movement after the quantity of reward diminished with the current movement (Shima and Tanji, 1998). The preference-related activations of orbitofrontal neurons may serve to identify the reward with the relatively higher or lower value when a decision is made between alternatives. These activations would be advantageous for neuronal mechanisms involved in making decisions about immediate behavioral choices leading to rewarding goals, as they indicate the most profitable outcome among available alternatives. Processing of information relative to other objects is also found in neurons of the supplementary eye field which code spatial positions of eye movement targets relative to other objects (Olson and Gettner, 1995
). This mechanism could facilitate quick orientations relative to established landmarks without computing a full spatial map of the environment. Relative information coding in general would allow subjects to reach quick decisions without going through the lengthy process of computing every aspect of the objects present in the environment.
![]() |
Notes |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Address correspondence to W. Schultz, Institute of Physiology and Program in Neuroscience, University of Fribourg, CH-1700 Fribourg, Switzerland. Email: wolfram.schultz{at}unifr.ch.
![]() |
Footnotes |
---|
2 Present address: Department of Psychology, Allegheny College, Meadville, PA 16335, USA
![]() |
References |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Alexander GE, Crutcher MD (1990) Preparation for movement: neural representations of intended direction in three motor areas of the monkey. J Neurophysiol 64:133150.
Apicella P, Ljungberg T, Scarnati E, Schultz W (1991) Responses to reward in monkey dorsal and ventral striatum. Exp Brain Res 85:491500.[ISI][Medline]
Apicella P, Scarnati E, Ljungberg T, Schultz W (1992) Neuronal activity in monkey striatum related to the expectation of predictable environmental events. J Neurophysiol 68:945960.
Barbas H (1988) Anatomic organization of basoventral and mediodorsal visual recipient prefrontal regions in the rhesus monkey. J Comp Neurol 276:313342.[ISI][Medline]
Baylis LL, Gaffan D (1991) Amygdalectomy and ventromedial prefrontal ablation produce similar deficits in food choce and in simple object discrimination learning for an unseen reward. Exp Brain Res 86: 617622.[ISI][Medline]
Bechara A, Damasio H, Tranel D, Anderson SW (1998) Dissociation of working memory from decision making within the human prefrontal cortex. J Neurosci 18:428437.
Carmichael ST, Price JL (1995) Limbic connections of the orbital and medial prefrontal cortex in macaque monkeys. J Comp Neurol 363:615641.[ISI][Medline]
Cavada C, Compañy T, Tejedor J, Cruz-Rizzolo RJ, Reinoso-Suárez F (2000) The anatomical connections of the macaque monkey orbitofrontal cortex: a review. Cereb Cortex 10:220242.
Critchley HG, Rolls ET (1996) Olfactory neuronal responses in the primate the orbitofrontal cortex: analysis in an olfactory discrimination task. J Neurophysiol 75:16591672.
Cromwell HC, Hassani OK, Schultz W (1998) Reward discrimination in primate striatum. Soc Neurosci Abstr 24:1652.
Crutcher MD, Alexander GE (1990) Movement-related neuronal activity selectively coding either direction or muscle pattern in three motor areas of the monkey. J Neurophysiol 64:151163.
Crutcher MD, DeLong MR (1984) Single cell studies of the primate putamen. II. Relations to direction of movement and pattern of muscular activity. Exp Brain Res 53:244258.[ISI][Medline]
Damasio AR (1994) Descartes' error. New York: Putnam.
Dias R, Robbins TW, Roberts AC (1996) Dissociation in prefrontal cortex of affective and attentional shifts. Nature 380:6972.[ISI][Medline]
Dickinson A, Balleine B (1994) Motivational control of goal-directed action. Anim Learn Behav 22:118.[ISI]
Divac I, Rosvold HE, Szwarcbart MK (1967) Behavioral effects of selective ablation of the caudate nucleus. J Comp Physiol Psychol 63:184190.[ISI][Medline]
Eblen F, Graybiel AM (1995) Highly restricted origin of prefrontal cortical inputs to striosomes in the macaque monkey. J Neurosci 15: 59996013.[Abstract]
Fibiger HC, Phillips AG (1986) Reward, motivation, cognition: psychobiology of mesotelencephalic dopamine systems. In: Handbook of physiology. Vol IV. The nervous system (Bloom FE, ed), pp 647675. Baltimore, MD: Williams & Wilkins.
Fujita K (1987) Species recognition by five macaque monkeys. Primates 28:353366.[ISI]
Funahashi S, Bruce CJ, Goldman-Rakic PS (1990) Visuospatial coding in primate prefrontal neurons revealed by oculomotor paradigms. J Neurophysiol 63:814831.
Funahashi S, Chafee MV, Goldman-Rakic PS (1993) Prefrontal neuronal activity in rhesus monkeys performing a delayed anti-saccade task. Nature 365:753756.[ISI][Medline]
Fuster JM (1973) Unit activity of prefrontal cortex during delayedresponse performance: neuronal correlates of transient memory. J Neurophysiol 36:6178.
Fuster JM, Alexander GE (1971) Neuron activity related to short-term memory. Science 173:652654.[ISI][Medline]
Gardiner TW, Nelson RJ (1992) Striatal neuronal activity during the initiation and execution of hand movements made in response to visual and vibratory cues. Exp Brain Res 92:1526.[ISI][Medline]
Haber SN, Lynd E, Klein C, Groenewegen HJ (1990) Topographic organization of the ventral striatal efferent projections in the rhesus monkey: an autoradiographic tracing study. J Comp Neurol 293: 282298.[ISI]
Haber S, Kunishio K, Mizobuchi M, Lynd-Balta E (1995) The orbital and medial prefrontal circuit through the primate basal ganglia. J Neurosci 15:48514867.[Abstract]
Hikosaka K, Watanabe M (2000) Delay activity of orbital and lateral prefrontal neurons of the monkey varying with different rewards. Cereb Cortex 10:263271.
Hikosaka O, Sakamoto M, Usui S (1989a) Functional properties of monkey caudate neurons. I. Activities related to saccadic eye movements. J Neurophysiol 61:780798.
Hikosaka O, Sakamoto M, Usui S (1989b) Functional properties of monkey caudate neurons. III. Activities related to expectation of target and reward. J Neurophysiol 61:814832.
Hollerman JR, Schultz W (1998) Dopamine neurons report an error in the temporal prediction of reward during learning. Nature Neurosci 1:304309.[ISI][Medline]
Hollerman JR, Tremblay L, Schultz W (1998) Influence of reward expectation on behavior-related neuronal activity in primate striatum. J Neurophysiol 80:947963.
Iversen SD, Mishkin M (1970) Perseverative interference in monkeys following selective lesions of the inferior prefrontal convexity. Exp Brain Res 11:376386.[ISI][Medline]
Jacobsen CF, Nissen HW (1937) Studies of cerebral function in primates. IV. The effects of frontal lobe lesions on the delayed alternation habit in monkeys. J Comp Physiol Psychol 23:101112.
Johnstone S, Rolls ET (1990) Delay, discriminatory, and modality specific neurons in the striatum and pallidum during short-term memory tasks. Brain Res 522:147151.[ISI][Medline]
Kawagoe R, Takikawa Y, Hikosaka O (1998) Expectation of reward modulates cognitive signals in the basal ganglia. Nature Neurosci 1:411416.[ISI][Medline]
Kubota K, Niki H (1971) Prefrontal cortical unit activity and delayed alternation performance in monkeys. J Neurophysiol 34:337347.
Kubota K, Iwamoto T, Suzuki H (1974) Visuokinetic activities of primate prefrontal neurons during delayed-response performance. J Neurophysiol 37:11971212.
Ljungberg T, Apicella P, Schultz W (1992) Responses of monkey dopamine neurons during learning of behavioral reactions. J Neurophysiol 67:145163.
Mirenowicz J, Schultz W (1994) Importance of unpredictability for reward responses in primate dopamine neurons. J Neurophysiol 72: 10241027.
Mirenowicz J, Schultz W (1996) Preferential activation of midbrain dopamine neurons by appetitive rather than aversive stimuli. Nature 379:449451.[ISI][Medline]
Montgomery EB Jr, Buchholz SR (1991) The striatum and motor cortex in motor initiation and execution. Brain Res 549:222229.[ISI][Medline]
Nader MA, Woolverton WL (1991) Effects of increasing the magnitude of an alternative reinforcer on drug choice in a discrete trials choice procedure. Psychopharmacology 105:169174.[ISI][Medline]
Nishijo H, Ono T, Nishino H (1988) Single neuron responses in amygdala of alert monkey during complex sensory stimulation with affective significance. J Neurosci 8:35703583.[Abstract]
Olson CR, Gettner SN (1995) Object-centered direction selectivity in the macaque supplementary eye field. Science 269:985988.[ISI][Medline]
Öngür D, Price JL (2000) The organization of networks within the orbital and medial prefrontal cortex of rats, monkeys and humans. Cereb Cortex 10:206219.
Owen AM, Stern CE, Look RB, Tracey I, Rosen BR, Petrides M (1998) Functional organization of spatial and nonspatial working memory processing within the human lateral prefrontal cortex. Proc Natl Acad Sci USA 95:77217726.
Pandya DN, Yeterian EH (1996) Comparison of prefrontal architectures and connections. Phil Trans R Soc Lond B 351:14231432.[ISI][Medline]
Petrides M, Pandya DN (1988) Association fiber pathways to the frontal cortex from the superior temporal region in the rhesus monkey. J Comp Neurol 273:5266.[ISI][Medline]
Rao SC, Rainer G, Miller EK (1997) Integration of what and where in the primate prefrontal cortex. Science 276:821824.
Rescorla RA, Wagner AR (1972) A theory of Pavlovian conditioning: variations in the effectiveness of reinforcement and nonreinforcement. In: Classical conditioning. Vol II. Current research and theory (Black AH, Prokasy WF eds), pp 6499 New York: Appleton Century Crofts.
Robbins TW, Everitt BJ (1996) Neurobehavioural mechanisms of reward and motivation. Curr Opin Neurobiol 6:228236.[ISI][Medline]
Rolls ET (1996) The orbitofrontal cortex. Phil Trans R Soc Lond B 351: 14331444.[ISI][Medline]
Rolls ET (2000) The orbitofrontal cortex and reward. Cereb Cortex 10: 284294.
Rolls ET, Yaxley S, Sienkiewicz ZJ (1990) Gustatory responses of single neurons in the caudolateral the orbitofrontal cortex of the macaque monkey. J Neurophysiol 64:10551066.
Romo R, Schultz W (1990) Dopamine neurons of the monkey midbrain: contingencies of responses to active touch during self-initiated arm movements. J Neurophysiol 63:592606.
Romo R, Schultz W (1992) Role of primate basal ganglia and frontal cortex in the internal generation of movements. III. Neuronal activity in the supplementary motor area. Exp Brain Res 91:396407.[ISI][Medline]
Rosenkilde CE, Bauer RH, Fuster JM (1981) Single cell activity in ventral prefrontal cortex of behaving monkeys. Brain Res 209:375394.[ISI][Medline]
Schoenbaum G, Chiba AA, Gallagher M (1998) Orbitofrontal cortex and basolateral amygdala encode expected outcome during learning. Nature Neurosci 1:155159.[ISI][Medline]
Schultz W (1998) Predictive reward signal of dopamine neurons. J Neurophysiol 80:127.
Schultz W, Romo R (1992) Role of primate basal ganglia and frontal cortex in the internal generation of movements. I. Preparatory activity in the anterior striatum. Exp Brain Res 91:363384.[ISI][Medline]
Schultz W, Apicella P, Scarnati E, Ljungberg T (1992) Neuronal activity in monkey ventral striatum related to the expectation of reward. J Neurosci 12:45954610.[Abstract]
Schultz W, Apicella P, Ljungberg T (1993) Responses of monkey dopamine neurons to reward and conditioned stimuli during successive steps of learning a delayed response task. J Neurosci 13:900913.[Abstract]
Schultz,W, Apicella P, Romo R, Scarnati E (1995) Context-dependent activity in primate striatum reflecting past and future behavioral events. In: Models of information processing in the basal ganglia (Houk JC, Davis JL, Beiser DG, eds), pp 1128. Cambridge, MA: MIT Press.
Schultz W, Dayan P, Montague RR (1997) A neural substrate of prediction and reward. Science 275:15931599.
Selemon LD, Goldman-Rakic PS (1985) Longitudinal topography and interdigitation of corticostriatal projections in the rhesus monkey. J Neurosci 5:776794.[Abstract]
Shidara M, Aigner TG, Richmond BJ (1998) Neuronal signals in the monkey ventral striatum related to progress through a predictable series of trials. J Neurosci 18:26132625.
Shima K, Tanji J (1998) Role for cingulate motor area cells in voluntary movement selection based on reward. Science 282:13351338.
Sutton RS, Barto AG (1981) Toward a modern theory of adaptive networks: expectation and prediction. Psychol Rev 88:135170.[ISI][Medline]
Thorpe SJ, Rolls ET, Maddison S (1983) The orbitofrontal cortex: neuronal activity in the behaving monkey. Exp Brain Res 49:93115.[ISI][Medline]
Tremblay L, Schultz W (1999) Relative reward preference in primate the orbitofrontal cortex. Nature 398:704708.[ISI][Medline]
Tremblay L, Schultz W (2000a) Reward-related neuronal activity during gonogo task performance in primate orbitofrontal cortex. J Neurophysiol (in press).
Tremblay L, Schultz W (2000b) Modifications of reward expectationrelated neuronal activity during learning in primate orbitofrontal cortex. J Neurophysiol (in press).
Tremblay L, Hollerman JR, Schultz W (1998) Modifications of reward expectation-related neuronal activity during learning in primate striatum. J Neurophysiol 80:964977.
Ungerleider LG, Courtney SM, Haxby JV (1998) A neural system for human visual working memory. Proc Natl Acad Sci USA 95:883890.
Washburn DA, Hopkins WD, Rumbaugh DM (1991) Perceived control in rhesus monkeys (Macaca mulattta): enhanced video-task performance. J Exp Psychol: Anim Behav Proc 17:123129.[ISI][Medline]
Watanabe M (1992a) Frontal units coding the associative significance of visual and auditory stimuli. Exp Brain Res 89:233247.[ISI][Medline]
Watanabe M (1992b) Prefrontal unit activity during associative learning in the monkey. Exp Brain Res 80:296309.
Watanabe M (1996) Reward expectancy in primate prefrontal neurons. Nature 382:629632.[ISI][Medline]
Wilson FAW, O'Scalaidhe SP, Goldman-Rakic PS (1993) Dissociation of object and spatial processing domains in primate prefrontal cortex. Science 260:19551958.[ISI][Medline]
Wise RA (1996) Neurobiology of addiction. Curr Opin Neurobiol 6:243251.[ISI][Medline]
Wise RA, Rompre PP (1989) Brain dopamine and reward. Annu Rev Psychol 40:191225.[ISI][Medline]