Institute of Physiology, University of Fribourg, CH-1700 Fribourg, Switzerland
![]() |
ABSTRACT |
---|
![]() ![]() ![]() ![]() ![]() ![]() |
---|
Hollerman, Jeffrey R., Léon Tremblay, and Wolfram Schultz. Influence of reward expectation on behavior-related neuronal activity in primate striatum. J. Neurophysiol. 80: 947-963, 1998. Rewards constitute important goals for voluntary behavior. This study aimed to investigate how expected rewards influence behavior-related neuronal activity in the anterior striatum. In a delayed go-nogo task, monkeys executed or withheld a reaching movement and obtained liquid or sound as reinforcement. An initial instruction picture indicated the behavioral reaction to be performed and the reinforcer to be obtained after a subsequent trigger stimulus. Movements varied according to the reinforcers predicted by the instructions, suggesting that animals differentially expected the two outcomes. About 250 of nearly 1,500 neurons in anterior parts of caudate nucleus, putamen, and ventral striatum showed typical task-related activations that reflected the expectation of instructions and trigger, and the preparation, initiation, and execution of behavioral reactions. Strikingly, most task-related activations occurred only when liquid reward was delivered at trial end, rather than the reinforcing sound. Activations close to the time of reward showed similar preferences for liquid reward over the reinforcing sound, suggesting a relationship to the expectation or detection of the motivational outcome of the trial rather than to a "correct" or "end-of-trial" signal. By contrast, relatively few activations in the present task occurred irrespective of the type of reinforcement. In conclusion, many of the behavior-related neurons investigated in the anterior striatum were influenced by an upcoming primary liquid reward and did not appear to code behavioral acts in a motivationally neutral manner. Rather, these neurons incorporated information about the expected outcome into their behavior-related activity. The activations influenced by reward several seconds before its occurrence may constitute a neuronal basis for the retrograde effects of rewards on behavioral reactions.
Much animal behavior appears to be directed toward obtaining specific goals. How could a reward occurring at the end of a trial influence the preceding behavioral reaction that led to the reward? Theories of goal-directed behavior emphasize the importance of expectations of motivational outcomes evoked before the goal is attained (Dickinson 1980 The study was conducted on two male Macaca fascicularis monkeys (A, 4.4 kg weight; B, 5.4 kg weight) performing a behavioral task under computer control. Activity of single neurons was recorded with moveable microelectrodes while monitoring arm muscle activity through chronically implanted electrodes. On termination of recording, electrode positions were reconstructed on histological brain sections.
Behavioral procedures
Animals were seated in a primate chair inside a completely enclosed behavioral apparatus. An immovable, touch-sensitive resting key was mounted on the right-hand side in front of the animal such that the elbow joint rested at ~90° when its hand contacted the key. Key release was detected by a frequency-sensing circuit that reacted to a change in electrical capacity induced by the touch of the animal's hand. Visual stimuli of 13 × 13° were presented as instruction or trigger stimuli in the center of a 13-in. computer monitor positioned immediately behind a transparent vertical wall. A small transparent response lever (7 × 15 mm) protruding by 20 mm from the vertical wall was positioned in the center of the midsagittal plane, at 35° below the eye level of the animal and within easy reaching distance (250 mm from the animal's shoulder). Thus the lever was located immediately below the projection of the visual stimuli. A 1-kHz sound of rectangular waveform was delivered from a distant sound source with ~68 dB intensity. Small quantities of apple juice (0.15-0.20 ml) were delivered by an electronically controlled solenoid valve and arrived at a spout at the animal's mouth 55 ms after the electronic feeder pulse. All task events were controlled by a suitably interfaced laboratory computer. A closed-circuit video system served to continuously supervise limb movements from above. Animals were fluid- and partly food-deprived during weekdays. They received apple juice as reward during task performance and cookies during breaks. Recording sessions on each weekday lasted 3-4 h, after which animals were returned to their home cages.
Data acquisition
After termination of behavioral conditioning, animals underwent surgery under deep pentobarbital sodium anesthesia and aseptic conditions. Two cylinders for head fixation and a stereotaxically positioned, stainless steel chamber were fixed to the skull to permit vertical access with microelectrodes to the left striatum. The dura was left intact. Teflon-coated, multistranded, stainless steel wires were implanted into the left and right extensor digitorum communis and biceps brachii muscles and led subcutaneously to the head. Ag-AgCl electrodes were implanted into the outer, upper, and lower canthi of the orbits. All metal components, including plugs for the muscle and periorbital electrodes, were imbedded in several layers of dental cement and fixed to the skull with surgical grade stainless steel screws.
Data analysis
Off-line data inspection was performed on the basis of raster dots, perievent time histograms, and cumulative frequency distributions of neuronal and EMG impulses, and with displays of single-trial or averaged analog data, in reference to any of the task components.
Histological reconstruction
During the last recording sessions with each animal, small marking lesions were placed by passing negative currents (5-10 µA for 5-20 s) through the microelectrode, while larger lesions (20 µA for 20 or 60 s) were positioned at a few locations above in the same track. This produced distinct patterns of vertically oriented histological marks. Animals were deeply anesthetized with pentobarbital and conventionally perfused with paraformaldehyde through the heart. Guide cannulae were inserted into the brain at known coordinates of the implant system to delineate the general area of recording. The tissue was cut in 40-µm-thick serial coronal sections on a cryotome, and every third section was stained with cresyl violet. All histological sections were projected on paper, and outlines of brain structures and marks from lesions and recent electrode tracks were drawn. Recording positions in tracks marked by electrolytic lesions were reconstructed by using distances to lesions according to microdrive readings entered into the protocol. Positions in parallel adjacent tracks were reconstructed at comparable vertical levels. In the internal capsule, no attempts were made to reconstruct the recording positions of neurons in reference to individual fiber bundles. The discrimination between neuronal and fiber impulses relied on the electrophysiological criteria described above. Differences in distributions of activations among anatomic structures were determined with the Behavioral performance
Both animals showed >95% correct task performance throughout the period of neuronal recording (monkey A: 99.7, 99.9, and 98.2%; monkey B: 98.1, 99.8, and 91.6% for rewarded movement, rewarded nonmovement, and unrewarded movement trials, respectively). Unrewarded movements were followed by a sound reinforcer and a subsequent rewarded trial. Because they did not lead to immediate reward, they are referred to as "unrewarded" for reasons of simplicity. Both rewarded and unrewarded movements involved reaching from the same starting position toward the same response lever. Reaction times in both animals were significantly shorter in rewarded as compared with unrewarded movement trials, whereas movement times differed inconsistently (Table 1). In rewarded movement trials, animal A often and animal B always kept pressing the response lever after the reaching movement until the liquid was delivered, whereas they immediately returned to the resting key in unrewarded movement trials. This resulted in significantly longer return times in rewarded as compared with unrewarded movement trials. Correspondingly, arm muscles were activated during the return movement after liquid reward in rewarded but before the reinforcing sound in unrewarded movement trials (Fig. 2A). However, in some blocks of trials, monkey A performed rewarded movements with similar parameters and muscle activity as unrewarded movement trials (Fig. 2B). All of these differences concerned predominantly the timing of movement, whereas major differences in patterns of arm muscle activation or visible postural differences were not observed between rewarded and unrewarded movement trials.
Neuronal data base
A total of 1,487 slowly discharging striatal neurons with a median of 1.02 impulses/s during the control period were tested during task performance. Of these, 259 neurons (17%) exhibited 507 statistically significant task-related activations, of which 386 occurred selectively, 108 preferentially, and 13 nonpreferentially in any of the 3 trial types. For reasons of simplicity, selective and preferential relationships are referred together as preferential. Combinations of activations were frequently seen between activations preceding and following the same task events in the same trial types, including both rewarded trial types, unrewarded movement trials, or nonmovement trials. Other combinations concerned different task events. About one-third of neurons responding to the trigger also responded to the instruction in the same trial types, and about one-fifth of neurons responding to reinforcers showed additional instruction responses. Tonically active neurons with discharge rates of 3-8 impulses/s were not studied.
Responses to instructions
Transient or sustained responses to the instructions were found in 101 of the 259 task-related neurons (39%) (Table 2). Many responses occurred selectively in movement or nonmovement trials, and nearly all responses were influenced by the type of reinforcer delivered at trial end. Responses frequently reflected both the type of behavioral reaction and the reinforcer, responding only in rewarded or unrewarded movement trials (Fig. 4, A and B). Other neurons responded in both rewarded trial types irrespective of the execution or withholding of the movement but not in unrewarded movement trials (Fig. 4C). Some neurons responded preferentially in nonmovement trials. In contrast, only 2 of the 101 neurons responded preferentially in both movement trials without being influenced by the reinforcer. Responses had mean latencies of 180-204 ms and durations of 355-425 ms (transient responses) and 1,630-2,396 ms (sustained), without varying significantly among the three trial types (P > 0.05; ANOVA).
Activations preceding the trigger stimulus
Of the 259 task-related neurons, 80 (31%) showed activations that began slowly and at varying times during the instruction-trigger interval, had their peak at or before the trigger, lasted mostly until the trigger, and terminated abruptly thereafter (Table 2). More than one-half of these activations occurred predominantly in movement trials and were in addition influenced by the type of reinforcer, appearing either in liquid-rewarded trials (Fig. 5A) or, less frequently, in sound-reinforced trials (Fig. 5B). Only a single neuron was activated in both rewarded and unrewarded movement trials and thus independent of the type of reinforcement. We tested whether the neuronal differences between rewarded and unrewarded movement trials could be due to behavioral differences by ranking trials according to reaction times (Fig. 6A). Only a single neuron showed a clear relationship between activation strength and reaction time (Fig. 6B). Some neurons were activated in both rewarded trials (Fig. 5C) or exclusively in nonmovement trials. Most activations began >1 s before trigger presentation, their means varying insignificantly from 1,050 to 1,500 ms among trial types (Fig. 7).
Activations following the trigger stimulus
Of the 259 task-related neurons, 93 (36%) showed activations that closely followed the trigger stimulus. About one-half of these activations occurred predominantly in movement trials and depended in addition on liquid reward or sound reinforcement (Fig. 8, A and B). However, several movement-related activations occurred irrespective of the type of reinforcer (Fig. 8C). These activations closely followed the trigger stimulus (Fig. 8A) or in better temporal relation to the subsequent execution of movement (Fig. 8C), although many activations were ambiguus in this respect (Fig. 8B; Table 2). Some neurons were preferentially activated in both liquid-rewarded trials irrespective of the execution or withholding of the movement or responded mainly in nonmovement trials. Responses had mean latencies of 120-260 ms and durations of 460-485 ms (transient responses) and 1,390-1,920 ms (sustained; Fig. 9). Differences among trial types were statistically insignificant, except for longer lasting sustained responses in nonmovement versus unrewarded movement trials (P < 0.05; ANOVA with post hoc Fisher's PLSD test).
Activations preceding reinforcers
Of the 259 task-related neurons, 91 (35%) showed activations that usually began well before the liquid reward or the conditioned auditory reinforcer (Table 2). Most of these activations remained present until the liquid or sound was delivered and terminated in <500 ms afterward, even when these events occurred before or after the usual time. Most activations occurred in both liquid-rewarded trial types but not in sound-reinforced trials (Fig. 10A), although several activations were in addition restricted to one of the rewarded trial types (Fig. 10B). Some usually weak activations preceded only the reinforcing sound (Fig. 10C). Most activations began >1 s before the reinforcers (mean 1,200 ms) and varied insignificantly among trial types.
Responses to reinforcers
Of the 259 task-related neurons, 85 (33%) showed transient or sustained responses after the delivery of a reinforcer (Table 2). Most of them occurred only in both liquid-rewarded trial types irrespective of the movement (Fig. 11, A and B), although some were further restricted to movement or nonmovement trials. Very few neurons responded preferentially to the sound in unrewarded movement trials (Fig. 11C). Only a single neuron responded unpreferentially to all reinforcers. Responses had mean latencies of 210-380 ms and durations of 430-510 ms (transient responses) and 1,265-1,920 ms (sustained), varying insignificantly among trial types.
Activations preceding instructions
Of the 259 task-related neurons, 57 (22%) showed activations that began slowly and at varying times after the reinforcer of the preceding trial, showed their peak <500 ms before the instruction, and terminated abruptly afterward (Table 2). They thus differed distinctively from sustained responses to the preceding reinforcer. Because the instruction was the first stimulus in each trial and trials of different types followed each other in a semirandom sequence, neuronal activations were analyzed relative to the preceding trial type. Most activations occurred preferentially after both rewarded trial types and not after unrewarded movement trials (Fig. 12A). Because only rewarded trials could be followed by an unrewarded trial, these activations preceded the instruction for a possibly unrewarded trial. In a smaller group of neurons, activations occurred preferentially after nonmovement trials (Fig. 12B). As in this animal, nonmovement trials were always followed by a movement trial; these activations preceded the instruction for a movement trial. Most preinstruction activations began >1 s before the instructions (mean 1,150-1,220 ms), this being 2-7 s after the preceding reinforcer and varying insignificantly among trial types.
Recording positions
Histological reconstructions of recording positions revealed that neurons were sampled in caudate nucleus, putamen, and ventral striatum, including nucleus accumbens, between rostrocaudal levels A18 and A25. Recordings were made throughout the entire dorsoventral extent of these structures and were mediolaterally concentrated around the internal capsule (Fig. 13). Recordings in monkey A were focused at more rostral levels (A21-A25) than in monkey B (A18-A22). Task relationships did not differ between monkeys at the levels studied in both animals. Task-related changes were found in 130 of 796 caudate neurons (16%), 94 of 475 putamen neurons (20%), and 35 of 216 ventral striatal neurons (16%). Incidences of task-related changes did not vary significantly between these three structures (P = 0.6;
The present data show that many of the previously described behavior-related activations in anterior striatum show pronounced relationships to reward. Many task-related activations, not only those immediately preceding or following the reward, occurred only in trials leading to liquid reward. These activations occurred several seconds before the reward and were related to various behavioral processes, such as the expectation and detection of instruction and trigger stimuli, and the preparation, initiation, and execution of movement. In each trial, the type of reinforcement was indicated by the initial instruction signal. Animals were sensitive to this predictive information, as judged from the subtle movement differences. Apparently the reward relationships occurred on the basis of differential expectations of reinforcement. Thus the activity of many anterior striatal neurons reflected the expectation of outcome of specific behavioral reactions together with the performance of behavioral reactions necessary to obtain the outcome. The preference for reward over conditioned auditory reinforcement suggests a particular influence of primary reward for neuronal processing in anterior striatum. These activations contrasted with activations immediately following or preceding reward delivery, which reflected the detection or expectation of imminent reward.
Behavior
Animals differentiated in movement trials between the two reinforcers on the basis of the initial instruction. In most liquid rewarded trials, animals kept the hand on the touch lever after the movement until the liquid arrived, whereas they immediately returned to the resting key in trials reinforced by the sound. Apparently the predictive information provided by the instructions induced an expectation of the type of reinforcer as specific trial outcome. This expectation influenced the execution of the movement, as judged from the consistently shorter reaction times in rewarded as compared with unrewarded movement trials. This is reminiscent of the "differential outcome effect," according to which behavioral performance is ameliorated when different actions lead to different outcomes, apparently on the basis of differential expectations of outcome (Trapold 1970 Reward influence on behavior-related activity
FORMS OF REWARD INFLUENCE.
The most frequently observed influence of reinforcement on behavior-related neuronal activity in the anterior striatum consisted in the preferential occurrence of activations occurring only in rewarded movement trials but not in the other trial types. A second form consisted in preferential occurrences in both rewarded trial types, irrespective of the execution or withholding of movement. In a third form, a few activations occurred preferentially in movement trials reinforced by a sound rather than in the other trial types. The preferential activations in rewarded movement trials and in both rewarded trial types suggest a relative importance of primary liquid reward over the conditioned auditory reinforcer. These neurons appeared to be sensitive to the "appetitive weight" of events, with higher activity related to more explicit reward values like liquid. By contrast, relatively few task-related neuronal activations in anterior striatum occurred in both movement trials types irrespective of the type of reinforcer. This suggests a much greater sensitivity to the type of reinforcer in these neurons as compared with the motor information conveyed by the instructions.
INFLUENCE ON BEHAVIOR-RELATED ACTIVITY.
The predictive information about the reinforcer had a pronounced effect on several forms of behavior-related neuronal activity known from previous studies. These concerned movement-dependent transient and sustained instruction responses that in striatal neurons probably reflect a preparatory motor set preceding the behavioral reaction (Alexander and Crutcher 1990 VISUAL RESPONSES.
It might be conjectured that the present instruction responses simply reflected the visual features of instructions rather than reward. However, similar trial selectivities related to reward and not to individual instructions were observed in the same neurons during learning trials with different sets of visual instructions (Tremblay et al. 1998 MOVEMENT RELATIONSHIPS.
Rewarded and unrewarded movements were often performed with different parameters, namely reaction time and return time. It is conceivable that the present postinstruction and pretrigger activations related to movement preparation simply reflected differences in movements rather than reinforcement. However, preferential pretrigger activations were five times more frequent in rewarded as compared with unrewarded movement trials, which would be difficult to reconcile with differential preparatory processes for different movement parameters. It might then be that activations are stronger during the preparation of faster reactions, but only one neuron in fact showed this phenomenon. The differences in movement times between rewarded and unrewarded movements varied inconsistently between the two monkeys, but both monkeys showed similar proportions of reward-related pretrigger activations. Although we did not record activity from postural muscles, the visual inspection of the animal's trunk failed to reveal systematic postural differences between rewarded and unrewarded movements that could be related to the systematic differences in neuronal activations. Taken together, movement differences were unlikely to account for the higher incidence of pretrigger activations in rewarded movement trials.
AROUSAL.
The prominent reward-related activations might simply reflect heightened arousal accompanying the expectation of the motivating liquid reward, as compared with the less interesting sound reinforcer. In arousal-sensitive neurons, this could be manifested in increased neuronal activity in rewarded trials. However, most reward-related activations consisted of trial-selective, all-or-none activations that were related to particular task events and appeared too strong to reflect differences in arousal levels. Arousal appeared to be higher with the expected absence, rather than the presence, of reward in unrewarded movement trials. Animals apparently disliked unrewarded movement trials by showing the least correct task performance, particularly during later periods of daily experiments. This obvious increase of arousal in unrewarded movement trials was not associated with comparably increased neuronal activations. Further arguments disfavoring arousal are provided by the learning task in which most of the presently described neurons were also studied (Tremblay et al. 1998 Neuronal mechanisms underlying the reward influence
The present study revealed that many neurons in the anterior striatum showed activations that were related to behavioral reactions and were influenced by the expectation of future reward. These conjoint behavior and reward relationships could arise from similarly conjoint activations entering the striatum, or from convergence of separate inputs to the striatum.
INPUTS OF CONJOINT ACTIVATIONS.
Previous studies reported only a limited extent of reward influences on behavior-related activity in structures projecting to the anterior striatum. Neurons in primate orbitofrontal cortex responded differentially between appetitive and aversive conditioned visual stimuli that, however, did not constitute preparatory instructions similar to those employed presently (Thorpe et al. 1983 CONVERGENCE OF SEPARATE INPUTS.
The full range of behavior-related striatal activations influenced by reward may be based on convergence between behavior-related and reward-related inputs at the level of the striatum itself. All subdivisions of prefrontal cortex, premotor cortex, and primary motor cortex project to different parts of anterior striatum (Arikuni and Kubota 1986 DOPAMINE INPUTS.
Another input mediating an influence of reward could arise from phasic responses of dopamine neurons to primary rewards and reward-predicting stimuli. Dopamine neurons were also activated by instructions in comparable delay tasks, as well as by trigger stimuli occurring with variable delays after instructions (Hollerman and Schultz 1993
INTRODUCTION
Abstract
Introduction
Methods
Results
Discussion
References
; Dickinson and Balleine 1994
). This is seen in the "differential outcome effect" in which expectations of differential reinforcers lead to improved behavioral performance (Trapold 1970
). However, few neurophysiological investigations have so far dealt with the problem that information about rewards should propagate backward in time and influence neuronal mechanisms related to the behavior to be rewarded.
; Fibiger and Phillips 1986
; Robbins and Everitt 1992
; Wise 1982
). Neurophysiological studies revealed two forms of reward-related activity in the striatum. Neurons showed expectation-related activations that began shortly after a reward-predicting stimulus and terminated after reward was delivered, and detection-related responses that followed reward delivery (Apicella et al. 1991
, 1992
; Hikosaka et al. 1989c
; Schultz et al. 1992
). These activations occurred close to the time of reward but too late for influencing the numerous forms of behavior-related activity in the striatum that concern responses to movement-eliciting stimuli, activations during the preparation and execution of movements, and activations related to the expectation of future events (Alexander and Crutcher 1990
; Apicella et al. 1992
; Crutcher and DeLong 1984
; Hikosaka et al. 1989c
; Rolls et al. 1983
; Schultz and Romo 1992
).
, 1992
). In keeping with theories of goal-directed behavior, reward information should influence neuronal activity earlier during a trial, at a time when behavioral reactions are decided, planned, initiated, and executed. To study the influence of reward on these processes, we used a conditional, go-nogo delayed response task typical for striatal functions. In one trial type, animals performed an arm movement and received a drop of liquid reward. In the second trial type, the same movement was reinforced by a conditioned sound instead of the liquid. The third trial type served as a control for the behavioral reaction by requiring animals to withhold the movement for a drop of liquid reward. Rewarded withholding of movement may constitute an active behavioral reaction, as opposed to the unrewarded absence of movement (Petrides 1986
). Specific instruction pictures at trial onset indicated both the behavioral reaction and the reinforcer. This task allowed comparisons on trial types along a single dimension, namely reinforcement (rewarded movements vs. unrewarded movements) and behavioral reaction (rewarded execution vs. rewarded withholding of movement). The results obtained were previously presented in abstract form (Hollerman et al. 1994
). The subsequent report describes changes in reward-related activity while animals learned novel instruction pictures within the same task structure (Tremblay et al. 1998
).
METHODS
Abstract
Introduction
Methods
Results
Discussion
References
View larger version (32K):
[in a new window]
FIG. 1.
Behavioral task. The monkey sat with its right hand immobile on the immovable resting key and faced a computer monitor positioned behind a transparent wall in which a nearly transparent lever was mounted centrally. The task consisted of 3 trial types alternating semirandomly. All trials began with a 2-s control period during which the monitor was blank, followed by a 1-s presentation of a fractal instruction picture at monitor center immediately above the lever. After a random delay of 2.5-3.5 s after instruction onset, the red square trigger stimulus appeared at the center of the monitor. In rewarded (top) and unrewarded movement trials (bottom), the trigger elicited the movement and disappeared when the animal touched the lever after release of the resting key, or stayed on for 1.5 s in erroneous trials without key release or lever touch. In rewarded movement trials, a small quantity of liquid reward, and in unrewarded movement trials the reinforcing sound, were presented for 1.5 s after lever touch. In nonmovement trials (middle), the same trigger stimulus was presented for 1.5 s while the animal maintained its hand on the resting key, and liquid reward was delivered after a further 1.5 s.
).
3 dB), and monitored with oscilloscopes and earphones. Somatodendritic discharges of 0.8-1.2 ms duration were discriminated against those originating from fibers using earlier established criteria, in particular the very short durations of fiber impulses (0.1-0.3 ms) (Hellweg et al. 1977
). Data obtained from fiber impulses are not reported. Neuronal discharges were converted into standard digital pulses by means of an adjustable Schmitt-trigger, the output of which was continuously monitored on a digital oscilloscope together with the original waveform.
12 dB at 1 kHz), rectified, monitored on conventional oscilloscopes, and converted into standard digital pulses by a Schmitt-trigger. Horizontal and vertical electrooculograms (EOGs) were collected during neuronal recordings from the implanted periorbital electrodes.
). In each trial, two epochs were defined and the numbers of impulses contained in each epoch were normalized over time and considered as a matched pair. One epoch was the 2-s control period immediately before the instruction; the second epoch consisted of a time window of 250 ms that was moved in steps of 25 ms through the time period of a suspected change. For activations preceding the instruction, the control period was placed individually for each neuron toward trial end at a position without obvious neuronal changes. The Wilcoxon test was performed at each step of 25 ms, using the signed difference from each matched pair over all trials as input. Onset of activation was determined as the midwindow time of the first of seven consecutive steps showing an increase at P < 0.01. Offset of activation was determined in analogy by searching for the loss of statistically significant increase over seven steps. Subsequently, the Wilcoxon test was performed on the total duration between onset and offset of activation (P < 0.005). Neurons not showing an onset of activation or failing in the total duration test were considered as unmodulated. The magnitude of activation was assessed by counting neuronal impulses between onset and offset of activation and expressed as percentage above control period activity. Activations are defined as statistically significant increases of activity in the sliding window procedure. Depressions of activity were difficult to assess objectively because of the low background activity and are not reported.
2 test, and variations along the rostrocaudal extent of striatum were assessed with Spearman correlation analysis.
RESULTS
Abstract
Introduction
Methods
Results
Discussion
References
View this table:
TABLE 1.
Movement parameters in rewarded and unrewarded movement trials
View larger version (28K):
[in a new window]
FIG. 2.
Activity of the extensor digitorum communis muscle of the right arm during rewarded and unrewarded movement trials. A: major differences were seen when the hand returned to the key after reward in rewarded movement trials but before the sound with unrewarded movements, as often observed in monkey A and always in monkey B. B: monkey A occasionally performed the movement similarly in the 2 trial types. Raster dots correspond to rectified activity above a threshold level. Individual trials are presented as horizontal lines, ranked vertically according to reaction time.
View larger version (35K):
[in a new window]
FIG. 3.
Eye movements during performance in the 3 trial types. Each curve in the 2 top parts shows horizontal and vertical eye positions during a single trial, respectively. Eye movements were most consistently observed after instruction onset and offset, as well as at trigger onset in movement trials. Neuronal activity recorded simultaneously with these measurements is presented in Fig. 4A. The polar plots (bottom) show superimposed eye positions during 4 s after instruction onset (10 trials). Top, upward; right, rightward.
View this table:
TABLE 2.
Numbers of striatal neurons differentially influenced by the type of reinforcement
View larger version (36K):
[in a new window]
FIG. 4.
Responses to instructions influenced by reward. A: sustained, preferential response of caudate neuron in rewarded movement trials. B: sustained response of caudate neuron restricted to unrewarded movement trials. C: sustained response of putamen neuron in both rewarded trial types, but absence of response in unrewarded movement trials. Perievent time histograms in A-C are composed of neuronal impulses shown as dots below. Each dot denotes the time of a neuronal impulse, and distances to instruction onset correspond to real time intervals. Each line of dots shows 1 trial. Trials in A-C alternated semirandomly during the experiment and are separated for analysis according to trial type and rearranged according to instruction-trigger intervals. Vertical calibration is 20 impulses/bin for all histograms.
View larger version (36K):
[in a new window]
FIG. 5.
Activations preceding the trigger stimulus influenced by reward. A: selective activation of caudate neuron restricted to rewarded movement trials. B: selective activation of caudate neuron restricted to unrewarded movement trials. This neuron shows additional, separate activations following the instruction and the trigger with similar selectivity. C: selective activation of caudate neuron in both rewarded trial types irrespective of movement. Neuronal activity is referenced to trigger onset, which in movement trials elicited the reaching movement. Reward or sound reinforcement was delivered 1.5 s after lever touch in movement trials, and 3.0 s after the trigger in nonmovement trials. Trials are rank-ordered according to instruction-trigger intervals.
View larger version (41K):
[in a new window]
FIG. 6.
Absence of pretrigger activations in unrewarded movement trials unrelated to reaction time. A: selective activation of ventral striatal neuron occurred in all rewarded movement trials regardless of reaction time (from trigger stimulus to key release) but did not appear in unrewarded movement trials, even when reaction times were within the range observed in rewarded trials. This neuron is typical of all but 1 neuron showing activations selectively preceding the trigger in rewarded movement trials. B: a single putamen neuron showed a relationship between strength of activation and reaction time. Longer reaction times (bottom left) were accompanied by lower pretrigger activations. In this neuron, the longer reaction times with unrewarded as compared with rewarded movements might explain the weaker activations in unrewarded movements. Trials are rank-ordered according to reaction time.
View larger version (14K):
[in a new window]
FIG. 7.
Line graphs showing the timing of neuronal activations preceding the trigger stimulus in the 3 trial types. Individual horizontal lines represent the durations of statistically significant activations of individual striatal neurons. Lines are grouped vertically according to trial selectivities. In each group, lines are rank-ordered according to onset times of activations, starting with the leftmost column. Activations from the same neurons in multiple columns are presented at corresponding horizontal positions. Only selective responses are shown for purpose of clarity.
View larger version (30K):
[in a new window]
FIG. 8.
Activations following the trigger stimulus influenced by reward. A: activation of caudate neuron during movement occurring only in rewarded movement trials. B: activation of caudate neuron during movement restricted to unrewarded movement trials. C: activation of putamen neuron in both movement trial types irrespective of the type of reinforcer. Activation of this neuron showed a better temporal relationship to movement onset (key release) than to the trigger stimulus. Neuronal activity is referenced to trigger onset. The trigger was preceded by 2.5-3.5 s by onset of the instruction stimulus. Movement trials are rank-ordered according to reaction time.
View larger version (13K):
[in a new window]
FIG. 9.
Line graphs showing the timing of neuronal activations following the trigger stimulus in the 3 trial types. Sustained responses had offsets of >1 s after trigger onset (vertical dashed line). Bars below time scales show trigger stimulus durations, with mean values in movement trials. Only selective responses are shown for clarity.
View larger version (26K):
[in a new window]
FIG. 10.
Selective activations preceding reinforcers. A: activation in putamen neuron preceding the delivery of liquid reward in the 2 rewarded trial types but not before the reinforcing sound in unrewarded movement trials. B: activation in caudate neuron preceding liquid reward only in movement trials. C: weak activation in caudate neuron preceding the reinforcing sound in unrewarded movement trials. Neuronal activity is referenced to onset of reinforcement. The trigger was preceded by 2.5-3.5 s by the instruction. Trials are rank-ordered according to trigger-reinforcer intervals.
View larger version (34K):
[in a new window]
FIG. 11.
Selective responses to reinforcers. A and B: transient response in putamen neuron (A) and sustained response in caudate neuron (B) to liquid reward in both rewarded trial types, but absence of response to auditory reinforcer in unrewarded movement trials. C: sustained response in caudate neuron to auditory reinforcer in unrewarded movement trials, but absence of response to reward in both rewarded trial types.
View larger version (50K):
[in a new window]
FIG. 12.
Activations preceding instructions influenced by reward and behavioral reaction. A: activation in caudate neuron following rewarded movement and nonmovement trials but not unrewarded movement trials. Trials are grouped according to preceding trial type. In the task, any rewarded trial could be followed by an unrewarded trial, whereas unrewarded trials were not presented consecutively. A': same neuron as in A, but with trials grouped according to current trial type. Corresponding with activations preceding the instruction for possible unrewarded movements (A), this neuron showed an additional selective response to the instruction in unrewarded movement trials. B: activation in caudate neuron following nonmovement trials but neither movement trial in monkey A. With this animal, nonmovement trials were always followed by a movement trial. B': same neuron as in B, but with trials grouped according to current trial type. Corresponding with activations preceding movement trials (B), this neuron showed additional selective activations during the instruction-trigger interval in both movement trial types. Trials alternated semirandomly during the experiment and are separated for analysis according to previous trial types in A and B and current trial types in A' and B'.
2 test) nor along the rostrocaudal extent (P = 0.06).
View larger version (30K):
[in a new window]
FIG. 13.
Positions of all striatal neurons recorded in the 2 monkeys. Neurons showing task relationships or unmodulated activity are indicated by dots and horizontal lines, respectively. Dashed lines show approximate borders between caudate nucleus, putamen and ventral striatum. Standard coronal sections from the left hemisphere are labeled in rostrocaudal stereotaxic planes according to distances from the interaural line (A18-A25). Cd, caudate nucleus; Put, putamen; VSt, ventral striatum including nucleus accumbens; AC, anterior commissure.
, 1992
; Schultz et al. 1992
). Trigger-related activations in unrewarded movement trials were found significantly more frequently in the head of caudate close to the internal capsule, as compared with other trial types and striatal areas (Fig. 15). Differences were also observed along the rostrocaudal extent of striatum. Neurons at more rostral levels showed significantly more activations preceding the instruction in all three trial types, more sustained reward responses, and more instruction responses in unrewarded movement trials, as compared with more posterior levels (Fig. 16). The remaining response classes lacked statistically significant rostrocaudal heterogeneity.
View larger version (34K):
[in a new window]
FIG. 14.
Positions of neurons with different reward relationships in the 2 monkeys. The 3 parts of striatum are separated schematically (Cd, caudate nucleus; Put, putamen; VST, ventral striatum including nucleus accumbens; AC, anterior commissure). "Reward-influenced" refers to all instruction and trigger responses and to activations preceding the trigger preferentially in rewarded movement trials or in both rewarded trial types. "Prepost-reward" comprises activations preceding the reward and responses to the reward in rewarded trials. "Unrewarded movement" refers to instruction and trigger responses and activations preceding the trigger preferentially in unrewarded movement trials. "Both movement" refers to activations occurring preferentially in both movement trial types. "Mixed" refers to combined activations and selectivities related to different task events.
View larger version (35K):
[in a new window]
FIG. 15.
Regional distributions of different types of task-related changes in the 2 monkeys. There were significantly higher fractions of total reward-related activities (white + hatched + black columns) in the ventral striatum as compared with the other 2 striatal structures (P < 0.001 and P = 0.007; 2 test). Neurons showing trigger-related activations in unrewarded movement trials were found significantly more often in caudate, as compared with the other trial types and the other 2 structures (P = 0.006). The number of task-related changes exceeded the number of neurons because of multiple task relationships. "Sustained" and "transient" refer to the duration of neuronal responses to task events; "preceding" refers to activations preceding task events. PUT, putamen; CD, caudate; VST, ventral striatum; n is number of neurons.
View larger version (25K):
[in a new window]
FIG. 16.
Rostrocaudal distributions of task-related changes in the 2 monkeys. Only significant regional variations are shown. These were found with activations preceding the instructions, transient and sustained responses to instructions, and sustained reward responses. The entire rostrocaudal extent explored in the experiment was subdivided into 4 levels (A18-A19, A20-A21, A22-A23, and A24-A25). P was obtained with a 2 test; R indicates the regression coefficient over the 4 rostrocaudal levels obtained from Spearman's correlation analysis.
DISCUSSION
Abstract
Introduction
Methods
Results
Discussion
References
).
; Apicella et al. 1992
; Brown et al. 1995
; Hikosaka et al. 1989b
,c
; Schultz and Romo 1992
). Comparable instruction responses in trials with liquid reward as opposed to no reinforcer were found in prefrontal cortex (Watanabe 1990
). A similar reward influence was seen on activations during the instruction-trigger delay, which probably reflected the preparation of movement or the expectation of a movement-triggering stimulus. Previous studies related such activations to parameters of upcoming movements (Alexander and Crutcher 1990
; Hikosaka et al. 1989a
), to execution versus withholding of movement (Apicella et al. 1992
), and to stimulus-triggered versus self-initiated movements (Schultz and Romo 1992
). Thus the presently studied anterior striatal neurons rarely reflected the preparation of movement irrespective of the type of reinforcer.
; Gardiner and Nelson 1992
; Montgomery and Buchholz 1991
; Rolls et al. 1983
; Romo et al. 1992
). However, several posttrigger activations occurred in both movement trials irrespective of the type of reinforcer. In addition, the reward-related differences in some of the posttrigger activations may have reflected the differences in movement parameters associated with the two reinforcers rather than different reinforcers themselves. The posttrigger and movement-related activations were the most likely among all activations to be influenced by motor aspects of the task and the least likely to reflect the type of reinforcer.
), were related to regularly alternating trial types (Hikosaka et al. 1989c
), or depended on the employed dimensions of discriminations (Sakagami and Niki 1994
). Some of the present preinstruction activations occurred preferentially after all rewarded trials. They could be related to the absence of liquid reward in an upcoming unrewarded movement trial that could follow a rewarded trial. Thus they appeared to be related to the expectation of no reward, in contrast to most of the other task-related activations that were stronger when reward was expected. Other preinstruction activations occurred preferentially after nonmovement trials and could be related to an upcoming movement trial.
; Hikosaka et al. 1989c
; Schultz et al. 1992
). Some of the present prereward activations were restricted to movement or nonmovement trials and thus reflected also the preceding behavioral reaction, reminiscent of prereward activations differentiating between arm and eye movements (Hikosaka et al. 1989c
). However, the main result concerning the present activations was their predominant restriction to trials reinforced by liquid rather than conditioned sound. This suggests a relationship to the "appetitive weight" of the reinforcer and not to information about correct task performance or trial end contained in the reinforcer.
). In addition, responses to the same instructions varied systematically during learning when their reward prediction changed. Thus the trial selectivities were more likely due to differences of reinforcement than visual features. Neurons coding predominantly visual features of stimuli irrespective of behavioral significance were found in the tail of caudate (Brown et al. 1995
; Johnstone and Rolls 1990
).
). Although learning situations with behavioral errors and erroneous reward expectations are usually accompanied by increased arousal, most task relationships showed less differences between learning and familiar trials than between rewarded and unrewarded movement trials. Taken together, the observed influences of reward on neuronal activations in the anterior striatum were unlikely due to arousal mechanisms.
). Neurons in the amygdala displayed reinforcer-specific appetitive responses to noninstructional visual and auditory stimuli (Nishijo et al. 1988
). More comparable with the present results, neurons in dorsolateral prefrontal cortex showed reward-dependent instruction responses (Watanabe 1990
, 1992
), as well as food and liquid reward-related sustained activity in a spatial delayed response task (Watanabe 1996
). In the same task as presently, orbitofrontal neurons responded differentially to instructions predicting primary liquid reward or conditioned auditory reinforcement. However, they failed to discriminate between behavioral reactions and rarely showed relationships to movement preparation, trigger stimuli, and movement execution (Tremblay and Schultz 1995
).
; Eblen and Graybiel 1995
; Haber et al. 1995
; Künzle 1975
; Selemon and Goldman-Rakic 1985
; Yeterian and Pandya 1991
), and the amygdala projects mainly to ventral parts of striatum (Russchen et al. 1985
). Despite the notion of segregated, parallel corticobasal ganglia loops (Alexander et al. 1986
), individual subsystems may show degrees of ordered convergence at the high numbers of synaptic inputs to medium spiny striatal neurons (Eblen and Graybiel 1995
; Flaherty and Graybiel 1993
; Groves et al. 1995
; Parthasarathy and Graybiel 1992; Percheron et al. 1984
).
; Fuster 1973
; Komatsu 1982
; Kubota et al. 1974
; Nakamura et al. 1992
; Romo and Schultz 1992
; Watanabe 1986a
,b
; Weinrich and Wise 1982
; for references on posttrigger activations, see Romo et al. 1992
).
; Markowitsch and Pritzel 1976
; Nakano et al. 1987
; Niki and Watanabe 1979
; Nishijo et al. 1988
; Rosenkilde et al. 1981
; Thorpe et al. 1983
; Tremblay and Schultz 1995
), but activations during the expectation of immediate reward were also described (Komatsu 1982
; Tremblay and Schultz 1995
). These activations, particularly those related to the expectation of reward, might mediate the reward influence on striatal behavior-related activations. However, it is unclear how exactly such convergence could lead to reward influences several seconds before the rewards.
; Schultz and Romo 1990
; Schultz et al. 1993
). A reward influence by dopamine neurons on transient and sustained striatal activations related to instructions, movement preparation, trigger stimuli, and movement execution would in most cases require a prolonged facilitatory action of phasically released dopamine on behavior-related striatal activity, which is presently rather hypothetical.
![]() |
ACKNOWLEDGEMENTS |
---|
We thank M. Watanabe for helpful comments on the text and B. Aebischer, J. Corpataux, A. Gaillard, A. Pisani, A. Schwarz, and F. Tinguely for expert technical assistance.
This study was supported by the Swiss National Science Foundation (Grants 31-28591.90, 31.43331.95, and NFP38.4038-43997), the Roche Research Foundation, Switzerland, postdoctoral fellowships from the National Institute of Mental Health, MH-10282 to J. R. Hollerman, and the Fondation pour la Recherche Scientifique of Quebec to L. Tremblay.
Present addresses: J. Hollerman, Dept. of Psychology, Allegheny College, Meadville, PA 16335; L. Tremblay, INSERM Unit 289, Hôpital de la Salpetriére, 47 Boulevard de l'Hôpital, F-75651 Paris, France.
![]() |
FOOTNOTES |
---|
Address reprint requests to W. Schultz.
Received 11 February 1998; accepted in final form 9 April 1998.
![]() |
REFERENCES |
---|
![]() ![]() ![]() ![]() ![]() ![]() |
---|