Abstract
The basal ganglia are thought to play a crucial role in reinforcement learning. Central to the learning mechanism are dopamine (DA) D1 and D2 receptors located in the cortico-striatal synapses. However, it is still unclear how this DA-mediated synaptic plasticity is deployed and coordinated during reward-contingent behavioral changes. Here we propose a computational model of reinforcement learning that uses different thresholds of D1- and D2-mediated synaptic plasticity which are antagonized by DA-independent synaptic plasticity. A phasic increase in DA release caused by a larger-than-expected reward induces long-term potentiation (LTP) in the direct pathway, whereas a phasic decrease in DA release caused by a smaller-than-expected reward induces a cessation of long-term depression, leading to LTP in the indirect pathway. This learning mechanism can explain the robust behavioral adaptation observed in a location-reward-value-association task where the animal makes shorter latency saccades to reward locations. The changes in saccade latency become quicker as the monkey becomes more experienced. This behavior can be explained by a switching mechanism which activates the cortico-striatal circuit selectively. Our model also shows how D1- or D2-receptor blocking experiments affect selectively either reward or no-reward trials. The proposed mechanisms also explain the behavioral changes in Parkinson's disease.
Introduction
Many of our skillful daily actions are a result of constant positive and negative reinforcements. It is postulated that the basal ganglia (BG) contribute to this kind of reinforcement learning (see Hikosaka et al., for a review). Accordingly, reward-related activities have been observed in most of the BG components including dorsal striatum (Hikosaka et al., ; Apicella et al., ; Kawagoe et al., ; Lauwereyns et al., ; Costa et al., ; Samejima et al., ; Oyama et al., ), ventral striatum (Schultz et al., ; Kalenscher et al., ), subthalamic nucleus (STN; Darbaky et al., ), and even along the border region of the globus pallidus (GPb; DeLong, ; Hong and Hikosaka, ). Consequently, an insult in the BG, such as Parkinson's disease (PD), severely affects the patient's learning ability (Frank et al., ; Ell et al., ; Voon et al., ).
Previous studies gave an insight how the BG may contribute to this kind of learning. Particularly, neurons in the caudate nucleus (CD; part of the striatum) flexibly encode visual cues that predict different amounts or probabilities of reward (Apicella et al., ; Kawagoe et al., ; Lauwereyns et al., ; Samejima et al., ). For example, when a monkey performs a visually guided saccade task with positionally biased reward outcomes, called the “one direction reward (1DR)” task (Figure 1A), many CD neurons respond to a visual cue and the responses are often enhanced (and occasionally depressed) when the cue indicates a larger-than-average amount of reward during a block of trials. Also, there was a tight block-to-block correlation between the changes in CD neuronal activity preceding target onset and the changes in saccade latency (Lauwereyns et al., ). This relatively rapid modulation of CD neuronal activity seems to reflect a mechanism underlying reward-based learning. It has thus been hypothesized that these neuronal changes in the BG facilitate the eye movements to reward (Hikosaka et al., ).
Figure 1
It has also been shown that dopamine (DA) plays a crucial role in learning in the BG. Phasic DA signals, in particular, have been hypothesized to cause reinforcement learning (Montague et al.,
However, there is evidence that DA has complex effects on BG neurons during reinforcement learning, including different effects on different BG pathways. In the BG there are two anatomically distinct pathways: the “direct” pathway whose striatal neurons have abundant D1 receptors, and the “indirect” pathway whose striatal neurons have abundant D2 receptors (Deng et al.,
Numerous studies have examined the influence of the D1 and D2 mediated processes in the BG on animal learning behavior (e.g., Frank et al.,
In addition to the quantitative behavioral data, we have accumulated a rich set of data on the neuronal activity in many brain areas in the BG that relay visuo-oculomotor information including the CD, substantia nigra pars reticulata (SNr), STN, globus pallidus external segment (GPe), superior colliculus (SC), as well as frontal cortical areas (see Hikosaka et al.,
In the following we propose a formal version of our theory of BG, where the BG “orients” the eyes to reward (Hikosaka,
Materials and Methods
Implementation of the model
We examined the possibility that the plasticity mediated by the DA actions on direct pathway MSNs and indirect pathway MSNs are responsible for the observed saccadic latency changes in normal and Parkinsonian monkeys. The model circuit was implemented with cell membrane differential equations (see Appendix) in Visual C++ using a PC. Our model implements only half of the hemisphere of the brain. This is because, during the left-ward saccades, for example, the right part of the BG is assumed to be active in learning because of the prevalent frontal eye filed (FEF)-to-striatum activation in the right hemisphere. This permits the striatal learning on the right side of the brain while the left side is not being affected. Because the 1DR alternates the left-side-reward and right-side-reward blocks of trials, in a symmetrical way, implementing one side with alternating blocks could represent the learning processes happening in both sides of the brain. Below, we will describe the basic architecture of the model, including how it generates saccades and how it is modulated by DA-independent and DA-dependent synaptic plasticity. For full details of the model equations, see the Appendix.
One direction rewarded task
Our model simulates the data from 1DR task. In the task, a visual target was presented randomly on the left or right, and the monkey had to make a saccade to it immediately. Correct saccades were signaled by a tone stimulus after the saccade. Saccades to one position were rewarded, whereas saccades to the other position were not rewarded. The rewarded position was the same in a block of 20–30 consecutive trials and was then changed to the other position abruptly for the next block with no external instruction. Thus, the target instructed the saccade direction and also indicated the presence or absence of the upcoming reward.
While the monkey was performing 1DR task, the latency was consistently shorter for the saccade to reward target than for the saccade to no-reward target. Such a bias evolved gradually becoming more apparent as trials progressed (Figures 1B and 3B,E). The slow change in saccadic latency was particularly evident initially (Figure 3B). After experiencing 1DR task extensively the monkey became able to switch the bias rapidly (Figure 3E; Takikawa et al.,
Learning in the basal ganglia
Figure 2A shows neuronal circuits in and around the BG included in our model. In the BG, there are two opposing pathways: (1) direct pathway which facilitates movement initiation and is under the control of D1 DA receptors, and (2) indirect pathway which suppresses movement initiation and is under the control of D2 DA receptors (Kravitz et al.,
Figure 2

Dopamine-mediated learning mechanisms in the striatum. DA-dependent LTP and LTD are labeled in red; DA-independent LTP and LTD are in blue. (A) Null state, where free saccades happen with no-task. Indirect pathway MSNs, which express D2 receptors, show weak DA-dependent LTD and DA-independent LTP. Direct pathway MSNs, which express D1 receptors, show DA-independent LTD. (B) Hypothesized D1 and D2 thresholds in relation to the levels of DA during big-reward and no (or small) reward trials. (C) In big-reward trials, the increased level of DA causes DA-dependent LTP in the direct pathway and enhances DA-dependent LTD in the indirect pathway. (D) In no-reward trials, the decreased level of DA causes an attenuation of DA-dependent LTD in the indirect pathway. The changes in DA level are assumed to coincide with the activation of the cortical input and the activation of the connected MSN neuron to enable the DA-dependent LTP and LTD (see the eligibility traces in Eqs 6–8). In both cases (C and D) DA-independent LTD in the direct pathway and DA-independent LTP in the indirect pathway remain unchanged, but their effects become relatively weak in big-reward trials (C) and relatively strong in no-reward trials (D). The red arrows indicate the amplitudes (large: 2 arrows, small: 1 arrow) and directions (up, down) of the change of neural activity compared to the null state shown in (A). Note, that even though the DA level in figure (A) and (D) are both under the threshold, LTD in the case of figure (D) happens more vigorously because of enabled eligibility (Eqs 6–8). The thickness of the connections indicates the resulting output in each module. Black and open circles indicate inhibitory and excitatory neurons, respectively. In (C) and (D), the GPb–LHb–SNc circuit is omitted for clarity. LTP/LTD, long-term potentiation/depression; GPb, border region of globus pallidus; LHb, lateral habenula.
Following the findings by Shen et al. (
We define “DA-dependent synaptic plasticity” as synaptic changes facilitated by over-the-threshold DA level. This situation occurs mostly during positive learning experience when DA neurons burst phasically, notably due to changes in reward expectation (Figure 2C). In contrast, the DA-independent synaptic plasticity happens as an opposing process constantly antagonizing the “DA-dependent synaptic plasticity,” and becoming prominent whenever “DA-dependent synaptic plasticity” loses its strength. For this reason, DA-independent synaptic plasticity acts as a “forgetting” mechanism.
Figure 2A shows the BG circuit of the model in its no-task (null) state where the subject makes saccades without any DA modulation. It has been shown that DA affinity is higher for D2 receptors than for D1 receptors (Richfield et al.,
When the animal detects a signal indicating an upcoming reward, DA neurons exhibit a short burst of spikes (Eq. 17), causing a phasic increase in the concentration of DA in the CD which temporarily exceeds the threshold of D1 receptors (Figure 2C). This phasic elevation of DA concentration, together with co-occurrence of pre- and post-synaptic activations, leads to the emergence of LTP in the direct pathway and an enhancement of LTD in the indirect pathway. Following the DA-induced changes in either the direct or indirect pathway, SNr neurons are inhibited and therefore SC neurons are activated (through disinhibition), leading to the facilitation of the saccade toward the target (Figure 2C). The changes in activity through the direct or indirect pathway are illustrated by the directions of arrows (upward: increase, downward: decrease). Note that the direction of arrows remains unchanged after an excitatory connection (shown by open “cell body” with an arrow ending, as in STN–SNr connection), but reverses after an inhibitory connection (filled “cell body” with a rectangular ending, as in CD–GPe connection).
When the animal detects a signal of no-reward, the level of DA in CD will go below the threshold of D2 receptors (Figure 2D). In the indirect pathway this leads to an attenuation of LTD leaving DA-independent LTP intact. In the direct pathway, this leads to only DA-independent LTD. As a result, the activity of SNr neurons increases and the saccadic eye movement toward the target is suppressed (Figure 2D).
Switching mechanism
After experiencing 1DR task extensively the monkey became able to switch the saccade latency bias more rapidly (Takikawa et al.,
We hypothesize that this rapid switching is enabled by a population of neurons on each side of the hemisphere which becomes active when a reward is available on the contralateral side but not on the ipsilateral side. Such neurons, which we hereafter call “reward-category neurons,” are assumed to have excitatory connections to neurons in the FEF and to the direct pathway MSNs on the same side. This assumption is based on our previous findings: presumed projection neurons in the CD (Lauwereyns et al.,
The model implements the reward-category neurons, tentatively, as a module in the cerebral cortex, as illustrated in Figure 3D. When a reward is expected on the left side, for example, the reward-category neurons in the right cortex (red circle in Figure 3D; CgRWD in Eq. 4 in Appendix) will ramp-up their activity before the execution of a saccade. This will excite the right FEF neuron and direct pathway MSNs, therefore boosting the activity of these neurons. Note that there will be no boost of activity in the FEF and MSNs in the left (ipsilateral) hemisphere. Due to this construction, the striatum receives strong cortal inputs boosted by the excitatory reward-category neurons only during contralateral reward trials. The reward-category activity also affects the SC directly via the FEF–SC excitatory connection (Figure 2) making the SC react more rapidly during reward trials (Ikeda and Hikosaka,
Figure 3

Experience-dependent emergence of a switching mechanism that allows rapid changes of saccade latency in response to the change in reward location: before (A–C) and after (D–F) sufficient experience of the 1DR task. We hypothesize the presence of “reward-category neurons” (RWD), a key driver of the switching, that have excitatory connections to FEF neurons and direct pathway MSNs in the CD in the same hemisphere. They would become active before target onset selectively when a reward is expected on the contralateral side (see Figure 4), an assumption based on experimental observations of neuronal activity in the FEF, CD, SNr, and SC. Before sufficient experience of the 1DR task (A–C), the saccade latency changes gradually in both the small-to-big-reward transition [red in (B,C)] and the big-to-small-reward transition [blue in (B,C)] similarly by experimental observation (B) and computer simulation (C). The saccade latency data in (B) is from monkeys C, D, and T. After sufficient experience of the 1DR task (D–F), the saccade latency changes quickly as shown in experiments (E) and computer simulation (F). This is mainly due to the additional excitatory input from the reward-category neurons. Note, however, that the decrease in saccade latency in the small-to-big-reward transition [red in (E,F)] is quicker than the increase in saccade latency in the big-to-small-reward transition [blue in (E,F)]. This asymmetry is due to the asymmetric learning algorithm operated by two parallel circuits in the basal ganglia illustrated in Figure 2. Figure (E) from Matsumoto and Hikosaka (
In the following, we first simulate the eye movements in the 1DR showing the baseline performance of the model. Next, we simulate the influence of D1 and D2 antagonist injections in the CD showing how the DA-mediated learning leads to behavioral manifestation. The simulation results for PD are presented to show the potential application of our model to understanding neurological disorders.
Results
Simulation of saccade latency in the 1DR task
In one block of trials in the 1DR task a saccade to a given target is followed by a reward, and in the next block of trials the saccade to the same target is followed by no-reward (Figure 1A). Hence, in each block of trials the monkey learns a new position-reward association, and the learning is evidenced as changes in the saccade reaction time (or latency): decrease in saccade latency for the rewarded target and increase in saccade latency for the unrewarded target (Figures 3B,E). The changes in saccade latency became quicker as the monkey experienced 1DR task extensively (compare Figure 3B and Figure 3E; Takikawa et al.,
Our model simulates these changes in saccade latency reasonably well (Figures 3C,F).
In the early stage of the monkey's experience with the 1DR task, the saccade latency decreased gradually after a small-to-big-reward transition and increased gradually after a big-to-small-reward transition (Figure 3B). These slow changes in saccade latency are simulated by the model (Figure 3C) by assuming that there is no-reward-category activity (Figure 3A), which would act as a switching mechanism. In other words, these changes in saccade latency, at this stage, are controlled solely by the striatal plasticity mechanisms which are described in the Section “Learning in the Basal Ganglia.”
After sufficient experience with the 1DR task, the changes in saccade latency occur more quickly (Figure 3E). This was simulated by assuming the emergence of reward-category neurons which, before the target comes on, exert an excitation on FEF neurons as well as on the direct pathway MSNs when a reward is expected on the contralateral side (see Switching Mechanism).
The performance of our model in an advanced stage of learning (Figure 3D) is illustrated in Figure 4. Our model combines two kinds of neuronal mechanisms: (1) learning in the BG (i.e., plasticity at cortico-striatal synapses), and (2) switching mechanism (i.e., reward-category activity). Here, the activity of individual neurons (or brain areas) is compared between two reward contexts: a contralateral saccade is followed by a reward (Figure 4A) and no-reward (Figure 4B). Only the contralateral saccade is considered because the neuronal network simulates one hemisphere and is assumed to control only contralateral saccades.
Figure 4

Simulated neural components of the model performing reward and no-reward trials of 1DR task. In reward trials (A) the reward-category unit (REW category) ramps up its activity shortly after the presentation of the fixation point. The activity shuts off in response to the burst activity of DA unit (DA) signaling the reward value of the target. The FEF unit combines the tonic reward-category activity and the phasic target signal. In the BG, both the direct pathway MSN unit (D1) and the indirect pathway MSN unit (D2) receive an input from the FEF. The direct pathway MSN unit (D1), in addition, receives an input directly from the reward-category unit and therefore shows larger ramping activity than the indirect pathway MSN unit (D2). The activity of the direct pathway MSN unit (D1) is further enhanced by DA-dependent LTP, which is triggered by the DA burst, and mediated by D1 receptors. This results in a stronger disinhibition of the SC by the SNr leading to a stronger activity in the SC. In contrast, the activity of indirect pathway MSN unit (D2) is further depressed by DA-dependent LTD, which is triggered by the DA burst, and mediated by D2 receptors. This results in the suppression of the excitatory input from the STN to the SNr, further enhancing the SC activity. The combined effects from the direct and indirect pathways lead to a shorter latency saccade (see the arrow head on top, indicating the time of saccade initiation). In no-reward trials (B) the activity of the reward-category unit is much weaker, thus lowering the activity of the FEF unit and the direct pathway MSN unit (D1). The activity of the direct pathway MSN unit (D1) is further depressed by DA-independent LTD. In contrast, the activity of D2 MSN increases because DA-dependent LTD is attenuated due to the “pause” of DA activity (DA) and thus is dominated by DA-independent LTP. The combined effects from the direct and indirect pathways lead to a weaker activation of the SC unit and hence a longer latency saccade. The scale of all the ordinate axes is from 0 to 1.
According to our model, the learning in the BG controls, mainly, the phasic response component to target onset. The response of direct pathway MSNs (D1) to the post-target input from the FEF increases when the contralateral saccades were rewarded repeatedly (Figure 4A); this is mainly due to the development of DA-dependent LTP at the corticostriatal synapses. In contrast, the response decreases when the contralateral saccades were unrewarded repeatedly (Figure 4B); this is mainly due to the development of DA-independent LTD at the corticostriatal synapses. Such reward-facilitated visual responses in CD neurons have been reported repeatedly using 1DR task (Kawagoe et al.,
Roughly opposite effects occur through the indirect pathway. The response of indirect pathway MSNs (D2) to the post-target input from the FEF decreases when the contralateral saccades were rewarded repeatedly, mainly due to the development of DA-dependent LTD at the corticostriatal synapses (Figure 4A). In contrast, the response increases when the contralateral saccades were unrewarded repeatedly, mainly due to the development of DA-independent LTP at the corticostriatal synapses (Figure 4B). Such reward-suppressed visual responses in CD neurons have been reported (Kawagoe et al.,
The effects of the switching mechanism mainly lead to tonic changes in neuronal activity before target onset. When a reward is expected on the contralateral side, the reward-category neurons (RWD category in Figure 4) ramp-up their activity shortly after the presentation of a fixation point (Figure 4A). FEF neurons (FEF in Figure 4) receive excitatory input from the reward-category neurons in addition to a phasic excitatory input encoding the onset of the target (Ding and Hikosaka,
In summary, the learning mechanism and the switching mechanism, when working together, enable quick adaptation of oculomotor behavior depending on expected reward. It is important to note that the two mechanisms interact in a mutually facilitatory manner. First, the reward-category activity facilitates the development of DA-dependent LTP in direct pathway MSNs (Figure 4A) because it increases the likelihood of the co-occurrence of the pre-synaptic activity (i.e., FEF activity) and the post-synaptic activity (i.e., MSN activity) which is thought (and here assumed) to be a pre-requisite of this type of LTP (Wickens,
Influence of D1 antagonist on saccadic latency
Our computational model has simulated reward-dependent oculomotor behavior successfully. Central to our model is the DA-dependent plasticity at the cortico-striatal synapses. Therefore, experimental manipulations of DA transmission in the striatum could provide critical tests of our model. Such experiments were done by Nakamura and Hikosaka (
After the D1 antagonist injection, latency increased for the saccades made toward the reward position without affecting the saccades toward the no-reward position (Figure 5C left). Simulation results correctly follow this trend (Figure 5C right). As explained above (Figure 2B) the model assumes that, in a normal condition, the threshold for D1 receptor activation (hereafter called “D1 threshold”) is above the default concentration of DA level in the CD, and the threshold for D2 receptor activation (hereafter called “D2 threshold”) is below it. After the injection of the D1 antagonist, the D1 threshold increases significantly while the D2 threshold remains unchanged (compare Figure 5D with Figure 2C). This leads to a selective suppression of DA-dependent LTP in the direct pathway which would be triggered by a phasic increase of DA concentration in reward trials (Figure 5D). In consequence, the activation of direct pathway MSNs by the reward-predicting visual input becomes weaker. In turn, this causes SNr neurons to be less inhibited, SC saccadic neurons to be less disinhibited, and saccades to occur at longer latencies.
Figure 5

Influence of D1 antagonist on saccadic latency. (A) Trial-by-trial changes in the latency of contralateral saccades, before (black) and after (red) injection of a D1 antagonist into the CD. Data are from Nakamura and Hikosaka (
In the case of no-reward trials, the situation remains unchanged after D1 antagonist injection because the D1 threshold, while elevated by the D1 antagonist, remains higher than the DA concentration (compare Figure 5E with Figure 2D) and the D1 antagonist does not affect the D2 threshold. Consequently, the saccade latency remains unchanged in no-reward trials (Figure 5C right), similar to the experimental data (Figure 5C left).
In the preceding section we showed that our model can simulate the time course of saccade latency changes during the 1DR task (Figure 3). As seen in Figure 5A the D1 antagonist injection in the CD alters the saccade latency over time and our model simulates this change (Figure 5B).
Influence of D2 antagonist on saccadic latency
In contrast, after the D2 antagonist injection in the CD, the saccadic latency increased selectively in no-reward trials (Nakamura and Hikosaka,
Figure 6

Influence of D2 antagonist on saccadic latency. (A) Trial-by-trial changes in the latency of contralateral saccades, before (black) and after (blue) injection of a D2 antagonist into the CD. Data are from Nakamura and Hikosaka (
Disrupted plasticity mechanisms in parkinsonian subjects
Our model predicts altered reward-related learning in PD subjects. We first modeled the changes in synaptic plasticity that occur during PD. In animal models of PD, the synaptic plasticity of the BG is disrupted (Figure 7A) such that LTP is induced in indirect pathway MSNs (green dots) and LTD is induced in direct pathway MSNs (purple dots) after stimulation protocols that normally induce LTD and LTP, respectively (Shen et al.,
Figure 7

Simulation of disrupted plasticity in Parkinson's disease (PD). (A) Disrupted plasticity in MSNs (green and purple dots) in a rat PD model. When input stimulation was followed by excitation of a MSN repeatedly, the response of the MSN to the input changed gradually, in the directions opposite to control subjects. Data are from Shen et al. (
Given these assumptions, our model predicts that direct pathway MSNs undergo LTD during either reward or no-reward trials (orange curves in Figure 7C). This is because DA-dependent LTP, which is rendered minimal due to the low DA level, is dominated by DA-independent LTD. In reward trials, however, the slight increase in the DA level can trigger weak LTP because the D1 threshold is lowered due to hypersensitivity (Figure 7B). As a consequence, the net LTD is bigger after no-reward trials than reward trials (orange curves in Figure 7C).
An opposite reaction occurs in indirect pathway MSNs. They undergo LTP in either reward or no-reward trials (black curves in Figure 7C) because DA-dependent LTD, which is rendered minimal due to the low DA level, is dominated by DA-independent LTP. In reward trials, however, the slight increase in the DA level can trigger weak LTD because the D2 threshold is lowered due to hypersensitivity (Figure 7B). In consequence, the net LTP is is bigger after no-reward trials than reward trials (black curves in Figure 7C).
Our model predicts that these changes in synaptic plasticity would cause several changes in the pattern of behavior during the 1DR task (Figure 7D). The results indicate that in reward trials the saccadic latency in the PD subject (red curve in Figure 7D) is longer than in the normal subject (red curve in Figure 3E). The saccade latencies during no-reward trials are even more sluggish as shown by the blue curve in Figure 7D. Interestingly, while both latencies are longer than those of normal subjects, the latencies during reward trials are still shorter than those in no-reward trials in PD patients. This means that even with the reversed directions of plasticity, the subjects show correct direction of learning.
Our model also predicts the impact of l-DOPA in the PD subject. Figure 7B illustrates the hypothesized learning situation in PD with l-DOPA, showing the elevated DA level (green trace) that enables positive reinforcement learning with the assistance of increased sensitivity (e.g., Gerfen,
Discussion
This study explored the possible neuronal mechanisms underlying adaptive changes in oculomotor behavior in response to the change of reward locations. We did so by constructing a computational model and simulating animal's normal and experimentally manipulated behaviors. Our model, which combines a learning mechanism and a switching mechanism in the cortico-striatal circuit, simulates experimental results obtained using a saccade task with positional reward bias (1DR task) reasonably well. In the following we discuss possible physiological mechanisms presumed to be the bases of these phenomena, as well as, the limitations of our model.
Neural correlates of reinforcement learning in BG
Basal ganglia are well known for their involvement in motor and cognitive functions. It is also known that many neurons in the BG are sensitive to expectation of reward (see Hikosaka et al.,
Plasticity mechanisms in direct and indirect pathways
We have constructed a model that implements a lumped LTP/LTD, which simplifies underlying complicated intracellular processes. Here we discuss some probable mechanisms, underlying these synaptic changes. The mechanisms of the synaptic plasticity in the BG have been studied extensively, yet there are conflicting experimental results (for reference, see Calabresi et al.,
In indirect pathway MSNs, D2 receptor activation is known to promote dephosphorylation processes in a variety of channels including AMPA and NMDA and Na+ channels by suppressing adenylyl cyclase. It has also been reported that DA-independent LTP (or repotentiation) happens in indirect pathway MSNs when the afferents are stimulated with a following post-synaptic depolarization (Shen et al.,
In direct pathway MSNs, D1 receptor activation by DA induces LTP by stimulating adenylyl cyclase therefore promoting phosphorylation processes of a variety of channels, such as AMPA and NMDA and Na+ channels. Note that D1 and D2 receptors target the same chemical agent, adenylyl cyclase, in opposite ways. (Picconi et al.,
These LTP and LTD processes in direct and indirect pathway MSNs seem to depend, directly or indirectly, on the level of DA in the BG. For example, when DA was depleted, the direction of plasticity changed dramatically: direct pathway MSNs showed only LTD and indirect pathway MSNs showed only LTP regardless of the protocol used (Shen et al.,
Dopamine hypotheses of reinforcement learning and behavior
The simulation results of our model predict that while having significant learning deficit, PD patients still show some learning, consistent with the literature (e.g., Behrman et al.,
Our model also predicts the impact of l-DOPA in the PD subject. As Figure 7D shows, the simulated PD subject with l-DOPA shortens the saccadic latency compared to the non-medicated counterpart, consistent with previous reports (Highstein et al.,
It was reported that, compared to normal subjects, PD subjects on l-DOPA medication are better in positive learning and worse in negative learning, and that PD subjects off medication are better in negative learning and worse in positive learning (Frank et al.,
One interesting question arises in our model: Why are there two pathways (the direct and indirect pathways) in the BG even though their jobs could apparently be done by just one pathway? It is possible that the two pathways exist to flexibly control the output of the BG. In other words, while many situations require cooperative operations of the direct and indirect pathways, some other situations may call for separate operations of these two pathways. For example, if an animal meets a conflicting situation, such as, food is in sight while a predator is also nearby, an indirect-pathway-specific “no go” command may save the animal from the recklessly daring situation. Another possible benefit of having two separate pathways comes from the connectional anatomy of the BG. In the rat, the indirect pathway of the BG receives a majority of its inputs from neurons in deep layers of the cerebral cortex which also project to the motoneurons in the spinal cord, whereas the direct pathway receives a majority of its inputs from neurons in the intermediate layers of the cerebral cortex, some of whose axons also contact contralateral BG (Lei et al.,
Statements
Acknowledgments
We are grateful to M. Isoda, L. Ding for providing data (monkey T and D, respectively), C. R. Hansen, E. S. Bromberg-Martin for helpful comments. This work was supported by the intramural research program of the National Eye Institute.
Conflict of interest
The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
References
1
ApicellaP.ScarnatiE.LjungbergT.SchultzW. (1992). Neuronal activity in monkey striatum related to the expectation of predictable environmental events. J. Neurophysiol.68, 945–960.
2
BehrmanA. L.CauraughJ. H.LightK. E. (2000). Practice as an intervention to improve speeded motor performance and motor learning in Parkinson's disease. J. Neurol. Sci.174, 127–136.
3
BreitensteinC.KorsukewitzC.FloelA.KretzschmarT.DiederichK.KnechtS. (2006). Tonic dopaminergic stimulation impairs associative learning in healthy subjects. Neuropsychopharmacology31, 2552–2564.10.1038/sj.npp.1301167
4
Bromberg-MartinE. S.MatsumotoM.HongS.HikosakaO. (2010). A pallidus-habenula-dopamine pathway signals inferred stimulus values. J. Neurophysiol.104, 1068–1076.
5
BrownJ. W.BullockD.GrossbergS. (2004). How laminar frontal cortex and basal ganglia circuits interact to control planned and reactive saccades. Neural Netw.17, 471–510.10.1016/j.neunet.2003.08.006
6
CalabresiP.PicconiB.TozziA.Di FilippoM. (2007). Dopamine-mediated regulation of corticostriatal synaptic plasticity. Trends Neurosci.30, 211–219.10.1016/j.tins.2007.03.001
7
CostaR. M.CohenD.NicolelisM. A. (2004). Differential corticostriatal plasticity during fast and slow motor skill learning in mice. Curr. Biol.14, 1124–1134.
8
CunningtonR.LalouschekW.DirnbergerG.WallaP.LindingerG.AsenbaumS.BruckeT.LangW. (2001). A medial to lateral shift in pre-movement cortical activity in hemi-Parkinson's disease. Clin. Neurophysiol.112, 608–618.10.1016/S1388-2457(01)00467-9
9
DarbakyY.BaunezC.ArecchiP.LegalletE.ApicellaP. (2005). Reward-related neuronal activity in the subthalamic nucleus of the monkey. Neuroreport16, 1241–1244.10.1097/00001756-200508010-00022
10
DeLongM. R. (1971). Activity of pallidal neurons during movement. J. Neurophysiol.34, 414–427.
11
DengY. P.LeiW. L.ReinerA. (2006). Differential perikaryal localization in rats of D1 and D2 dopamine receptors on striatal projection neuron types identified by retrograde labeling. J. Chem. Neuroanat.32, 101–116.10.1016/j.jchemneu.2006.07.001
12
DingL.HikosakaO. (2006). Comparison of reward modulation in the frontal eye field and caudate of the macaque. J. Neurosci.26, 6695–6703.10.1523/JNEUROSCI.0836-06.2006
13
EllS. W.WeinsteinA.IvryR. B. (2010). Rule-based categorization deficits in focal basal ganglia lesion and Parkinson's disease patients. Neuropsychologia48, 2974–2986.10.1016/j.neuropsychologia.2010.06.006
14
FearnleyJ. M.LeesA. J. (1991). Ageing and Parkinson's disease: substantia nigra regional selectivity. Brain114(Pt 5), 2283–2301.
15
FinoE.GlowinskiJ.VenanceL. (2005). Bidirectional activity-dependent plasticity at corticostriatal synapses. J. Neurosci.25, 11279–11287.10.1523/JNEUROSCI.4476-05.2005
16
FrankM. J.SamantaJ.MoustafaA. A.ShermanS. J. (2007). Hold your horses: impulsivity, deep brain stimulation, and medication in Parkinsonism. Science318, 1309–1312.
17
FrankM. J.SeebergerL. C.O'ReillyR. C. (2004). By carrot or by stick: cognitive reinforcement learning in Parkinsonism. Science306, 1940–1943.10.1126/science.1102941
18
GerdemanG. L.RonesiJ.LovingerD. M. (2002). Postsynaptic endocannabinoid release is critical to long-term depression in the striatum. Nat. Neurosci.5, 446–451.
19
GerfenC. R. (2003). D1 dopamine receptor supersensitivity in the dopamine-depleted striatum animal model of Parkinson's disease. Neuroscientist9, 455–462.10.1177/1073858403255839
20
GibsonJ. M.PimlottR.KennardC. (1987). Ocular motor and manual tracking in Parkinson's disease and the effect of treatment. J. Neurol. Neurosurg. Psychiatr.50, 853–860.
21
HighsteinS.CohenB.MonesR. (1969). Changes in saccadic eye movements of patients with Parkinson's disease before and after L-dopa. Trans. Am. Neurol. Assoc.94, 277–279.
22
HikosakaO. (2007). Basal ganglia mechanisms of reward-oriented eye movement. Ann. N. Y. Acad. Sci.1104, 229–249.
23
HikosakaO.NakamuraK.NakaharaH. (2006). Basal ganglia orient eyes to reward. J. Neurophysiol.95, 567–584.10.1152/jn.00458.2005
24
HikosakaO.SakamotoM.MiyashitaN. (1993). Effects of caudate nucleus stimulation on substantia nigra cell activity in monkey. Exp. Brain Res.95, 457–472.
25
HikosakaO.SakamotoM.UsuiS. (1989). Functional properties of monkey caudate neurons. III. Activities related to expectation of target and reward. J. Neurophysiol.61, 814–832.
26
HikosakaO.TakikawaY.KawagoeR. (2000). Role of the basal ganglia in the control of purposive saccadic eye movements. Physiol. Rev.80, 953–978.
27
HikosakaO.WurtzR. H. (1983). Visual and oculomotor functions of monkey substantia nigra pars reticulata. IV. Relation of substantia nigra to superior colliculus. J. Neurophysiol.49, 1285–1301.
28
HikosakaO.WurtzR. H. (1985). Modification of saccadic eye movements by GABA-related substances. I. Effect of muscimol and bicuculline in monkey superior colliculus. J. Neurophysiol.53, 266–291.
29
HongS.HikosakaO. (2008). The globus pallidus sends reward-related signals to the lateral habenula. Neuron60, 720–729.10.1016/j.neuron.2008.09.035
30
HoodA. J.AmadorS. C.CainA. E.BriandK. A.Al-RefaiA. H.SchiessM. C.SerenoA. B. (2007). Levodopa slows prosaccades and improves antisaccades: an eye movement study in Parkinson's disease. J. Neurol. Neurosurg. Psychiatr.78, 565–570.
31
IkedaT.HikosakaO. (2003). Reward-dependent gain and bias of visual responses in primate superior colliculus. Neuron39, 693–700.10.1016/S0896-6273(03)00464-1
32
IsodaM.HikosakaO. (2008). A neural correlate of motivational conflict in the superior colliculus of the macaque. J. Neurophysiol.100, 1332–1342.10.1152/jn.90275.2008
33
JaberM.RobinsonS. W.MissaleC.CaronM. G. (1996). Dopamine receptors and brain function. Neuropharmacology35, 1503–1519.10.1016/S0028-3908(96)00100-1
34
KalenscherT.LansinkC. S.LankelmaJ. V.PennartzC. M. (2010). Reward-associated gamma oscillations in ventral striatum are regionally differentiated and modulate local firing activity. J. Neurophysiol.103, 1658–1672.10.1152/jn.00432.2009
35
KawagoeR.TakikawaY.HikosakaO. (1998). Expectation of reward modulates cognitive signals in the basal ganglia. Nat. Neurosci.1, 411–416.
36
KawagoeR.TakikawaY.HikosakaO. (2004). Reward-predicting activity of dopamine and caudate neurons-a possible mechanism of motivational control of saccadic eye movement. J. Neurophysiol.91, 1013–1024.10.1152/jn.00721.2003
37
KravitzA. V.FreezeB. S.ParkerP. R.KayK.ThwinM. T.DeisserothK.KreitzerA. C. (2010). Regulation of Parkinsonian motor behaviours by optogenetic control of basal ganglia circuitry. Nature466, 622–626.10.1038/nature09159
38
KreitzerA. C.MalenkaR. C. (2007). Endocannabinoid-mediated rescue of striatal LTD and motor deficits in Parkinson's disease models. Nature445, 643–647.10.1038/nature05506
39
LauwereynsJ.WatanabeK.CoeB.HikosakaO. (2002). A neural correlate of response bias in monkey caudate nucleus. Nature418, 413–417.10.1038/nature00892
40
LeiW.JiaoY.Del MarN.ReinerA. (2004). Evidence for differential cortical input to direct pathway versus indirect pathway striatal projection neurons in rats. J. Neurosci.24, 8289–8299.10.1523/JNEUROSCI.1990-04.2004
41
LoC. C.WangX. J. (2006). Cortico-basal ganglia circuit mechanism for a decision threshold in reaction time tasks. Nat. Neurosci.9, 956–963.
42
MallolR.Barros-LoscertalesA.LopezM.BellochV.ParcetM. A.AvilaC. (2007). Compensatory cortical mechanisms in Parkinson's disease evidenced with fMRI during the performance of pre-learned sequential movements. Brain Res.1147, 265–271.10.1016/j.brainres.2007.02.046
43
MatsumotoM.HikosakaO. (2007). Lateral habenula as a source of negative reward signals in dopamine neurons. Nature447, 1111–1115.10.1038/nature05860
44
MehtaM. A.MontgomeryA. J.KitamuraY.GrasbyP. M. (2008). Dopamine D2 receptor occupancy levels of acute sulpiride challenges that produce working memory and learning impairments in healthy volunteers. Psychopharmacology (Berl.)196, 157–165.
45
MontagueP. R.DayanP.SejnowskiT. J. (1996). A framework for mesencephalic dopamine systems based on predictive Hebbian learning. J. Neurosci.16, 1936–1947.
46
MorrisG.ArkadirD.NevetA.VaadiaE.BergmanH. (2004). Coincident but distinct messages of midbrain dopamine and striatal tonically active neurons. Neuron43, 133–143.10.1016/j.neuron.2004.06.012
47
MuslimovicD.PostB.SpeelmanJ. D.SchmandB. (2007). Motor procedural learning in Parkinson's disease. Brain130, 2887–2897.10.1093/brain/awm211
48
NakamuraK.HikosakaO. (2006). Role of dopamine in the primate caudate nucleus in reward modulation of saccades. J. Neurosci.26, 5360–5369.10.1523/JNEUROSCI.4853-05.2006
49
NakamuraK.MatsumotoM.HikosakaO. (2008). Reward-dependent modulation of neuronal activity in the primate dorsal raphe nucleus. J. Neurosci.28, 5331–5343.10.1523/JNEUROSCI.0021-08.2008
50
OyamaK.HernadiI.IijimaT.TsutsuiK. (2010). Reward prediction error coding in dorsal striatal neurons. J. Neurosci.30, 11447–11457.10.1523/JNEUROSCI.1719-10.2010
51
PessiglioneM.SeymourB.FlandinG.DolanR. J.FrithC. D. (2006). Dopamine-dependent prediction errors underpin reward-seeking behaviour in humans. Nature442, 1042–1045.10.1038/nature05051
52
PicconiB.CentonzeD.HakanssonK.BernardiG.GreengardP.FisoneG.CenciM. A.CalabresiP. (2003). Loss of bidirectional striatal synaptic plasticity in L-DOPA-induced dyskinesia. Nat. Neurosci.6, 501–506.
53
PizzagalliD. A.EvinsA. E.SchetterE. C.FrankM. J.PajtasP. E.SantessoD. L.CulhaneM. (2008). Single dose of a dopamine agonist impairs reinforcement learning in humans: behavioral evidence from a laboratory-based measure of reward responsiveness. Psychopharmacology (Berl.)196, 221–232.
54
ReynoldsJ. N.HylandB. I.WickensJ. R. (2001). A cellular mechanism of reward-related learning. Nature413, 67–70.10.1038/35092560
55
RichfieldE. K.PenneyJ. B.YoungA. B. (1989). Anatomical and affinity state comparisons between dopamine D1 and D2 receptors in the rat central nervous system. Neuroscience30, 767–777.10.1016/0306-4522(89)90168-1
56
RobinsonD. A. (1972). Eye movements evoked by collicular stimulation in the alert monkey. Vision Res.12, 1795–1808.10.1016/0042-6989(72)90070-3
57
SamejimaK.UedaY.DoyaK.KimuraM. (2005). Representation of action-specific reward values in the striatum. Science310, 1337–1340.10.1126/science.1115270
58
SatoM.HikosakaO. (2002). Role of primate substantia nigra pars reticulata in reward-oriented saccadic eye movement. J. Neurosci.22, 2363–2373.
59
SchallJ. D.HanesD. P.ThompsonK. G.KingD. J. (1995). Saccade target selection in frontal eye field of macaque. I. Visual and premovement activation. J. Neurosci.15, 6905–6918.
60
SchultzW. (2006). Behavioral theories and the neurophysiology of reward. Annu. Rev. Psychol.57, 87–115.10.1146/annurev.psych.56.091103.070229
61
SchultzW. (2007). Behavioral dopamine signals. Trends Neurosci.30, 203–210.10.1016/j.tins.2007.03.007
62
SchultzW.ApicellaP.ScarnatiE.LjungbergT. (1992). Neuronal activity in monkey ventral striatum related to the expectation of reward. J. Neurosci.12, 4595–4610.
63
SchultzW.DayanP.MontagueP. R. (1997). A neural substrate of prediction and reward. Science275, 1593–1599.10.1126/science.275.5306.1593
64
SchwarzschildM. A.AgnatiL.FuxeK.ChenJ. F.MorelliM. (2006). Targeting adenosine A2A receptors in Parkinson's disease. Trends Neurosci.29, 647–654.10.1016/j.tins.2006.09.004
65
ShenW.FlajoletM.GreengardP.SurmeierD. J. (2008). Dichotomous dopaminergic control of striatal synaptic plasticity. Science321, 848–851.10.1126/science.1160575
66
ShimoY.HikosakaO. (2001). Role of tonically active neurons in primate caudate in reward-oriented saccadic eye movement. J. Neurosci.21, 7804–7814.
67
SommerM. A.WurtzR. H. (2001). Frontal eye field sends delay activity related to movement, memory, and vision to the superior colliculus. J. Neurophysiol.85, 1673–1685.
68
SurmeierD. J.DingJ.DayM.WangZ.ShenW. (2007). D1 and D2 dopamine-receptor modulation of striatal glutamatergic signaling in striatal medium spiny neurons. Trends Neurosci.30, 228–235.10.1016/j.tins.2007.03.008
69
TakikawaY.KawagoeR.HikosakaO. (2002). Reward-dependent spatial selectivity of anticipatory activity in monkey caudate neurons. J. Neurophysiol.87, 508–515.
70
TakikawaY.KawagoeR.HikosakaO. (2004). A possible role of midbrain dopamine neurons in short- and long-term adaptation of saccades to position-reward mapping. J. Neurophysiol.92, 2520–2529.10.1152/jn.00238.2004
71
TereshchenkoL. V.YudinA. G.KuznetsovY.LatanovA. V.Shul'govskiiV. V. (2002). Disturbances of saccadic eye movements in monkeys during development of MPTP-induced syndrome. Bull. Exp. Biol. Med.133, 182–184.
72
VermerschA. I.RivaudS.VidailhetM.BonnetA. M.GaymardB.AgidY.Pierrot-DeseillignyC. (1994). Sequences of memory-guided saccades in Parkinson's disease. Ann. Neurol.35, 487–490.
73
VoonV.PessiglioneM.BrezingC.GalleaC.FernandezH. H.DolanR. J.HallettM. (2010). Mechanisms underlying dopamine-mediated reward bias in compulsive behaviors. Neuron65, 135–142.10.1016/j.neuron.2009.12.027
74
WangZ.KaiL.DayM.RonesiJ.YinH. H.DingJ.TkatchT.LovingerD. M.SurmeierD. J. (2006). Dopaminergic control of corticostriatal long-term synaptic depression in medium spiny neurons is mediated by cholinergic interneurons. Neuron50, 443–452.10.1016/j.neuron.2006.04.010
75
WatanabeK.HikosakaO. (2005). Immediate changes in anticipatory activity of caudate neurons associated with reversal of position-reward contingency. J. Neurophysiol.94, 1879–1887.10.1152/jn.00012.2005
76
WatanabeK.LauwereynsJ.HikosakaO. (2003). Neural correlates of rewarded and unrewarded eye movements in the primate caudate nucleus. J. Neurosci.23, 10052–10057.
77
WhiteO. B.Saint-CyrJ. A.TomlinsonR. D.SharpeJ. A. (1983). Ocular motor deficits in Parkinson's disease. II. Control of the saccadic and smooth pursuit systems. Brain106(Pt 3), 571–587.
78
WickensJ. R. (2009). Synaptic plasticity in the basal ganglia. Behav. Brain Res.199, 119–128.
79
YinH. H.MulcareS. P.HilarioM. R.ClouseE.HollowayT.DavisM. I.HanssonA. C.LovingerD. M.CostaR. M. (2009). Dynamic reorganization of striatal circuits during the acquisition and consolidation of a skill. Nat. Neurosci.12, 333–341.
Appendix
Model Equations
Our model examined the possibility that the plasticity mediated by the dopamine (DA) actions on direct pathway medium spiny neurons (MSNs) and indirect pathway MSNs are responsible for the observed saccadic latency changes in normal and Parkinsonian monkeys. The model circuit was implemented with cell membrane differential equations in Visual C++ using a PC. Below, we describe the architecture of the model, including how it generates the saccade latency.
Cortical process
The cortico-striatal signal, FEF, is represented as follows:
where a, b are constants of 0.7 and 0.6 respectively. For the initial stage of one direction reward (1DR) learning where the reward-category (CgRWD; Eq. 4) has not been formed, a = 1 and b = 0 are used. I is the visual input representing the target signal given as follows:
tstart and tend above represent the beginning (1000 ms) and ending (1100 ms) of the target signal, respectively, coming from the visual area. tdelay (50 ms) is the signal delay between the presentation of the visual target and the activation of the frontal eye field (Schall et al.,
The function above is an accelerating function of x, that ensures the output (FEF in Eq. 1) to have a linear response to the input (I in Eq. 2).
To simulate the recognition of the reward trial, we used a “reward-category neuron” that has a ramping activity leading to a saccade (Lauwereyns et al.,
where, τ (500 ms) is a time constant for the slow ramping activity of the category neuron. IFIX represents the fixation signal (IFIX = 1, for 800 ms at the beginning of a trial, 0 otherwise) that drives the neuron. The constant b of 100 was used to shut off the activity of the category neuron, once the substantia nigra compacta (SNc) activity (SNc; see Eq. 18), which generates DA, deviates from the current DA level (DA, see Eq. 9) indicating presence or absence of future reward. |x| indicates an absolute value function. The constant a of 1 and 0.4 was used for contralateral reward and no-reward blocks, respectively. This gave a larger ramping activity during contralateral reward trials compared to no-reward trials.
Direct pathway
The neural activity in the direct pathway of the caudate (CDdr) is simulated as follows:
where ( )* indicates a conduction function as in Eq. 3. a and b are constants of 0.7 and 0.3 respectively. For the initial stage of 1DR learning where the reward-category (CgRWD; Eq. 4) has not been formed, a = 1 and b = 0 are used. wdr above is the synaptic weight between the cortex and the CDdr as follows:
where Edr~ denotes the eligibility trace of the direct pathway neuron (see below); A (1 when I > 0, 0 otherwise; also 1, for 100 ms beginning from the start of the outcome, when there has been a block change) is a cholinergic action in the caudate, deemed to facilitate plasticity mechanism (Shimo and Hikosaka,
Edr~ denotes the eligibility trace of the direct pathway caudate neuron:
where τ is a time constant of 33 ms. The eligibility trace acts as a time window where the plasticity is allowed to occur.
The concentration of DA, was calculated using a simple integrating function:
where SNc is the activity of the DA neurons in the substantia nigra pars compacta.
Indirect pathway
The neural activity of the caudate neuron in the indirect pathway (CDid) is described as follows:
where wid is the synaptic weight between the CX and CDid as follows:
where Edr~ denotes the eligibility trace of the indirect pathway in the caudate neuron and has the same form and parameters as in Eq. 8. A is the cholinergic input as in Eq. 6. wL (0.1) is the lower bound of the weight. θD2 (normal: 0.25, DA depletion: 0.75, Parkinsonian: 0.23) is the threshold of the D2 receptor activation; τ (71 ms) is a time constant for the weight change; a, b are constants of 0.9 and 12 respectively; a, b of 0.06 and 0.06 were used to explain the inefficient learning in Figure 3C.
The thresholding mechanism for D1 and D2 receptors is similar to the one proposed by Brown et al. (
Globus pallidus external segment
The simulated GPi neuron gets inhibition from the CDid and has its own tonic component as follows:
where TGPe (of 10) represents a tonic component. 1/(CDid*+ 1) denotes a shunting form of suppression by the striatum.
Subthalamic nucleus
The activity of the subthalamic nucleus (STN) is simulated as follows:
where TSTN (4.0) represents the lumped version of cortical activity that becomes high when there are more than one plan to execute (conflict), and a tonic STN component. The lumped version of cortical activity was used because it is assumed that there is no coactivation of plans at a given time (Frank et al.,
Substantia nigra pars reticulata
The simulated substantia nigra pars reticulata (SNr) gets its excitatory input from STN and inhibitory input from CDdr as follows:
where a is a threshold of 0.1 and TSNr (1.5) is a tonic component. The conduction times from the CD and STN to the SNr are set to 9 ms (Hikosaka et al.,
Border region of the globus pallidus (GPb)
To simulate the known physiology of the lateral habenula (LHb)-projecting neurons in the border region of the globus pallidus (GPb), the following equation is used.
tstart represents the onset time of the target stimulus; 115 and 100 are the known delay of GPb neurons and their firing duration in ms, respectively (Hong and Hikosaka,
Lateral habenula
Lateral habenula is simulated to simply follow the input activity of the LHb-projecting GPi neurons:
Substantia nigra pars compacta
The substantia nigra pars compacta (SNc) is assumed to get inhibitory inputs from the LHb during trials as follows:
where TSNc (normal: 0.5, PD: 0.25) is a tonic component that defines the DA tone in the caudate. a is a constant that defines the upper limit of the SNc activity. It was set to be 1 and 0.27 in the normal subject and Parkinsonian subject, respectively. The time constant τ was set to 3.3 ms.
Superior colliculus
Superior colliculus (SC) is assumed to integrate excitatory inputs from the cortex and inhibitory inputs from SNr as follows:
where FEF*/(SNr* + 1) represents a possible shunting nature of the SNr signal to the cortical input. The conduction time delays from the cortex and SNr to the SC are set to be 1 ms (assumed) and 0.7 ms (Hikosaka and Wurtz,
Saccadic reaction time
Reaction time (in ms) of the saccade was calculated as follows.
where tSC is the time point when the SC activity has reached the threshold of saccade initiation (of 0.2); tstart, the beginning of the target signal; M is a scaling factor (173, to consider the different data samples (monkeys) used by Nakamura and Hikosaka (
Summary
Keywords
LTP, LTD, model, saccade, latency, reaction time, reward, motivation
Citation
Hong S and Hikosaka O (2011) Dopamine-Mediated Learning and Switching in Cortico-Striatal Circuit Explain Behavioral Changes in Reinforcement Learning. Front. Behav. Neurosci. 5:15. doi: 10.3389/fnbeh.2011.00015
Received
19 October 2010
Accepted
09 March 2011
Published
21 March 2011
Volume
5 - 2011
Edited by
Paul E. M. Phillips, University of Washington, USA
Reviewed by
Kenji Doya, Okinawa Institute of Science and Technology, Japan; Michael J. Frank, Brown University, USA
Copyright
© 2011 Hong and Hikosaka.
This is an open-access article subject to an exclusive license agreement between the authors and Frontiers Media SA, which permits unrestricted use, distribution, and reproduction in any medium, provided the original authors and source are credited.
*Correspondence: Simon Hong, Laboratory of Sensorimotor Research, National Eye Institute, National Institutes of Health, 49 Convent Drive, Bethesda, MD 20892, USA. e-mail: hongy@nei.nih.gov
Disclaimer
All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.