Event Abstract

The basal ganglia and the 3-factor learning rule: reinforcement learning during operant conditioning

  • 1 University of Sheffield, Adaptive Behaviour Research Group, Department of Psychology, United Kingdom

Operant conditioning paradigms that explore interactive, or ‘trial and error’ learning in animals, have provided evidence to suggest that the basal ganglia embody a form of reinforcement learning algorithm, with phasic activity in midbrain dopaminergic neurons constituting an internally generated training signal. In the presented work we employ a biologically constrained, computational model of the basal ganglia, and related circuitry (see supplementary Fig. 1), to explore the proposal of Redgrave and Gurney (supplementary ref. [1]) that the phasic dopamine signal represents a ‘sensory prediction error’ as opposed to the ‘reward prediction error’ more commonly posited. Under this scheme, the sensory prediction error or reinforcement learning signal trains the basal ganglia to preferentially select any action that reliably precedes a novel outcome, irrespective of whether that outcome is associated with genuine reward or not. In other words, this neuronal signal changes the normal basal ganglia action-selection mechanism into temporary ‘doing-it-again’ mode increasing the probability to more likely choose the key (bias) action causally associated with the novel-outcome. We propose that through the purposeful repetition of such actions, the brain rapidly forms robust action-outcome associations rendering previously novel outcomes predictable. Consistent with the proposal of Redgrave and Gurney, we further suggest that through this policy of temporary ‘repetition bias’, a naive animal populates a library of action-outcome associations in long-term memory and that these subsequently form the foundation for voluntary goal-seeking behaviour.

The computational model that we present tests the idea that a ‘repetition-bias’ policy is encoded at cortico-striatal synapses by underlying modulatory-plasticity effects, with long-term potentiation leading to repetition and long-term depression being responsible for returning the basal ganglia to an unbiased state (see supplementary Fig. 2). To this end, we have constructed a novel learning rule (see supplementary Eq. 1) based upon the 3-factor synaptic plasticity framework proposed by Reynolds & Wickens (supplementary ref. [4]). This rule is composed of a dopamine factor (supplementary ref. [2]) that combines both phasic and tonic dopamine signal characteristics, and the properties of a stable hebbian-like, BCM factor (supplementary ref. [3]). The combination of this two elements account for synaptic re-normalization nearing the baseline set as initial condition.

We present results from the simulation of an operant conditioning task utilizing abstract sensory and motor signals to demonstrate the model’s successful implementation of repetition-bias hypothesis. We then compare the behavioural consequences of this policy to natural animal behaviour by utilizing an embodied robot simulation in which the agent is free to explore an open environment containing interactive objects. In addition, our results demonstrate a biologically plausible relationship between robot behavioural performance and simulated synaptic plasticity in cortico-striatal synapses.

Conference: Bernstein Conference on Computational Neuroscience, Frankfurt am Main, Germany, 30 Sep - 2 Oct, 2009.

Presentation Type: Poster Presentation

Topic: Learning and plasticity

Citation: Bolado-Gomez R, Chambers J and Gurney K (2009). The basal ganglia and the 3-factor learning rule: reinforcement learning during operant conditioning. Front. Comput. Neurosci. Conference Abstract: Bernstein Conference on Computational Neuroscience. doi: 10.3389/conf.neuro.10.2009.14.092

Copyright: The abstracts in this collection have not been subject to any Frontiers peer review or checks, and are not endorsed by Frontiers. They are made available through the Frontiers publishing platform as a service to conference organizers and presenters.

The copyright in the individual abstracts is owned by the author of each abstract or his/her employer unless otherwise stated.

Each abstract, as well as the collection of abstracts, are published under a Creative Commons CC-BY 4.0 (attribution) licence (https://creativecommons.org/licenses/by/4.0/) and may thus be reproduced, translated, adapted and be the subject of derivative works provided the authors and Frontiers are attributed.

For Frontiers’ terms and conditions please see https://www.frontiersin.org/legal/terms-and-conditions.

Received: 27 Aug 2009; Published Online: 27 Aug 2009.

* Correspondence: Rufino Bolado-Gomez, University of Sheffield, Adaptive Behaviour Research Group, Department of Psychology, Sheffield, United Kingdom, r.bolado@sheffield.ac.uk