Event Abstract

Habits, Action Sequences And Working Memory From A Behavioral And A Computational Perspective

  • 1 UCLouvain, Belgium

Habitual and goal-directed behaviors have long been investigated (1–3). Habits are extensively trained behaviors that are efficient in a stable environment and insensitive to contingency degradation or change of the associative structure (CAS) of the environment (4–6). In contrast, goal-directed actions are more efficient in a new or ever-changing environment, but are subject to interference with dual tasks, presumably because they are more computationally demanding. The exact computational and neural correlates of these two types of behaviors remain debated. The now “classical” view is that a model-free reinforcement-learning algorithm could reflect habitual control whereas a model-based reinforcement-learning algorithm could reflect the goal-directed control (7–13). However, other additional factors are also likely to influence the selection or reliance on habitual versus goal-directed behaviors. The goal-directed controller is also supposed to lose influence as habits gain in reliability, but other factors, such as working memory (WM) capacity and WM loading, could influence the weighting between the two systems (14,15). It is also unknown if learning of action sequences could, in whole or in part, account for habit acquisition as opposed to the step-by-step, single action-stimulus association formation, as assumed by the classical theory (16–18). The goal of the present study was to investigate (1) if action sequence formation could account at least in part for habit acquisition, (2) if WM loading with respect to WM capacity could reflect the propensity of switching from goal-directed to habitual controller. Here we used a two-step task aimed at inducing habitual behavior through extensive training in a stable environment. Participants (n=10) had to perform successively two different actions in response to four visual stimuli in order to reach one of the four desired final state associated with each stimulus. Actions consist of key pressing on a computer keyboard and a mouse. Action sequences and stimuli were divided in two different groups (i.e. stimuli 1 and 2 required action sequences 1 and 2 whereas stimuli 3 and 4 required action sequences 3 and 4). Each of the four stimuli was associated with a preferential action sequence, which had to be performed with a defined probability (.5, .66, .83 and 1). Subjects were not aware of those probabilities. The experiment was achieved in two sessions on two successive days and each subject performed a total of 1200 trials in approximately 150 min. After a training period of 700 trials (175 trials per stimulus), a WM loading task was added in parallel to the main task (Stroop task). After a short retraining in this condition (225 trials), the contingency of the system was changed for the last 275 trials and each shape stimulus was henceforth associated with all the possible action sequences with an equal probability. WM was assessed at the beginning of the experiment with a spatial updating task. Pupil size dilation was also measured during all the experiment. We expected the habitual behavior (i.e. the difficulty to adapt to change) to be proportional to the probability of the associated/corresponding action sequence: stimulus trained with a 100% action sequence should be more habitual than the 83%, the 83% more habitual than the 66% and so on. Preliminary results show that the performance after the CAS depended of the strength of the association between the stimuli and the corresponding action sequence. They also show that WM capacity is correlated to the overall performance and to the ability to adapt to the new contingency after the CAS. We tested four computational models on the present task and hybrid forms of them: a model-free algorithm (SARSA learner), a model-based algorithm (FORWARD learner), a WM-dependent reinforcement-learning algorithm and an action-sequence, model-free algorithm. Some computational considerations are also proposed to account for the pupil size results.

References

1. Dolan, R. J. & Dayan, P. Goals and habits in the brain. Neuron 80, 312–25 (2013).
2. Dickinson, a. Actions and Habits: The Development of Behavioural Autonomy. Philos. Trans. R. Soc. B Biol. Sci. 308, 67–78 (1985).
3. Dayan, P. Goal-directed control and its antipodes. Neural Netw. 22, 213–9 (2009).
4. Valentin, V. V, Dickinson, A. & O’Doherty, J. P. Determining the neural substrates of goal-directed learning in the human brain. J. Neurosci. 27, 4019–26 (2007).
5. Tricomi, E., Balleine, B. W. & O’Doherty, J. P. A specific role for posterior dorsolateral striatum in human habit learning. Eur. J. Neurosci. 29, 2225–32 (2009).
6. Yin, H. H. & Knowlton, B. J. The role of the basal ganglia in habit formation. Nat. Rev. Neurosci. 7, 464–76 (2006).
7. Daw, N. D., Niv, Y. & Dayan, P. Uncertainty-based competition between prefrontal and dorsolateral striatal systems for behavioral control. Nat. Neurosci. 8, 1704–11 (2005).
8. Lee, S. W., Shimojo, S. & O’Doherty, J. P. Neural Computations Underlying Arbitration between Model-Based and Model-free Learning. Neuron 81, 687–699 (2014).
9. Sutton, R. S. & Barto, A. G. Reinforcement Learning : An Introduction. (2012).
10. Gläscher, J., Hampton, A. N. & O’Doherty, J. P. Determining a role for ventromedial prefrontal cortex in encoding action-based value signals during reward-related decision making. Cereb. Cortex 19, 483–95 (2009).
11. Gläscher, J., Daw, N., Dayan, P. & O’Doherty, J. P. States versus rewards: dissociable neural prediction error signals underlying model-based and model-free reinforcement learning. Neuron 66, 585–95 (2010).
12. Wunderlich, K., Dayan, P. & Dolan, R. J. Mapping value based planning and extensively trained choice in the human brain. Nat. Neurosci. 15, 786–91 (2012).
13. Wunderlich, K., Smittenaar, P. & Dolan, R. J. Dopamine enhances model-based over model-free choice behavior. Neuron 75, 418–24 (2012).
14. Otto, a R., Raio, C. M., Chiang, A., Phelps, E. a & Daw, N. D. Working-memory capacity protects model-based learning from stress. Proc. Natl. Acad. Sci. U. S. A. (2013). doi:10.1073/pnas.1312011110
15. Collins, A. G. E. & Frank, M. J. How much of reinforcement learning is working memory, not reinforcement learning? A behavioral, computational, and neurogenetic analysis. Eur. J. Neurosci. 35, 1024–35 (2012).
16. Dezfouli, A. & Balleine, B. W. Habits, action sequences and reinforcement learning. Eur. J. Neurosci. 35, 1036–51 (2012).
17. Dezfouli, A. & Balleine, B. W. Actions, action sequences and habits: evidence that goal-directed and habitual action control are hierarchically organized. PLoS Comput. Biol. 9, e1003364 (2013).
18. Keramati, M., Dezfouli, A. & Piray, P. Speed/accuracy trade-off between the habitual and the goal-directed processes. PLoS Comput. Biol. 7, e1002055 (2011).

Keywords: habits, goal-directed behavior, working memory, Computational Biology, Pupil size

Conference: Belgian Brain Council 2014 MODULATING THE BRAIN: FACTS, FICTION, FUTURE, Ghent, Belgium, 4 Oct - 4 Oct, 2014.

Presentation Type: Poster Presentation

Topic: Basic Neuroscience

Citation: Moens V, Zénon A and OLIVIER E (2014). Habits, Action Sequences And Working Memory From A Behavioral And A Computational Perspective. Conference Abstract: Belgian Brain Council 2014 MODULATING THE BRAIN: FACTS, FICTION, FUTURE. doi: 10.3389/conf.fnhum.2014.214.00048

Copyright: The abstracts in this collection have not been subject to any Frontiers peer review or checks, and are not endorsed by Frontiers. They are made available through the Frontiers publishing platform as a service to conference organizers and presenters.

The copyright in the individual abstracts is owned by the author of each abstract or his/her employer unless otherwise stated.

Each abstract, as well as the collection of abstracts, are published under a Creative Commons CC-BY 4.0 (attribution) licence (https://creativecommons.org/licenses/by/4.0/) and may thus be reproduced, translated, adapted and be the subject of derivative works provided the authors and Frontiers are attributed.

For Frontiers’ terms and conditions please see https://www.frontiersin.org/legal/terms-and-conditions.

Received: 01 Jul 2014; Published Online: 13 Jul 2014.

* Correspondence: Dr. Vincent Moens, UCLouvain, Bruxelles, Belgium, vincentmoens@gmail.com