Event Abstract

Learning to plan: planning as an action in simple reinforcement learning agents

  • 1 Columbia University, United States
  • 2 University of Minnesota, Department of Neuroscience, United States

Current neuroscientific theories of decision making emphasize that behavior can be controlled by different brain systems with different properties. A common distinction is that between a model-free, stimulus-response, "habit" system on the one hand, and a model-based, flexible "planning" system on the other. Planning tends to be prominent early during learning before transitioning to more habitual control, and is often specific to important choice points (e.g. Tolman, 1938), implying that planning processes can be selectively engaged as circumstances demand. Current models of model-based decision making lack a mechanism to account for selective planning; for instance, the influential Daw et al. (2005) model plans at every action, using an external mechanism to arbitrate between planned and model-free control. Thus, there is currently no model of planning that defines its relationship to model-free control while respecting how humans and animals actually behave. To address this, we explored a "T-maze grid world" reinforcement learning model where the agent can choose to plan. The value of planning is learned along with that of other actions (turn left, etc.) and is updated after an N-step fixed policy (the "plan") is executed, offset by a fixed planning cost. The contents of the plan consist of either a random sequences of moves (random plan control) or the sequence of moves that leads to the highest valued state on the agent’s internal value function (true plan). Consistent with previous results (Sutton, 1990), we find that planning speeds learning. Furthermore, while agents plan frequently during initial learning, with experience, the non-planning actions gradually increase in value and win out. Interestingly, even in this simple environment, the agent shows a selective increase in planning actions specifically at the choice point under appropriate conditions. We explore a number of variations of the model, including a hierarchical version where action values are learned for the model-free and model-based controller separately. Thus, a simple Q-learning model which includes an added planning action, the value of which is learned alongside that of other actions, can reproduce two salient aspects of planning data: planning is prominent early but gives way to habitual control with experience, and planning occurs specifically at points appropriate to the structure of the task. The fact that these phenomena can be learned in a simple reinforcement learning architecture suggests this as an alternative to models that use a supplemental arbitration mechanism between planning and habitual control. By treating planning as a choice, the model can generate specific predictions about what point in time, where in the environment, and how far ahead or for how long, agents may choose to plan. More generally, the current approach is an example of blurring the boundary between agent and environment, such that actions (like planning) can be inside the agent alone instead of having to affect the environment, and a demonstration that the state space can include internal variables (such as the contents of a plan), similar to the role of working memory (O’Reilly and Frank, 2006; Zilli and Hasselmo, 2007).

Conference: Computational and Systems Neuroscience 2010, Salt Lake City, UT, United States, 25 Feb - 2 Mar, 2010.

Presentation Type: Poster Presentation

Topic: Poster session II

Citation: Wimmer GE and Van Der Meer M (2010). Learning to plan: planning as an action in simple reinforcement learning agents. Front. Neurosci. Conference Abstract: Computational and Systems Neuroscience 2010. doi: 10.3389/conf.fnins.2010.03.00246

Received: 05 Mar 2010; Published Online: 05 Mar 2010.

* Correspondence: G E Wimmer, Columbia University, New York, United States, elliott@caa.columbia.edu

© 2007 - 2017 Frontiers Media S.A. All Rights Reserved