Original Research ARTICLE
From semantics to execution: Integrating action planning with reinforcement learning for robotic causal problem-solving
- 1Universität Hamburg, Germany
Reinforcement learning is an appropriate and successful method to learn robot control. Symbolic action planning is useful to resolve causal dependencies and to break a causally complex problem down into a sequence of simpler high-level actions. A problem with the integration of both approaches is that action planning is based on discrete high-level action- and state spaces, whereas reinforcement learning is usually driven by a continuous reward function. Recent advances in model-free reinforcement learning, specifically, universal value function approximators and hindsight experience replay, have focused on goal-independent methods based on sparse rewards that are only given at the end of a rollout, and only if the goal has been fully achieved. In this article, we build on these novel methods to facilitate the integration of action planning with model-free reinforcement learning. Specifically, the article demonstrates how the reward-sparsity can serve as a bridge between the high-level and low-level state- and action spaces. As a result, we demonstrate that the integrated method is able to solve object manipulation problems that involve tool use and non-trivial causal dependencies under noisy conditions, exploiting both data and knowledge.
Keywords: Hierarchical Architecture, planning, Robotics, neural networks, causal puzzles, reinforcement learning
Received: 22 May 2019;
Accepted: 04 Nov 2019.
Copyright: © 2019 Eppe, Nguyen and Wermter. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
* Correspondence: Dr. Manfred Eppe, Universität Hamburg, Hamburg, Germany, email@example.com