Frontiers reaches 6.4 on Journal Impact Factors

Original Research ARTICLE Provisionally accepted The full-text will be published soon. Notify me

Front. Robot. AI | doi: 10.3389/frobt.2018.00049

Bootstrapping of Parameterized Skills Through Hybrid Optimization in Task and Policy Spaces

  • 1Faculty of Technology, Bielefeld University, Germany
  • 2Faculty of Technology, Bielefeld University, Germany
  • 3Institute for Robotics and Process Control, Technische Universitat Braunschweig, Germany

Modern robotic applications create high demands on adaptation of actions with respect to
variance in a given task. Reinforcement learning is able to optimize for these changing conditions,
but relearning from scratch is hardly feasible due to the high number of required rollouts. We
propose a parameterized skill that generalizes to new actions for changing task parameters,
which is encoded as a meta-learner that provides parameters for task-specific dynamic motion
primitives. Our work shows that utilizing parameterized skills for initialization of the optimization
process leads to a more effective incremental task learning. In addition, we introduce a hybrid
optimization method that combines a fast coarse optimization on a manifold of policy parameters
with a fine grained parameter search in the unrestricted space of actions. The proposed algorithm
reduces the number of required rollouts for adaptation to new task conditions. Application in
illustrative toy scenarios, for a 10-DOF planar arm, and a humanoid robot point reaching task
validate the approach.

Keywords: reinforcement learning, policy optimization, Memory, Learning, hybrid optimization, dimensionality reduction, parameterized skills

Received: 13 Jun 2017; Accepted: 11 Apr 2018.

Edited by:

Alexandre Bernardino, Instituto Superior Técnico, Universidade de Lisboa, Portugal

Reviewed by:

John Nassour, Technische Universität Chemnitz, Germany
Erol Sahin, Middle East Technical University, Turkey  

Copyright: © 2018 Queißer and Steil. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

* Correspondence: Mr. Jeffrey F. Queißer, Bielefeld University, Faculty of Technology, CoR-Lab Research Institute for Cognition and Robotics, Bielefeld University, Universitätsstr. 25, Bielefeld, 33615, Germany,