Event Abstract

Biologically-inspired neural controller based on adaptive reward learning

  • 1 Technische Universität München, Germany

Human learns tasks from their own experiences by self-exploration and observation of others' actions. The evaluation of the achieved task is driven by rewards. Human can improve their skills in order to gain more rewards (e.g. Happiness, Food, Money, and etc...). By observing its cortical activities, neurobiological studies suggest that the orbitofrontal cortex (OFC) is related to reward dealing in the human brain [1]. Neurons of OFC are the key reward structure of the brain, where reward is coded in an adaptive and flexible way [2]. Studies of the Anterior Cingulate Cortex (ACC) suggest that it is responsible to avoid repeating mistakes [3]. This cortical area acts as an early warning system (EWS) that adjusts the behavior to avoid dangerous situations. It responds not only to the sources of errors (external error feedback), but also to earliest sources of error information available (internal error detection) [4]. EWS has shown to be affected by the tolerant to risks, psychological studies provide further evidences of people’s strategies into two classes as in taking or aversion risks [5].

“NeuroRobotics” research draw on human learning methods in order to improve the autonomy and the robustness of robots for their dealing with environment changes. In connection with these neurological studies, we proposed a learning method based on human learning from experiences (ACC) and inspired by the way the human brain code rewards (OFC), in order to allow a humanoid robot to learn a walking task. With the vigilance threshold concept that represents the tolerance to risk, the method guaranteed the balance between exploration and exploitation, unlike other searching methods (e.g. Q-learning, Monte Carlo…). Furthermore, it is able to converge into multiple learning targets.
Most task learning methods based on reward use predefined parameters in their reward function [6], which cannot be obtained without previous experiences to achieve the desired task. Learning based adaptive reward don’t require any previous information about the reward, it is able to build the experience only based on the reward available information after starting from scratch.
Our approach has been implemented on the NAO humanoid robot, controlled by a bio-inspired neural controller based on a central pattern generator (CPG). The learning system adapts the oscillation frequency and the motor neuron gain in pitch and roll in order to walk on flat and sloped terrain, and to switch between them.


[1] Iversen, S. D. & Mishkin, M. Perseverative interference in monkeys following selective lesions of the inferior prefrontal convexity. Experimental Brain Research, Springer Berlin / Heidelberg, 1970, 11, 376-386.
[2] Kobayashi, S.; de Carvalho, O. P. & Schultz, W. Adaptation of reward sensitivity in orbitofrontal neurons. The Journal of Neuroscience, 2010, 30, 534-544.
[3] Brown, J. W. & Braver, T. S. Learned Predictions of Error Likelihood in the Anterior Cingulate Cortex. Science, 2005, 307, 1118-1121
[4] Mars, R. B.; Coles, M. G.; Grol, M. J.; Holroyd, C. B.; Nieuwenhuis, S.; Hulstijn, W. & Toni, I. Neural dynamics of error processing in medial frontal cortex. Neuroimage, 2005, 28, 1007-1013.
[5] Wang, X.; Kruger, D. & Wilke, A. Towards the development of an evolutionarily valid domain-specific risk-taking scale. Evolutionary Psychology, 2007, 5, 555-568.
[6] Tzuu-Hseng S. Li and Yu-Te Su and Shao-Wei Lai & Jhen-Jia Hu. Walking Motion Generation, Synthesis, and Control for Biped Robot by Using PGRL, LPI, and Fuzzy Logic. IEEE Transactions on Systems, Man, and Cybernetics, Part B, 2011, 41, 736-748.

Keywords: Adaptive reward, Learning, neurorobotics, robot control

Conference: Bernstein Conference 2012, Munich, Germany, 12 Sep - 14 Sep, 2012.

Presentation Type: Poster

Topic: Learning, plasticity, memory

Citation: Nassour J and Cheng G (2012). Biologically-inspired neural controller based on adaptive reward learning. Front. Comput. Neurosci. Conference Abstract: Bernstein Conference 2012. doi: 10.3389/conf.fncom.2012.55.00164

Copyright: The abstracts in this collection have not been subject to any Frontiers peer review or checks, and are not endorsed by Frontiers. They are made available through the Frontiers publishing platform as a service to conference organizers and presenters.

The copyright in the individual abstracts is owned by the author of each abstract or his/her employer unless otherwise stated.

Each abstract, as well as the collection of abstracts, are published under a Creative Commons CC-BY 4.0 (attribution) licence (https://creativecommons.org/licenses/by/4.0/) and may thus be reproduced, translated, adapted and be the subject of derivative works provided the authors and Frontiers are attributed.

For Frontiers’ terms and conditions please see https://www.frontiersin.org/legal/terms-and-conditions.

Received: 11 May 2012; Published Online: 12 Sep 2012.

* Correspondence: Mr. John Nassour, Technische Universität München, München, 80333, Germany, john.nassour@informatik.tu-chemnitz.de