# NEURAL COMPUTATION IN EMBODIED CLOSED-LOOP SYSTEMS FOR THE GENERATION OF COMPLEX BEHAVIOR: FROM BIOLOGY TO TECHNOLOGY

EDITED BY : Poramate Manoonpong and Christian Tetzlaff PUBLISHED IN : Frontiers in Neurorobotics

#### Frontiers Copyright Statement

© Copyright 2007-2018 Frontiers Media SA. All rights reserved. All content included on this site, such as text, graphics, logos, button icons, images, video/audio clips, downloads, data compilations and software, is the property of or is licensed to Frontiers Media SA ("Frontiers") or its licensees and/or subcontractors. The copyright in the text of individual articles is the property of their respective authors, subject to a license granted to Frontiers.

The compilation of articles constituting this e-book, wherever published, as well as the compilation of all other content on this site, is the exclusive property of Frontiers. For the conditions for downloading and copying of e-books from Frontiers' website, please see the Terms for Website Use. If purchasing Frontiers e-books from other websites or sources, the conditions of the website concerned apply.

Images and graphics not forming part of user-contributed materials may not be downloaded or copied without permission.

Individual articles may be downloaded and reproduced in accordance with the principles of the CC-BY licence subject to any copyright or other notices. They may not be re-sold as an e-book.

As author or other contributor you grant a CC-BY licence to others to reproduce your articles, including any graphics and third-party materials supplied by you, in accordance with the Conditions for Website Use and subject to any copyright notices which you include in connection with your articles and materials.

All copyright, and all rights therein, are protected by national and international copyright laws.

The above represents a summary only. For the full conditions see the Conditions for Authors and the Conditions for Website Use. ISSN 1664-8714 ISBN 978-2-88945-605-5 DOI 10.3389/978-2-88945-605-5

#### About Frontiers

Frontiers is more than just an open-access publisher of scholarly articles: it is a pioneering approach to the world of academia, radically improving the way scholarly research is managed. The grand vision of Frontiers is a world where all people have an equal opportunity to seek, share and generate knowledge. Frontiers provides immediate and permanent online open access to all its publications, but this alone is not enough to realize our grand goals.

#### Frontiers Journal Series

The Frontiers Journal Series is a multi-tier and interdisciplinary set of open-access, online journals, promising a paradigm shift from the current review, selection and dissemination processes in academic publishing. All Frontiers journals are driven by researchers for researchers; therefore, they constitute a service to the scholarly community. At the same time, the Frontiers Journal Series operates on a revolutionary invention, the tiered publishing system, initially addressing specific communities of scholars, and gradually climbing up to broader public understanding, thus serving the interests of the lay society, too.

#### Dedication to Quality

Each Frontiers article is a landmark of the highest quality, thanks to genuinely collaborative interactions between authors and review editors, who include some of the world's best academicians. Research must be certified by peers before entering a stream of knowledge that may eventually reach the public - and shape society; therefore, Frontiers only applies the most rigorous and unbiased reviews.

Frontiers revolutionizes research publishing by freely delivering the most outstanding research, evaluated with no bias from both the academic and social point of view. By applying the most advanced information technologies, Frontiers is catapulting scholarly publishing into a new generation.

#### What are Frontiers Research Topics?

Frontiers Research Topics are very popular trademarks of the Frontiers Journals Series: they are collections of at least ten articles, all centered on a particular subject. With their unique mix of varied contributions from Original Research to Review Articles, Frontiers Research Topics unify the most influential researchers, the latest key findings and historical advances in a hot research area! Find out more on how to host your own Frontiers Research Topic or contribute to one as an author by contacting the Frontiers Editorial Office: researchtopics@frontiersin.org

# NEURAL COMPUTATION IN EMBODIED CLOSED-LOOP SYSTEMS FOR THE GENERATION OF COMPLEX BEHAVIOR: FROM BIOLOGY TO TECHNOLOGY

Topic Editors:

Poramate Manoonpong, University of Southern Denmark, Denmark; Vidyasirimedhi Institute of Science and Technology, Thailand; Nanjing University of Aeronautics and Astronautics, China Christian Tetzlaff, Georg-August-Universität Göttingen, Germany

Brain and Body by Poramate Manoonpong under CC-BY

How can neural and morphological computations be effectively combined and realized in embodied closed-loop systems (e.g., robots) such that they can become more like living creatures in their level of performance? Understanding this will lead to new technologies and a variety of applications.

To tackle this research question, here, we bring together experts from different fields (including Biology, Computational Neuroscience, Robotics, and Artificial Intelligence) to share their recent findings and ideas and to update our research community. This eBook collects 17 cutting edge research articles, covering neural and morphological computations as well as the transfer of results to real world applications, like prosthesis and orthosis control and neuromorphic hardware implementation.

Citation: Manoonpong, P., Tetzlaff, C., eds. (2018). Neural Computation in Embodied Closed-Loop Systems for the Generation of Complex Behavior: From Biology to Technology. Lausanne: Frontiers Media. doi: 10.3389/978-2-88945-605-5

# Table of Contents

*05 Editorial: Neural Computation in Embodied Closed-Loop Systems for the Generation of Complex Behavior: From Biology to Technology* Poramate Manoonpong and Christian Tetzlaff

#### SECTION 1

#### EMBODIED CLOSED-LOOP SYSTEMS

*10 Adaptive Control Strategies for Interlimb Coordination in Legged Robots: A Review*

Shinya Aoi, Poramate Manoonpong, Yuichi Ambe, Fumitoshi Matsuno and Florentin Wörgötter

*31 A Minimal Model Describing Hexapedal Interlimb Coordination: The Tegotae-Based Approach*

Dai Owaki, Masashi Goda, Sakiko Miyazawa and Akio Ishiguro


Frank Pasemann

### SUBSECTION 1.1

#### SENSORY AREAS

*87 An Adaptive Neural Mechanism for Acoustic Motion Perception With Varying Sparsity*

Danish Shaikh and Poramate Manoonpong

*104 Real-Time Biologically Inspired Action Recognition From Key Poses Using a Neuromorphic Architecture*

Georg Layher, Tobias Brosch and Heiko Neumann

### SUBSECTION 1.2

#### MOTOR AREAS

*125 Fast Dynamical Coupling Enhances Frequency Adaptation of Oscillators for Robotic Locomotion Control*

Timo Nachstedt, Christian Tetzlaff and Poramate Manoonpong

*139 Development and Training of a Neural Controller for Hind Leg Walking in a Dog Robot*

Alexander Hunt, Nicholas Szczecinski and Roger Quinn

#### SUBSECTION 1.3

#### HIGHER INTEGRATIVE AREAS


Laura Martin, Bulcsú Sándor and Claudius Gros

*183 ReaCog, a Minimal Cognitive Controller Based on Recruitment of Reactive Systems*

Malte Schilling and Holk Cruse

*206 Motor-Skill Learning in an Insect Inspired Neuro-Computational Control System*

Eleonora Arena, Paolo Arena, Roland Strauss and Luca Patané

### SUBSECTION 1.4

BODY

*223 Morphological Properties of Mass–Spring Networks for Optimal Locomotion Learning*

Gabriel Urbain, Jonas Degrave, Benonie Carette, Joni Dambre and Francis Wyffels

# SECTION 2

### TECHNOLOGY TRANSFER

*236 Modular Neural Mechanisms for Gait Phase Tracking, Prediction, and Selection in Personalizable Knee-Ankle-Foot-Orthoses*

Jan-Matthias Braun, Florentin Wörgötter and Poramate Manoonpong

*253 Biomechanical Reconstruction Using the Tacit Learning System: Intuitive Control of Prosthetic Hand Rotation*

Shintaro Oyama, Shingo Shimoda, Fady S. K. Alnajjar, Katsuyuki Iwatsuki, Minoru Hoshiyama, Hirotaka Tanaka and Hitoshi Hirata

*260 Obstacle Avoidance and Target Acquisition for Robot Navigation Using a Mixed Signal Analog/Digital Neuromorphic Processing System* Moritz B. Milde, Hermann Blum, Alexander Dietmüller, Dora Sumislawska, Jörg Conradt, Giacomo Indiveri and Yulia Sandamirskaya

# Editorial: Neural Computation in Embodied Closed-Loop Systems for the Generation of Complex Behavior: From Biology to Technology

Poramate Manoonpong1,2,3 \* and Christian Tetzlaff <sup>4</sup>

<sup>1</sup> Embodied AI and Neurorobotics Lab, Centre for BioRobotics, The Mærsk Mc-Kinney Møller Institute, University of Southern Denmark, Odense, Denmark, <sup>2</sup> Bio-inspired Robotics and Neural Engineering Lab, School of Information Science and Technology, Vidyasirimedhi Institute of Science and Technology, Rayong, Thailand, <sup>3</sup> Institute of Bio-Inspired Structure and Surface Engineering, Nanjing University of Aeronautics and Astronautics, Nanjing, China, <sup>4</sup> Bernstein Center for Computational Neuroscience, Third Institute of Physics, Georg-August-Universität Göttingen, Göttingen, Germany

Keywords: embodiment, neural computation, sensorimotor coordination, orthosis and prosthesis, motor planning/control, learning and memory, autonomous robot, neuromorphic computing

**Editorial on the Research Topic**

**Neural Computation in Embodied Closed-Loop Systems for the Generation of Complex Behavior: From Biology to Technology**

# 1. INTRODUCTION

The brain of biological organisms is a highly complex and very efficient computing unit. It can deal with a multitude of tasks from low-level sensorimotor coordination to high-level cognition. Specifically, it can process high-dimensional sensory information and, dependent on this, generate coordinated motor commands in real time, resulting in actions (like, locomotion and manipulation). Simultaneously, it can also perform cognitive functions (such as navigation, goaloriented behavior, reasoning and decision making, interaction, communication). This amazing performance is achieved by using the full capacity of its neural dynamics, learning, memory, and adaptation as well as by interacting with the environment through its body (i.e., sensory-motor system). Thus, actions and cognition require dynamical brain-body-environment interactions and thereby cannot be disembodied. A traditional view of embodiment has also emphasized that complex behavior emerges from continuous and dynamical interactions between computational and physical means with the environment (Wilson, 2002; see also the embodiment scheme in Pfeifer et al., 2007). While this radical scientific concept has been promoted since the last three decades (Brooks, 1991; Chiel and Beer, 1997; Calvo and Gomila, 2008; Pfeifer et al., 2014), the detailed interaction of the (neural) computation within and across different brain areas, as the sensory, motor, and higher integrative areas, with the environment to generalize complex and adaptive behaviors have not been fully addressed.

According to this, this Research Topic called researchers from different fields (including Biology, Computational Neuroscience, Robotics, and Artificial Intelligence) to share their recent developments and results and to update our research community with remaining open questions. The topic has in total 17 articles which cover neural and morphological computations as well as the transfer of results to real world applications, like prosthesis and orthosis control and

Edited and reviewed by: Florian Röhrbein, Technische Universität München, Germany

> \*Correspondence: Poramate Manoonpong poma@mmmi.sdu.dk

Received: 14 June 2018 Accepted: 09 August 2018 Published: 30 August 2018

#### Citation:

Manoonpong P and Tetzlaff C (2018) Editorial: Neural Computation in Embodied Closed-Loop Systems for the Generation of Complex Behavior: From Biology to Technology. Front. Neurorobot. 12:53. doi: 10.3389/fnbot.2018.00053 neuromorphic hardware implementation. Eight articles focus on the three main areas (sensory, motor, and integrative areas) of the controller or brain (**Figure 1**). Among these, two focus on neural computation mechanisms in sensory areas for action recognition of a human agent

(Layher et al.) and for acoustic motion perception (Shaikh and Manoonpong), twointegrative areas for motor-skill learning (Arena et al.), navigation learning (Goldschmidt et al.), motor planning and internal representations (Schilling and Cruse), and self-organized complex locomotion patterns (Martin et al.). In addition, two articles present neural closed-loop architectures that link between sensory and motor areas for reaching and grasping (Knips et al.) and for, e.g., obstacle avoidance behavior (Pasemann). Three articles consider a tight interaction between the body and the sensory and motor areas for sensorimotor coordination of legged robots (Aoi et al.; Owaki et al.) and a robot arm (Der and Martius). One article provides an insight on the computation of morphological body for optimal locomotion learning (Urbain et al.). Regarding to technology transfer, two articles show the transfer of the principles of the nervous system for orthosis (Braun et al.) and prosthesis control (Oyama et al.) and one shows the transfer to neuromorphic hardware-based control (Milde et al.). Based on these contributions, we organize subsections into two main categories: Embodied closed-loop systems and their technology transfer.

# 2. OVERVIEW

# 2.1. Embodied Closed-Loop Systems

An embodied closed-loop system or a brain-body-environment system (Chiel and Beer, 1997) is generally formed by three main ingredients: Nervous system (or controller), body, and the environment. The nervous system has in general three subareas: Sensory, motor, and higher integrative areas. Environmental information is perceived through sensors and processed in the sensory areas. The sensory areas can be also influenced by forward internal models (Kawato, 1999) embedded in the higher integrative areas for sensory prediction and noise cancellation (von Holst and Mittelstaedt, 1950; Blakemore et al., 1999) as well as state estimations (Frens and Donchin). The outputs of the sensory areas are transmitted to motor and higher integrative areas. The motor areas translate the sensory information into motor commands to control the body. They also send a copy of motor commands (efference copy) to the forward models. The

FIGURE 1 | The diagram of an embodied closed-loop system. The system concerns an agent that is situated in the environment. It can perceive the environmental information through its sensors and perform its actions using its motors. In principle, the agent consists of two main components: Nervous system (or controller) and body. There are three main areas inside the nervous system: Sensory, motor, and higher integrative areas. In the embodied perspective, while all external and internal stimuli are processed in the nervous system, this computational processing can be offloaded to the body (i.e., morphological computation) for a successful interaction with the environment (see text for more details).

higher integrative areas have multiple complex functions with multiple-time scales plasticity (i.e., long-term and short-term synaptic plasticity) for motor-skill learning, navigation learning, motor planning, and internal representations. Their outputs project to both sensory and motor areas. Through these internal (inside the controller) and external (between the body and the environment) interactions in the closed-loop system, adaptive behaviors emerge.

Regarding interactions in embodied closed-loop systems for adaptive behavior generation, in the Research Topic, Aoi et al. review various adaptive locomotor behaviors that emerge from the interactions between the body dynamics, the nervous system, and the environment in animals and legged robots. They identify key factors and mechanisms for adaptations to different speeds, environmental situations, body properties, and tasks. The factors and mechanisms include CPGs, sensory feedbacks, forward model, learning model, and muscle stiffness. Owaki et al. present a novel and minimal Tegotae-based approach, that exploits the interactions between foot contact sensor signals, body dynamics, and the environment for adaptive interlimb coordination of a hexapod robot. The approach can generate various insects' gait patterns that allows the robot to adapt to different locomotion speeds, changes in the weight distribution, and leg amputation. Der and Martius report self-organized behavior of an anthropomorphic musculoskeletal robot arm. The behavior emerges from the interaction between a selflearning neural controller (nervous system), the elastically musculoskeletal arm (body), and an object (environment) through proprioceptive sensory feedback. Through the agentenvironment coupling, the robot can perform handshaking, pendulum swinging, bottle shaking, rotating a wheel, wiping a table, and hand-eye coordination.

As a part of embodied closed-loop systems, dynamical system-based architectures that link between sensory and motor areas are introduced in the Research Topic. Knips et al. present a neural dynamic architecture for reaching and grasping objects. It integrates several modules, having functions for scene representation, concurrent object classification and pose estimation, behavioral organization, and movement generation, into a large dynamical system of an anthropomorphic robot arm equipped with a Kinect sensor. In addition to the perception, integration, and movement generation, the architecture can also allow for the online adaptation of the performed movement of the robot for successful completion of the grasp. Pasemann proposes the exploitation of the discrete-time neurodynamics of networks in a sensorimotor loop for generating adaptive behavior, like adaptive obstacle avoidance of mobile robots. The behavior generation is a result from a projection of attractor transients or meta-transients, embedded in neurodynamics, to the motor space.

#### 2.1.1. Sensory Areas

In a closed-loop scenario, agents have to extract and process relevant information from the environment, they are situated in. For this, the initial step is to perceive at least parts of the environment via the sensory system. Thereby, the sensory modality can be multifaceted and requires various types of sensors in the system, as for visual, touch, or sound perception. As the next step, the sensory system has to preprocess the perceived environmental information to support the computation done in subsequent areas. Based on experimental insights from the lizard auditory system (Fletcher and Thwaites, 1979; Christensen-Dalsgaard and Manley, 2005), Shaikh and Manoonpong developed a model of the auditory system that can learn to perceive a sound, and to process the information to localize its source and to estimate the speed and direction of the motion of the source. Different to previous approaches, the model can track the source, although it is occluded for a certain duration. By using the model for the auditory system of a wheeled robot, the robot was always able to perceive, to locate, and, by its sensor-motor interaction, to face the sound source.

The extraction of more abstract but complex concepts from the environmental stimulus stream requires, in general, a larger and more sophisticated sensory system. Layher et al. trained a deep neural network to recognize human poses from a stream of images. The pose recognition is based only on features of human body motions and shapes not requiring feedback from higher integrative areas. By implementing the network on a neuromorphic hardware, the recognition process becomes real time with about 1,000 frames per second. Remarkably, the system already shows indications of generalization of poses.

#### 2.1.2. Motor Areas

Central pattern generators (CPGs) have been identified as one of key mechanisms in the motor areas particulary for locomotion control. The principle of biological CPGs has been widely used for robot locomotion control (Ijspeert, 2008). Although CPGs do not need any external input or feedback to produce basic rhythmic activity, they still require sensory feedback to adapt and tune their produced activity, e.g., their frequency or phase. Reactive and adaptive mechanisms have been introduced for this purpose (Buchli et al., 2006). A reactive mechanism can entrain the CPG signal where the frequency of the signal matches to sensory feedback. However, if the feedback disappears, the CPG signal will return to its intrinsic frequency. By contrast, an adaptive mechanism modifies the intrinsic frequency of the CPG permanently. Here, Nachstedt et al. propose a novel frequency adaptation mechanism through fast dynamical coupling (AFDC) of a CPG model. It is an extension of the standard frequency adaptation mechanism (Righetti et al., 2009) and based on dynamically adapting the coupling strength of sensory feedback to a CPG model. Using this AFDC technique, they achieve fast and precise adaptation for a wide range of initial intrinsic and target frequencies without the need for parameter fine tuning.

Hunt et al. report a CPG-based motor control circuit with sensory feedback and an automatic process for neural parameter setting. It is based on the known connectivity of mammalian locomotor systems. The process, faster and more reliable than manual tuning, can tune neural parameters to generate adaptive locomotion in the rear legs of a dog-like robot driven by artificial muscles. Using the CPG-based control approach, they show that the dog-like robot can adapt its stepping continuously and maintains rhythmic walking.

#### 2.1.3. Higher Integrative Areas

Different to sensory and motor areas, higher integrative areas are associated with cognitive processes as learning, planning, navigation, or generalization. For instance, Goldschmidt et al. developed a system, which learned in a reward-based manner to represent the path an agent has walked. By this, the agent is able to robustly localize itself within the environment and to find back to the home position. Thereby, the resulting behaviors are quite similar to behaviors of insects as desert ants.

Interestingly, the neuronal network underlying cognitive processes can be quite small by using the computational resources of other areas or by utilizing the temporal dynamics of the system (Buonomano and Maass, 2009). Martin et al. use the ongoing dynamics of short-term synaptic plasticity (Tsodyks and Markram, 1997) in a three-neuron system to switch between different, complex motor patterns. Here, Schilling and Cruse show that already a small neuronal network is sufficient for successful planning within an environment and generalization to other environments, if the system recruits and orders the resources of the downstream motor areas. In other words, the small network reorders diverse reactive behaviors, each stored in a different part of the motor area, to adapt according to new environments.

There is a clear indication that higher integrative areas are not mandatory for cognitive processes (Cruse and Wehner, 2011). Arena et al. show in a theoretical model that learning within the Drosophila mushroom body, which is in general associated with the sensory processing of olfactory inputs, adapts motor commands or primitives in the motor area. Thus, by changes in the sensory area, the sensorymotor relations are updated yielding new behaviors. This was demonstrated on a six-legged robot, which can learn by this mechanism to climb up stairs. The authors also address the role of Neural Reuse as one of the possible keys for the emergence of complex behaviors in simple brains.

#### 2.1.4. Body

Apart from neural computation in the nervous system, morphological computation also contributes to the generation of complex behavior. Morphological computation considers that certain processes can be performed by the body instead of the nervous system (Pfeifer and Bongard, 2006). In other words, it captures conceptually the observation that biological systems utilize their flexible and compliant morphology to conduct computations required for a successful interaction with their complex environments. There are numerous illustrative applications of morphological computation and embodiment for efficient locomotion in biological systems (Dickinson et al., 2000) and artificial systems (McGeer, 1990; Jayaram and Full, 2016; Manoonpong et al., 2016). Here, Urbain et al. present an analysis of the trade-offs between morphology, efficiency of locomotion, and the ability of a mechanical body. This is done by using a detailed dynamical model of a Mass-Spring-Damper (MSD) network. They also analyze the computational capacity of a MSD body to generate motor control signals and integrate the signals as feedback to a locomotion controller.

# 2.2. Technology Transfer

Analyzing the neural computation in closed-loop systems, on the one hand, yields insights of the underlying neural dynamics and principles and, on the other hand, provides new solutions for technological control problems. Braun et al. developed a neural controller which tracks and predicts the gait of a patient to control the gait-supporting knee-ankle-foot orthosis. This controller is independent of the actual environmental situation, as walking on a flat terrain or climbing stairs, and requires a minimal feedback from the patient.

Based on adaptive principles in neural circuits, Oyama et al. developed an adaptive controller for a hand prosthesis. A standard controller requires the user to interfere to avoid errors given in different environmental conditions. By contrast, the adaptive controller self-adapts according to the new environmental state or different hand poses.

Milde et al. transferred the neural controller and the whole neural sensory processing onto neuromorphic hardware. By implementing this hardware architecture, they developed an autonomous, neuromorphic robotic agent, which is able to avoid obstacles and to acquire targets. Due to the neural nature of the controller, the agent behaves robustly according to unexpected changes in the environment.

# 3. CONCLUSION

The Research Topic presents an embodied closed-loop approach that considers the interaction of (neural) computation across sensory, motor, and higher integrative areas with the agent's body and the environment. The studies in this Topic cover the broad spectrum of this approach and show that, indeed, complex behaviors emerge from the interplay between different parts of an agent. Thereby, the majority of these studies focus on the interplay between a subset of the available parts. The results from these studies confirm that the embodied approach can be a powerful method to develop autonomous robotic agents performing complex behaviors and it can even be a key to understand high-level cognition. Given these and further studies, it is now possible to address the interaction between all parts of an agent's controller (brain), body, and the environment.

# AUTHOR CONTRIBUTIONS

All authors listed have made a substantial, direct and intellectual contribution to the work, and approved it for publication.

# ACKNOWLEDGMENTS

We thank all authors contributing with their work to this Research Topic.

We acknowledge funding by the H2020-FETPROACT project Plan4Act (no. 732266). PM acknowledges support from the Human Frontier Science Program under grant agreement no. RGP0002/2017, from Vidyasirimedhi Institute of Science and Technology (VISTEC)-research funding (Thailand) on

REFERENCES


Bio-inspired Robotics, and from the Thousand Talents program of China. CT acknowledges support provided by the SFB 1286 from the German Research Foundation (DFG).


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2018 Manoonpong and Tetzlaff. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Adaptive Control Strategies for Interlimb Coordination in Legged Robots: A Review

Shinya Aoi <sup>1</sup> \* † , Poramate Manoonpong<sup>2</sup> \* † , Yuichi Ambe<sup>3</sup> , Fumitoshi Matsuno<sup>4</sup> and Florentin Wörgötter <sup>5</sup>

<sup>1</sup> Department of Aeronautics and Astronautics, Graduate School of Engineering, Kyoto University, Kyoto, Japan, <sup>2</sup> Embodied AI & Neurorobotics Lab, Centre for Biorobotics, Mærsk Mc-Kinney Møller Institute, University of Southern Denmark, Odense, Denmark, <sup>3</sup> Department of Applied Information Sciences, Graduate School of Information Sciences, Tohoku University, Aoba-ku, Japan, <sup>4</sup> Department of Mechanical Engineering and Science, Graduate School of Engineering, Kyoto University, Kyoto, Japan, <sup>5</sup> Bernstein Center for Computational Neuroscience, Third Institute of Physics, Georg-August-Universität Göttingen, Göttingen, Germany

Walking animals produce adaptive interlimb coordination during locomotion in accordance with their situation. Interlimb coordination is generated through the dynamic interactions of the neural system, the musculoskeletal system, and the environment, although the underlying mechanisms remain unclear. Recently, investigations of the adaptation mechanisms of living beings have attracted attention, and bio-inspired control systems based on neurophysiological findings regarding sensorimotor interactions are being developed for legged robots. In this review, we introduce adaptive interlimb coordination for legged robots induced by various factors (locomotion speed, environmental situation, body properties, and task). In addition, we show characteristic properties of adaptive interlimb coordination, such as gait hysteresis and different time-scale adaptations. We also discuss the underlying mechanisms and control strategies to achieve adaptive interlimb coordination and the design principle for the control system of legged robots.

#### Edited by:

Tom Ziemke, University of Skövde and Linköping University, Sweden

#### Reviewed by:

Auke Ijspeert, École Polytechnique Fédérale de Lausanne, Switzerland Holk Cruse, Bielefeld University, Germany

#### \*Correspondence:

Shinya Aoi shinya\_aoi@kuaero.kyoto-u.ac.jp Poramate Manoonpong poma@mmmi.sdu.dk

† These authors have contributed equally to this work.

Received: 26 November 2016 Accepted: 31 July 2017 Published: 23 August 2017

#### Citation:

Aoi S, Manoonpong P, Ambe Y, Matsuno F and Wörgötter F (2017) Adaptive Control Strategies for Interlimb Coordination in Legged Robots: A Review. Front. Neurorobot. 11:39. doi: 10.3389/fnbot.2017.00039 Keywords: legged robot, interlimb coordination, adaptation, sensorimotor interaction, central pattern generator

# 1. INTRODUCTION

Animals produce adaptive motor behaviors by skillfully manipulating their complicated and redundant musculoskeletal systems. Locomotion is an important behavior required in daily life. Gait selection in accordance with the situation, such as speed and environment, is a prominent adaptive motor function. Humans walk bipedally and use walking and running gaits. Quadruped animals use four legs and produce walking, trotting, and galloping gaits. Hexapod insects use six legs and create metachronal (wave), tetrapod, and tripod gaits as well as intermediate stepping patterns forming a continuum. These gaits, including the transitions and the intermediate stepping patterns for hexapods, are generated through the intralimb and interlimb coordination of leg movements. Intralimb coordination is the relationship between segments or joints within one leg, whereas interlimb coordination is the relationship between legs. For example, in the adaptive control of intralimb coordination, peak timings of ankle plantar flexion, knee extension, and hip extension are out of phase during the human walking gait, but they are shifted and almost in phase during the human running gait (Diedrich et al., 1998). In the adaptive control of interlimb coordination, the footfall sequence between legs changes (Muybridge, 1957), and the sequence is mainly explained by the relative phases between leg movements, because the leg movements are periodic with almost the same period for each leg (note that different frequencies between the legs have been sometimes observed in insects due to the flexibility in the stepping patterns, Pearson and Franklin, 1984). In the quadrupedal walking gait, although the left and right legs move in anti-phase, the ipsilateral front and hind legs do not. In contrast, in the quadrupedal trotting gait, the ipsilateral front and hind legs as well as the left and right legs move in anti-phase; that is, the diagonal legs move in phase (Hildebrand, 1965). Measured data analyses performed to clarify the gait mechanisms have suggested that gaits are selected based on metabolic and biomechanical factors (Margaria, 1938; Hoyt and Taylor, 1981; Farley and Taylor, 1991). However, reports on the roles of these factors in determining the gaits (Hreljac, 1993; Minetti et al., 1994; Raynor et al., 2002; Wickler et al., 2003) are conflicting, and so the underlying mechanism remains unclear.

To elucidate adaptive motor functions in animals, neurophysiological and biomechanical studies have been independently conducted. Neurophysiological studies mainly investigate the configurations and activities of the neural system, whereas biomechanical studies generally examine the functional roles of the musculoskeletal system. However, locomotion is generated through dynamic interactions among the neural system, the musculoskeletal system, and the environment. It is thus difficult to fully analyze the locomotion mechanism from a single perspective. In addition, gaits are viewed as self-organized patterns in such complex dynamical systems (Schöner et al., 1990; Diedrich et al., 1998; Griffin et al., 2004; Schilling et al., 2013a). The stability structure of gaits has been identified from the response of perturbations, especially by phase oscillators and phase response curves (Couzin-Fuchs et al., 2015; Funato et al., 2016) based on the phase reduction theory (Kuramoto, 1984). However, it is difficult to understand how the stability structure is generated due to the complex nature of interactions between the dynamic factors in locomotion. To fully elucidate the locomotion mechanism, integrated studies of neural and musculoskeletal systems are required to find the processes that create adaptive locomotor behavior.

Recently, to reveal the locomotion mechanism, legged robots have attracted attention. A robot's mechanical system with actuators, such as electric motors and pneumatic and hydraulic actuators, has been used to investigate the dynamic role of the musculoskeletal system in locomotion. The control system of the robot has been developed based on neurophysiological findings and employs various sensors, such as a touch sensor, load cell, acceleration sensor, gyro sensor, laser range scanner, and vision system. This approach allows us to emulate and investigate gait generation through dynamic interactions between the neural system, the musculoskeletal system, and the environment. In particular, central pattern generators (CPGs), which are located in the spinal cord of vertebrates and in the thoracic ganglia of invertebrates, are an important factor for elucidating the locomotion mechanism (Grillner, 1975; Orlovsky et al., 1999; MacKay-Lyons, 2002) and have aided the development of locomotion control systems of legged robots. A CPG is a group of interconnected neurons that can be activated to generate a motor pattern without the requirement of sensory feedback. The evidence that supports this hypothesis was originally shown by Brown (1911). In addition to the open-loop control function, CPGs receive sensory feedbacks to modulate motor commands. This closed-loop structure of sensory feedbacks is crucial to achieve adaptive behavior depending on the situation. Various CPG models have been proposed by using neural or oscillator networks and implemented in control legged robots [see review by Ijspeert (2008)]. For example, Taga and Shimizu (1991) and Taga (1995) conducted a pioneering study of a CPG model for human bipedal locomotion. They employed an articulated multi-link system for the body mechanical model and neural oscillators developed by Matsuoka (1985) for the CPG model. This CPG model received sensory signals of local and global information for locomotion. They demonstrated that adaptive locomotion is established through the interaction between body dynamics, oscillator dynamics, and environment; they called this "global entrainment." Although complex and robust locomotion behavior can be achieved by purely reflexive control mechanisms (Cruse et al., 1998; Manoonpong et al., 2007; Lewinger and Quinn, 2011; Schilling et al., 2013a,b) and classical machine learning control (Bongard et al., 2006; Cully et al., 2015) instead of using CPG models, the CPG concept and modeling have had a large influence on the studies of legged robots.

In this review, we focus on the adaptive control of interlimb coordination in locomotion. We introduce adaptive interlimb coordination for animals and legged robots induced by various factors (locomotion speed, environmental situation, body properties, and task). In addition, we show characteristic properties of adaptive interlimb coordination in animals and robots, such as gait hysteresis and different time-scale adaptations. Finally, we discuss the underlying mechanisms and control strategies to achieve adaptive interlimb coordination and the design principle for the control system of legged robots.

# 2. ADAPTIVE INTERLIMB COORDINATION IN ANIMALS AND ROBOTS

#### 2.1. Speed-Dependent Adaptation

The most general adaptive interlimb coordination appears when varying the locomotion speed in legged animals. This has been observed even in spinal cats on treadmills (Forssberg and Grillner, 1973; Orlovsky et al., 1999), in which the phase relationship between the legs changes and the gait varies among walking, trotting, and galloping. In reported studies, the spinal cords of cats were transected from the brain, but they still received sensory feedback through the contact between their feet and the belt. The sensory signals changed with the belt speed change, which induced their gait transitions. This result highlights the important contribution of sensorimotor interaction to adaptive interlimb coordination. Quadruped robots have achieved adaptive interlimb coordination that depends on locomotion speed by modeling spinal CPGs with local sensory feedback (Maufroy et al., 2010; Aoi et al., 2011, 2013b; Owaki et al., 2013; Fukuoka et al., 2015; Owaki and Ishiguro, 2017). This can be seen in the following examples. **Figure 1** shows a quadruped robot, the control system, and the experimental results of the walk–trot transition in Aoi et al. (2013b) (this robot showed hysteresis in the gait transition, as discussed in Section 3.1). **Figure 2**, which is from the work by Fukuoka et al. (2015), presents quadruped gaits transitioned from a walk at slow speeds to a trot at medium speeds, and a transverse gallop at high speeds. **Figure 3**, which is from the work by Owaki and Ishiguro (2017), also shows spontaneous gait transitions from a lateral-sequence (L-S) walk to a trot and even to a gallop of a quadruped robot with respect to the locomotion speed without neural coupling. These robotics studies used simple neural oscillators or phase oscillators for the CPG model and produced leg motions from the oscillator phases. More specifically, one oscillator created one leg motion and the phase relationship between the oscillators determined the gait. Each oscillator phase was regulated through local sensory information of the leg, such as foot contact and leg loading, occurring only within one leg.

As an important control architecture in these robotics studies, the phase relationship between the oscillators was not predefined and the oscillators were only weakly coupled or decoupled. That is, the gait was not determined by the oscillator dynamics using strong coupling (Schöner et al., 1990; Canavier et al., 1997; Ito et al., 1998; Golubitsky et al., 1999), but by the interaction between whole-body dynamics and oscillator dynamics through local sensory feedback. The interlimb coordination was generated only in a self-organizing manner among the neural dynamics, the body dynamics, and the environment.

Similar adaptive interlimb coordination in accordance with gait speed also appears in hexapod insects, such as stick insects (Wilson, 1966; Graham, 1972; Cruse, 1990; Grabowska et al., 2012), cockroaches (Hughes, 1952; Delcomyn, 1971; Pearson, 1976; Bender et al., 2011), and flies (Strauß and Heisenberg, 1990; Wosnitza et al., 2013; Berendes et al., 2016). In particular, stick insects and flies smoothly change their interlimb coordination in accordance with gait speed (Wilson, 1966; Graham, 1972; Wosnitza et al., 2013). More specifically, the relative phases between the legs continuously change in a linear fashion for gait speed. This is similar to some mammals, including sheep, but is different from other mammals, including dogs. In mammals such as dogs, the gait transitions have relative leg phases that change suddenly in a sigmoid fashion (Alexander and Jayes, 1983). Although it is suggested that cockroaches achieve interlimb coordination mainly by the CPG itself (Fuchs et al., 2011), the CPG by itself does not produce a coordinated motor pattern for stick insect walking, because sensory feedback is important (Bässler and Wegner, 1983; Büschges et al., 1995; Büschges et al., 2008). Cruse and his colleagues proposed an artificial neural network, named Walknet, which controls leg movements based on six different rules to regulate interlimb coordination by sensory information (note that the controller of the individual leg operates without CPG). The rules were empirically derived from the behavioral experiments of stick insects [see reviews by Cruse et al. (1998), Dürr et al. (2004) and Schilling et al. (2013a)]. Three of the rules were designed by disturbing leg movements on a slippery surface. The rules changed the cycle duration of a leg based on sensory information of the neighboring legs. As a result of sensorimotor interaction, the insect models controlled by Walknet produced a continuum of locomotion patterns, such as tripod, tetrapod, and wave gaits, and intermediate stepping patterns, as observed in stick insects. In addition, the models were used for various situations, such as walking on uneven surfaces (Kindermann, 2002), leg amputation (Schilling et al., 2007), negotiating curves (Schilling et al., 2013b), and climbing over large gaps (Bläsing, 2006), and the locomotor behavior was comparable to that of stick insects. Tóth and Daun-Gruhn (2016) developed neural network models based on Hodgkin Huxley dynamics and integrated them with musculoskeletal models to explain the interlimb coordination mechanism of insects. Although their models did not produce intermediate stepping patterns as observed in stick insects (Wilson, 1966; Graham, 1972) and flies (Wosnitza et al., 2013), their results suggest that the connection between the levator-depressor neuromuscular systems of the different legs is necessary to replicate the primary features of tripod and tetrapod gaits. Ambe et al. (2013, 2015) used simple phase oscillators with local sensory feedback of foot contact information for a hexapod robot, in a manner similar to the quadruped robots mentioned above. They produced a continuum of locomotion patterns, such as metachronal and tripod gaits and intermediate stepping patterns, through embodied sensorimotor interaction, without predefining the patterns in accordance with the locomotion speed. In addition, one important aspect shown was positive velocity feedback during the stance of stick insects (Bässler, 1976). The positive velocity feedback has been tested on a robot (Schmitz et al., 2008).

Similarly, myriapods, such as centipedes, change their interlimb coordination depending on gait speed. Myriapods have a long and flexible body axis and produce body undulations when the gait speed increases (Manton, 1965). In addition to the amplitude increase of the undulations, the phase relationship between ipsilateral leg movements changes in synchronization with the body segment movements of the undulations. In Aoi et al. (2007, 2013a), a multilegged robot with six body segments and twelve legs, which use torsional springs for body axis flexibility, was developed. The robot showed body undulations through a supercritical Hopf bifurcation of straight walking by increasing the locomotion speed, and so showed dependence of body undulations on speed, as was similar to the dependence shown by centipedes.

### 2.2. Environment-Dependent Adaptation

The advantage of using legs in mobile motion for animals and machines is to gain high traversability even in complex environments by manipulating the foot contact positions. However, the traversability of legged robots is still far from reaching the level of animals. During locomotion, the leg motion consists of the stance phase, in which the foot is in contact with the ground, and the swing phase, in which the foot is lifted off the ground. In the stance phase, the leg supports the body against gravity and produces propulsive and decelerating forces to move the body through the interaction between the

foot and the ground. Geometric properties of the ground vary. These properties include being flat terrain, sloped terrain, or irregular and rough terrain. The physical properties of the ground also change. These properties include hard and slippery ground like stone, soft ground like loose soil, and flowable and penetrable ground like sand. The interaction between the foot and the ground is crucial to create locomotion, and real-time adaptation of motor behavior is required according to the ground situation. Animals actually show adaptive interlimb coordination depending on the environmental situation. To control legged robots, it is crucial to clarify and apply the dynamical principles of animals.

Manoonpong and his colleagues developed a series of modular neural CPG-based locomotion control for legged robots (Manoonpong et al., 2008, 2013; Steingrube et al., 2010; Goldschmidt et al., 2014; Xiong et al., 2014, 2015; Dasgupta et al., 2015; Grinke et al., 2015). They showed that using this control approach leads to adaptive interlimb coordination that allows the robots to deal with complex environments, such as walking over difficult terrain (Steingrube et al., 2010; Manoonpong et al., 2013; Goldschmidt et al., 2014; Xiong et al., 2014, 2015; Dasgupta et al., 2015) and avoiding obstacles in an unknown cluttered area (Manoonpong et al., 2008; Grinke et al., 2015), as observed in insects. For example, they implemented modular neural control with an adaptive chaotic CPG-based network and sensory feedback on a hexapod robot (**Figures 4A,B**; Steingrube et al., 2010). Due to the intrinsically chaotic dynamics of the CPG similar to that observed in certain biological CPGs (Rabinovich and Abarbanel, 1998), the dynamics were exploited to generate various walking patterns depending on the environmental condition. In their setup, the robot showed a tetrapod gait for standard walking, a wave gait for up-slope walking, a mixture gait between wave and tetrapod gaits for down-slope walking, and a tripod gait for fast walking to perform fast phototaxis (**Figure 4C**). However, this implementation of discrete gaits does not necessarily correspond to the situation found in insects. In addition to these multiple gaits, the chaotic dynamics especially contributed to self-untrapping of a leg from a hole in the ground (**Figure 4B**) and thereby enhanced foothold searching behavior. In Dasgupta et al. (2015), Goldschmidt et al. (2014),

is denoted as the color meter on the top right gradient bar. The convex curves indicate the flexor half-center outputs for left foreleg (LF, blue), left hindleg (LH, red), right foreleg (RF, green), and right hindleg (RH, purple), which lead to the swing phase. The thick lines indicate the stance phase and the thin dashed lines refer to the swing phase. These figures were modified from Fukuoka et al. (2015) with permission.

and Manoonpong et al. (2013) integrating forward models into the modular neural control enabled the robot to effectively predict its walking state in order to extend or elevate its legs during the swing and stance phases while walking on complex terrains. With this setup, the robot walked on uneven terrain by using a tetrapod gait and climbed over high obstacles as well as up a flight of stairs by using a wave gait. Moreover, it successfully crossed a large gap by using a caterpillar gait, where each left and right pair of legs moved simultaneously. In this situation, however, stick insects show more complex behavior than caterpillar coordination, which is adopted only rarely, if at all (Blaesing and Cruse, 2004). In Xiong et al. (2014), modular neural control was extended by introducing muscle models based on virtual agonist-antagonist mechanisms

generated by non-wired simple phase oscillators (CPGs) with continuous phase modulation. The oscillator phases are modulated with respect to the magnitude of local load sensing Ni . (B) Walking speed and gait diagrams of different locomotion modes (walk, trot, canter, and gallop). The pink area shows the change of the treadmill speed with respect to the value of ω. The colored areas in the gait diagrams mean the stance phase, during which the sensor value N<sup>i</sup> becomes greater than a threshold value. These figures were modified from Owaki and Ishiguro (2017) with permission.

(VAAM), and neuromechanical control was produced to achieve leg compliance. Combining neuromechanical control with sensorimotor learning results in energy-efficient walking using different gaits with corresponding leg compliances (Xiong et al., 2015). The robot efficiently walked on different surfaces including sponge, gravel, fine gravel, and grass. For adaptation to the avoidance of obstacles in a cluttered environment, an adaptive neural sensory processing network with synaptic plasticity was introduced to the modular neural control (Grinke et al., 2015). The adaptive processing network could drive different turning behaviors with short-term robot memory. As a consequence, the robot walked around and adapted its turning behavior to avoid obstacles in different situations and to avoid sharp corners or deadlocks (**Figure 4D**). In addition to the modular neural control approach, Schneider et al. (2012) developed bio-inspired control, which combines Walknet (mentioned above) with higher level control and planning (**Figures 5A,B**), for adaptive interlimb coordination of the hexapod robot HECTOR. By using this control technique, versatile behaviors (e.g., gap crossing, obstacle crossing, and global planning to avoid or attack obstacles) can be generated to deal with complex environments (**Figure 5C**). Furthermore, Schilling and Cruse (2017) expanded Walknet to invent new behaviors and test them by internal simulation before using them in reality. Arena et al. (2017) proposed multilayered CPG-based locomotion control with insect inspired motor-skill learning. It can adaptively coordinate the limbs of a Drosophila-like hexapod robot for stable walking and obstacle climbing.

When horses walk up an incline (Wickler et al., 2003) or when they carry weights (Farley and Taylor, 1991), the trot-togallop transition speed is reduced. Hexapod insects, such as stick insects, cockroaches, and beetles, change their gait depending on the slope of the ground (Spirito and Mushrush, 1979; Pelletier and Caissie, 2001; Grabowska et al., 2012). Furthermore, while cockroaches use the tripod gait during normal walking, the gait changes to metachronal when they are tethered on a supported ball to decrease loading (Spirito and Mushrush, 1979); uphill slope and loading induce similar effects on their gaits (Tang and Macmillan, 1986). Fujiki et al. (2013a) extended the control system of a quadruped robot (**Figure 1B**) for a hexapod walker and showed that the gait changed between tripod and metachronal gaits through the sensorimotor interaction depending on the loading and slope angle, as observed in insects.

Fukuoka et al. (2003), Fukuoka and Kimura (2009), and Kimura et al. (2007a,b) used the neural oscillators developed by Matsuoka (1985) to control quadruped robots (Tekken series). They incorporated models of various reflexes, such as the flexor reflex, extensor reflex, and vestibulospinal reflex, based on sensory information. In addition, they modeled the tonic labyrinthine response to adjust the rolling motion to synchronize with the pitching motion. The robots produced robust locomotion over irregular terrain, such as steps and slopes, while inducing the gait transition between walking and trotting.

When the ground is flowable like sand, the leg penetrates deeply into the ground during locomotion. Consequently, the interaction with the ground to produce lift, drag, and thrust forces becomes complicated [see review by Aguilar et al. (2016)]. Li et al. (2009, 2013) used a tripod gait for a hexapod robot and produced locomotor performance similar to that in hard ground by adjusting the leg shape and leg motion with a force model of the robot moving in granular media.

FIGURE 5 | Walknet with higher level control and planning for different locomotion behavior generation of the hexapod robot HECTOR for different environments. (A) Bio-inspired control Walknet with interlimb coordination rules [rules 1, 2, and 3; see Schilling et al. (2013a) and Schneider et al. (2012) for details]. (B) Setup of an individual leg controller with higher-level control and planning. Its outputs drive the leg joints of HECTOR. (C) Different desired locomotion behaviors that can be generated by the control approach to deal with complex environments. These figures were modified from Schneider et al. (2012) with permission.

Along with the adaptation to slopes, rough terrain, cluttered areas, and flowable areas, interlimb adaptation dealing with an asymmetric environmental condition has been investigated. For the asymmetric condition, split-belt treadmills have been used in studies of humans (Dietz et al., 1994; Reisman et al., 2005; Morton and Bastian, 2006), cats (Yanagihara and Kondo, 1996; Frigon et al., 2013), crayfishes (Müller and Cruse, 1991a,b), and stick insects (Bässler and Wegner, 1983; Foth and Graham, 1983). The treadmills have two parallel belts with independently controlled speeds and thus are capable of artificially creating left–right symmetric and asymmetric environments for walking (tied configuration: same speed between the belts, split-belt configuration: different speeds between the belts). Although the details are discussed later in Section 3.2, adaptive interlimb coordination has been observed in accordance with the belt speed condition. Such an adaptation appeared even in spinal cats (Forssberg et al., 1980; Frigon et al., 2013). Otoda et al. (2009) developed a sensory-driven controller without a CPG model for a two-dimensional biped robot and Fujiki et al. (2013b) used simple phase oscillators for the CPG model of a biped robot with local sensory feedback of the foot contact information, as was similarly done with the abovementioned quadruped and hexapod robots that achieved adaptive interlimb coordination (Aoi et al., 2011, 2013b; Ambe et al., 2013, 2015; Fujiki et al., 2013a). The biped robots achieved adaptive interlimb coordination on splitbelt treadmills.

#### 2.3. Body-Dependent Adaptation

Animals show adaptive motor behavior also due to changes in their body properties. As mentioned above, they change walking patterns when carrying weights or reducing their loads (Tang and Macmillan, 1986; Farley and Taylor, 1991). For fast locomotion, such as the galloping gait of cursorial quadrupeds and the undulatory walk of centipedes, the appearance of trunk and body-segment movements suggests that body flexibility is crucial for adaptive locomotion (Alexander, 1988). In Aoi et al. (2011), a quadruped walker was controlled by simple phase oscillators with local sensory foot contact information (**Figure 1B**) and the change in trunk flexibility induced the walk–trot transition, where walking and trotting gaits appeared for a hard trunk and a soft trunk, respectively. In Aoi et al. (2007, 2013a, 2016), a centipede-like multilegged robot showed the gait transition from straight walking to body undulatory walking through a Hopf bifurcation by changing the body axis flexibility.

One of the advantages to using many legs for mobile motion, as in insects and myriapods, is the avoidance of losing mobility completely by leg damage due to injury and predation. Through adaptive control of interlimb coordination, even complete leg loss does not prevent walking (Grabowska et al., 2012). To clarify how interlimb coordination changes with leg loss, amputations of single legs of stick insects have been performed in order to investigate changes of the relative phases between the legs depending on which leg is amputated (Graham, 1977). Dasgupta et al. (2015) used neural CPG-based control with distributed adaptive forward models for the hexapod robot, as mentioned above, and demonstrated that the robot successfully kept walking straight with a slightly modified tetrapod gait through adaptation despite the damaged right middle leg. Ren et al. (2015) extended the chaotic CPG controller introduced above (Steingrube et al., 2010) to a controller of multiple chaotic CPGs depending on the number of legs (**Figure 6A**). They demonstrated that the six-legged robot (AMOSII) could continue walking by changing the interlimb coordination in accordance with the disabled leg(s) (**Figures 6B,C**). Walknet, which identifies the behavior of stick insects, as introduced above, was able to coordinate the movements of the remaining legs so that a six-legged walker could continue walking when some legs were amputated (Kindermann, 2002; Schilling et al., 2007). Besides these bio-inspired control approaches, Cully et al. (2015) proposed an alternative machine learning based approach consisting of two main parts: an automatically generated, precomputed, behavior-performance map, and a trial-and-error learning algorithm (**Figure 7**). The behavior-performance map contains a number of interlimb coordination parameters that can generate approximately 13,000 different gaits. The trialand-error learning algorithm is used to search for successful robot locomotion behaviors from the map with respect to robot body condition. They showed that this approach allows a hexapod robot to walk and rapidly find a walking behavior that can compensate for damage. Although all these approaches predefined interlimb connections, another approach based on the concept of emergent locomotion (i.e., walking patterns appearing as a result of stabilization in a self-organizing manner, Schilling et al., 2013a) from tight interaction between neural systems, musculoskeletal systems, and the environment has been explored for body-dependent adaptation. For example, Barikhan et al. (2014) proposed multiple decoupled neural CPGs with local sensory feedback (**Figure 6D**). This approach exploited the interaction between neural and body dynamics through foot contact feedback to achieve self-organized locomotion and to allow a hexapod robot to quickly adapt its locomotion to deal with morphological changes [e.g., leg damage (**Figure 6E**) or asymmetric leg lengths between the front and hind legs]. Tsuchiya et al. (2002) used simple phase oscillators with local sensory foot contact information to control a ten-legged robot to establish adaptive interlimb coordination, as mentioned above for quadruped and hexapod robots. The leg loss induced the change in interlimb coordination, and the change reduced the degradation of locomotion performance, such as gait speed.

# 2.4. Task-Dependent Adaptation

Animals often encounter a situation in which they have to change locomotor behavior. For example, when an obstacle appears in a walking path, they step over the obstacle, or turn to the right or the left to avoid collision with the obstacle (this is also related to environment adaptation). Such a task is mainly generated by modulating the leg movements, and thus adaptive control of intralimb coordination is important. However, also important is adaptive control of interlimb coordination. To step over an obstacle, the leading limb first clears the obstacle and then the trailing limb follows it. The foot of the leading limb must be raised higher than usual to avoid collision with the obstacle, and this motion delays foot contact. Especially for bipedal and

quadrupedal animals, the foot of the trailing limb must be raised after foot contact of the leading limb; otherwise, the obstacle avoidance task will fail because the contralateral limb does not support the body at the onset of raising the trailing limb (Aoi et al., 2013c).

Turning behavior to change walking direction is used for various tasks, such as target pursuit (Szczecinski et al., in press) and obstacle avoidance (**Figures 4D**, **5C**). Knops et al. (2013) controlled a mechanical model of a stick insect's middle legs by using a neural network model based on Hodgkin Huxley dynamics and produced turning behavior with two different strategies observed in stick insects walking on a slippery surface: switching the inner middle leg from forward to sideward, or from forward to backward stepping. In Aoi et al. (2016), the turning maneuverability of a centipede-like multilegged robot was enhanced via straight walk instability induced by the Hopf bifurcation by changing the body axis flexibility. Although arthropods with sprawling legs have a low center of mass and thus cannot effectively lean, mammals with erect legs have a high center of mass and can use body leaning to help turning. The relative phase between legs in human turning shifts from anti-phase due to the left–right asymmetry of the turning movement (Courtine and Schieppati, 2003). In Aoi and Tsuchiya (2007), simple phase oscillators with local sensory feedback about foot contact information were used for turn walking of a biped robot, as was used for walking on a split-belt treadmill. The relative phase between legs shifted depending on the turning radius to compensate for the left–right asymmetry induced by body leaning; this shift allowed the robot to achieve high turning performance.

The transition from quadrupedal gait to upright and bipedal gait is a challenging task for legged robots, because it requires drastic changes in locomotor movements (Asa et al., 2009; Aoi et al., 2012; Kobayashi et al., 2015). In particular, because the robot has to raise its trunk so that the arms leave the ground, an adequate relationship between the supporting limb locations and the center of mass location is important. That is, adequate interlimb and trunk coordination is crucial;

otherwise, the robot easily falls over. In Aoi et al. (2012), simple phase oscillators with sensory regulation by ground contact information of the arms and legs were used for a biped robot (**Figures 8A,B**). The controller was extended based on the concept of kinematic synergy (Freitas et al., 2006; Ivanenko et al., 2007; Latash, 2008; Funato et al., 2010) to change the robot movements for gait transition and allowed the robot to successfully change the gait from quadrupedal to upright and bipedal (**Figure 8C**).

Legged robots are useful for search and rescue missions. In this case, the ground is not only irregular but also fragile, like an area with scattered debris and collapsed buildings, on which surfaces may collapse when put under external forces, such as the pressure from a robot's leg. It is important to check the ground condition in such situations by using haptic information of the legs to secure stable walking. In Ambe and Matsuno (2016), a control mechanism with haptic sensory feedback for terrain determination was proposed. With the control mechanism, a quadruped robot can sense whether the foothold is stable through its force sensor when it puts its leg on the ground. In addition, this mechanism produces adequate interlimb coordination so that the robot never stumbles, even if the foothold collapses in the probe motion. As a result, the robot can effectively walk on unstable terrain and avoid stumbling and causing a large collapse of the surrounding area (**Figure 9**). Other methods have also been proposed to estimate fragile and slippery footholds based on haptic feedbacks and image information (Tokuda et al., 2003; Hoepflinger et al., 2010, 2013).

# 3. CHARACTERISTIC PROPERTIES OF ADAPTIVE INTERLIMB COORDINATION

#### 3.1. Hysteresis in Gait Transition

As discussed in Section 2.1, animals change their walking patterns depending on their locomotion speed. In general, locomotion speed has a large sudden change at gait transition in overground walking. However, using treadmills, which can control gait speed, we can investigate the speed-dependent gait transition mechanism by smoothly and continuously changing the belt speed of the treadmills. It has been reported in humans and some quadruped animals that the gait changes at different speeds depending on whether the speed is increasing or decreasing, and that a speed range exists in which different gaits are used. In other words, gait transitions may exhibit hysteresis (Diedrich et al., 1998; Heglund and Taylor, 1998; Raynor et al., 2002; Griffin et al., 2004). **Figure 10A** shows the relative phase between the right front and hind legs of a dog walking on a treadmill for walk-to-trot and trot-to-walk transitions induced by changing the belt speed (Aoi et al., 2013b). This figure shows hysteresis in the walk–trot transition. Such a phenomenon is difficult to explain by triggering the gait transition based on metabolic and biomechanical factors. The dynamical system approach might provide useful insights into such a gait transition mechanism (Diedrich et al., 1998).

Quadruped robots controlled by simple phase oscillators with local sensory foot contact information, as introduced in Section 2.1, showed hysteresis in the walk–trot transition induced

by changing the locomotion speed (**Figure 10B**; Aoi et al., 2011, 2013b). Because walking and trotting gaits are mainly distinguished by the relative phases of the ipsilateral legs, a stability analysis using the return maps of the relative phases clarified the stability structure of the gaits. **Figure 10C** shows the return maps obtained at three different speeds. While only one stable relative phase exists in the left and right figures, two stable and one unstable relative phases exist in the middle figure. The stable and unstable relative phases explain that hysteresis is generated through two saddle-node bifurcations induced by changing the locomotion speed (**Figure 10D**). From this result, a potential function is derived, as shown in **Figure 10E**. It suggests that gait transition is explained by switching the stability of self-organized patterns in the complex dynamical system.

Gait transition hysteresis also appears in other legged robots controlled by CPG models with sensory feedback, e.g., in the walk–run transition of a biped model (Taga and Shimizu, 1991) and the metachronal–tripod gait transition of hexapod robots (Kimura et al., 1993; Fujiki et al., 2013a; note that insects do not clearly show abrupt transitions, but a continuum of locomotion patterns).

# 3.2. Two Different Time-Scale Adaptations

When the environment suddenly changes, locomotor behavior is rapidly modulated to adapt to the environmental variation and successively shows gradual regulation for gaining a new locomotor pattern. This behavior suggests that motor learning occurs. This has been observed in interlimb coordination during locomotion. In particular, the split-belt treadmill walking mentioned above is a good example.

The regulation of motor behavior in split-belt treadmill walking appears in various locomotor factors. However, the factors related to interlimb coordination, such as the relative phase between the legs, step length, and center of pressure profile, and those related to intralimb coordination, such as the duty factor and stride length, show different trends (**Figure 11**). A sudden environmental variation rapidly changes the factors; this is called "early adaptation". Although the intralimb coordination factors do not show further change,

the interlimb coordination factors tend to gradually return to their original state after early adaptation; this is called "late adaptation". This means that interlimb coordination has two types of adaptations with different time scales. Furthermore, when the environment is returned its original state, the interlimb coordination factors move in the opposite direction to the early adaptation, which shows the aftereffects.

Rapid changes in the locomotor factors have been observed during split-belt treadmill walking of spinal cats (Forssberg et al., 1980; Frigon et al., 2013). These rapid changes suggest that early adaptation is induced by sensorimotor interaction in the spinal cord. On the other hand, humans with cerebellar damage do not show late adaptation or after-effects during split-belt treadmill walking, and it appears that the cerebellum contributes to late adaptation and the after-effects (Morton and Bastian, 2006; although split-belt experiments have been performed for arthropods (Bässler and Wegner, 1983; Foth and Graham, 1983; Müller and Cruse, 1991a,b), the results showed that they do not necessarily need learning, which may underestimate their adaptation ability). Otoda et al. (2009) modeled the stepping reflex to modulate the touchdown angle of the swing leg and introduced the adjustment of proportional control gain at the hip joint of the stance leg as the cerebellar function producing split-belt treadmill walking of a two-dimensional biped robot, although they did not use a CPG model with adaptation. In contrast, Fujiki et al. (2015) incorporated a cerebellar learning model into the spinal CPG model (**Figure 12B**). The CPG model was composed of simple phase oscillators with sensory reflex by local foot contact information and was used in Fujiki et al. (2013b) as mentioned above. The learning model modulated the foot contact timing of each leg through the evaluation of prediction error by using the local sensory foot contact information of each leg. Biped robot experiments on a splitbelt treadmill (**Figure 12A**) showed adaptive intralimb and interlimb coordination (**Figures 12C,D**). In particular, despite the lack of direct interlimb coordination control, early and late adaptations and after-effects were observed in interlimb coordination, and showed strong similarities to those observed in humans.

Rapid modulation by the sensory reflex model and gradual modulation by the learning model changed the pitching moment depending on the belt speed condition through the body dynamics of the robot (**Figure 13**). The pitching moment change induced spatiotemporal modification of the robot movements and altered various locomotor factors. The sensory reflex model secured the ability to continue walking against the environmental change, and the cerebellar learning model modulated the robot movements under those conditions to make walking smoother and more efficient through optimization (minimization of prediction error of foot contact timing). For simple human behaviors, such as arm reaching movements, learning models that aim to minimize jerk or torque-change have been proposed (Flash and Hogan, 1985; Uno et al., 1989). However, for human locomotion, it remains unclear what factors are predicted and how to facilitate the learning. This is partly because locomotion is a whole-body movement through limb movement and posture controls, and is governed by complicated dynamics including foot contact and lift off, which change the physical constraints. Robot experiments with neurophysiologically inspired control models are useful for examining potential control models through the comparison of results obtained from human measured data and clarification of dynamical mechanisms.

(E) Possible potential function that shows hysteresis. These figures were modified from Aoi et al. (2013b).

# 4. KEY FACTORS AND MECHANISMS FOR ADAPTIVE INTERLIMB COORDINATION

In the previous sections, we presented adaptive interlimb coordination of animals and legged robots to deal with different locomotion speeds, environmental situations, body properties, and tasks. Here, we discuss key factors and mechanisms underlying the adaptive control of interlimb coordination.

One of the key mechanisms is the CPGs, which are located in the spinal cord of vertebrates and in the thoracic ganglia of invertebrates. Except for the anti-phase activity of antagonistic excitatory motoneurones, no feature of the pilocarpine-induced rhythm appears to correspond to any motor output observed in stick insects (Büschges et al., 1995). However, neurophysiological studies have revealed that CPGs are important for locomotion (Grillner, 1975; Orlovsky et al.,

coordination factors. This shows only early adaptation when the environment

changes. These figures were modified from Fujiki et al. (2015).

1999; MacKay-Lyons, 2002). A CPG is a group of interconnected neurons that can be activated to generate a motor pattern without the requirement of sensory feedback. As described in Ijspeert (2008), various CPG models with different levels of complexity have been proposed, from detailed biophysical models using Hodgkin-Huxley neurons (Traven et al., 1993; Cataldo et al., 2006; Holmes et al., 2006; Bungay and Campbell, 2009) and connectionist models using leaky-integrator neurons or integrate-and-fire neurons (Buchanan, 1992; Arena, 2000) to abstract models using coupled oscillators (Ijspeert et al., 2007; Chung and Slotine, 2010; Yu et al., 2014). Although some robot studies have shown that complex insect behavior, such as continuous gait transition, walking over irregular ground including a large gap, and curve walking with an irregular step pattern, can be replicated without CPG models (Cruse et al., 1998; Lewinger and Quinn, 2011; Schilling et al., 2013a,b), these CPG models have improved locomotion control of legged robots, such as the control of speed (Ijspeert, 2008) and robustness against sensory noise as well as sensory failure (Di Canio et al., 2016). In particular, key issues for controlling legged robots are design of feedforward and feedback controllers and integration of these controllers. The CPG models give us useful ideas for the design and integration so that the integrated controller works in a biologically plausible fashion (comparison between the controllers with and without CPG models would be useful to find the contribution of the CPG models).

Most research has employed abstract CPG models with hardwired connections to motor units for generating different basic locomotor behaviors, such as walking and swimming. Switching between different gaits or locomotion modes can be done by using simple external input signals (Kirchner et al., 2002; Ijspeert et al., 2007; Manoonpong et al., 2008). Though CPGs acting as open-loop control are the key for production of basic rhythmic locomotion, sensory feedback is a very important factor needed for adaptations to different speeds, environments, bodies, and tasks, as described in previous sections for adaptive interlimb coordination. Combining CPGs with sensory feedback results in closed-loop control with adaptability. For robotic implementation, different sensory feedback affecting CPG activities includes proprioceptive feedback (e.g., joint/leg movement and force) and exteroceptive feedback (e.g., foot contact and vision). Such feedback can modulate the frequency, phase, and magnitude of CPG activities [see review by Buschmann et al. (2015)].

Frequency modulation (also known as entrainment, Buchli et al., 2006) uses feedback information to adapt the frequency of the CPG so that the frequencies of the feedback and the CPG are synchronized (Nachstedt et al., 2017). Usually, joint angle feedback is used for this process in robotics studies (Endo et al., 2004; Buchli and Ijspeert, 2008; Di Canio et al., 2016) and frequency modulation has been mainly employed for adaptations of locomotion speed (Harischandra et al., 2011; Di Canio et al., 2016) and body change (Ren et al., 2015). In contrast, phase modulation typically uses foot contact and foot loading feedbacks to adjust the phase of CPGs to regulate the swing and stance phase durations, depending on the situation. In particular, the phase resetting mechanism, which has often been used for phase modulation in legged robots, was developed from the phase shift and rhythm resetting behaviors by the tactile sensor feedback in cats (Conway et al., 1987; Duysens, 1997; Schomburg et al., 1998; Rybak et al., 2006; Frigon et al., 2010) and stick insects (Büschges, 1995; Bässler and Büschges, 1998). The functional role of phase resetting has been investigated by the integration with musculoskeletal models and muscle synergy hypothesis (Aoi et al., 2010, 2013c; Aoi and Funato, 2016), and the control strategy was implemented in legged robots and helped to improved the robustness of their walking (Tsuchiya et al., 2002; Aoi and Tsuchiya, 2005, 2007; Nomura et al., 2009; Aoi et al., 2011, 2012, 2013b; Ambe et al., 2013, 2015; Fujiki et al., 2013a,b, 2015). Phase modulation has also been widely used for different adaptations including locomotion speed (Tsuchiya et al., 2002; Aoi et al., 2011, 2013b; Ambe et al., 2013, 2015; Fujiki et al., 2013a; Owaki et al., 2013; Fukuoka et al., 2015; Owaki and Ishiguro, 2017), environmental condition (Aoi and Tsuchiya, 2005; Aoi et al., 2010; Fujiki et al., 2013a,b, 2015), body properties (Tsuchiya et al., 2002; Aoi et al., 2011; Fujiki et al., 2013a; Owaki et al., 2013; Barikhan et al., 2014), and task (Aoi and Tsuchiya, 2007; Aoi et al., 2012, 2013c). Magnitude modulation uses different types of feedback, such as force and vision, to regulate the magnitude of the CPG. This regulation is indirectly achieved through premotor neuron networks (Buschmann et al., 2015). Goldschmidt et al. (2014) and Grinke et al. (2015) employed this strategy by using visual feedback for environment-dependent adaptation, such as hexapod robots climbing over an obstacle or turning away from it.

One can also achieve adaptive interlimb coordination by integrating these CPG modulation techniques with other

bio-inspired approaches, such as adaptive muscle stiffness control (Xiong et al., 2015). Manoonpong et al. (2013) showed that bio-inspired forward models that translate motor commands or efference copies into expected sensory feedback are important components for environmentdependent adaptation, i.e., walking on different terrains. By using a split-belt treadmill, Fujiki et al. (2015) showed that cerebellar learning models to regulate motor commands while minimizing the prediction error are also important for environment-dependent adaptation. **Table 1** roughly categorizes the key mechanisms that have been used for different adaptations.

In addition to these bio-inspired key factors (CPGs, sensory feedbacks, forward model, learning model, and muscle stiffness), which are usually applied to independent control of individual legs or joints, most of the studies explicitly design complete

interlimb connections to obtain the desired locomotor behaviors. This results in limitations of adaptive and flexible interlimb coordination (e.g., Kirchner et al., 2002; Ijspeert et al., 2007; Harischandra et al., 2011; Manoonpong et al., 2013; Ren TABLE 1 | Key mechanisms used for different adaptations.


CPG, Central pattern generator; CF, CPG frequency modulation; CP, CPG phase modulation; CM, CPG magnitude modulation; PF, Pure feedback; FM, Forward model; LM, Learning model; MS, Muscle stiffness adaptation; ML, Other machine learning approaches.

et al., 2015). To overcome these limitations, a proposed alternative paradigm achieves interlimb coordination by local sensing, body-environment interactions, and weakly-coupled or decoupled CPGs (Tsuchiya et al., 2002; Aoi et al., 2011, 2013b; Shim and Husbands, 2012; Ambe et al., 2013, 2015; Fujiki et al., 2013a; Owaki et al., 2013; Barikhan et al., 2014; Owaki and Ishiguro, 2017), rather than by predefined interlimb connections. Although the proposed paradigm leads to high flexibility and adaptability in interlimb coordination, it sometimes encounters unstable locomotion. Phase resetting, which modulates the CPG phase based on the sensory reflex, as mentioned above, is one of the solutions to obtain flexible and adaptive interlimb coordination while keeping stability in locomotion (Tsuchiya et al., 2002; Aoi and Tsuchiya, 2007; Aoi et al., 2011, 2012, 2013b,c; Ambe et al., 2013, 2015; Fujiki et al., 2013a,b, 2015). However, this uses only phase modulation and has limitations. Thus, one future research study in this direction is to find a method that can autonomously form the plastic connections for stable but still flexible and adaptive locomotion. Furthermore, the interactions of CPGs, sensory feedback, body dynamics, forward model, learning model, and muscle stiffness for highly adaptive, robust, and energy-efficient locomotion remain to be explored.

#### 5. CONCLUSION

Although walking animals create adaptive locomotor behavior by skillfully manipulating their complicated and redundant musculoskeletal systems, the underlying mechanisms are still unclear. Designing the control architecture for legged robots to autonomously achieve such adaptability is still a challenge. Although some legged robots produced adaptive locomotor behaviors by purely engineering approaches without inspiration from biological systems, neurophysiological findings such as CPG organizations and sensorimotor interactions are useful for designing the control system of legged robots. Robot experiments with CPG models and sensory feedbacks are insightful from a dynamic viewpoint for understanding gait generation and adaptation in a self-organizing manner among neural dynamics, body dynamics, and environment. In this review, we showed adaptive interlimb coordination in the locomotion of animals and legged robots induced by various factors, such as locomotion speed, environmental situation, body properties, and tasks. We also showed characteristic properties of adaptive interlimb coordination, such as gait hysteresis and different time-scale adaptations. Legged robots are becoming a valuable tool for understanding the locomotion mechanism including interlimb coordination. In the future, together with the improvement of robotics systems, such as actuators and sensors, it will be important to enhance biological plausibility and feasibility by the integration with sophisticated models of neural and musculoskeletal systems, such as the Hodgkin-Huxley model and the muscle-tendon unit model, and to extract dynamical features by integration with simple models, such as the template model (Full and Koditschek, 1999; Holmes et al., 2006). Furthermore, it will also be important to further improve and develop analytical methods, such as phase reduction theory (Kuramoto, 1984) and synergy analysis (Ivanenko et al., 2004; Latash, 2008), to clarify essential factors from multiple and redundant data.

#### REFERENCES


#### AUTHOR CONTRIBUTIONS

SA and PM contributed to the conception and design of the paper. SA, PM, and YA reviewed the relevant literature and wrote the paper. FM and FW revised the paper critically for important intellectual content. All authors approved the paper for publication.

#### ACKNOWLEDGMENTS

This paper is supported in part by Grant-in-Aid for Young Scientists (A) 17H04914 from the Ministry of Education, Culture, Sports, Science, and Technology (MEXT) of Japan, by the Inamori Foundation, Japan, by the Kyoto Technoscience Center, Japan, and by the Centre for BioRobotics (CBR) at the University of Southern Denmark (SDU, Denmark).


and adaptive locomotion on complex terrains. Artif. Life Robot. 21, 274–281. doi: 10.1007/s10015-016-0296-3


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2017 Aoi, Manoonpong, Ambe, Matsuno and Wörgötter. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# A Minimal Model Describing Hexapedal Interlimb Coordination: The Tegotae-Based Approach

Dai Owaki <sup>1</sup> \*, Masashi Goda<sup>1</sup> , Sakiko Miyazawa<sup>1</sup> and Akio Ishiguro1, 2

*<sup>1</sup> Research Institute of Electrical Communication, Tohoku University, Sendai, Japan, <sup>2</sup> Japan Science and Technology Agency, CREST, Saitama, Japan*

Insects exhibit adaptive and versatile locomotion despite their minimal neural computing. Such locomotor patterns are generated via coordination between leg movements, i.e., an interlimb coordination, which is largely controlled in a distributed manner by neural circuits located in thoracic ganglia. However, the mechanism responsible for the interlimb coordination still remains elusive. Understanding this mechanism will help us to elucidate the fundamental control principle of animals' agile locomotion and to realize robots with legs that are truly adaptive and could not be developed solely by conventional control theories. This study aims at providing a "minimal" model of the interlimb coordination mechanism underlying hexapedal locomotion, in the hope that a single control principle could satisfactorily reproduce various aspects of insect locomotion. To this end, we introduce a novel concept we named "Tegotae," a Japanese concept describing the extent to which a perceived reaction matches an expectation. By using the Tegotae-based approach, we show that a surprisingly systematic design of local sensory feedback mechanisms essential for the interlimb coordination can be realized. We also use a hexapod robot we developed to show that our mathematical model of the interlimb coordination mechanism satisfactorily reproduces various insects' gait patterns.

#### Edited by:

*Poramate Manoonpong, University of Southern Denmark Odense, Denmark*

#### Reviewed by:

*Jean-Baptiste Mouret, Institut National de Recherche en Informatique et en Automatique, France Sakyasingha Dasgupta, IBM Research–Tokyo, Japan*

#### \*Correspondence: *Dai Owaki*

*owaki@riec.tohoku.ac.jp*

Received: *11 November 2016* Accepted: *22 May 2017* Published: *09 June 2017*

#### Citation:

*Owaki D, Goda M, Miyazawa S and Ishiguro A (2017) A Minimal Model Describing Hexapedal Interlimb Coordination: The Tegotae-Based Approach. Front. Neurorobot. 11:29. doi: 10.3389/fnbot.2017.00029* Keywords: hexapedal locomotion, interlimb coordination, local sensory feedback, central pattern generator (CPG), Tegotae

# 1. INTRODUCTION

Insects exhibit tremendously versatile gait patterns owing to their locomotion speed and physical/environmental conditions (Hughes, 1957; Graham, 1972, 1977; Cruse, 1976; Foth and Graham, 1983a,b; Dean, 1991; Zollikofer, 1994a,b,c; Noah et al., 2004; Goldman et al., 2006; Sponberg and Full, 2008; Grabowska et al., 2012; Wosnitza et al., 2013). These locomotor patterns are generated via their interlimb coordination mechanism. Biological findings suggest that interlimb coordination in hexapedal locomotion is controlled largely in a decentralized manner by neural networks located in thoracic ganglia (Pearson and Iles, 1969, 1973; Bässler and Wegner, 1983; Dean, 1989; Brekowitz and Laurent, 1996). Thus, clarifying this interlimb coordination mechanism is expected to allow us to obtain the key to understanding the control principle underlying animals' agile locomotion and for realizing truly adaptive legged robots that could not be realized solely by conventional control methods.

Aiming to elucidate the mechanism responsible for the interlimb coordination in hexapedal locomotion, various studies have been conducted to date by focusing on specific insects, e.g., stick insects (Graham, 1972, 1977; Cruse, 1976; Foth and Graham, 1983a,b; Dean, 1991; Grabowska et al., 2012) and cockroaches (Hughes, 1957; Pearson and Iles, 1969; Noah et al., 2004; Goldman et al., 2006; Sponberg and Full, 2008) and/or by focusing on control paradigms, e.g., central pattern generators (CPGs) (Pearson and Iles, 1973; Bässler and Wegner, 1983; Bässler, 1986, 1993; Ryckebusch and Laurent, 1993; Büschges et al., 1995, 2004; Bässler and Büschges, 1998; Büschges, 2005; Borgmann et al., 2009; Daun-Gruhn and Büschges, 2011; Marder and Bucher, 2011) and chains of reflexes (Cruse, 1983, 1990; Cruse et al., 1998; Dürr et al., 2004; Schilling et al., 2013). The knowledge obtained from these past studies deepened biological understanding of the interlimb coordination mechanism greatly; however, the diversity of these approaches may have confused roboticists who want to build adaptive insect-like hexapod robots via bio-inspired approaches (Kimura et al., 1993; Beer et al., 1997; Altendorfer et al., 2001; Ritzmann et al., 2004; Ambe et al., 2013; Manoonpong et al., 2013).

In order to address this problem, in this study, we attempt to capture the control principle essential to understanding the interlimb coordination in a concise form that could help bridge the gap between biologists and roboticists, in the hope that a single control principle could adequately reproduce various aspects of insect locomotion. Since reduction is required for understanding the essence, we build a "minimal model" of the interlimb coordination mechanism on the basis of a mathematically tractable highly abstract model. To this end, we employ a unique approach in this study. We introduce a novel concept we named "Tegotae," a Japanese concept describing the extent to which a perceived reaction matches an expectation. We then introduce a Tegotae function, which is a function that quantitatively measures Tegotae, whereby we can design a decentralized interlimb coordination mechanism in a systematic manner. We validated the Tegotae-based interlimb coordination model by using a physical hexapod robot that we developed. We confirmed that the model adequately reproduced various aspects of insect locomotion patterns. We expect that our minimal model, systematically derived from the concept of Tegotae, will provide substantial insight into the essence of the interlimb coordination mechanism to roboticists as well as biologists.

The following section presents the materials and methods used in this study. First, we describe a basic building block for the interlimb coordination mechanism. Second, we explain the Tegotae concept and the design scheme of local sensory feedback using the Tegotae-based approach. Third, we explain the developed robotic platform in detail. Section 3 presents the experimental results to validate our Tegotae-based control for the interlimb coordination mechanism. Finally, in Section 4, we discuss our results and future work.

#### 2. MATERIALS AND METHODS

#### 2.1. Basic Building Block of Interlimb Coordination Mechanism Employed

To capture the control principle essential for the interlimb coordination mechanism, which works largely in a decentralized manner in insects' thoracic ganglia, it is important to determine a basic building block to be used for the distributed control system. From a control perspective, past studies have intensively argued mainly from the viewpoint of two distinct control paradigms: chains of reflexes (Cruse, 1983, 1990; Cruse et al., 1998; Dürr et al., 2004; Schilling et al., 2013) and CPGs (Pearson and Iles, 1973; Bässler and Wegner, 1983; Bässler, 1986, 1993; Ryckebusch and Laurent, 1993; Büschges et al., 1995, 2004; Bässler and Büschges, 1998; Büschges, 2005; Borgmann et al., 2009; Daun-Gruhn and Büschges, 2011; Marder and Bucher, 2011). In the chain-of-reflex approach, a control system is modeled by using many chained discontinuous reflexive events, in which locomotion can be generated purely from the interaction between sensory feedback signals and the body. However, the discontinuity in this approach may impede mathematical tractability (Daun-Gruhn and Büschges, 2011). In contrast, in the CPG approach, a control system is modeled by using directly coupled oscillators to generate feedforward motor commands, based on a continuous dynamical system, i.e., a set of differential equations, for the interlimb coordination. Considering the mathematical tractability stemming from a continuous model, we employ the CPG approach as a control paradigm. The CPG approach offers various ways to model a basic building block at different levels of abstraction (Ijspeert, 2008), ranging from detailed models using a single cell (Hodgkin and Huxley, 1952; Hellgren et al., 1992) to abstract oscillator models (Fitz-Hugh, 1969; Van der Pol, 1972; Kuramoto, 1984). Here we use a phase oscillator (Kuramoto, 1984) for each leg to build a minimal model of the interlimb coordination mechanism on the basis of a highly abstract model.

The time evolution of the oscillator phase is described by a differential equation as follows:

$$
\dot{\phi}\_{\dot{l}} = \omega + f\_{\dot{l}}, \tag{1}
$$

where ω is the intrinsic angular velocity; φ<sup>i</sup> is the phase of the oscillator implemented into the ith leg; and f<sup>i</sup> is a local sensory feedback term, which plays an essential role in the interlimb coordination. This equation is one of the abstract oscillator models, i.e., the Kuramoto model (Kuramoto, 1984) (a case without coupling between oscillators and with local sensory feedback fi), which describes a one-dimensional, reduction model of oscillatory behaviors. Using the trigonometric functions (sin φ<sup>i</sup> , cos φ<sup>i</sup> , etc.) of oscillator phases enables us to generate a periodic motor command to control the legs of a robot. As an example of implementation, we describe the target angles θ˜ yaw,i and θ˜ roll,i for the proportional and derivative (PD) control of the motors (as explained in Section 2.4 and **Figure 6** in detail) through the following equations:

$$
\tilde{\theta}\_{yw,i} = -A \cos \phi\_i,\tag{2}
$$

$$\tilde{\theta}\_{roll,i} = \begin{cases} B\sin\phi\_i, \text{when } 0 \le \phi\_i < \pi, \\ B'\sin\phi\_i, \text{when } \pi \le \phi\_i < 2\pi, \end{cases} \tag{3}$$

where A, B, and B ′ are user-defined parameters, describing amplitudes in the yaw and roll direction for leg motion (see Section 2.4 and **Table 1**). Thus, the ith leg is actively controlled according to φ<sup>i</sup> such that the ith leg is in the swing phase when 0 ≤ φ<sup>i</sup> < π, i.e., sin φ<sup>i</sup> > 0, and in the stance phase when π ≤ φ<sup>i</sup> < 2π, i.e., sin φ<sup>i</sup> < 0, as shown in **Figure 1**. Below, we explain how we design local sensory feedback f<sup>i</sup> by introducing the concept of "Tegotae" in a systematic manner.

#### 2.2. Tegotae and Tegotae Function

Here we explain the core concept Tegotae in detail. Tegotae is a novel concept describing the extent to which a perceived reaction matches an expectation (intention) of a controller. For ease of understanding, let us explain it metaphorically. Imagine you want to lean against a wall nearby. Note that what you want to do, i.e., leaning against the wall, is regarded as the intention of the controller, i.e., your nervous system. When you lean against the wall, if you feel that the reaction force from the wall is sufficient for supporting your body, we say "good" Tegotae is obtained. If the reaction force you receive is insufficient (imagine the wall were a curtain/screen for example), "bad" Tegotae is obtained. Notice that Tegotae stems not only from the reaction received from the environment, but also from the consistency between the perceived reaction and the intention/expectation of the controller, i.e., what the controller wants to do.

Now the question is how to quantify Tegotae. Of course, there are various ways to accomplish this. As the initial step of the investigation, we quantify Tegotae in the simplest mathematical


form, i.e., a function based on the type of separation of variables as follows:

$$T\_i(\phi\_i, N) = C(\phi\_i)S(N). \tag{4}$$

Hereafter, we refer to the function T<sup>i</sup> as the "Tegotae function" a function that quantitatively measures Tegotae. φ<sup>i</sup> is a control variable (in this case the phase of the oscillator), and N is the sensory information obtained from multiple sensors embedded in the body. Note that, the Tegotae function T<sup>i</sup> is expressed as the product of two functions C(φi) and S(N): the former is a function expressing the intention of the controller, and the latter denotes the reaction obtained from the environment. Here, we design T<sup>i</sup> such that it becomes more positive when enhanced Tegotae is detected. Next, we explain how we can design the sensory feedback term f<sup>i</sup> by using T<sup>i</sup> .

### 2.3. Tegotae-Based Control

Given that the Tegotae function is defined, the local sensory feedback term f<sup>i</sup> is designed in such a way that the control system modulates φ<sup>i</sup> in order to increase the amount of Tegotae received. Thus, because a continuous system is used, f<sup>i</sup> is expressed simply as the partial derivative of the Tegotae function T<sup>i</sup> with respect to the control variable φ<sup>i</sup> , as follows:

$$f\_i = \frac{\partial T\_i(\phi\_i, N)}{\partial \phi\_i}.\tag{5}$$

Note that we can systematically design decentralized controllers by only designing the Tegotae functions required.

Now, the question is how to define Ti(φ<sup>i</sup> , N) to satisfactorily reproduce the hexapedal interlimb coordination observed in insect locomotion. In this study, we define Ti(φ<sup>i</sup> , N) as follows:

$$T\_i(\phi\_i, N) = \sigma\_1 T\_{i,1}(\phi\_i, N) + \sigma\_2 T\_{i,2}(\phi\_i, N), \tag{6}$$

$$T\_{i,1}(\phi\_i, N) = (-\sin \phi\_i) N\_i^V,\tag{7}$$

$$T\_{i,2}(\phi\_i, N) = \sin \phi\_i \left( \frac{1}{n\_L} \sum\_{j \in L(i)}^{n\_L} k\_j N\_j^V \right). \tag{8}$$

As Equation (6) indicates, Ti(φ<sup>i</sup> , N) consists of two Tegotae functions, Ti,1(φ<sup>i</sup> , N) and Ti,2(φ<sup>i</sup> , N), both of which are linearly coupled via the positive constants σ<sup>1</sup> and σ2. The suffix i denotes the leg number (i : 1, 2, . . . , 6). Sensory information N consists of vertical ground reaction forces (GRFs) acting on each leg N = [N V 1 , N V 2 , . . . , N V 6 ] T . L(i) denotes a set consisting of the legs neighboring the ith leg, and n<sup>L</sup> is the number of elements in L(i) and k<sup>j</sup> (ka, kp, k<sup>c</sup> ≥ 0) denotes the weight for each GRF N V j , as shown in **Figure 2**. Further, we present a detailed explanation of the approach we followed when designing these two Tegotae functions.

Ti,1 quantifies Tegotae on the basis of the information that is only locally available at the corresponding leg; when the local controller intends to be in the stance leg (− sin φ<sup>i</sup> > 0), and

FIGURE 1 | Schematic of the basic building block for the control system. We used a *phase oscillator* (Kuramoto, 1984) with local sensory feedback for each leg for hexapedal interlimb coordination. The *i*th leg is actively controlled (see Figure 5) according to φ*<sup>i</sup>* such that the *i*th leg is in the swing phase when 0 ≤ φ*<sup>i</sup>* < π and in the stance phase when π ≤ φ*<sup>i</sup>* < 2π.

results in receiving a ground reaction force (N V <sup>i</sup> <sup>&</sup>gt; 0) (**Figure <sup>3</sup>**, top), Ti,1 evaluates this situation as "good" Tegotae, and returns a positive value.

On the other hand, Ti,2 quantifies Tegotae on the basis of the relationship between the movements of the corresponding leg and its neighboring legs; when the local controller intends to be in the swing phase (sin φ<sup>i</sup> > 0) and its neighboring legs offer good support to the body at that time ( <sup>1</sup> nL Pn<sup>L</sup> j∈L(i) kjN V <sup>j</sup> <sup>&</sup>gt; 0) (**Figure <sup>3</sup>**, bottom), Ti,2 evaluates that the corresponding leg adequately establishes a relationship with its neighboring legs and returns a positive value.

By substituting Equations (6–8) into Equations (1) and (5), we obtain our interlimb coordination mechanism as follows:

$$\dot{\phi}\_i = \omega - \sigma\_1 N\_i^V \cos \phi\_i + \sigma\_2 \left( \frac{1}{n\_L} \sum\_{j \in L(i)}^{n\_L} k\_j N\_j^V \right) \cos \phi\_i. \tag{9}$$

Introduction of the Tegotae-based approach enables us to easily design a minimal model for hexapedal interlimb coordination in a systematic manner.

### 2.4. Robotic Platform for the Validation of Proposed Control Scheme

**Figure 4** shows the structure of our hexapod robot. The robot consists of six leg segments (**Figure 5**) and a body segment. The robot is 0.40 m long, 0.30 m wide, 0.20 m high, and weighs 2.4 kg. The leg and body consist of carbon fiber rods and acrylonitrile butadiene styrene (ABS) resin printed using a 3-D printer. For each leg, we used two servo motors (Futaba Corporation, Japan: RS405CB), which generate leg motion during the swing and stance phases according to the corresponding oscillator phase (**Figure 5B**). As shown in **Figure 6**, we describe the target angles θ˜ yaw,<sup>i</sup> and θ˜ roll,i for proportional and derivative (PD) control of

the motors through the following equations:

$$
\tilde{\theta}\_{yaw,i} = -A \cos \phi\_i,\tag{10}
$$

$$\tilde{\theta}\_{roll,i} = \begin{cases} B\sin\phi\_i, \text{when } 0 \le \phi\_i < \pi, \\ B'\sin\phi\_i, \text{when } \pi \le \phi\_i < 2\pi. \end{cases} \tag{11}$$

Based on this control scheme, we can generate periodic leg motion as shown in **Figure 5B**. From the viewpoint of neurophysiological findings for a locomotor CPG system in animals (Lafreniere-Roula and McCrea, 2005; Rybak et al., 2016), Equation (9) corresponds to the rhythm generator (RG) and Equations (10) and (11) correspond to a pattern formation (PF) network in the two-level CPG concept. For the robot, we choose parameter values A, B, B ′ for the geometric path of the foot by tuning them through trial and error as shown in **Table 1**. We employ passive springs (MISUMI Corporation: WM8-20, 2.9 N/mm) in each leg for shock absorption. Furthermore, we use three-axis force sensors (OptoForce Ltd., Hungary: OMD-20-SE-40N) in the feet of the robot to detect ground reaction forces (GRFs), as shown in **Figure 5A**.

The body contains a main control board. We calculate the oscillator phase in each leg by using microcontrollers (mbed NXP LPC1768) on the main control board. We manipulated each servo motor installed in the legs using proportional-derivative (PD) control as explained above.

#### 3. EXPERIMENTAL RESULTS

To verify the proposed control scheme in the real world, we conducted five experiments: (i) steady walking, (ii) gait transition according to locomotion speed, (iii) adaptability to change in weight distribution, (iv) adaptability to leg amputation, and (v) effect of local sensory feedback. The control parameters that were used in experiments with the hexapod robot (Sections 3.1–3.5) are listed in **Table 1**. We conducted over 10 trials for each experiment: each trial was conducted on a treadmill for a period of 50 s using randomly selected initial phases.

## 3.1. Steady Walking

**Figure 7** shows the results of measurements conducted when our robot was engaged in steady walking. Here, we set the parameter ω = 2.0 rad/s. **Figure 7** shows the gait diagram (upper graph) and time evolution of the oscillator phases of the legs (lower graph, sin φi) for the period 0.0–20.0 s. In the gait diagram, the colored regions represent the stance phase, which is distinguished by using the threshold data value (1.5 N: less than 10% of the maximum force detected) from the force sensor.

FIGURE 4 | Hexapod robot developed for the study. The robot is 0.40 m long, 0.30 m wide, 0.20 m high, and weighs 2.4 kg.

Hereafter, we use the gait diagrams and movies (i.e., Movies S1– S3) recorded by a video camera as a qualitative evaluation index and the average duty factors (the ratio of the stance phase to one period) as a quantitative evaluation index. For the quantitative analysis, the duty factors obtained by the gait diagrams reflect the direction of the robot motion (i.e., straightness) because the asymmetric duty factors in the left and right legs indicate turning in the locomotion. Moreover, the duty factors indirectly represent the foot point velocity during the locomotion because the leg trajectory of our robot is determined in response to oscillator phases (**Figure 6**). Thus, the data of the duty factors from the gait diagrams indirectly include physical information about the speed and the direction of the locomotion (see SM for more details). The gait pattern rapidly converges from the initial phase relationship to a tetrapod gait—the ipsilateral feet touch the ground in the order of hind, middle, and fore legs—within approximately two periods. Furthermore, we tested the effect of the variation in the initial oscillator phases on the gait patterns. The results confirmed that the initial patterns converged to the same gait patterns from any initial phase relationship (in 10 out of 10 trials: 100%).

# 3.2. Gait Transitions According to Locomotion Speed

We tested the ability of the proposed control scheme to change the gait patterns according to the locomotion speed by linearly changing the parameter ω from 2.0 to 4.0 rad/s during the time period 40.0 to 42.0 s. **Figure 8A** shows the gait diagram (upper graph) and the time evolution of oscillator phases of legs (lower graph, sin φi), during the time period 30.0–50.0 s in this experiment. After ω was chenged, the gait pattern spontaneously changed from that of a tetrapod to that of a tripod—the (L1, R2, L3) and (R1, L2, R3) feet alternately touch the ground in the antiphase (Movie S1). **Figure 8B** shows the profile of vertical and

FIGURE 5 | Detailed structure of the leg segment of the robot. (A) The leg consists of carbon fiber rods and ABS resin printed using a 3-D printer. The feet contain three-axis force sensors to detect GRFs. (B) Each leg is equipped with two servo motors, which generate leg motion during the swing and stance phases according to the corresponding oscillator phase.

horizontal GRFs (N V i and N H i ) in the same experiment. In this figure, the upper, middle, and lower graphs show the GRF profile of the front (L1), middle (L2), and hind (L3) legs, respectively. Furthermore, we confirmed this result for the gait transition in all 10 trials (10/10: 100%). The results indicate that leg coordination is appropriately modified according to the locomotion speed via Tegotae-based control.

# 3.3. Adaptability to Change in Weight Distribution

Here, we show the adaptability of our robot to changes in weight distribution by applying a load (500 g) to the hind portion of the body (upper photograph in **Figure 9**). The lower graphs in **Figure 9** show the experimental result. Here, we changed the parameter ω from 2.0 to 4.0 rad/s during the period 40.0 to 42.0 s as in the previous gait transition experiments (Section 3.2). After changing ω, the gait pattern did not change to that of a tripod; instead, a tetrapod gait was maintained (Movie S2). We obtained the same results in 10 out of 10 trials (100%). **Figure 10** compares the average duty factor of the front, middle, and hind legs without and with the load for 10 trials (ω = 4.0 rad/s). The duty factor, which is the ratio of the stance phase to one period, was calculated by using the gait patterns during six periods for each trial. This result indicates that the duty factor of the loaded hind legs is larger than that of legs that do not bear any load. This result demonstrates the adaptability of our proposed control scheme to changes in the weight distribution without requiring prior data about these changes.

# 3.4. Adaptability to Leg Amputation

**Figure 11** shows the experimental results of the leg amputation test after both of the middle legs were amputated. In spite of the amputation, the robot was able to continue walking. Furthermore, the gait patterns converged to a trot or an L-S walk gait observed in quadrupeds—i. e. the (L1, R3) and (R1, L3) feet alternately touch the ground in nearly anti-phase, or more precisely, focusing on the timing of touch down, the feet touch the ground in the order from L1, R3, R1, L3 (Movie S3). **Figure 12** compares the average duty factor of the front, middle, and hind legs for 10 trials of the leg amputation experiment. The duty factor of each leg was modulated according to the remaining number of legs, which mainly resulted in increasing the duty factor of the hind legs. Furthermore, we confirmed that the initial patterns converged to the same gait patterns from any initial phase relationship (in 10 out of 10 trials: 100%). These results also indicate that the proposed control scheme can achieve interlimb coordination according to the physical properties of the robot's body in a self-organizing manner, without any predefined gait patterns.

# 3.5. Effect of Local Sensory Feedback Concerning Neighboring Legs

The usefulness of our proposed local sensory feedback was verified based on the Tegotae approach by conducting experiments with the following conditions: we set the parameters ω = 2.0, σ<sup>1</sup> = 0.2, σ<sup>2</sup> = 0, which is a model similar to our previous model for quadrupeds (Owaki et al., 2012; Owaki and Ishiguro, 2017) or Barikhan's model for hexapod models

(Barikhan et al., 2014). We conducted 10 trials in this experiment using randomly selected initial phases. **Figure 13** shows the experimental results obtained using these parameters. The gait patterns mostly did not converge to insect-like gaits, e.g., tetrapod/tripod gaits, but converged to other patterns under many initial conditions (in 7 out of 10 trials: 70%) in this model. In these gaits, the left legs touched the ground in the order L3, L2, and L1 (hind to fore), whereas the right legs touched in the order R1, R2, and R3 (fore to hind). This result indicates that the model with only the second term in Equation (9) (similar to Barikhan's model) sometimes reproduced a gait pattern similar to that of insects, but its robustness against the initial conditions was insufficient.

#### 4. DISCUSSION

The purpose of this study was to provide a minimal model for the interlimb coordination in hexapedal locomotion based on a novel concept named Tegotae. Using the Tegotae-based approach has enabled us to show how we can design the local sensory feedback for a decentralized interlimb coordination mechanism in a systematic manner. Moreover, we have demonstrated that our hexapod robot, which was developed for the validation of the proposed control scheme, satisfactorily reproduced various aspects of insect locomotion, i.e., steady walking, gait transition according to locomotion speed, and adaptability to changes in weight distribution and to leg amputation. As shown in **Figure 8B**, the role arrangement of the fore, middle, and hind legs can be achieved via the interlimb coordination mechanism: (i) the fore legs mainly generate breaking forces (N H <sup>i</sup> was mainly negative), (ii) middle legs mainly support the body (N V <sup>i</sup> was

phase sin φ*<sup>i</sup>* . We found spontaneous transition from the gait of a tetrapod to that of a tripod, in which the (L1, R2, L3) and (R1, L2, R3) feet alternately touch the ground in anti-phase, by changing only parameter ω from 2.0 to 4.0 rad/s in the period from 40.0 to 42.0 s (yellow highlight in the graph, Movie S1). We confirmed the same result for the gait transition in all 10 trials (10/10: 100%). (B) The profile of vertical and horizontal GRFs (*N V i* and *N H i* ). The upper, middle, and lower graphs show the GRF profile of the front (L1), middle (L2), and hind (L3) legs, respectively.

larger than those for the other legs), and (iii) hind legs mainly generate propulsion forces (N H <sup>i</sup> was mainly positive). Such adaptive behaviors are commonly observed for various species of insects, as shown in **Table 2**. This suggests that our Tegotae-based interlimb coordination model captures the essential mechanism for hexapedal interlimb coordination. As a control experiment, if we set the parameters σ<sup>1</sup> = σ<sup>2</sup> = 0, i.e., a condition without local sensory feedback, we can easily imagine that interlimb coordination did not occur, but the phase relationship between leg movement maintains the initial condition. Thus, in order to determine the usefulness of the proposed local sensory feedback, we verified the effect of the second and third terms of Equation (9) in Section 3.

In the previous study on quadruped locomotion (Owaki et al., 2012; Owaki and Ishiguro, 2017), we have proposed a simple interlimb coordination rule that well reproduced various

quadruped gait patterns and well explained the underlying mechanism. The second term in Equation (9) corresponds to the quadruped interlimb coordination rule. Inspired by our model, Barikhan et al. (2014) also implemented an almost identical mechanism for a hexapedal interlimb coordination model and verified its usefulness by reproducing some insect-like locomotion in simulations. However, although our experiments about the effect of the third term in Equation (9) in Section 3.5 indicate that the model with only the second term in Equation (9) sometimes reproduces a gait pattern similar to that of insects, but its robustness against the initial conditions was insufficient. This is because the local load information on quadrupeds is totally reflected by physical information throughout the whole body (Owaki et al., 2012; Owaki and Ishiguro, 2017), whereas that on hexapods does not sufficiently include physical information

FIGURE 10 | Average duty factor of each leg without and with a load through 10 trials (ω = 4.0 rad/s). This result indicates that the duty factors of the loaded hind legs and middle legs are larger than those of legs without a load, whereas the duty factor of the front legs becomes smaller.

for interlimb coordination. Thus, we concluded that the third term in Equation (9), which used sensory information about load distribution in neighboring legs, is essential for the reproduction of insect-like gait patterns and gait transitions. Moreover, we have already reported the local sensory feedback mechanism in Equation (9), but we did not previously confirmed the gait transition from tetrapod to tripod and the adaptability to change in the weight distribution and leg amputation (Goda et al., 2016). Here, we newly introduce anterior-posterior asymmetry in the parameter k<sup>a</sup> and kp, which mainly resulted in the stable gait transition according to locomotion speed, i.e., from tetrapod to tripod as well as the adaptability to change according to

FIGURE 12 | Average duty factor of front, middle, and hind legs in the leg amputation experiment in 10 trials. The duty factor of the hind legs mainly increased in the case of two-leg amputation experiments.

patters mostly did not converge to insect-like gaits, e.g., tetrapod/tripod gaits, but converge to other patterns under many initial conditions (in 7 out of 10 trials: 70%). In these gaits, the left legs touched the ground in the order L3, L2, and L1 (hind to fore), whereas the right legs touched in the order R1, R2, and R3 (fore to hind).

the weight distribution and as a results of leg amputation. Our main contribution is the versatility of reproduced behaviors concerning insects' locomotion: Barikhan's model (Barikhan et al., 2014) differs from ours in that it did not reproduce the gait transition from tetrapod to tripod and did not exhibit adaptability against changes in the weight distribution and robustness against initial conditions. Furthermore, our approach is unique; we have discussed the common underlying mechanism of interlimb coordination in the locomotion of both vertebrates and arthropods by using legged robots.

The proposed interlimb coordination model shows adaptability to changes in the weight distribution of the robot's body, where the gait pattern did not change to a tripod gait but maintained a tetrapod gait after changing ω and the average duty factor of the loaded hind legs automatically became larger than those of the unloaded fore legs. These results were reproduced in a self-organizing manner by using Tegotae-based control, without any need to provide prior data about these changes. We additionally obtained biological evidence for the adaptability to changes in the weight distribution by conducting experiments using two crickets (Gryllus bimaculatus). These experiments are described in the Supplementary Material in detail. Our results using the robot clearly show good agreement with our biological evidence of the influence of the load on the leg coordination in crickets: with a load, (1) they exhibit a tetrapod gait and (2) increase the duty factor of the middle and hind legs. Furthermore, another experiment using fruit flies confirmed the same effect of a vertical load (Mendes et al., 2014), which suggests that such adaptability is observed for various species of insects. This fact strongly supports that the essentiality of using vertical GRFs N V i for sensory information S(N) when designing a Tegotae function for hexapedal interlimb coordination.

Furthermore, our model exhibited adaptability to the physical conditions resulting from a two-leg amputation. If we use a predefined neural connection for a tripod gait—where the (L1, R2, L3) and (R1, L2, R3) legs are in-phase—, we cannot reproduce a trot or an L-S walk pattern—where the (L1, R3) and (R1, L3) feet alternately touch the ground in nearly anti-phase when the two legs are amputated (**Figure 11**). Owing to the Tegotae-based interlimb coordination mechanism using both local (Ni) and neighboring (Nj) load information (Equation 9), gait patterns were self-organized in response to load distribution stemming from the remaining number of legs, which is one of the advantages of our approach. Some biological studies have suggested that insects generally exhibit the L-S walk when their two middle legs are amputated. Hughes (1957) have shown that two-middle-leg amputee cockroaches exhibited a gait—the touch-down order was (L3, L1, R3, R1), i.e., the L-S walk in quadrupeds. Graham (1977) and Grabowska et al. (2012) have shown that two-middle-leg amputee stick insects exhibited the same gait as cockroaches (Hughes, 1957) because the contralateral touch down timing became same such that gaits could be symmetric about the body axis to ensure its stability. Here, we did not actually conduct various leg-amputation tests; we can expect adaptability to some extent against some conditions, e.g., amputating a front/hind leg, owing to the potential of our model, as we have shown. However, because our model did not include any directional or posture controls and learning algorithms as in Ren et al. (2015) and Cully et al. (2015) (here, we mainly focus on real-time adaptability), its direction of motion would vary according to the physical properties: a frontleft-leg amputated robot will turn left when moving forward. According to the patterns of leg amputation, insects exhibit modulation of their spatial footfall patterns, i.e., they change the landing location of a stance leg to maintain their posture stability (Hughes, 1957; Graham, 1977; Cruse, 1983; Grabowska et al., 2012); thus, we intend to apply an additional Tegotae-based controller for the modulation of spatial footfall patterns, resulting in the adaptation to a large number of leg amputations.


TABLE 2 | Observed adaptive behavior various species of insects have in common.

In insect locomotion, it is well known that two types of sensory signals play an essential role in leg coordination: (1) sensory signals about the position and velocity of joints during movement (Büschges, 2005; Pearson et al., 2006) and (2) force signals from the leg segments (Pearson, 1972; Bässler, 1977; Cruse, 1985a,b; Duysens et al., 2000; Zill et al., 2004). Such sensory signals modulate not only the timing (phase) but also the magnitude of neural output stems from the nervous system, e.g., CPGs (Grillner, 2003; Büschges, 2005). In our Tegotae-based approach, as a first step for the investigation, we use only vertical GRFs N V i detected by force sensors installed in the legs to modulate the phase of oscillators. The obtained control principle, where both local and neighboring leg load information is essential for the interlimb coordination, agrees with biological evidence (Pearson, 1972; Bässler, 1977; Cruse, 1985a,b; Duysens et al., 2000; Zill et al., 2004). To reproduce increased adaptability to different surfaces and typed of movement, e.g., uneven terrain, uphill/downhill, similar to insects, other types of sensory signals, e.g., horizontal GRFs, would requires us to design additional Tegotae functions. Furthermore, modulation of the magnitude of motor output from neural systems will also lead to a change in landing location of a stance leg for negotiating various leg amputation patterns as discussed in the above paragraph. These topics seem to be of general interest and will also be studied in further investigations.

In the past two decades, various hexapod robots were developed with the aim of reproducing the adaptive functions of insects and to understand their control mechanisms (Kimura et al., 1993; Beer et al., 1997; Altendorfer et al., 2001; Ritzmann et al., 2004; Steingrube et al., 2010; Ambe et al., 2013; Manoonpong et al., 2013; Dasgupta et al., 2015; Ramdya et al., 2017). Ours was the first study of its kind to demonstrate various aspects of insect locomotion with a minimal control principle without any interlimb neural communication between oscillators. To the best of our knowledge, no studies have been reported in which adaptability was reproduced in a completely

#### REFERENCES


self-organized manner by only using local and neighboring load information. In the CPG approach as a control paradigm in this study, local sensory feedback f<sup>i</sup> is described simply as a partial differential of the Tegotae function T<sup>i</sup> with respect to the control variable φ<sup>i</sup> . This aspect of our model also suggests a new design scheme of local sensory feedback in the chain-ofreflex approach based on the discontinuous basic process, which should also be discussed as a next step. Our minimal model, which is systematically derived from the concept of Tegotae, is expected to provide substantial insight into the essence of the hexapedal interlimb coordination mechanism to roboticists as well as biologists.

#### AUTHOR CONTRIBUTIONS

AI and DO conceived the research and managed the data collection. MG and SM designed the robot and conducted the experiments. MG, SM, and DO conducted the analyses. All authors wrote the manuscript together.

#### FUNDING

We acknowledge the support provided by the Japan Science and Technology Agency (CREST).

#### ACKNOWLEDGMENTS

We are grateful to R. Kobayashi (Hiroshima University), H. Aonuma (Hokkaido University), and T. Kano (Tohoku University) for their helpful comments.

#### SUPPLEMENTARY MATERIAL

The Supplementary Material for this article can be found online at: http://journal.frontiersin.org/article/10.3389/fnbot. 2017.00029/full#supplementary-material

oscillators with a phase modulation mechanism," in IEEE/RSJ International Conference on Intelligent Robots and Systems (Tokyo), 5087–5092.


femur-tiibia-joint. J. Comp. Physiol. 121, 99–113. doi: 10.1007/BF00 614183


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2017 Owaki, Goda, Miyazawa and Ishiguro. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Self-Organized Behavior Generation for Musculoskeletal Robots

Ralf Der <sup>1</sup> and Georg Martius 2, 3 \*

1 Institute for Computer Science, University of Leipzig, Leipzig, Germany, <sup>2</sup> IST Austria, Klosterneuburg, Austria, <sup>3</sup> Autonomous Learning Group, Max Planck Institute for Intelligent Systems, Tübingen, Germany

With the accelerated development of robot technologies, control becomes one of the central themes of research. In traditional approaches, the controller, by its internal functionality, finds appropriate actions on the basis of specific objectives for the task at hand. While very successful in many applications, self-organized control schemes seem to be favored in large complex systems with unknown dynamics or which are difficult to model. Reasons are the expected scalability, robustness, and resilience of self-organizing systems. The paper presents a self-learning neurocontroller based on extrinsic differential plasticity introduced recently, applying it to an anthropomorphic musculoskeletal robot arm with attached objects of unknown physical dynamics. The central finding of the paper is the following effect: by the mere feedback through the internal dynamics of the object, the robot is learning to relate each of the objects with a very specific sensorimotor pattern. Specifically, an attached pendulum pilots the arm into a circular motion, a half-filled bottle produces axis oriented shaking behavior, a wheel is getting rotated, and wiping patterns emerge automatically in a table-plus-brush setting. By these object-specific dynamical patterns, the robot may be said to recognize the object's identity, or in other words, it discovers dynamical affordances of objects. Furthermore, when including hand coordinates obtained from a camera, a dedicated hand-eye coordination self-organizes spontaneously. These phenomena are discussed from a specific dynamical system perspective. Central is the dedicated working regime at the border to instability with its potentially infinite reservoir of (limit cycle) attractors "waiting" to be excited. Besides converging toward one of these attractors, variate behavior is also arising from a self-induced attractor morphing driven by the learning rule. We claim that experimental investigations with this anthropomorphic, self-learning robot not only generate interesting and potentially useful behaviors, but may also help to better understand what subjective human muscle feelings are, how they can be rooted in sensorimotor patterns, and how these concepts may feed back on robotics.

Received: 19 August 2016 Accepted: 07 February 2017 Published: 16 March 2017

#### Citation:

Edited by:

Poramate Manoonpong, University of Southern Denmark

Subramanian Ramamoorthy, University of Edinburgh, UK Hazem Toutounji, ZI Mannheim, Germany \*Correspondence: Georg Martius

Odense, Denmark Reviewed by:

Der R and Martius G (2017) Self-Organized Behavior Generation for Musculoskeletal Robots. Front. Neurorobot. 11:8. doi: 10.3389/fnbot.2017.00008

georg.martius@tuebingen.mpg.de

Frontiers in Neurorobotics | www.frontiersin.org March 2017 | Volume 11 | Article 8

Keywords: self-organization, robot control, musculoskeletal, tendon-driven, learning, anthropomimetic, selfexploration

# 1. INTRODUCTION

Control is a ubiquitous theme of life and technology. When reaching for a cup of coffee or walking through the mountains, our nervous system controls all movements with great ease, despite the great uncertainty involved in controlling the muscles, the complexity of the task and many other factors. That this simplicity is an illusion is seen as soon as trying to program a robot for doing a task. While the complexity of programming stands as a challenge for decades, in recent times considerable progress has been achieved by new materials (Kim et al., 2013), powerful actuators (Raibert et al., 2008), the improved theory of control (Siciliano et al., 2009), but in particular by the tremendous increase in computational power that allows modeling and physically realistic simulations of very complex systems to improve planning and control (Mordatch et al., 2012; Erez et al., 2013; Posa et al., 2014) and even allows to simulate large controlled muscular body systems (Yamane and Nakamura, 2011), or find new perspectives for artificial evolution (Bongard, 2015) by exploiting super computer power. Also there are a variety of new control paradigms around, best demonstrated by the amazing locomotion abilities of the Boston dynamics robots, like BigDog, PETMAN and others. These are ingeniously engineered systems for realizing a specific set of tasks with their highly specialized bodies. The DARPA challenge also presents numerous examples of progress but also reveals a realm of failures of these systems even under remote control. Alternatively, the so-called embodied AI recognizes that the body can be very helpful in reducing both design efforts and computational load on the controller. The exploitation of the specific properties of the body, sometimes called morphological computation (Paul, 2004; Pfeifer and Gómez, 2009; Hauser et al., 2012) is an active field of research with many impressive results, see Pfeifer and Bongard (2006) and Pfeifer and Scheier (1999), opening new perspectives for both robot control and our understanding of human sensorimotor intelligence (Pfeifer et al., 2012).

The embodied approach seems to be favored in systems with strong physical effects, like soft robotic systems or elastically actuated robots, where the engineering approaches may run into severe difficulties. Though there are a number of interesting results, for instance in employing neural learning to obtain goal-directed behavior, e.g., Manoonpong et al. (2007), Shim and Husbands (2012), Toutounji and Pasemann (2014), and Tetzlaff et al. (2014) using fast synaptic plasticity as in this work, or using simplified spring-models (Park and Kim, 2015), a systematic embodied approach for controlling such systems is not available so far. This is not a surprise, given the aim of exploiting the physical dynamics which is strongly embodiment specific. In this paper we will not aim at a general solution to physics based deliberate control but will investigate the possible role of self-organization (SO) and its general phenomenology in robotics. We will devote this paper to systems with extended embodiment, consisting of a Myorobotics arm connected to a physical subsystem with an internal dynamics of its own. The arm is a muscle-tendon driven (MTD) mechanical system with strong embodiment effects. The controller is a one-layer feedforward neural network which may drive systems into self-organization by a specific learning rule—differential extrinsic plasticity (DEP)—as introduced recently in Der and Martius (2015). It was applied to a number of systems in simulation producing a great variety of behavior. In a slightly modified form, it will face here a new challenge with MTD systems with their strong embodiment effects.

To introduce this paper's topics and claims, imagine that you get an object, a half-filled bottle for that matter, attached to the tip of your forearm such that you can neither know orientation nor identity of the object. When sitting in the dark you probably will start doing something, trying to find out about the object's properties. The idea is, while moving the bottle around, you feel the reaction from the water when hitting the walls of the bottle. Intrigued by this signal and driven by curiosity, you may vary the direction of the shaking motion to end up with shaking parallel to the bottle axis, as the strongest and most coherent force response is coming from there. Without vision or any other external information on the attached object, motor signals are based on the sensor values, i.e., the muscle tensions, modulated by the force responses of the subsystem's internal dynamics. Humans will describe this as feeling the muscles (or the embodiment in general) and generating actions out of this feeling. Generally, behavior is a direct result of the agent-environment coupling, here the dynamical contact between the agent, the arm with its "brain," and the attached object.

Similarly, with DEP learning, the self-excited motion patterns of the arm are guided, or piloted, by the object's internal dynamics. Specifically, an attached pendulum drives the arm into a circular motion, a half-filled bottle produces axis oriented shaking behavior, a wheel is getting rotated, and wiping patterns emerge automatically in a table-plus-brush setting. This is of interest for the self-organized acquisition of behavioral primitives but there is more: as the emerging patterns are object specific, we may say that the robot was able of identifying the object's identity by just the feedback through the (unknown) internal dynamics of the object. Identifying means that our selflearning system responds with a specific sensorimotor pattern for each object attached to the arm. So, this is a cognitive act closely related to the self-organized discovery of Gibson's object affordances, in particular for dynamical interactions, see below. The observation that DEP learning elicits just these subtle effects unknown so far is the central result of this paper.

Acquired with an anthropomorphic robot (arm), these findings may also provide answers to more general questions in human related cognitive science. Specifically, while the phenomenon of feeling the embodiment (and acting out of this feeling) is easy to grasp from the subjective human perspective, understanding it from the objective scientific perspective becomes very demanding. We claim that our experimental investigation with the self-learning anthropomorphic robot may help to better understand what the subjective human feelings are and how they relate to artificial beings so that this knowledge eventually will help building machines that are in behavior closer to humans.

The paper is organized as follows: In the next section we introduce the DEP learning rule for the controller and give a first discussion of properties, in particular of balancing at the edge of instability which is loosely related to the edge of chaos concept. We present in Section 3 the the experiments with the robot, **Figure 3** for an overview of the experimental settings and **Table 1** for a list of videos documenting the various experiments. Throughout the paper, we present different

#### TABLE 1 | Experiments.


The videos can be watched at http://playfulmachines.com/MyoArm-1.

methods for the theoretical analysis based on dynamical system theory. Specifically, we introduce in Section 3.5.1 the eigenvalue spectrum of the linearized dynamical operator, in Section 3.5.2 parametric plots for visualizing the "purity" of a behavior, in Section 3.6 local Lyapunov exponents, and in Section 3.7 Hilbert transforms for analyzing more quantitatively the emerging sensorimotor patterns. Central to the paper is the piloting effect introduced in Section 3.3 which explains how the robot may develop a feeling for the internal dynamics of an object, see also Section 3.6 for its relation to the concept of object affordances. This is followed by Section 4 discussing the findings. Some mathematical details are provided in Section 5 (Supplementary Material).

# 2. ROBOT BEHAVIOR AS A SELF-EXCITED PHYSICAL MODE

The controller we propose is a function that receives at time t a vector of sensor values x<sup>t</sup> ∈ R n and sends a vector of motor values y<sup>t</sup> ∈ R <sup>m</sup>. In the applications, we use a neurocontroller realized by a one-layer feed-forward network as

$$\mathcal{y}\_i = \mathcal{g}\left(\kappa\_i \mathbf{z}\_i\right) \tag{1}$$

for neuron i, where

$$z\_i = \sum\_{j=1}^{n} C\_{ij} x\_j \tag{2}$$

is the postsynaptic potential and Cij is the synaptic connection strength to input j. We use tanh-neurons, i.e., the activation function g(z) = tanh(z) to get motor commands between +1 and -1. This is also the reason why we did not include a bias term in Equation (1).

An important ingredient for the intended self-excitation of behavioral modes is a controlled destabilization of the system. With a fixed C, this destabilization is controlled by the gain factors κ<sup>i</sup> in Equation (1) which regulate the feedback strength for each motor channel i individually. In the experiments we used the definition<sup>1</sup> κ<sup>i</sup> = κ/kCik where κ regulates the overall feedback strength and kCik is the norm of the synaptic vector of neuron i. The setup is displayed in **Figure 1**.

#### 2.1. Learning Dynamics

As we aim at self-organization of behavior, we have to define the control signals in a self-consistent way on the basis of the history of sensor signals alone. Let us introduce x ′ <sup>t</sup> = xt+<sup>θ</sup> , the vector of the sensor values received in the next time step, where θ is a time lag with θ = 1 in the derivations given below (time is measured in discrete update-steps, here 1/100 s).

The self-organized definition of the controller outputs is realized in the following way. Let us postulate the existence of a forward model given by the (possibly state dependent) matrix A so that

$$\mathbf{x}'\_t = A\_t \mathbf{y}\_t + \xi\_t \tag{3}$$

where ξ is the modeling error. This describes the physical dynamics over one time step. Introducing M which is the inverse or pseudoinverse of A we require y to be a function of the future sensor values x ′ ,

$$\mathbf{y}\_t \stackrel{!}{=} M\_t \mathbf{x}\_t' \tag{4}$$

Together with the destabilization, Equation (4) displays the essential idea of our approach to make the system active while keeping motor signals compliant with the world dynamics. In a

<sup>1</sup>This needs a regularization, i.e., in the experiments we use κ<sup>i</sup> = κ/(||C<sup>i</sup> || + λ) with λ > 0 is very small.

sense, Equation (4) means that the world's responses, represented by x ′ , signals the controller what to do. But of course the world (i.e., the future sensor values x ′ t ) is also controlled by the controller through the actions y (Equation 3). The interplay of these effects is the ultimate reason for the self-excitation of modes by self-amplification of system responses.

However, we cannot use Equation (4) directly for generating the control signal y as it contains the future. So, we must find a model for relating the future sensor signals x ′ t to their past, i.e., x<sup>t</sup> , xt−1, . . .. In other words, we need a time series predictor for the sensor dynamics. Following the derivation in Section 5.1 (Supplementary Material) we obtain eventually the update rule

$$
\tau \Delta C\_t = M\_t \dot{\mathbf{x}}\_t' \hat{\mathbf{x}}\_t^\top - C\_t \tag{5}
$$

or in coordinate representation (omitting the time index)

$$\text{tr}\,\Delta\!C\_{i\dot{j}} = \sum\_{k} M\_{ik}\dot{\mathbf{x}}\_{k}^{\prime}\hat{\mathbf{x}}\_{\dot{j}}^{\top} - C\_{i\dot{j}} \tag{6}$$

where xˆ = ˙xk˙xk −2 , see also **Figure 1**. The matrix M defines the sensor to motor mapping which is one-to-one for normal sensors and negated one-to-one for the delay sensors in the experiments of this paper, see Section 5.2 in Supplementary Material, so the sum in Equation (6) reduces to 2 terms. In general M can be more complicated and can be learned in a prior step.

In accordance with earlier work (Der and Martius, 2015), we call this update rule differential extrinsic plasticity (DEP), though there is a difference with x˙ replaced with xˆ as the second factor in the update. Equation (5) becomes stationary if

$$C\_{\vec{\eta}} = \sum\_{k} M\_{ik} \langle \dot{\mathbf{x}}\_{k}^{\prime} \hat{\mathbf{x}}\_{\vec{\jmath}}^{\top} \rangle \tag{7}$$

where h. . .i is the moving time average. Equation (7) is an important consequence of the update rule, showing that learning converges toward behaviors with a fixed point in correlation space, here a fixed pattern of velocity correlations in sensor space, corresponding to specific attractors in state space. In principle such a fixed correlation pattern corresponds to any behavior like crawling, walking, running, hopping or the like of any amplitude and frequency. If the controller were sufficiently expressive and the sensor to motor mapping appropriate, any (cyclic) mode could potentially be realized by this correlation learning. With the matrix M used in this paper, the spectrum of (stable) behaviors is of course restricted but the variety of the observed motion patterns, see below, is still interesting. To enhance self-organization into periodic patterns, we introduce additional sensors which are copies of the primary sensors but are delayed by a fixed time-delay d, see Section 5.3 in Supplementary Material for technical details.

For the analysis in terms of dynamic systems theory to be given below, we will need the dynamic operator

$$L = \text{MC} \tag{8}$$

which describes the mapping from state x to x ′ for the linearized dynamics (Jacobian of linearized system), see Section 5.1 in Supplementary Material for details. The above learning rule differs from the DEP rule introduced in Der and Martius (2015) by the normalization factor k˙xk −2 introduced with Equation (6) above. In the experiments this leads to a more continuous activity in the behaviors avoiding potential pauses of inactivity. In relation to our earlier work on predictive information maximization (PiMax) (Martius et al., 2013) there are several differences: the DEP rule uses derivatives of the sensors values for learning where PiMax uses the raw ones, PiMax requires to perform a matrix inversion of the noise-correlation matrix which is not needed here, and finally the resulting behaviors obtained from PiMax get high-dimensional (in terms of attractor dimension, see Martius and Olbrich, 2015 for details) whereas the DEP rule yields low-dimensional behaviors as we will see in the analysis below.

#### 2.2. Properties

The irreducible conjunction of state and parameter dynamics creates a meta-system—formed by controller, body, and environment—with a rich variety of all kinds of attractors. These can be deliberately switched by manipulative disturbances, creating an attractor meta-dynamics (Gros et al., 2014). This explains why we observe so many different behaviors in the experiments.

#### 2.2.1. Meta-Parameters

Furthermore, there are three parameters in this approach κ, τ and d, which act as meta-parameters for changing the "character" of the SO process. κ determines roughly the amplitude of behavior. In the experiments, the appropriate value for κ is easily found: when increasing κ gradually, a critical value κ<sup>c</sup> ≈ 1 is eventually reached. Using κ > κ<sup>c</sup> the amplitude a of an emerging motion pattern is roughly a ∝ κ − κ<sup>c</sup> for small a. For larger κ the non-linearities come stronger into play such that the amplitude is never above 1. The time lag of the delay sensors d determines the preferred frequency. The parameter τ determines the time scale for taking previous sensor values into account. This has effects on how quickly the controller parameters are wandering around if not yet in a stationary behavior. It is advisable to have it similar or larger to the period of the expected behavior.

#### 2.2.2. Least Biasing

The implementation of the controller is explicitly given by Equation (1) together with the update rule Equation (5) which obviously has no system specific components. In the experiments we start always with the least biased initial condition, putting the controller matrix C = 0 so that all actuators are in their central position. A basic requirement for a "genuine" approach to SO is its independence of specific properties of the controlled system. Obviously, this is realized here in an ideal manner by both the structure of the approach and because there is no specific goal, no target signal, no platform specific information and no biasing.

#### 2.2.3. Theoretical Analysis

It would be interesting and helpful if the wide spectrum of selforganizing behavior could be given a quantitative analysis. In goal oriented learning this can be done by some performance criterion, assessing the difference between actual and intended behavior. However, this seems not appropriate in a true selforganization scenario like that of the present paper. Still one may ask for a profound theoretical analysis of what these systems actually are doing. This paper contributes to that task by presenting several such measures which are partly a bit unorthodox but were quite successful for analyzing behavior generated by the DEP learning rule. Central is the use of dynamical systems theory in several aspects. Specifically, we investigate below the eigenvalue spectrum of the linearized dynamical operator L = MC as introduced in Equation (8), using it for assessing the nature, and the stability of periodic motions, the prevalent modes in this paper. We use local Lyapunov exponents as a more quantitative concept of dynamical system theory, arguing that they may be a first guess for the claimed realization of an edge of chaos system, see Section 3.6 below. Also, parametric plots have proven a viable tool for visualizing the nature of behavior and last but not least, Hilbert transforms of the sensor signals were used for analyzing the phase relations between sensor and motor signals, thereby quantifying the closure of the sensorimotor loop, see Section 3.7.

The nature of the dynamical system generated by the learning rule may also be quantified by a number of methods from complexity theory, information theory (Bialek et al., 2001) and more evolved tools from non-linear dynamics (Kantz and Schreiber, 2004). Akin to this paper are methods for analyzing emergent behavior (Lungarella and Sporns, 2006; Ay et al., 2008; Wang et al., 2012; Schmidt et al., 2013) using information theory. A new quantification based on excess entropy (predictive information) and attractor dimension was recently proposed in Martius and Olbrich (2015) and applied to similar self-organizing behavior as found in this paper. However, there long traces of repetitive behavior where recorded in simulations to estimate entropies. Unfortunately it is impossible to perform this analysis for the fast online learning of the synaptic dynamics, given the time scale of a few seconds or minutes for the behavior generation.

There is some pioneering work in using dynamical systems theory for analyzing behavior generation by fast synaptic plasticity. In Sándor et al. (2015) and Gros (2015), the interesting concept of an attractor metadynamics was introduced which is close to the scenario of this paper. However, their analysis, while pointing in the right direction, is restricted so far to rather simple physical systems in simulation, so that we did not apply it in this paper. Related ideas may also be found in Toutounji and Pasemann (2014, 2016).

#### 2.2.4. Edge of Chaos—The Working Regime for Self-Organization

An essential feature of our approach is the possibility to chose, by the parameter κ, the working regime at the boundary between stable and unstable dynamics. This working regime may be associated with the somewhat vague "edge of chaos" concept (Langton, 1990; Mitchell et al., 1993; Kauffman, 1995; Bertschinger and Natschläger, 2004; Natschläger et al., 2005). As is known from dynamical system theory, this region is not well defined but is otherwise of eminent interest for understanding both life and creativity in natural and artificial beings. Unfortunately, with systems of the physical complexity considered here, a strict mathematical analysis of this region, e.g., by global Lyapunov coefficients, is out of reach of this paper. Nevertheless, in a sense, one can observe in the videos the edge of chaos hypothesis, i.e., to live somewhere between order and fully developed chaos. In fact, on the one hand the systems react very sensitively on weak perturbations, in particular one may observe that the further development of behavior is determined by the initial kick the system experiences or by the interaction with attached objects with an internal dynamics. This extremely sensitive reaction to perturbations is a signature of chaos. On the other hand, see the pendulum video or the bottle shaking experiments, the system also has a high degree of organization as demonstrated by the emergence of long-lived regular orbits. This is the order aspect of the scenario.

Developing quantitative measures for the edge of chaos regime may get the robotic community interested in this very rich, intellectually appealing, and potentially highly useful branch of dynamical system theory based robotics. But this is a topic of future research.

#### 2.2.5. Platforms for Embodied AI

Finally, let us discuss on which platforms our controller is likely to create useful behavior. First of all, the system has to provide sensory feedback about acting physical forces to make embodiment effects perceivable by the controller. This is, for instance, not the case if all perturbations are perfectly compensated by a low-level PID controller. Secondly, there should be sensors reporting a similar quantity as used to control the actuators, e.g., position sensor for position control or force sensors for force control. Additional sensors are typically integrated into the loop if they show a definite response (correlation) to the motor patterns. Thirdly, the behaviors of interest should be oscillatory. Since we only need the main sensor-to-motor wiring information about the particular robot (which can also be learned) and do not require any other specific information, we expect our system to work with a wide variety of machines including soft robots, but this remains for future research.

#### 3. EXPERIMENTS

The above defined controller was used in the experiments with a tendon driven arm-shoulder system from the Myorobotics toolkit (Marques et al., 2013), see **Figure 2**. The system has 11 artificial muscles, 8 in the shoulder and 2 in the elbow and one affecting both. However, two of the shoulder muscles where disconnected. The muscles are composed of a motor winding up a tendon connected to a spring, see **Figure 2B**. The length of a tendon l is given by the motor encoders and the spring compression by f which is in the interval [−α, 1 − α] where α defines pretension (here α = 0.1). The length of the tendons is normalized to l ∈ [−1, 1]. We define the sensor values as

$$\alpha\_i = l\_i + \beta f\_i \tag{9}$$

where β regulates the integration of the spring-compression. In the experiments, β was simply set to 1 without further tuning. It is expected that this choice is not critical. After the initialization, where the arm is put in a defined initial position, all tendons are tightened to their pretension, and all l<sup>i</sup> are set to zero, the system is put into a position control mode where the controller output y<sup>i</sup> defines a target tendon length for each tendon. In the experiments we used the following parameter settings: κ = 0.5, τ = 1 s (Equations 1, 5), delay sensor lag: 0.5 s (Section 5.3 in Supplementary Material), a time distance between x and x ′ of 0.08 s, r = 10−<sup>3</sup> (Equation 22), and an update frequency of the control loop of 100 Hz.

### 3.1. Peculiarities of Muscle-tendon Driven Systems

There are a number of features which make the muscle-tendon driven (MTD) systems different from classical robots with joints under rigorous motor control, i.e., the motor positions directly

FIGURE 2 | Myorobotic arm (A), a single muscle element (B), and a dislocated shoulder (C). The dislocation happens wickedly as soon as the tendons are getting slack.

translate into joint angles and into poses. Naively one could think that control is very easy, realized by just pulling the right strings (tendons) for getting a desired arm pose. However, life is much more difficult due to a number of annoying effects. The most obvious effect is seen when tendons are getting slack so that contact with the physical state of the arm is lost altogether. This has to be avoided by keeping a permanent tension on the tendons, which poses another problem: The tension can only be achieved by tightening each tendon up against all the others, each individual tension being reported by the spring length. This means that (i) there are infinitely many combinations of tension forces for a single arm pose and (ii) that the action of a single motor will be reflected in a change of spring length of all other muscles. In other words, actuating a single muscle is reflected by a pattern of sensory stimulation—a whole-body answer.

Furthermore, the combination of friction effects and musclepose ambiguity leads to a hysteresis effect. After driving the arm by a sequence of motor commands from pose A to pose B one ends up in a different pose and muscle configuration than A after moving back by reversing the motor commands. In general, this makes the translation of a kinematic trajectory for the arm into motor programs difficult, even more so if there are loads and high velocities involved. Also, the classical approach of learning a model by motor babbling becomes problematic because actions cannot be chosen independently.

We conducted several experiments listed in **Table 1** which demonstrate the essential features of the control scheme. All experiments are done with the same controller with the same initialization (C = 0) so that it is only the physical situation that differs between the experiments.

We strongly recommend consulting the videos for better understanding which can be found at http://playfulmachines.com/MyoArm-1.

## 3.2. Self-Regulated Working Regime

Before presenting the experiments in more detail, let us take a look at the sensorimotor coupling that is created by our controller. One of the crucial features is the self-regulation into a working regime where the tendons are kept under tension even in very rapid motions with notable loads. This is very important as it guarantees the signals from the controller to be executed in a definite way. As a result, in all experiments we never had to face a shoulder dislocation, see **Figure 2C**, which may happen promptly if tendons are getting loose. This is of some importance as this sensible working regime emerges without any additional tuning or calibrating (Wittmeier et al., 2012) the system. For that, the specific sensor configuration (Equation 9) seems to be important, but we did not study it systematically yet and expect other configurations to work as well. A more rigorous analysis in terms of the local Lyapunov exponents will be give in Section 3.6 below.

# 3.3. The Piloting Effect. Feeling the Embodiment

In the Introduction, we presented a thought experiment illustrating the main features of this work. We did not yet carry out this experiment with humans, but the scenario of getting piloted by the subsystem toward activities of strongest response is just what we observe with the learning arm for a series of very different objects, ranging from the pendulum to the wheel to the wiping a table setting. In any of those situations we could not only observe the piloting effect but also support it by quantitative analysis. Let us remember that any motion of the arm impacts on the inner dynamics which reacts back on the arm via the force response of the internal dynamics, like the water hitting the wall of the bottle. These force responses modulate the sensor values (measuring the length of the tendons) and may become self-amplifying under the learning rule as substantiated by the following arguments (which still need more theoretical support). Point one is that these signals, though tiny, generically may be systematic, building correlations over space and time. Examples are the slow swaying motion of the pendulum or the inertia motions of the water. As the DEP rule enhances correlations by the learning process, any systematic signal persisting over the time scale of learning contributes to the correlation pattern with an enhanced strength. In the experiments, the time scale set by τ was one second, about the same as the internal dynamics of the subsystems. This seems to be the main cause of the piloting effect. Furthermore, the learning system was seen to be the host without preferences of a wide spectrum of attractors giving rise to a kind of attractor morphing. Meaning the learning rule changes the dynamics such that the attractors continuously change, all modulated by the systematic force responses from the subsystem. In other words, the learning system has no resistance to being piloted into a resonance with the subsystem. The piloting by the subsystem is the leading mechanism in the experiments described in the following.

# 3.4. Manipulability

The dominance of the physical responses makes the system manipulable as any externally applied forces—like a physical robot human interaction—leave their footprint in the sensor values via the changing spring tension. For instance, the arm can always be stopped by simply holding it. The reason is not that the motors are too weak. Instead, x˙ = 0 is a fixed point of the dynamics of the meta-system to which it relaxes if the mechanical degrees of freedom are frozen manually<sup>2</sup> .

Moreover, the system can be entrained by manual interaction into specific behaviors. We demonstrate this in the handshake experiment, see **Figure 3A** and Video 1 in Supplementary Material, where the user is trying to move the arm in a periodic pattern. Besides the possibility to train a robot in this way, the most interesting point is the subjective feeling that comes about when interacting with the robot. In the beginning of such an interplay, the robot seems to have a will of its own as it resists the motions the user is trying to impose. But after a short time the robot follows the human more and more and eventually is able (and "willing") to uphold the imposed motion by itself, see **Figure 4**. Otherwise, depending also on the human partner, the meta-system of robot and human may "negotiate" a joint motion pattern which might be left if the human quits the loop. This can be understood by realizing that any periodic patterns creates a fixed correlation pattern in Equation (7). If the imposed patterns match one of the stable ones, the robot is controlling this pattern by itself. In fact, in the experiments, one can well observe that a "compliant" human is intrigued to follow the system as much as its own intentions, ending up in an orchestrated human-machine dynamical pattern.

Training of a robot by directly imposing motions is not new. The common approaches generate a kinematic trajectory which is afterwards translated into the motor commands by well known engineering methods. This method may run into some difficulties due to the peculiarities of our MTD system discussed in Section 3.1. With DEP learning, imposing the patterns is a process of creative interaction with the system, see also the training of wiping patterns in Section 3.7.

# 3.5. Emerging Modes

As already mentioned above, DEP learning as formulated in Equation (1) drives systems toward attractors in state space corresponding to fixed velocity correlation patterns in sensor space. The selection of a specific attractor may be realized by the self-amplification of a dynamical seed, generically provided by an initial perturbation from e.g., gravitational forces or by tipping the arm.

#### 3.5.1. Self-Excited Pendulum Modes

In a first experiment, we suspend a weight (the bottle) from the tip of the arm, see **Figure 3B**. With the pivot point (arm) at rest the pendulum may realize ellipsoidal or circular motion patterns with fixed frequency. In general, when considering a pendulum with moving pivot it can perform chaotic motions under certain trajectories of the pivot point. With the pendulum attached to the MyoArm, the motions of the weight exert small inertia forces on the arm which change the spring tensions and

<sup>2</sup>This effect involves the normalization factors and fades away once the regularization comes into play. After that, the system tries to move to the global attractor x˙ = x = 0.

coordination (F). All experiments are performed with the same controller.

thereby leave a footprint in the sensor values. To illustrate this point, **Figure 5** displays the sensor reading for the swinging pendulum with the motors being stopped. While being tiny, these reactions are systematic, leading to the self-excitation of resonant modes according to the piloting effect described in Section 3.3 above.

In Video 2 (Supplementary Material) it can be seen<sup>3</sup> directly how latent velocity correlations are being amplified to end up in stable circular motion patterns of the pendulum. The experiment starts in a situation where the motor activities have settled to rest, interrupted by occasional bursts leaving irregular footprints in the sensor values. As to the piloting effect, we have to verify that, starting with this irregular behavior, the compound system is driven into a resonance with the pendulum and that this resonance behavior is dominated by the (tiny) force responses of the pendulum. This may be supported by analyzing the time lag between measured force and driving signal (motor commands). As shown by **Figure 6A**, the incipiently rather irregular phase relation is followed by a constant phase from time t > 40 on. This convergence to a stable mode is also seen by the time evolution of the controller matrix C, see **Figure 6C**.

Let us consider here, as a further bit of analysis, the eigenvalue spectrum of the dynamical operator L = MC, which has proven very useful in this work. Actually, if the system would obey the linearized dynamics, any cyclic behavior should be reflected by the existence of a pair of complex eigenvalues. There might be more of such pairs if there are different frequencies involved. Though questionable due to nonlinearities and deficiencies of the linear operator, this analysis may yield reliable results as seen in the pendulum case: **Figure 6B** clearly displays just such a pair of eigenvalues with absolute value (not shown) a little above one. All other eigenvalues have a absolute value significantly smaller than one which makes the corresponding modes short lived<sup>4</sup> . The latter point was investigated in terms of the local Lyapunov

<sup>3</sup>Note that later in the experiment, the string of the pendulum was shortened such that a different sensorimotor coordination emerges.

<sup>4</sup>This is true in particular for the other complex eigenvalue with roughly half the value, apparently belonging to a subharmonics but this still needs some more analysis.

exponents, see Section 3.6 below, for remarks on that method. Apart from identifying the oscillatory modes, this eigenvalue analysis also confirms the substantial dimensionality reduction which is also known as a signature of self-organization.

#### 3.5.2. Bottle Shaking Modes

In a next series of experiments we attached a bottle filled with some liquid to the tip of the arm in either horizontal or vertical orientation, see **Figure 3C**. These experiments are meant to support our hypothesis on the piloting effect, i.e., that, under the DEP learning rule, the emerging motion patterns are defined eventually by force responses of the subsystem. With the bottle, the force response is solely generated by the internal motions of the water, i.e., when the water is hitting either the walls or top and bottom of the bottle. Similar to the pendulum, starting with spontaneous movements, the arm soon reaches an oscillatory mode with strong force answers. In the experiment, the emerging shaking motions are indeed more or less aligned with the axis orientation of the bottle, see Videos 4, 5 in Supplementary Material, in correspondence to the piloting effect.

We also performed a more quantitative analysis by using parametric plots to characterize the state dynamics. Oriented at the arm's geometry, we identified two pairs of motor values (y1, y3) and (y6, y9) which are expected to be discriminating the direction of the arm movement, i.e., to have different phase relations for the horizontal and vertical arm movements, respectively. When plotting the time course of (y1, y3) and (y6, y9) in the plane, fixed phase relations translate into typical ellipsoidal figures. In **Figures 7C–F** we compare the phase relation for the horizontal and vertical setup (violet and orange line, respectively) for two behavioral modes (see **Figures 7A,B** for the time course and intervals) and indeed find that they are different and often orthogonal to each other. The emerging motion pattern is determined by the axis direction of the bottle, with the reactive forces of the water as the only information for that direction. Metaphorically, the robot can "read" the information about the nature of the environment by just getting into dynamical contact with the latter in a completely self-organized way.

In **Figures 7G,H** we present the time evolution of the matrix elements C3<sup>j</sup> representing the connection to the motor unit 3. As starting from the zero-initialization, one can see how first correlations build up due to the dynamics of the C matrix (Equation 5). The following behavior is highly transient until convergence is (roughly) reached where the dynamics gets more stationary. Any perturbation or change in conditions leads to an adjustment of the controller, always aiming for a mode where high velocity correlations appear.

# 3.6. Rotating a Wheel

A further example for the piloting mechanism (Section 3.3) and the discovery of dynamic object affordances (as discussed below) is the robot arm connected to a wheel, see **Figure 3D**. In Der and Martius (2015), the emergence of rotational modes was demonstrated for a humanoid robot with revolution joints and in simulation. With the MyoArm, we have a much more challenging situation. In the experiments, the tip of the arm is attached to the crank of a wheel, implemented as a revolvable bar with weights for giving it the necessary moment of inertia. In Video 6 (Supplementary Material), initially the connection between the arm and the wheel was rather loose so that for small movements there is no definite response from the rotation of the wheel. After improving this connection, an initial push by the experimenter was sufficient to excite a rotation mode that persists over time and is stable under mild perturbations. It is as if the controller "understood" how to rotate the wheel, although it is just the result of force exchange in combination with correlation learning, i.e., by the mechanism described in Section 3.3. When positioning the wheel in parallel to the arm, the modes were emerging even more readily as seen in Video 7 (Supplementary Material). Furthermore, the system may be changed in frequency by changing just the time-delay d as shown earlier (Martius et al., 2016).

For an analysis, we may use here the method of local Lyapunov exponents, given by the eigenvalues of the dynamical operator L = MC transforming sensor states x to x ′ under the linearized dynamics. **Figure 8A** displays the results. The point of interest are the two largest exponents which are slightly above zero. They represent the rotational mode. Being above zero means that they are actually instable which was to be expected given the slight destabilization of the system controlled by the parameter κ. However, the system dynamics is kept from exploding by the nonlinearities so that the rotation modes are stable but all other modes have to die out, i.e., their Lyapunov exponents have to be below zero. It is also illustrative to consider the absolute change of the controller matrix as displayed in **Figure 8B** (top). At the beginning of a new mode the changes are large and then settle to a background level. When, for instance, the rotation is externally changed (second 40 and 71) then again a high rate of change is observed. The coupling of the sensors to motors also changes qualitatively between the modes as illustrated at the example of motor 6 in **Figure 8B** (bottom).

The constitutive role of the body-environment coupling is also seen if a torque is applied to the axis of the wheel. Through this external force we may give the robot a hint of what to do. When in the fluctuating phase, the torque immediately starts the rotation which is then taken over by the controller. Otherwise, we can also "advise" the robot to rotate the wheel in the opposite direction. This can be considered as a kinesthetic training procedure, helping the robot in finding and realizing its task through direct mechanical influences.

Finally, these results can also be of interest for elucidating the spontaneous discovery of object affordances. Following Gibson (1977) theory of affordances, object affordances are defined as a relation between an agent and its environment through its motor and sensing capabilities (e.g., graspable, movable, or eatable and so on). In this sense, in the same way as a chair affords sitting or a knob affords twisting, the wheel in our experiment affords rotating it, the bottle affords shaking and pouring and so on. This is of immediate interest for embodied AI as affordances are prerequisites for planning complex actions. Because our controller generates dynamic and typically oscillatory movements it can only discover dynamic afforcances, such as shaking, turning etc. but will not find static ones such as sitting on a chair or leaning against a wall.

## 3.7. Wiping

In the case of the wheel setup, above, the embodiment strongly constrains the possible motion patterns. In the next setup the agent-environment coupling imposes a much milder restriction on the behavior: the robot is equipped with a brush and a table is placed in its work-space, see **Figure 3E**. The table height is about 5 cm above the initialized resting position. Video 8 in Supplementary Material demonstrates how, by the combination of the restricting table surface and the manual force, the robot is guided into the two-dimensional wiping mode. Actually, even without this guidance the system typically learns a wiping behavior, because movements perpendicular to the table are strongly damped such that the directions along the table plane may create the highest velocity correlation and thus dominate the generated motion patterns. Later in this video, the robot is forced by hand into a different behavior.

The analysis of the dynamics during this experiment revealed that the wiping patterns where not stationary as it appeared in the video, but are actually slowly drifting. We devised a method to quantify such high-dimensional oscillatory behavior. It considers the phase difference between the different degrees of freedom. For each oscillatory signal we can associate a phase variable that continuously runs from −π to π using the Hilbert transform. Now we can compute the phase difference between the signals from different sensors, for instance. Post-processing is applied to avoid unnecessary 2π phase jumps and to smoothen the signal for better visibility.

In a stable oscillation, the phase difference should stay constant over time. In **Figure 9A**, these phase differences are presented for the wiping experiment. One can see that already before manual interaction, the meta-system is in a transient behavior, with changing phase relations slowly over time. We interpret this as a wandering through the metastable cyclic attractors induced by the learning dynamics. We may also call this a self-induced attractor morphing. During interaction (second 11 onward) the changes are initially stronger, fading out later. After releasing the arm (second 22), behavior persists for a few seconds and then is again drifting away. The corresponding controller matrices also show a significantly different structure in the course of the experiment. With the phase analysis using Hilbert transform we can thus analyze pseudo-stationary highdimensional motion patterns and we believe this methods is also helpful to analyze other systems where attractor morphing occurs.

So, what appeared as stationary actually was a transient behavior. As explained above, there is a potentially infinite reservoir of attractors in C-space, with the learning dynamics

FIGURE 7 | Horizontal and vertical bottle shaking experiment. Depicted are the time traces of the motor values for the horizontal setup (A), see Video 4 in Supplementary Material, and the vertical setup (B), see Video 5 in Supplementary Material. At the marked regions (gray and red bar) both setups are compared in (C–F) with respect to their motor relation (motor 1 vs. 3 and 6 vs. 9). It is visible that the motions in both setups are mostly orthogonal to each other. (G,H) shows the evolution of the coupling of the 18 sensors to muscle 3 over time (corresponding to row 3 in C). In both cases the system starts at C = 0. In the horizontal case the arm was stopped and released at times indicated by vertical lines.

slowly and continuously morphing these attractors. Being more or less a speculation so far, this opens a view into a fascinating species of dynamical systems generated by the learning rule in specific agent-environment couplings. Moreover, this also should substantially improve our understanding of the edge of chaos hypothesis as an overarching concept.

Otherwise, by simply storing the weights (C) of the controller, these patterns can be collected into a repertoire. Video 9 in Supplementary Material shows the recall of and switching between such wiping modes, see **Figure 9B**. For the transition into a different mode the controller was changed abruptly, nevertheless a smooth transition into the new behavior occurs, attraction.

(bottom) and the controller matrices (top) (times, see green dots). See corresponding Video 9 in Supplementary Material. Observe the transients between the

suggesting that most static controllers have a large basin of

behaviors, which are sometimes long, e.g., 15 s for controller 4.

# 3.8. Hand-Eye Coordination

In the previous experiments, the sensorimotor loop was closed in proprioceptive space alone, muscle lengths and tensions generating muscle feelings with the ensuing piloting effect, see Section 3.3. This section investigates the integration of additional sensors given by a camera reporting the spatial coordinates of a green colored object connected to the tip of the arm, called the fist in the following. The camera was positioned to observe the arm from the front, see **Figure 3F**, but other positions would also work. The x − y coordinates of the object are obtained from the green pixels' center of gravity, whereas the z coordinate is given by the size of the pixel cluster. These coordinates are scaled between -1 and +1 as all the other sensors. To better compete with the 9 proprioceptive sensors, the corresponding synaptic weights were multiplied by a factor of 3 (before normalization). No other measures were taken, in particular, all entries for the vision channels in the model matrix M were put to zero in accordance with the least biasing commitment described in Section 2.2. In the experiments, we observed that the robot engaged into all kinds of trajectories similar to those of the purely proprioceptive case, i.e., as if the camera were not present. However, a simple inspection of the C matrix reveals a strong involvement of the vision channels in the generation of the modes, see the redframed rows in **Figures 10C,D**. The constitutive role of the camera can also be seen by the following experiment.

#### 3.8.1. Adaptation to Sensor Transformations—Rotating the Camera

In this setting we rotate the camera about its optical axis while the system is running and DEP learning is on, with a time scale of a few seconds. Initially the camera is rotated about its axis to -90 degrees, see **Figure 10E**. When a relatively stable motion occurs (limit cycle), the camera is slowly rotated to a normal orientation (0 degrees). During that process, the motion pattern of the arm changes until, after stopping the camera rotation, a new attractor behavior is reached. Together with **Figure 10** this shows that the emerging patterns are generated with the camera closely integrated<sup>5</sup> . Eventually, upon rotating the camera further to +90 degrees, the motion of the arm even stops until, after about 15 s, a new consistent behavior emerges, see Video 10 in Supplementary Material and **Figure 10**. The experiment shows that DEP learning generates motion patterns with the camera tightly integrated, i.e., proprioceptive and vision channels are strongly mixed. We remark that readaptation and reorganization of behavior takes place on a time scale of a few seconds.

#### 3.8.2. Hand-Eye Coordination. Emerging Central Pattern Generator

As discussed above, DEP learning potentially integrates all sensor channels, converging toward a fixed point in correlation space which corresponds to a periodic motion pattern in state space. This is seen from the parametric plots in **Figure 11C**, first row displaying a proprioceptive vs. one of the vision channels. Despite the strong perturbations in the complex physical setting, a distinct phase relation between vision and proprioception is seen. This is another corroboration of the integrative strength of DEP.

In a next experiment, we investigate the acquired sensorimotor mappings in more detail. During learning the camera delivers a periodic trajectory in a 3D space, correlated tightly with proprioception. What if we substitute the camera coordinates by those of a fake, or virtual, trajectory. In the experiment, we wait until the system, with the camera included, settled into a stable motion pattern. Then, we freeze the controller matrix C and cover the fist with a white cap making

<sup>5</sup>During a periodic motion pattern the controller matrix C stays relatively constant, i.e., a fixed point in correlation space is reached.

it invisible to the camera's green object detector so that the vision sensors are frozen. Now we use a dummy fist (green ball attached to a stick) to generate camera coordinates by hand, see **Figures 11D,E** for a normal and a dummy fist camera view, respectively.

As demonstrated in Video 11 (Supplementary Material), moving the dummy generates defined movements of the arm, although the arm would typically not follow the dummy if it is arbitrarily moved. However, if the dummy is moved along a similar path as the original movement, the arm is following the dummy, it can be even driven into trajectories with various velocities, and can be stopped deliberately, see Video 11 in Supplementary Material. In **Figure 11A** the time trace of one of the vision sensors and a proprioceptive sensor for the course of the experiment visualizes this behavior. By comparing the parametric plots in **Figures 11B,C**, first and second row we confirm the similarity between the original and the virtual camera trajectory. On the other hand, **Figures 11B,C**, third row shows that a different relation between the sensors occur if the dummy trajectory is in the opposite direction.

Another interesting point is that behaviors can not only be replayed and combined, as demonstrated in the wiping case, but also be driven by virtual trajectories with (moderately) varying shapes and velocities. This can be operationalized for deliberate control. For instance, a central pattern generator could be used to generate the virtual trajectory, giving the opportunity to systematically vary frequency and shape of the emerging behaviors. Furthermore, the emergence of hand-eye coordination and the possibility to deliberately control the arm using virtual trajectories could be of some interest for the development in infants during Piaget's first phase.

# 3.9. Perspectives for Goal Oriented Behavior

Though this paper is devoted to robotic self-organization, let us have a remark on generating user chosen behaviors. The basic idea is the following: the classical control setting is a two level hierarchy where the goal driven controller is applied directly to the low level PID controller realizing the action execution. Here, we advocate for the inclusion of a third, intermediary level, meaning that the higher-level controller is realizing its goals by manipulating the above mentioned meta-system with its wealth of latent behaviors waiting to be excited. How this could be effectively done is still to be investigated. However, the potential success of this extended hierarchy of control is suggested by the experiments. In fact, if we are able to influence the meta-system by hand, why not by just superimposing additional motor signals on the self-regulated meta-system. The use of the approach is encouraged by the mentioned ability of the meta-system to uphold a resilient working regime even under extreme external perturbations, preventing, for instance, shoulder dislocations.

FIGURE 11 | Experiment with camera input. Hand-eye coordination and tracking. (A) a proprioceptive sensor x3 and a vision sensor x11 (up-down direction) over time. The vertical line indicates when the fist was covered with a cap (see E). Black bars indicate time intervals used in (B,C). The yellow bar indicates the cut out part of the corresponding Video 11 in Supplementary Material. (B) Trajectory in vision sensor space for different parts. Left: original movement (with normal camera sight (D), middle: two similar driven behaviors, right: inverted direction movement. (C) The same trajectory relating vision to proprioception sensors x<sup>11</sup> → x<sup>3</sup> and x<sup>10</sup> → x6. (D,E) camera picture for normal and dummy-fist case.

# 4. DISCUSSION

This paper is seen as a further step toward a general theory and practical realization of self-organization (SO) for embodied AI. There are many facets to such a general idea worth to be investigated. In many cases, SO is considered as either self-exploration for scrutinizing the gross properties of the system (to be deliberately controlled afterwards), or (wishfully) used for the acquisition of behavior primitives. While this is often ticked-off as superfluous, to be replaced by well known methods like motor babbling, SO definitely has its realm if systems become larger. This has been demonstrated by a number of successful examples (Der and Martius, 2012, 2013, 2015; Der, 2016) attributing SO a much wider range of applicability. We claim that the results of this paper are a further step as they extend that range to composed systems consisting of the actual robot connected to a subsystem with an unknown internal dynamics. In the paper we ask how a robot may establish dynamical contact with a subsystem, eventually recognizing its identity, if there is no information or model of the subsystem's inner dynamics. Humans seem to have no problems there as they develop a feeling, by their muscle tensions, for the reactions of the subsystem. However, it is not clear what this subjective feeling is and how it is used for controlling the interacting system.

As a first insight offered by our DEP controlled robot, we note that the artificial system does not need any curiosity or other higher level concepts for producing the observed human like behaviors. Oriented at the similarity between our anthropomorphic robot and human behavior, we may question the ontological status of these higher level concepts also in humans. Furthermore, we could reveal a very subtle but dominating effect: by the mere feedback through the internal dynamics of the object, the robot is learning to answer with a very specific sensorimotor pattern to each of the objects. So, the robot discovers the identity of the attached object without knowing anything of its dynamical properties which may be very complex like the water in the bottle. This may be a further example how the robot can both model and substantiate concepts from cognitive science, here Gibson's object affordances. Furthermore, as we could uncover by the analytical tools developed in this work, the emergence of the combined mode and the eventual identification of the attached object—by establishing dynamical contact—is explained by a subtle mechanism which we call piloting.

Unfortunately, due to the high complexity of the system and the subtlety of the effect, a rigorous mathematical analysis is not possible so far. Nevertheless, using some concepts of dynamical system theory, we could establish tentative findings. By keeping the system at the border to instability we find a potentially infinite reservoir of (limit cycle) attractors "waiting" to be excited. Besides converging toward one of these attractors, the rich reservoir of further phenomena could possibly be related to concepts like attractor meta-dynamics (Gros, 2015; Sándor et al., 2015), the so called meta-transients (Negrello and Pasemann, 2008) and the mentioned self-induced attractor morphing. Altogether, these concepts may serve as a characteristic for self-organized behavior in the sensorimotor loop, possibly endowing even the edge of chaos concept with a new realm. There again, we emphasize that the outstanding sound mathematical analysis of these concepts can more reliably reveal their enormous potential for constructing and building such self-learning machines with their creative properties.

It is also important to note that "reading" the object's properties through the mere feedback from its internal dynamics is a direct consequence of those dynamical system properties. Considering the similarity with human behavior again, we may ask if humans also work in this dynamical regime at the border of instability and what the possible consequences are. It must be left to future work to reveal the thereby expected cross fertilization between robotics and cognitive science. Furthermore, the spontaneous identification of dynamical object affordances may be also of some interest for both robotics and embodied AI.

In short, we claim that experimental investigation with anthropomorphic, self-learning robots not only generates interesting behaviors in complex robotic systems. It may also help to better understand what subjective human feelings of physical interactions are, how they can be rooted in sensorimotor patterns, and how these concepts may feed back

#### REFERENCES


onto robotics. Hopefully, this knowledge may eventually help building machines that are as close to humans as possible.

Last but not least, let us briefly compare our results with the literature on SO in robotics. While this paper focuses on the SO of behavior for robots of a given morphology, much of the literature is devoted to SO for self-assembling and selfrepairing (Murata and Kurokawa, 2012), and eventually selfreplicating (Griffith et al., 2005) systems. Very influential for the topic is the paper Pfeifer et al. (2007) presenting the whole spectrum of bioinspired robotics. The central idea is that control is outsourced to the morphological and material properties, see also Hauser et al. (2012), Pfeifer and Gómez (2009), Paul (2004), Pfeifer and Bongard (2006), Pfeifer and Scheier (1999), and Pfeifer et al. (2012). This is in line with our work, as our controller is developing everything from the interplay with the physics of the system. However, to our knowledge previous work does not reach robots of such complexity as demonstrated here. Related to our work is the multiple attractor concept (Tani and Ito, 2003; Gros, 2015; Sándor et al., 2015), which was not yet applied to real robots. Another body of literature exists on SO in swarms (Bonabeau et al., 1997, 1999; Rubenstein et al., 2014; Blum and Groß, 2015) to get swarm intelligence (Engelbrecht, 2006; Nouyan et al., 2008), but there is no relation to our work which is devoted to the development of individual robots.

#### AUTHOR CONTRIBUTIONS

RD and GM conceived and conducted the experiments. GM analyzed the data. RD and GM wrote the paper.

#### ACKNOWLEDGMENTS

We thank Alois Knoll for inviting us to work with the Myorobotic arm-shoulder system at the TUM. Special thanks go also to Rafael Hostettler for helping us with the robot and control framework. GM received funding from the People Programme (Marie Curie Actions) of the European Union's Seventh Framework Programme (FP7/2007-2013) under REA grant agreement no. [291734].

#### SUPPLEMENTARY MATERIAL

The Supplementary Material for this article can be found online at: http://journal.frontiersin.org/article/10.3389/fnbot. 2017.00008/full#supplementary-material

Bialek, W., Nemenman, I., and Tishby, N. (2001). Predictability, complexity and learning. Neural Comput. 13:2409. doi: 10.1162/089976601753195969


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2017 Der and Martius. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# A Neural Dynamic Architecture for Reaching and Grasping Integrates Perception and Movement Generation and Enables On-Line Updating

#### Guido Knips <sup>1</sup> , Stephan K. U. Zibner <sup>1</sup> , Hendrik Reimann<sup>2</sup> and Gregor Schöner <sup>1</sup> \*

1 Institute for Neural Computation, Ruhr-University Bochum, Bochum, Germany, <sup>2</sup> Department of Kinesiology, Temple University, Philadelphia, PA, USA

Reaching for objects and grasping them is a fundamental skill for any autonomous robot that interacts with its environment. Although this skill seems trivial to adults, who effortlessly pick up even objects they have never seen before, it is hard for other animals, for human infants, and for most autonomous robots. Any time during movement preparation and execution, human reaching movement are updated if the visual scene changes (with a delay of about 100 ms). The capability for online updating highlights how tightly perception, movement planning, and movement generation are integrated in humans. Here, we report on an effort to reproduce this tight integration in a neural dynamic process model of reaching and grasping that covers the complete path from visual perception to movement generation within a unified modeling framework, Dynamic Field Theory. All requisite processes are realized as time-continuous dynamical systems that model the evolution in time of neural population activation. Population level neural processes bring about the attentional selection of objects, the estimation of object shape and pose, and the mapping of pose parameters to suitable movement parameters. Once a target object has been selected, its pose parameters couple into the neural dynamics of movement generation so that changes of pose are propagated through the architecture to update the performed movement online. Implementing the neural architecture on an anthropomorphic robot arm equipped with a Kinect sensor, we evaluate the model by grasping wooden objects. Their size, shape, and pose are estimated from a neural model of scene perception that is based on feature fields. The sequential organization of a reach and grasp act emerges from a sequence of dynamic instabilities within a neural dynamics of behavioral organization, that effectively switches the neural controllers from one phase of the action to the next. Trajectory formation itself is driven by a dynamical systems version of the potential field approach. We highlight the emergent capacity for online updating by showing that a shift or rotation of the object during the reaching phase leads to the online adaptation of the movement plan and successful completion of the grasp.

#### Edited by:

Christian Tetzlaff, Max Planck Institute for Dynamics and Self Organization (MPG), Germany

#### Reviewed by:

Florentin Wörgötter, University of Göttingen, Germany Thomas Wennekers, Plymouth University, UK

> \*Correspondence: Gregor Schöner gregor.schoener@ini.rub.de

Received: 02 November 2017 Accepted: 14 February 2017 Published: 02 March 2017

#### Citation:

Knips G, Zibner SKU, Reimann H and Schöner G (2017) A Neural Dynamic Architecture for Reaching and Grasping Integrates Perception and Movement Generation and Enables On-Line Updating. Front. Neurorobot. 11:9. doi: 10.3389/fnbot.2017.00009

Keywords: neural dynamics, dynamic field theory, autonomous reaching, autonomous grasping, online updating

# 1. INTRODUCTION

Object-oriented reaching and grasping in natural settings, a key element of human-robot cooperation, continues to be a challenge for autonomous robots (Herzog et al., 2012). Humans grasp and handle objects fluently, of course, although these are among the harder movement tasks, learned in infancy (Thelen et al., 1996), but with continued development for close to 10 years of life (Schneiberg et al., 2002). Humans easily reach and grasp objects that they see for the first time or that are partially occluded. They may grasp an object after closing their eyes. Anytime during movement preparation or execution, humans may update the motor plan when the object shifts or rotates (Desmurget and Grafton, 2000). This performance entails, in humans, a close coupling among perceptual processes including gaze control, shift of attention, segmentation, recognition, and pose estimation of the object, as well as between perception and motor processes including initiating, coordinating, and terminating reach and grasp movements.

Robotic approaches to grasping (reviewed in Carbone, 2013) have traditionally made strong demands on what perception delivers, often based on object models. Except for visual servoing, those approaches are most appropriate for static situations with well-known objects. In contrast, recent work has employed simpler perceptual processes, that deliver fast estimates of pose and grasp parameters and enable grasping objects that move with a conveyer belt (Cowley et al., 2013). Another recent line of work learns to extract grasp parameters that are linked to probabilistic models that enable generalization beyond the trained poses, and lead to most impressive real time grasping performance (Huang et al., 2013). Related work learns grasp primitives from demonstration (Herzog et al., 2014), from exhaustive simulation (Curtis and Xiao, 2008), from examples of object categories (Madry et al., 2012), or based on tactile feedback (Platt et al., 2006). Explicit modeling of the uncertainty of grasp parameters provides a potential solution (Li et al., 2016).

This paper is based on two hypotheses. First, we think that we may learn from how humans generate reaching and grasping movements. For instance, as a major theme that we address here, we believe that reaching and grasping is possible in humans with much simpler, lower-level perceptual representations than traditionally assumed in autonomous robotics. The perceptual processes engage attention and enable continuous online coupling to the sensory surface. Another example is at the level of control: The nature of actuation through muscles that act as relatively soft, tunable strings makes it possible to grasp without a precise estimate of grasp points. It is enough to set the equilibrium length of muscles in the hand to a posture inside the object and the muscles will then generate grip forces through their peripheral reflex loops (Santello et al., 2016). In this paper, we address the first, but not yet the second idea.

The other hypothesis is, in a sense, the converse. Many of the neural processes underlying human movement that is directed at objects have not yet been comprehensively understood in neuroscience (Andersen and Cui, 2009; Lisman, 2015). This means that neurally based process models do not stand ready to be imported into robotics. But this also means that how the component processes work together in the nervous system needs to be better understood. Integrated models demonstrate reaching and grasping in neurally grounded ways that may make a contribution to understanding neural function.

Our research agenda is thus to build an integrated model of reaching and grasping based on neural process accounts inspired by the human mind. We do this based on the theoretical framework of Dynamic Field Theory (DFT, see Schöner, 2008 for an introduction, Schöner et al., 2015 for a systematic tutorial), a neurally grounded set of concepts that address visual representations, coordinate transforms, attentive selection, working memory, and behavioral organization. To build and implement a complete model of reaching for and grasping novel objects, we propose a neurally inspired computational architecture.

All processes are modeled as neural dynamics, so that the entire architecture is essentially one big dynamical system. The theoretical framework of Dynamic Field Theory (DFT) provides the means to represent information, to perform detection and selection decisions, to model attention, track time varying input, and to store information in working memory. Instabilities of the neural dynamics create the discrete events from timecontinuous processes at which processes are initiated and terminated (Sandamirskaya et al., 2013). The neural dynamics interfaces with attractor dynamics that generate movements and control the robotic arm and hand (Reimann et al., 2011). The model builds on earlier work on scene representation (Zibner et al., 2011a), and on the simultaneous recognition of objects and estimation of their pose (Faubel and Schöner, 2009). We show how neural dynamics enable integrating and organizing all component processes, from the perception to the initiation and termination of robotic movements (Richter et al., 2012).

The approach is tested on a robotic agent called CAREN consisting of a Kuka LWR4 with seven degrees of freedom, with an attached Schunk Dextrous Hand (SDH) featuring additional seven degrees of freedom and tactile sensors. The arm is mounted on a Schunk PR 90 rotary module with one degree of freedom. We are using a Kinect camera to perceive the scene (see **Figure 1**).

This work is innovative in two different ways. On the one hand, this work is part of a research program in which robotic demonstrations are used to evaluate theoretical models of human cognition and behavior (Adams et al., 2000). Neural dynamics is a theoretical perspective within this program in which process models are formulated that may be linked to real sensory and motor systems (Erlhagen and Bicho, 2006). Previously, neural dynamics has been used to demonstrate reaching (Strauss and Heinke, 2012; Fard et al., 2015; Strauss et al., 2015). We expand on this work by including the autonomous sequential organization of the behavior and addressing grasping as well. Ours is one of the first demonstrations that cover the complete path from sensing to acting in a difficult task, that includes attention, recognition, estimation, executive control, movement planning, and control. In this demonstration, we integrate four separate neural dynamics models of component processes for scene representation (Zibner et al., 2011a), object classification with concurrent pose estimation (Faubel and Schöner, 2009),

behavioral organization (Richter et al., 2012), and movement generation (Reimann et al., 2011).

On the other hand, in direct comparison to approaches to grasping that are unconstrained by analogies with human cognition, the strength of the present work is the capacity to accomodate online updating to changing sensory information, while at the same time addressing the sequential organization of behavior and perception. For instance, work like Huang et al. (2013) has powerful online updating of the grasping action itself, but has a highly simplified perceptual system and limited behavioral flexibility. We think of online updating as a characteristic and attractive property of the neural organization of reaching and grasping and this is why we focus on demonstrating it here.

# 2. METHODS

We begin by providing a survey over the component processes involved in autonomous grasping and the over-all flow of activation in the neural dynamics architecture (**Figure 2**). Perception (on the left) consists of scene representation and object recognition. Scene representation entails the processes of visual exploration, which sequentially attends to subregions of the scene that may contain objects and commits an estimate of local height at each attended location to working memory. Visual exploration is a precondition of the query behavior, which processes a cue that defines a target object, brings matching locations into the attentional foreground and thus enables the process of object recognition to take over. Object recognition entails two interacting processes, shape classification, and pose estimation. Shape classification determines the type of grasp that will be used for the current target object, while pose estimation specifies parameters of the reach and the grasp such as hand orientation. Once both processes have converged, a sequence of actions executes the grasp (illustrated on the right). Initially, two behaviors are activated: "Open hand" does what the name suggests and "approach" drives the hand to a point close to the target object while orienting the hand based on a pose estimate. After both behaviors are completed, the "grasp" behavior moves the fingers. Up to that point, online updating of the classification and pose estimation processes is possible, after this point, online updating is suppressed. After detecting contact of the hand on the object's surface through tactile feedback, the "lift" behavior is activated, which raises the arm with the grasped object upwards from the table surface.

Although this description suggests that the individual behaviors and processes are separate modules, in reality they are all just subsets of one large system of differential and integrodifferential equations, the neural dynamics, whose solutions evolve continuously in time. These equations are coupled internally according to the architecture and to online sensory inputs. Online updating is thus a pervasive property of the architecture and neural dynamics approaches, in general. We now take a closer look at the elementary building blocks of the architecture to illustrate how neural dynamics and, specifically, DFT, are organize the interaction of the behaviors and processes.

### 2.1. Dynamic Neural Fields

Dynamic neural fields are the building blocks of Dynamic Field Theory (DFT). Continuous neural activation patterns, u(x, t), defined over a feature dimension, x, evolve in time according to an integro-differential equation that has been proposed as a simplified model of cortical neural dynamics (Amari, 1977):

$$
\tau \dot{u}(\mathbf{x}, t) = -u(\mathbf{x}, t) + h + s(\mathbf{x}, t) + \int \mathbf{w}(\mathbf{x} - \mathbf{x}') \sigma(\mathbf{u}(\mathbf{x}', t)) d\mathbf{x}'.
$$

Here, τ determines the time scale on which activation evolves. The −u-term endows this neural dynamics with the fundamental stability mechanism that creates different kinds of attractor solutions under different conditions. The attractor at the resting level, h < 0, is stable in the absence of external input, s(x, t). When such input from other neural fields or from sensory surfaces remains small, the attractor is shifted to h + s(x, t). When inputs become sufficiently strong so that this solution reaches a threshold given by the sigmoidal nonlinearity, σ(·) = 1/(1 + exp(−β·)), this attractor becomes unstable. The system switches to a new attractor state, a localized peak of activation that is sustained by local excitatory and global inhibitory interaction characterized by the interaction kernel, w(1x). The instability at which a switch to such a self-stabilized peak solution occurs is the detection instability, used to implement detection decisions in DFT. Localized peaks become unstable at the reverse detection instability at lower levels of input. Multi-modal inputs may lead to the formation of a self-stabilized peak at a single location in the field. This is how selection decisions are realized in DFT. Under appropriate conditions (for resting level and interaction strength), self-stabilized peaks may remain stable once the inducing localized input, s(x, t), is removed. Dynamic fields may be analogously defined over multi-dimensional spaces. Such sustained peaks of activation are the model of working memory in DFT. See Schöner et al. (2015) for a systematic exposition of the mathematical and conceptual structure of DFT. The stability regimes described here depend, of course, on parameter values. Typical values of the main parameters of the neural field dynamics used throughout the architecture are: τ = 100 ms, β = 100, h between −15 and −5, global inhibition between 0.01 and 0.5, excitatory interaction 1, width of exitatory interaction kernel between 3 and 5.

## 2.2. Neural Dynamics of Behavioral and Process Organization

Zero-dimensional neural activation fields are essentially discrete activation nodes described by a differential equation analogous to Equation 1:

$$\tau \dot{u}(t) = -u(t) + h + s(t) + \omega \sigma(u(t))$$

This dynamics may have an "off " attractor at negative levels of activation, and an "on" attractor at positive levels of activation. The "off " attractor may disappear in a detection instability at sufficiently high levels of input, s. The "on" attractor may disappear in a reverse detection instability at sufficiently low levels of input, s. Both attractors may co-exist bistably for intermediate levels of input. Such nodes are used in DFT to represent the activation and deactivation of categories, processes, or behaviors. For the organization of processes and behaviors, pairs of such activation nodes form an executive control unit (ECU, see Richter et al., 2012). When the intention node of an ECU is "on," it provides spatially homogenous excitatory input (a "boost") to parts of the architecture that is responsible for executing an associated process or behavior. The Condition of Satisfaction (CoS) node is activated when sensory or internal inputs are detected that indicate the completion of a process or behavior. CoS nodes inhibit the intention node, turn "off " the associated process or behavior. A third node may be joined to an ECU to represent a working memory of CoS activation, which maintains a record of the past completion of a processing step. Typical values of the parameters of the neural dynamic nodes used throughout the architecture are: τ = 100 ms, β = 100, h between −1 and −2, global inhibition 0.01.

#### 2.3. Visual Processing Pathway

The autonomous neural dynamics of visual processing controls exploratory attentional processes that build a working memory representation of the scene, which can be queried to activate a particular target object. A second block of processes determines object identity through classification and estimates object pose to determined grasp parameters.

#### 2.3.1. Scene Representation

The architecture contains an expanded version of a neural dynamic system for scene representation (Zibner et al., 2011a), in which neural dynamic nodes implement a form of process organization (Richter et al., 2012) to enable the autonomous visual exploration of the scene which can transition into a query mode that focusses attention on a target object in the scene. **Figure 3** expands this part of the complete architecture. As a cue to locations on a table surface, at which objects may be placed, we use color and visual depth estimates obtained from a Kinect sensor that views the scene in the work space of the robot arm. The idea is that color saturation on the homogeneous table surface guides attention to candidate locations. The height over the table surface estimated at these locations is then used to decide if an object is present (Petsch and Burschka, 2010).

Specifically, we use the Point Cloud Library (Rusu and Cousins, 2011), to find the largest surface in the RGB-D data, which is then identified as the table surface. Height and color maps are extracted in world coordinates. The distribution of saturation in the color map is passed through a sigmoid function and provides input to a neural field defined over the table surface (the space field in the green box of **Figure 3**). Only regions on the table at which saturation reaches a threshold level drive the neural field through a detection instability and induce a self-stabilized local peak of activation. This effectively suppresses outliers and filters out the noise that is typical of RGB-D data. The field is operated in a dynamic regime in which multiple self-stabilized peaks may coexist. It functions as a salience map for color (Itti et al., 1998).

The color salience space field provides input to a second neural field, the attention field, also defined over the table surface. This field is operated in the dynamic regime in which a single localized peak is stable at any time, implementing a selection decision. A self-stabilized peak in this field implements, therefore, selective attention and provides the attentional focus for the rest of the architecture. Height estimates from the subregion on the table, at which activation in the attention field is above threshold, are input into a one-dimensional neural field, the height field. The selected spatial region and the neural activation pattern representing height estimates are crossed to provide input into a three-dimensional field, the space-height field (on the top right

in the red box of **Figure 3**). For details of how the combination of two lower-dimensional inputs can be used to drive a higherdimensional field, please refer to Zibner et al. (2011a) or Chapter 9 of Schöner et al. (2015). The space-height field is operated in multi-peak working memory mode, so that it represents the location on the table and height of a potential object as a selfsustained peak, even after the attentional and height inputs are removed. It provides input to a second, three-dimensional field, the height query field, that is operated in single-peak mode and thus selects location and the associated height. Input from the attention field controls the location at which input from the space-height field may induce a peak. The height query field thus serves to retrieve a stored object location and height from scene memory.

To guide visual exploration, a multi-peak field over the table surface, the space memory field, keeps track of all locations that have come into the attentional focus of the system. A sustained peak of activation is induced each time selective attention is focussed at a location. The space memory field in turns inhibits the attention field and thus biases the process of attentional selection away from locations that have previously been the focus of attention. Autonomous exploration is now organized by a Condition of Satisfaction connection from the height query field into the attention field. Every time a peak has been successfully selected in the height query field, this signals that a memory has been created that matches the currently selected location and currently estimated height. This is the CoS of memory formation and inhibits the attention field, deleting the self-stabilized peak there in reverse detection instability. As a result, the peak in the height query field is no longer supported by selective attention and also decays, releasing the attention field from inhibition. The attention field is ready to select the next location for spatial attention. Inhibitory input from the space memory field now tends to inhibit return to the same location or other recently attended locations, biasing the selection process to new locations with salient color input. This process of visual exploration is continuously ongoing, confirming past memories in the spaceheight field, updating such memories or creating new such memories as needed.

Autonomous visual exploration can be interrupted at any time by a query for a target object, that triggers the estimation of grasp parameters. The target object can be specified by a spatial cue or by cues of characteristic object features, such as color (for a more detailed description of the querying behavior, see Zibner et al., 2011b). There is a set of neural nodes that activate and deactivate parts of the architecture by boosting or deboosting the resting levels of the associated fields. Not all of those nodes are plotted in the survey over the architecture for simplicity (see a description in the first part of the Results Section for the functional role of these nodes).

#### 2.3.2. Shape Classification and Pose Estimation

Estimation of grasp parameters is based on a recurrent architecture for object recognition (Faubel and Schöner, 2009). In the original work, a weighted sum of object templates, one for each known object, is compared to the current input image. Applying cascaded transformation operations of shift, rotation, and scaling) to the current input and matching the transformed input to each of the memorized templates (by cross-correlation, "C") yields a competitive weight of each template. Dynamic neural nodes compete with each other, leading to the selection of the template in a classification decisions. In a concurrent process, all templates are weighted with the current activation level of their dynamic neural node and summed. This inverse cascade of image transformations is applied and a match to the input image in each possible pose provides input into neural activation fields defined over the pose parameters for shift, rotation, and scaling. These fields are operated in a single-peak mode so that an emerging self-stabilized peak represents a selection decision among poses. The concurrent upward classification, and downward pose estimation processes converge in closed loop, activating an object identity representation in the set of neural nodes, and a pose estimate in the set of neural fields.

For the present purpose, we replace learned object templates with simple geometric shapes (square, circle, oblong rectangles). The subregion on the table that the attentional focus defines provides the visual input to the shape classification and pose estimation system. The two-layer decision architecture of the original model was further simplified into single layer decision fields connected each to a single inhibitory node that slows down the decision process, allowing multiple candidate peaks to form before a decision emerges. **Figure 4** gives an overview of the resulting architecture. The different stages of pose estimation are highlighted by the background color: translation (red), rotation (yellow), and scaling (green). The set of neural nodes that makes shape classification is highlighted in blue.

As the shape classification and poste estimation process converges, it delivers a shape candidate whose location is specification more precisely within the table surface than the attentional systems does. The scaling and rotation estimates together with features of the shape category are used to determined the grasp parameters, represented in the grasp decision field Oblong objects with a low height are grasped from above, while cylindrical objects and cuboids with a square base with sufficient height are grasped from the side. The latter objects need different approach movements prior to grasping, since cylinders, unlike cuboids, can be grasped sideways equally well from any direction.

Note, that the estimation process is continuously coupled to visual input through the attentional channel. As a result, changes in the scene are fed into the pose fields enabling online updating of the grasp parameters. In the current version of the model, online updating occurs only with respect to two dimensions of the task, translating, and rotating the gripper.

#### 2.4. Reaching and Grasping

This section explains how data from the scene representation and the shape classification/pose estimation systems are used to

#### FIGURE 4 | A sketch of the shape/pose estimation system used to classifify the attended part of the visual scene into a shape category and to concurrently estimate its pose. Along the downward pathway on the left, the input image is transformed based on the current estimates of translation, rotation, and scaling before being compared to the stored shape templates at the bottom. Along the upward path on the right, the current weighted sum of shape templates is inversely transformed by scaling and rotation operations. Cross-correlations with the input image yield updates to pose estimates. The pose fields in the center column feed into the representation of grasp parameters.

generate movement and to grasp an object. The overall scheme is as follows. Depending on the object pose parameters (position, height, rotation, and shape) and the current arm configuration, a desired wrist position and orientation for the hand are computed. These desired values are then set as attractors in a dynamical system that generates movement for the arm. The movement unfolds autonomously in three phases organized by a neural dynamics of the type reviewed earlier (Section 2.2). First, the hand is opened, brought close to the object, and oriented in a way that enables grasping the object. Second, the hand is moved through the remaining distance to the object, and is closed. The third phase begins when the object has been grasped as signaled by the tactile sensors on the fingers. The hand is then moved upward in space, lifting the object. This sequence of actions is generated by a neural dynamics of behavioral organization that is illustrated in **Figure 5**.

#### 2.4.1. Generating Motor Commands

Motor commands are generated from desired values for the wrist position and hand orientation using the attractor dynamic approach (Reimann et al., 2011). To move the wrist, movement speed, and direction are controlled separately. The rate of change of movement direction depends on the angle between the current movement velocity, Ev, and the vector, Ek, from the wrist position to the target position,

$$\phi = \arccos\left(\frac{\left(\vec{\nu}, \vec{k}\right)}{|\vec{\nu}||\vec{k}|}\right). \tag{1}$$

Reducing this angle to zero corresponds to changing the movement direction into the direction in which the target lies. This constraint is imposed by the dynamics of that angle, given by

$$
\dot{\phi} = -\alpha\_{\text{dir}} \phi,\tag{2}
$$

which is linear, simplifying Reimann et al. (2011). Here, αdir is a rate factor.

To translate this constraint into a motion command for the robotic arm, consider the direction, Ev⊥, in which the movement vector, Ev, is changed. It is perpendicular to Ev and lies in the plane spanned by Ev and Ek. Computed as:

$$|\vec{\nu}\_{\perp} = |\vec{\nu}| \frac{(\vec{k} \times \vec{\nu}) \times \nu}{|(\vec{k} \times \vec{\nu}) \times \nu|}. \tag{3}$$

and normalized to have the same length as Ev.

Combining the two equations we determine the direction in which the wrist's velocity vector in cartesian space should change so as to bring the hand closer to the target location:

$$
\vec{f}\_{\text{dir}} = \vec{\nu}\_{\perp} (\dot{\phi} - \dot{\phi}\_{\text{dev}}).\tag{4}
$$

Here, φ˙ dev is the rate at which the direction from the hand to the target changes due to the movement, Ev, of the hand in space. The direction of change lies in the appropriate plane and is proportional to the rate of change of the direction to the target corrected for the rate of change of that direction that is induced by the movement of the wrist in space.

To control movement speed, its rate of change, v˙, is proportional to the difference between the current speed, v = |Ev|, and a desired speed vdes:

$$\vec{f}\_{\text{vel}} = \frac{\vec{\nu}}{\nu} (-\alpha\_{\text{vel}} (\nu - \nu\_{\text{des}})) \tag{5}$$

where αvel is a rate constant. As a contribution to the rate of change of the 3D velocity vector, this contribution lies in the direction of the current velocity.

A third contribution to the dynamics of the hand velocity vector slows down the hand when it is close to the target object in order to reduce any impact in case of misestimation and collision. A local safe control law is proportional to the distance between hand position, gE, and target position, pE:

$$\nu\_{\text{local}} = -\beta\_{\text{pos}}(\vec{\varrho} - \vec{p}),\tag{6}$$

and is expanded in vector form as

$$\vec{f}\_{\rm pos} = -\alpha\_{\rm pos}(\vec{\nu} - \min\{|\nu\_{\rm local}|, \nu\_{\rm des}\} \frac{\nu\_{\rm local}}{|\nu\_{\rm local}|}),\tag{7}$$

where αpos and βpos are two rate factors. The introduction of vdes is a change over the approach of Reimann et al. (2011) intended as a safety measure to delimit movement speeds of the arm.

The rates of change of the hand's velocity vector in Cartesian space are transformed into joint space with the help of the pseudo-inverse, J + p , of the Jacobian matrix of the wrist position. The three contributions are then summed after each contribution is weighted with a sigmoidal factor that reflects the distance of the hand to the target. The result is the planned angular acceleration, FE, of the robotic arm in joint space:

$$\begin{split} \vec{F} &= \sigma(|\vec{k}| - d\_{\text{thr}}) \boldsymbol{O}\_{p}^{+} \cdot \vec{f}\_{\text{dir}}^{+} + \boldsymbol{J}\_{p}^{+} \cdot \vec{f}\_{\text{vel}}^{+} \text{)} \\ &+ (1 - \sigma(|\vec{k}| - d\_{\text{thr}})) \boldsymbol{J}\_{p}^{+} \cdot \vec{\nu\_{\text{vel}}} . \end{split} \tag{8}$$

This control strategy for the hand's position largely follows Reimann et al. (2011). The control law of the hand's orientation is formulated for the three Euler angles of the hand used as target angles for the three most distal joints of the arm. The desired rotation matrix R can thus be split into three subsequent rotations around three fixed axes. For each of these three most distal joints, θi , the angular acceleration θ¨ i is proportional to the deviation between the current joint angle, θ<sup>i</sup> , and the desired joint angle, θi,des, corrected for by the current angular velocity, vθ<sup>i</sup> , induced by the movement of the hand in space according to Equation 9:

$$\ddot{\theta\_i} = -\alpha\_{\text{rot}}(\nu\_{\theta\_i} - \beta\_{\text{rot}}(\theta\_i - \theta\_{i,\text{des}})).\tag{9}$$

Here, αrot and βrot are rate constants of the dynamics.

Finally, the opening and closing of the hand is controlled through a linear first order dynamical system:

$$\vec{\hat{\theta}} = -\omega\_{\text{hand}}(\vec{\hat{\theta}} - (\omega\_{\text{grap}}\vec{\hat{\theta}}\_{\text{closed}} + \omega\_{\text{approach}}\vec{\hat{\theta}}\_{\text{open}})).\tag{10}$$

This dynamical system has attractors either at a joint angle configuration, θE open, corresponding to an open hand or at a joint angle configuration, θE closed, corresponding to a closed hand. These joint configurations depend on the shape template of the object to be grasped.

#### 2.4.2. Target Positions and Orientations

Desired positions, g, for the wrist are defined for the approach, grasp and lift behaviors, as well as for different grasp types. All approach points for the different object types are updated online. The target point, gEapproach, for the approach behavior depends on the grasp type. For vertical objects, it lies in a horizontal plane at two thirds of the object's height at a certain distance from the object that depends on the object's shape. For cylindrical objects, the vector, Ek, from the current wrist position to the object position is projected onto the table plane to obtain the direction from which to grasp. For objects with a square base, one of the four sides is selected. This entails computing the inner product of Ek with each of four vectors that are orthogonal to each side. Using four competing neural nodes, the vector that best matches is selected. For objects that are grasped from above, the approach point is at a fixed distance above the object. A weighted sum

$$
\vec{\text{g}}\_{\text{approx}} = \frac{1}{n} \sum\_{i} \omega\_{i} \vec{\text{g}}\_{i},\tag{11}
$$

over the n different object types is used to calculate the instantaneous approach point. The values for w<sup>i</sup> are the output values of the grasp decision field.

For the target point of the grasping behavior, we use a point on the object vector, <sup>E</sup>k, at a certain distance, <sup>d</sup><sup>i</sup> , from the object

$$\vec{\mathbf{g}}\_{\text{grasp}} = \frac{1}{n} \sum\_{i} w\_{i} d\_{i} \frac{-\vec{k}}{|\vec{k}|}. \tag{12}$$

To lift the object, a position, gElift is set to a point 50 cm above the table surface located directly above the current position.

The current target position for the movement generation system is then set to the weighted sum over all these different target points

$$\vec{\mathfrak{g}} = \boldsymbol{\omega}\_{\text{approxach}} \, \vec{\mathfrak{g}}\_{\text{approxach}} + \boldsymbol{\omega}\_{\text{grasp}} \, \vec{\mathfrak{g}}\_{\text{grasp}} + \boldsymbol{\omega}\_{\text{lift}} \, \vec{\mathfrak{g}}\_{\text{lift}} \tag{13}$$

in which the weight factors are the activation states, w<sup>i</sup> , of the corresponding behavior.

The orientation of the hand at grasp is chosen so that the opening of the hand points toward the object and the fingers are aligned with the object's surfaces. For tall, narrow objects that are grasped from the side, the palm is chosen to be oriented perpendicular to the table surface. For flat objects that are grasped from above, the palm is oriented parallel to the table. Again a sum is used to obtain the desired orientation of the hand from these contributions, weighted with the activation level of the associated shape class.

#### 3. RESULTS

A first goal of our experimental work is to illustrate how the neural dynamic architecture generates the time courses of visual exploration, shape classification and pose estimation, and movement generation. In each case, we aim to show how transitions between different phases of behavior emerge autonomously from the space time continuous dynamical systems. Although we inspect the three components of scene representation, shape classification, and movement generation, one by one, these componets are tightly coupled in the overall neural architecture and evolve in parallel. The second goal is to demonstrate and assess the properties of the neural architecture in achieving reaching and grasping actions. We report three sets of experiments that probe online updating with respect to three dimensions of the task (grasping, translating, rotating). In the following sections, we first give detailed account of the general flow of neural activation through the dynamic fields and nodes. Then we report the results of the three experiments set up to probe specific characteristics of the system.

## 3.1. Time Course of Scene Representation

As long as there is no active cue, the neural architecture of scene representation (**Figure 3**) performs visual exploration which can be described as follows. The distribution of color over the table surface is captured by the space field that forms one peak at each location with salient color. These peaks provide localized input to the attention field, which generates a single peak and inhibits all alternative locations. This peak masks input from the height map to the height field so that only height measurements within this window contribute. A neural node that detects a peak in the

the raw and transformed input image is shown. At the beginning (time passes from left to right) the transformed input is blurred out and the estimation fields only contain sub-threshold activity. While the process converges, the estimation fields select pose candidates. With the fixed pose, the shape field converges onto a classification of the base shape. Note that this is a recurrent process, that is, pose estimates and shape classification converge in parallel and support each other. attention field provides a boost to the height field, which together with significant input from the masked height input may induce a peak in this field.

The attention and height fields now contain separate representations of spatial position and height. Spatial input projects as a cylinder localized in space, elongated along height into the three-dimensional space-height field. Height input projects as a slice localized along color, extended along space. Where these inputs intersect, a localized peak arises that binds height to location. This peak induces localized input into the three-dimensional space-height query field, which receives at the same time a cylinder of input localized in space, extended along height from the peak in the attention field. These inputs overlap and create a matching peak in the height query field. A CoS node detects this peak and inhibits the attention field, triggering a cascade of reverse detections in the attention, height, and height query fields, followed by de-activation of the CoS node itself, and a release from inhibition of the attention field. Parallel to this cascade of instabilities, the looking memory field has stabilized a sustained peak at the currently attended location which projects inhibitorily back onto that same location in the attention field. Upon the release from inhibition from the CoS node, the attention field selects a new salient location for activation, that is not typically the same as the previously examined location.

This form of visual exploration runs continuously and completely autonomously, in an ongoing sequence of shifts of attention. This ongoing sequence is interrupted when a cue is given from the outside, for example, by a human operator. The cue resets the attention field through a short burst of inhibition and acts as a mask to the input path from the color map, amplifying the specified color. When the attention field recovers from inhibition, it now selects a location matching the cued color. This attentional peak induces activation from working memory of the height value associated with that location, which can now be handed on to the reach and grasp module.

# 3.2. Time Course of Shape Classification and Pose Estimation

With the activation of the cueing behavior, a peak in the attention field defines a window of attention, that channels input to the shape classification and pose estimation portion of the architecture (**Figure 4**). The CoS node of the cueing behavior provides a boost to the resting level boost of all estimation fields, which gets the estimation process started. The classification nodes are all equal and at resting level.

At the beginning of the process (see left column of **Figure 6**), the sum of shape templates in the top-down path is a homogeneous mixture of every known shape. Since all shapes are stored in a centered fashion, even this sum provide a cue to translation estimates. Over time, the pose estimation fields build up peaks, which compete within the fields for selection. As these estimates sharpen, the cross-correlations at every stage of pose transformation produce increasingly precise input to the pose fields. The match between the transformed input image and the stored shapes improves at the same time (middle column of **Figure 6**). The pose estimates converge somewhat earlier than the neural nodes that make shape selection, which operate on a slightly slower time scale (right column of **Figure 6**). At this point both the top-down pathway as well as the bottom-up pathway are fully converged onto candidate estimates, but are still reactive to changes in the input (e.g., caused by rotating or shifting the target object). Both bottom-up and top-down pathways participate in this bootstrap process.

# 3.3. Time Course of Movement Generation

**Figure 7** illustrates the time line of the neural dynamics of behavioral organization of movement generation. Initially, none of the movement intention nodes is active, since no object has been recognized yet. When all fields of the pose estimation system have stabilized a peak, movement generation is initiated. The approach behavior and the open hand behavior become active at the same time and unfold in parallel. The open hand behavior terminates once the hand is open, while the approach behavior continues until the wrist of the arm has reached a certain target point and the hand is oriented correctly. The successful completion of either behavior is signaled through the respective CoS node. Once both CoS nodes become activated, the grasp behavior is activated. The arm moves the remaining distance to the object while the hand is closing. Pressure sensors in the fingers signal to the CoS node of the grasp behavior which is activated once a grasp is detected. The grasp intention node is deactivated by its CoS, and the lift behavior is activated. The series of snapshots of the robot arm during the reaching toward and grasping of an object is shown in **Figure 8**. This instance of reaching and grasping contains online updating as the object is moved and rotated by the experimenter after the movement has been initiated. We examine online updating next.

# 3.4. Three Experiments to Probe Online Updating

The task is to successfully reach for and grasp an object that is positioned on the table in front of the robot and then lift it up without losing grip, even if the object's pose is changed after the beginning of a trial. To assess performance, we count a grasp and lift as successful, if the object is lifted without losing grip. Failures include tipping over the object, closing the fingers

FIGURE 8 | This figure shows snapshots of a reaching and grasping trial. The third and fourth snapshot show a human intervening in the scene by moving and rotating the target object. Shortly after this intervention, the grasp approach adapts to the new pose leading to a successful grasp in the new pose followed by lifting up of the object.

without grasping, and not lifting the object. In addition, we also count trials as failed if the experimenters have to intervene with a safety stop due to singular arm configurations or any form of collision. In some cases, the grasp was executed successfully even without precise estimates (e.g., orientation estimate is off, base shape is not detected correctly). We count such trials as errors in classification.

For the experiments, we used a set of three simple wooden objects. The objects relate to the different grasps that the architecture is capable of executing: One cylinder and two cuboids, one with square base shape, the other with an oblong base shape (see **Figure 9** and **Table 1**). The object recognition system uses three geometric shapes that loosely fit the base shapes of the objects, that is, scale and aspect-ratio of the templates are close to those of the objects.

For practical reasons, the trunk degree of freedom of the robot was kept constant at 0◦ or −45◦ during all trials. This is a small number of trials to singular arm configurations, which our approach did not explicitly avoid. This limitation should be overcome in future implementations and illustrates how the trunk degree of freedom helps to cover a large workspace.

#### 3.4.1. Grasping without Online Updating

In a first experiment, we placed a single object from the object pool onto the table in front of the robot. We picked five different positions, P1–P<sup>5</sup> (see **Figure 10**) and multiple orientations for the square cuboid (0◦ , 30◦ , 60◦ ) and the oblong cuboid (0◦ , 45◦ , 90◦ , 135◦ ). For the cylinder, we repeated each trial three times, for a total of 50 trials in experiment 1.

The performance of plain grasps without online updating is shown in **Table 2**. To minimize singular arm configurations, the sideways grasps were executed with the trunk joint at −45◦ , while the top grasps were executed with a trunk joint angle of 0◦ . Of the 50 trials, 46 were successful (92% success rate). Individual trials failed due to a singularity in the arm configuration (twice) or failed recovery from a lost peak in the estimation architecture (twice). **Table 2** also contains the

FIGURE 9 | The three simple wooden objects used in the experiments are shown. Each object is colored blue on its top surface. Blue was used as query cue to indicate the target object.

#### TABLE 1 | Object sizes.


classification rate in all successful trials. In three trials, the base shape of the object was not detected correctly, but nonetheless the object was grasped successfully.

#### 3.4.2. Online Updating of Position

The second experiment investigates the tracking capabilities for position changes. For this experiment, we placed the cylindrical object in one of the five starting positions P1–P5. For positions P1–P4, we moved the object by hand toward position P<sup>5</sup> once the arms started moving, covering a distance of 10 cm in roughly one

#### TABLE 2 | Results of first experiment.


to two seconds. When the object started in position P5, we instead moved the object in the direction of one of the four other starting positions by 10 cm. These eight conditions are tested three times for a total of 24 trials, which the robot performed at a fixed trunk joint angle of 0◦ .

Of the 24 trials, 21 were successful (87.5% success rate). In seven trials the cylinder was erroneously recognized as a square cuboid from the start or after the hand of the experimenter had touched the object to move it to the new position (see **Table 3** for a listing of successful trials per condition). Three trials were counted as failed due to a safety stop of the experiment. In two of these cases, the arm configuration reached a singularity, in the third case the fingers almost collided with the object due to an erroneous position estimate. **Table 3** also lists the rate of correct classification in successful trials.

#### 3.4.3. Online Updating of Orientation

For the third experiment, we picked the two cuboids, which require a distinct hand orientation for grasping. Each object was placed in one of the starting positions P1–P5. Once the arm started moving, we turned the object in place around 45◦ within about one second. We repeated this three times for each object and starting position, altering the starting orientation and turning direction, ending up with 30 trials. Grasps of the square cuboid were performed with a trunk joint angle of 45◦ , while the top grasp object was grasped with the trunk being at 0◦ (see first experiment).

Out of 30 trials, 25 were successful (83.34% success rate, see **Table 4**). We repeated two trials for the square cuboid due to an



#### TABLE 4 | Results of third experiment.


erroneous estimate of base shape (circle instead of square). Of the two failed trials for the square object, one was a safety stop near a singular arm configuration, while the other failed due to an error in behavioral organization (the fingers did not open). The three failed trials of the longish cuboid comprise two wrongly estimated orientations (and safety stops before collision) and one approach was aborted by the behavioral organization caused by a reverse detection in an estimation field.

# 4. CONCLUSION

The neural dynamics architecture presented in this paper integrates modules that have previously been developed for scene representation, concurrent object classification and pose estimation, behavioral organization, and movement generation into one big dynamical systems. Sequences of perceptual events induce reach and grasp actions, as the architecture goes through controlled instabilities. As a result, the system is open to timevarying sensor information at all times. We demonstrated online updating of reaching and grasping movements to shifts and rotations of the object. The architecture also responds flexibly at the level of organization. When a target object is removed, the perceptual and motor actions are abandoned and the system returns to scene exploration. When the concurrent object classification and pose estimation fail to converge, for instance, because the object is too different from a learned template, then the perceptual process terminates and the system similarly returns to scene exploration.

The stability of all relevant states in the neural dynamics is critical for both integration and online updating. Attractor states are robust to the changes in the dynamics of a component that occur as the component is coupled into the larger architecture. Instabilities, at which attractor states disappear, are controlled through the mechanism of a condition of satisfaction.

Although we have evaluated the implementation of the architecture quantitatively, the current model is a demonstration of principle, that has not yet fully exploited all features of the approach. We did not use the size estimates obtained from the object classification system, for instance, and have made use of only a small number of shape templates, which in addition resemble the target objects and do not show generalization to objects of different shape. The avoidance of obstacles was not a focus of this work. We believe that the human-like organization and the smooth temporal structure of behavior in the neural dynamics architecture will prove most useful when cognitive robots cooperate with humans. On-line updating is critical there, as human users will not always wait for their turn.

Finally, we did not yet address the issue of learning to grasp. There are two obvious parts of the architecture that could benefit from learning. One is the set of geometric shapes used during classification and pose estimation, the other is the grasp type associated with each geometric body. Naturally, the set of geometric shapes should arise from exposure to a large amount of graspable objects. Any learning process has to address the challenge of making the decision if the shape of an object can be sufficiently matched by an existing shape from the set or if the object shape should be added to the set of templates. This may also include a pruning process to remove shapes if they become obsolete by adding betterfitting shapes to the set. The links between grasp types and

#### REFERENCES


geometric shapes will also have to be established by a learning process. To decide if a grasp type is suitable for a geometric body (considering the base shape and the height), one may use a reinforcement learning approach by trying different grasp types for the same object and using the CoS activation (or its absence) of the grasping and lifting behaviors as positive or negative reinforcement signals. The links may also be established by learning from demonstration (see, for example, Herzog et al., 2012). If both learning of geometric shapes and links to grasp types are done concurrently, one might run into a chicken-egg problem of not being able to learn one without a mature state of the other. A developmental process of first restricting possible shapes and executable grasps to small and primitive sets and bootstrapping the architecture with increasing complexity over time is a possible procedure to overcome this dilemma.

#### AUTHOR CONTRIBUTIONS

All authors designed the study. GK, SZ, and HR defined the model. GK implemented the model and performed the stimulations and experiments. GS wrote the paper, using input from the other authors.

### FUNDING

We gratefully acknowledge funding through the EU project "NeuralDynamics" (GS, coordinator).

Conference on Robotics and Automation (ICRA) (St. Paul, MN: IEEE), 2379– 2384.


in IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (Vilamoura: IEEE), 2457–2464.


visual attention to movement-relevant information. Neural Netw. 72, 3–12. doi: 10.1016/j.neunet.2015.10.005


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2017 Knips, Zibner, Reimann and Schöner. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Neurodynamics in the Sensorimotor Loop: Representing Behavior Relevant External Situations

#### Frank Pasemann\*

Institute of Cognitive Science, University of Osnabrück, Osnabrück, Germany

In the context of the dynamical system approach to cognition and supposing that brains or brain-like systems controlling the behavior of autonomous systems are permanently driven by their sensor signals, the paper approaches the question of neurodynamics in the sensorimotor loop in a purely formal way. This is carefully done by addressing the problem in three steps, using the time-discrete dynamics of standard neural networks and a fiber space representation for better clearness. Furthermore, concepts like meta-transients, parametric stability and dynamical forms are introduced, where meta-transients describe the effect of realistic sensor inputs, parametric stability refers to a class of sensor inputs all generating the "same type" of dynamic behavior, and a dynamical form comprises the corresponding class of parametrized dynamical systems. It is argued that dynamical forms are the essential internal representatives of behavior relevant external situations. Consequently, it is suggested that dynamical forms are the basis for a memory of these situations. Finally, based on the observation that not all brain process have a direct effect on the motor activity, a natural splitting of neurodynamics into vertical (internal) and horizontal (effective) parts is introduced.

#### Edited by:

Poramate Manoonpong, University of Southern Denmark, Denmark

#### Reviewed by:

Ralf Der, Leipzig University, Germany Jörn Fischer, Mannheim University of Applied Sciences, Germany

#### \*Correspondence: Frank Pasemann frank.pasemann@uni-osnabrueck.de

Received: 14 November 2016 Accepted: 17 January 2017 Published: 03 February 2017

#### Citation:

Pasemann F (2017) Neurodynamics in the Sensorimotor Loop: Representing Behavior Relevant External Situations. Front. Neurorobot. 11:5. doi: 10.3389/fnbot.2017.00005 Keywords: neurodynamics, behavior control, sensorimotor loop, mathematical concepts, neural representations

# 1. INTRODUCTION

From a neurocybernetics perspective the dynamical systems approach to embodied cognition can be traced back to the work of Ashby (Ashby, 1960) and von Foerster (Von Foerster, 1960). The assumption is that a living organism, in order to survive, must be able to develop internally some stable "entities" (von Foerster) which refer to or classify objects and situations in the physical world. These "entities" are the result of cognitive and sensorimotor processes developing through continuous interactions of an individual with its specific environment. On the other hand, cognitive and sensorimotor processes, relevant for the behavior of the individual, depend on the formation of these stable structures; i.e., they are complementary in the sense that one defines or implies the other. The assumption was, that an organism must be able to relate discrete internal structures to relevant aspects of its own interaction with its environment.

Although, the underlying processes are continuous these internal "entities" have to be discrete because the referenced objects or situations are discrete features of the environment. They also have to be "stable" in a certain time domain. On the other hand, due to changing sensorimotor or cognitive processes, they have to get "unstable" in the sense that different references have to be built up; i.e., new "stability domains" have to be visited or formed.

To pursue the dynamical systems approach to embodied cognition in this spirit, this paper will consider an individual as an autonomous system called an animat. An animat Dean (1998) and Guillot and Meyer (2001) is a simulated or physical robot equipped with sensors and actuators, and a neural network for behavior control. The neural controllers then have to operate in the so called sensorimotor loop, getting inputs from sensory signals and generating motor signals, which in turn will lead to new sensor inputs. The essential role of these closed loop processes for living or live-like systems has been discussed over several decades now from various points of view (Bishop, 1960; Beer, 1995; Di Paolo, 2003; Philipona et al., 2004; Hülse et al., 2007; Zahedi et al., 2010; Sándor et al., 2015). Here we use a purely formal approach and carefully analyze the dynamical description by making successive approximations to these processes.

Neurocontrollers, mimicking their biological counterparts, are considered as recurrent neural networks which in general allow for dynamical properties. That is, for fixed synaptic weights, bias terms and inputs such a network can be described as a dynamical system. Then, assuming that a neurocontroller is driven by slow sensor inputs, it will be properly described as a parametrized family of dynamical systems, where sensor inputs (and proprioceptive signals as well) are considered in a first approximation as parameters of such a family. Furthermore, for every parameter value the corresponding dynamical system may have a manifold of different attractors. The postulated internal "entities" then will be identified with the basins of attraction of parametrically stable neurodynamical systems. The interaction with the environment then may change the references to situations in the external world by changing parameter values given, for instance, by the sensor signals. This process of changing references will be described by so-called bifurcations.

For theoretical reasons, parameters are assumed to change so slowly that the system can approach its asymptotic states. This is often not the case for realistic sensor inputs. So, in a second step we will introduce sequences of neural states called meta-transients as for instance in Negrello and Pasemann (2008) Negrello (2011), and Toutounji and Pipa (2014).

In general these meta-transients can not be given an interpretation as trajectories of a dynamical system, mainly because the inducing sequence of sensor signals is not a trajectory of a dynamical system on sensor space. Instead, because of the closed loop, it is superposition of movements in the environment and the result of motor actions. The case where one has access to controlling parameters has often been discussed in geometric control theory (Gardner, 1983; Sussmann, 1983; Respondek, 1996; Kloeden et al., 2013). There then one can generalize the concept of attractors and the like. Although we do not find this approach applicable for the dynamics in the sensorimotor loop we will work with a comparable view.

Finally, these meta-transients have to be mapped to motor neurons, inducing then actions of the animats body; i.e., its behavior. Due to this projection not all elements of the neural system will be involved directly in the generation of motor signals. This leads naturally to a fiber structure over the motor space allowing to introduce the concepts of vertical or internal neurodynamics, having no direct effect on behavior, and a horizontal or effective neurodynamics, the projection of which generates the movements of the animat.

To clarify concepts, the paper will address the discrete-time neurodynamics of networks composed of standard sigmoidal neurons of additive type. Using this simplifying setup, it is assumed that the aspects described in the following are transferable also to neural systems employing more biologically plausible or other types of neurons. The basic concern here is to specify the role of, for example, attractors, basins of attraction, transients, bifurcations and stability properties in the context of systems acting in a sensorimotor loop.

Approaching the description of neurodynamics in the sensorimotor loop in three steps, we will first define the type of neurodynamics studied in this paper (Section 2), exemplifying it by some well known results. Assuming that sensor inputs are slow when compared to the activity dynamics of the neural system, we argue in Section 3 that neural systems in the sensorimotor loop are effectively described by parametrized families of dynamical systems, were parameters correspond to the sensor inputs. Other parameters, not considered here, are, for instance, signals coming from proprioceptors and the synaptic weights of the network, the change of which usually is associated with learning. Referring to the more realistic situations, meta-transients are introduction in Section 4. Finally, Section 5 discusses the generation of motor signals resulting from a projection of attractor transients or meta-transients, respectively, to the motor space; this then allows to differentiate between so called effective and internal neurodynamics. Finally the sensorimotor loop is closed through the environment by a formal mapping from motor space M to sensor space S. The paper concludes with a discussion of the possible role the introduced concepts can play for understanding neural representations of behavior relevant situations in the external world and, correspondingly, for a notion of memory which is not based on specific attractors like, for instance, fixed point attractors in Hopfield networks.

## 2. NEURODYNAMICS

Besides the body of an animat, three different parts of it will be discerned: The "brain" considered as a recurrent neural network N with n neurons. Its sensor neurons will prescribe the sensor space S, and the output neurons will define the motor space M. Sensor space S and motor space M are the interfaces of the "brain" N to the physical world. Assuming strictly the point of view of an animat, the world for an animat is what happens on its sensor surface. We describe these parts more concrete as follows.

A state a(t) ∈ A ⊂ R <sup>n</sup> of the neural system N at time t is characterized by the activation of all its n neurons. Correspondingly, the state space A is called the activation space or phase space of N. It is a manifold of dimension dim(A) = n. Neural states may be represented in an equivalent way by the outputs o(t) of the n neurons, and we call the corresponding state space of the network N its output space A<sup>∗</sup> .

The sensor space S consists of all possible sensor inputs, i.e., a sensor state s(t) ∈ S at time t consists of all sensor values at time t. The sensor space is assumed to be a bounded manifold S of dimension dim S = m, where m denotes the number of distinct sensor elements. S may be subdivided into modality spaces corresponding, for example, to visual, acoustic, or haptic inputs.

A motor state m(t) ∈ M at time t is given by the activation of the motor neurons at time t driving the various actuators of the animat. Thus, the motor space of the animat is an open bounded manifold M of dimension dim M = k, the number of all its motor neurons. The motor space M may be segmented into different domains responsible e.g., for head movement, eye movement, driving wheels, arms, and so forth. Special domains may be related to corresponding domains in sensor space: Fixed infrared sensors may be, for instance, related only to the wheels domain; but with pan-tilt-camera vision is related to wheels, and pan-tilt-motors, et cetera.

#### 2.1. Discrete-Time Neurodynamics

For a general introduction into the theory of dynamical system see for example (Abraham and Shaw, 1992; Hirsch et al., 2012; Strogatz, 2014). Here, in a first approximation we will understand the neural system N as a discrete-time dynamical system (Kloeden et al., 2013); i.e., on its activation space A there exists a differentiable map φ : Z × A → A, called the flow, with the following properties:


were Z denotes the set of nonnegative integers. In the following we consider a neural network N with activation space A ⊂ R n , writing it as N(A), which is composed of n standard additive neurons with sigmoid transfer function τ : = tanh. The flow of this system is then generated by a diffeomorphism f : A → A given in component form by

$$a\_i(t+1) \colon= \theta\_i + \sum\_{j=1}^n \boldsymbol{w}\_{ij}\,\boldsymbol{\pi}(a\_j(t)), \quad i = 1, \ldots, n,\tag{1}$$

where θ<sup>i</sup> represents a constant bias term of neuron i, wij the synaptic strength or weight from neuron j to neuron i, and τ denotes the transfer function. Thus, the output of neuron i is given by o<sup>i</sup> : = τ (ai), and for the output space we have A <sup>∗</sup> ⊂ (−1, 1)<sup>n</sup> .

The neural system N(A), considered as a dynamical system, will be denoted by (A, f). In this section terms like the bias terms θ<sup>i</sup> and synaptic weights wij are assumed to be constant. This means that we consider an isolated system; i.e., there is no neural plasticity involved, and sensor inputs are not considered.

Furthermore, we endow the vector space A with an Euclidean metric d<sup>τ</sup> induced by the transfer function τ ; i.e.,

$$d\_{\mathfrak{r}}(a, a') \colon= d(\mathfrak{r}(a), \mathfrak{r}(a')) = \sqrt{\sum\_{i=1}^{n} (\mathfrak{r}(a\_i) - \mathfrak{r}(a'\_i))^2} \dots$$

Due to the saturation domains of the sigmoid τ the distance of activity states corresponding to very high (positive or negative) activations is very small.

The flow on the state space A is then defined by

$$\phi(t, a\_0) \colon= a(t) = f^t(a\_0) = \underbrace{f \circ f \circ \dots \circ f \circ f}\_{t \text{ times}}(a\_0),$$

where a<sup>0</sup> ∈ A is called the initial state. The flow φ satisfies the group property; i.e., with initial condition φ(0, a0) = a<sup>0</sup> one has

$$
\phi(n, \phi(m, a\_{=})) = f^n \circ f^m(a\_0) = f^{n+m}(a\_0) = \phi(n+m, a\_0) \dots
$$

**Example 1:** The dynamics of 2-neuron networks have been analyzed extensively, in the continuous-time case as well as the discrete-time case, because already these simple systems, under certain conditions, can show all possible dynamical features: They can exhibit fixed point attractors as well as periodic, quasiperiodic and chaotic attractors, and even show co-existing attractors for one and the same condition (Wilson and Cowan, 1972; Marcus and Westervelt, 1989; Wang, 1991; Beer, 1995). Here we recall some of the results, which can be found for example in Pasemann (2002), to demonstrate basic properties of recurrent neural networks for this most simple case. So, let (A, f) denote the two-dimensional system given by two neurons (compare **Figure 1**) satisfying the equations

$$a\_1(t+1) := \theta\_1 + \mathbf{w}\_{11} \operatorname{\mathsf{tr}(a\_1(t))} + \mathbf{w}\_{12} \operatorname{\mathsf{tr}(a\_2(t))},$$
 
$$a\_2(t+1) := \theta\_2 + \mathbf{w}\_{21} \operatorname{\mathsf{tr}(a\_1(t))} + \mathbf{w}\_{22} \operatorname{\mathsf{tr}(a\_2(t))}.\tag{2}$$

As a bounded dissipative dynamical systems, the time development of neural states can be characterized by attractors and transients. We first recall some basic definitions.

A time-sequence of states

$$O(a\_0) \colon= \{a\_0, a(1), \dots, a(t), \dots\}, \quad a\_0 \in A,\tag{3}$$

is called an orbit or a trajectory of the system starting from a<sup>0</sup> ∈ A. An orbit O(a0) is called periodic of period p ≥ 1 if a(p) = a0, and p is the smallest integer such that this equation holds. For p = 1 the orbit is called a stationary state or a fixed point of the system. A p-periodic point is a state on a p-periodic orbit O(a0) = {a0, a(1), ... , a(p)}. It corresponds to a fixed point of the p-th iterate f <sup>p</sup> of the map f :

$$f^{\mathcal{P}}(a) \colon= \underbrace{f \circ f \circ \cdots \circ f}\_{p-times}(a) = a \,, \quad a \in A \; .$$

Let U ⊂ A denote a subset which is invariant under the action of f ; i.e. f(U) = U. A closed and bounded set Ŵ ⊂ U is called

an attractor of the dynamical system (A, f), if f(Ŵ) = Ŵ and there exists an ε > 0 such that

$$d(a\_0, \Gamma) \le \varepsilon, \ a\_0 \in U, \text{ implies that } \ d(a(t), \Gamma) \to 0 \text{ as } t \to \infty.$$

There are different types of attractors: Fixed points, periodic orbits (a finite set of periodic points) as in **Figure 2**, quasiperiodic orbits represented by a dense set of points on a closed line, and so called chaotic attractors which are characterized, for instance, as a fractal set in A (compare also **Figure 3** and Abraham and Shaw, 1992; Hirsch et al., 2012; Strogatz, 2014). If Ŵ is the only attractor of a system (A, f), then it is called a global attractor.

The basin of attraction B(Ŵ) of an attractor Ŵ is the set of all initial conditions a<sup>0</sup> ∈ A such that d(a(t), Ŵ) → 0 as t → ∞. Thus, the basin of attraction of Ŵ is considered as the set of all orbits attracted by Ŵ. A transient O(Ŵ) of a system (A, f) is an orbit in the basin of an attractor Ŵ.

A dynamical system (A, f) can have more than one attractor. Then we say that the system has several co-existing attractors. For instance, in **Figure 4** four co-existing period-2 attractors and their basins with regular boundaries are shown. **Figure 5** displays several co-existing attractors separated by fractal basin boundaries.

Often one uses the metaphor "landscape" to describe a dynamical system (A, f) qualitatively. This refers exactly to what we defined as the flow of the dynamical system (A, f). One can think about water running downhill into a sink when referring to transients approaching an attractor. Basin boundaries then correspond to water partings. An attractorlandscape, denoted by [A], then visualizes the different types of attractors present in the system together with their basins of attraction and basin boundaries as shown in the figures above.

Two different dynamical systems can have similar landscapes in the sense that there is the same number and type of attractors involved; but attractors, as well as the corresponding basin boundaries, may be deformed with respect to each other. If one can map the attractor-landscape of one system onto the attractor-landscape of the other system such that orbits are mapped one-to-one onto each other by preserving the time direction, then the qualitative behavior of such systems is comparable. This situation is formalized by the following

Definition 1. Two discrete-time dynamical systems (A, f) and (B, g) are said to be topologically conjugate, if there exists a homeomorphism ψ : A → B, such that f ◦ ψ = ψ ◦ g, i.e., such

FIGURE 2 | Examples of attractors in (o<sup>1</sup> , o<sup>2</sup> )-output space for a two neuron system (2). (Left) A fixed point attractor. (Right) A period-5 attractor. (Parameters are given in Table A1 in the Appendix referring to networks sys1 and sys2.)

FIGURE 4 | (Left) Four co-existing period-2 attractors in (o1, o2)-output space. (Right) Their basins of attraction (for parameters see Table A1, network sys5).

FIGURE 5 | (Left) A period-3 attractor (green) and a period-7 attractor (red) in (o1, o2)-output space, co-existing with two chaotic attractors, one cyclic with period 14. (Right) The corresponding basins of attraction; the two basins of the chaotic attractors are white. The system is given as sys6 in Table A1.

$$\begin{array}{ccc} A & \stackrel{f}{\longrightarrow} & A\\ \psi & \downarrow & & \downarrow & \psi\\ & B & \longrightarrow & B \end{array}$$

#### 3. PARAMETRIZED FAMILIES OF DYNAMICAL SYSTEMS

In the last section the bias terms θ<sup>i</sup> and synaptic weights wij, i, j = 1, ... , n, were held constant, and one can consider them as parameters of the neural system (A, f). For different bias terms or synaptic weight one gets different dynamical systems. Thus, we introduce a parameter space Q ⊂ R q for a neural system (A, f) as a q-dimensional Euclidean manifold (Q, h) with metric h. A parameter vector ρ = (θ,w) ∈ Q is given by the bias vector θ and the weight matrix w of the network N(A). Thus, one has dim Q = q = n · (n + 1).

As a next step we argue that the sensor inputs to the neural system N(A) can be assumed to act as parameters of the neurodynamics. Because brain-like systems will always act in a sensorimotor loop, the sensor signals s(t) ∈ S will always drive the neurodynamical system N(A). Assuming in a first approximation that the sensor signals s(t) change so slowly that the orbits of the neural system are always able to converge to an attractor, then they can be considered as varying parameters. For that reason we will subsume the sensor signals s(t) as part of the bias terms θ(t): = θˆ + s(t), with θˆ = constant.

A neural system then has to be described as a parametrized family of discrete-time dynamical systems denoted by (A, f; Q), with A ⊂ R n the activation space, Q ⊂ R q the parameter space, and a differentiable map f : Q × A → A. For a specific parameter vector ρ ∈ Q, we write f<sup>ρ</sup> : A → A for the corresponding dynamical system, and denote the q-parameter family of neurodynamical systems also by (A, fρ), ρ ∈ Q. The only varying parameters considered in the following are the bias terms θ<sup>i</sup> , i = 1, ... , n. As stated above, other parameters of the animats brain, like synaptic weights wij are constant.

We may now look at the "brain" as a fiber structure over parameter space Q (compare **Figure 6**): To every ρ ∈ Q there is attached the activation space A together with the flow ψ<sup>ρ</sup> corresponding to ρ ∈ Q; i.e., there is a whole attractor-landscape, denoted by [A]ρ, attached to every parameter ρ ∈ Q.

#### 3.1. Parametric Stability

Now, given two different parameter vectors ρ and ρ ′ in Q, one may ask if the corresponding attractor-landscapes are similar or not in the sense that there exist a homeomorphism carrying oriented orbits onto oriented orbits, especially attractors onto attractors. Using definition 1 we introduce the following

Definition 2. Given a neurodynamical system (A, f; Q). Two different parameters ρ, ρ ′ ∈ Q are said to be homologous

if the corresponding dynamic systems (A, fρ) and (A, fρ ′) are topologically conjugate; i.e., if the following diagram commutes:

$$\begin{array}{ccccc} & A & \xrightarrow{f\_{\rho}} & A\\ \psi & \downarrow & & \downarrow & \psi\\ & A & \longrightarrow & A \\ & & f\_{\rho'} & & \end{array}$$

If two parameter vectors ρ, ρ ′ ∈ Q are homologous, then the corresponding neurodynamics have qualitative the same behavior; i.e., attractors and basin boundaries may be deformed. In **Figures 7**, **8**, for example, attractors and output signals of an oscillatory 2n-network with two different bias terms are displayed. The two attractor-landscapes [A]ρ and [A]ρ ′ corresponding to homologous parameters θ, θ ′ are qualitatively, i.e., topologically, the same.

This leads us to an essential concept, that of parametric stability, which we define in correspondence to the concept of structural stability in the general theory of dynamical systems (Thom, 1989).

Definition 3. Given a neurodynamical system (A, f; Q) and a parameter vector ρ<sup>0</sup> ∈ Q. Then the system (A, fρ<sup>0</sup> ) is called parametrically stable, if there exists an ǫ > 0 such that for every ρ ∈ Q satisfying ||ρ−ρ0|| < ǫ the systems (A, fρ) are topologically conjugate to (A, fρ<sup>0</sup> ).

Definition 4. Given a neurodynamical system (A, f; Q). The domain of parametric stability corresponding to a parameter vector ρ<sup>0</sup> ∈ Q, denoted by P(ρ0) ⊂ Q, is the maximally connected parameter set in Q containing all ρ ∈ Q which are homologous to ρ<sup>0</sup> ∈ Q.

Thus, all systems (A, fρ) with ρ ∈ P(ρ0) are topologically conjugate to (A, fρ<sup>0</sup> ).

Parametrically stable systems are essential for modeling experimental situations: If the experimental inaccuracy is smaller than a domains of parametric stability, then the model remains valid in spite of experimental perturbations. More general, parametric stability is an essential concept, because interesting real (i.e., physical, biological, etc.) phenomena are of course

those which are stable under small perturbations of their defining conditions. For instance, a convergent neural network may stay convergent under a small perturbation of their parameters.

### 3.2. Bifurcations

As a second step to describe the dynamics of neural systems we have assumed that the dynamics depends on control parameters, that is, on variables that vary much more slowly than the states of the system. Suppose these parameters change along a smooth path ρ(t) ∈ Q. If all ρ(t) for t ∈ [t1, t2] are homologous, the corresponding neurodynamical systems will show qualitatively the same behavior, although the attractors and their basins in activation space A will move and deform. To such a situation we refer to as a morphing attractor-landscape with its morphing attractors (Negrello and Pasemann, 2008; Negrello, 2011; Toutounji and Pipa, 2014).

But the path ρ(t) may reach a point ρ<sup>c</sup> in parameter space Q where the behavior of a system changes qualitatively, i.e., the type and/or numbers of attractors will change, when the path crosses ρ<sup>c</sup> . Such points ρ<sup>c</sup> ∈ Q are called critical parameters or bifurcation points. Thus, bifurcation points are associated with the appearance of topologically non-conjugate systems. The values of ρ<sup>c</sup> ∈ Q are called the bifurcation values. The appearance of bifurcations in a system are often studied with the help of bifurcation diagrams. These are demonstrations of attractor sequences resulting from the variation of only one control parameter (compare **Figure 10**).

The (closed) subspace K ⊂ Q of all bifurcation points is called the bifurcation set of the system (A, f; Q). Bifurcation sets are sets in Q (i.e., curves, surfaces, hyperspaces) which separate different domains of parametric stability.

**Example 2:** As the most simple example we will discuss a single neuron with self-connection w as a 2-parameter family of dynamical systems (A, f; Q) given by

$$a(t+1) = \theta + \boldsymbol{w} \cdot \boldsymbol{\pi}(a(t)), \quad t \in \mathbb{Z} \; , \tag{4}$$

(compare also Pasemann, 1993a for a single neuron with logistic function σ(x) = (1 + e −x ) −1 as transfer function). Stability analysis tells us that for |w| < 1 there exist only global fixed points. Otherwise one will find bi-stable systems for w > 1, and a domain with global period-2 attractors for w < −1. Typical bifurcation diagrams are shown in **Figure 10**.

In **Figure 9** the three different domains of parametric stability in Q ⊂ R 2 are shown: Here P 0 (white) denotes the parameter domain for systems having a global fixed point attractor, P <sup>+</sup> (red) refers to bi-stable systems, and P <sup>−</sup> (green) to oscillatory systems. They are separated by bifurcation sets K<sup>+</sup> and K<sup>−</sup> in Q. Thus, a single neuron with self-connection comes in three dynamical forms (compare definition 5).

At K+, that is for w ≥ 1, there are saddle-node bifurcations, and at K−, that is for w ≤ −1, there are period-doubling bifurcations. This can be clearly seen in the bifurcation diagrams of **Figure 10**. They show that a single neuron with positive selfcoupling can act as a hysteresis element (short term memory), whereas a neuron with negative self-connection can serve as a switchable oscillator (compare also Pasemann, 1993a).

What should be taken from this simple example is, that in situations where there are parameter domains for which there are coexisting attractors, it depends on the direction from which a path ρ(t) in parameter space Q hits a bifurcation set K ⊂ Q (compare **Figure 10**). This leads to phenomena, called generalized hysteresis effects, demonstrating that the development of the system depends crucially on the history of the system. And therefore the behavior of these path-dependent systems will not be explicitly deducible from the knowledge of their actual state. This is one reason for the "complexity" of neural systems, and a source of their fascinating faculties.

Having clarified the decisive role of domains of parametric stability P ⊂ Q for the behavior of parametrized family of dynamical systems, it is natural to associate to a non-critical parameter vector ρ <sup>∗</sup> ∈ Q a set of dynamical systems (A, fρ) which are parametrically stable with respect to ρ <sup>∗</sup> ∈ Q. With reference to the designation of Thom Thom (1989), we give the following

Definition 5. Given a system (A, f; Q), and let ρ<sup>0</sup> ∈ Q denote a non-critical parameter vector. A dynamical form of (A, f; Q) is a connected set Fρ<sup>0</sup> ⊂ Diff (A) of dynamical systems (A, fρ) which are topologically conjugate to (A, fρ<sup>0</sup> ).

Assuming that changing parameter values correspond to changing sensor signals, one can deduce that if a sequence of signals stays in a certain domain of parametric stability P, the dynamics of the neural system stays qualitatively the same. And therefore we can assume that the resulting behavior of the controlled system, the animat, will not change dramatically.

#### 4. META-TRANSIENTS

In the next step we will have to ease the restrictions on the parameters by assuming that the sensor signals can change so fast that the activations a(t) of the neurodynamical system (A, fρ) can not approach an attractor Ŵ ⊂ A asymptotically.

In the following the considered parameters will be the sensor inputs s(t) of an animat, and all other parameters are fixed. Due to properties of the environment, or due to the behavior of the animat, its sensor inputs may change so fast that they can not be considered as parameters in the strict mathematical sense.

Such a situation is often described in terms of the dynamics of non-autonomous systems. But it is different from the situations covered by control theory (Gardner, 1983; Sussmann, 1983; Respondek, 1996) or by skew-product systems (Kloeden et al., 2013) in so far as a sequence of such sensor inputs is neither the trajectory of a dynamical system in parameter space, nor is it a well defined sequence leading to a preexisting goal. Here the sensor inputs depend on the dynamics of the physical environment (exo-motion) as well as on the movements/actions of the animat itself (ego-motion). We will come to that later again.

Assuming that parameters change almost as fast as the internal states, the resulting sequence of states is no longer that of a transient to one and the same attractor. Suppose the neural system at time t is in a state a(t) on a definite transient O(Ŵρ(t) ) to an attractor Ŵρ(t) of the neural system (A, fρ(t) ). If the parameter vector a short time later satisfies ρ(t + k) 6= ρ(t) the corresponding state a(t + k) will be an element of a different transient O(Ŵρ(t+k) ) to a different attractor Ŵρ(t+k) ⊂ A.

So, let σ<sup>θ</sup> : = {s(t),s(t + 1),s(t + 2), ...} denote such a sequence of sensor inputs represented by a sequence of parameter vectors θ(t) in Q. This will induce a sequence of states α(σ<sup>θ</sup> ): = {a(t), a(t + 1), a(t + 2), ...} on A with

$$a(t+1) = f\_{\rho(t)}(a(t)), \quad \text{that is,} \quad a\_i(t+1)$$

$$= \theta\_i(t) + \sum\_{j=1}^n w\_{ij} a\_j(t) \,. \tag{5}$$

Such a sequence α(σ<sup>θ</sup> ) in A will be called a meta-transient (Negrello and Pasemann, 2008). Thus, a meta-transient is not a transient of a dynamical system, but it is a sequence of states a(t) ∈ A following the morphing attractors of a sequence of the parametrized dynamical systems (A, fρ(t) ). The projection of such a meta-transient on A back to the parameter space Q then gives the sequence of "driving" parameter values σ<sup>θ</sup> .

If we define a map 8 : Q × A → A associated with the given parametrized family of dynamical systems by

$$\Phi(\rho, a) = f\_{\rho}(a) \,, \quad a \in A \,, \rho$$

then the elements of a meta-transient α(σ<sup>θ</sup> ) are generated by this map according to

$$
\alpha(\sigma\_\theta) = \cdot \cdot \circ f\_{\rho(t+2)} \circ f\_{\rho(t+1)} \circ f\_{\rho(t)}(a(t))\ .
$$

For example, if the input to a neuron with excitatory selfconnection is slow when compared with the internal dynamics one will observe a clear hysteresis signal as in **Figure 10**. If the input signal changes much faster, then there will not be "jumps" at the boundaries of the hysteresis domains but a kind of "squashed" hysteresis loop will appear, as was observed for instance in Manoonpong et al. (2010) for the dynamics resulting from audio input signals.

Furthermore, if all the parameter values, corresponding to the sequence σ<sup>θ</sup> of sensor inputs, lie in one and the same domain of parametric stability P, the behavior of the animat's body will not change dramatically, and one may describe it as "the same." But if a sequence of parameter values crosses a bifurcation set K in parameter space Q the system may behave in a very different way.

#### 5. PROJECTIONS TO MOTOR SPACE M

All the dynamics discussed so far has the goal to generate appropriate body movements. Therefore, the only interesting thing here is the effect of the activities of the neural system which activate the motor neurons. Thus, we have to project the metatransients α(σ<sup>θ</sup> ) on phase space A to the motor space M with dim(M) = k < n. This projection, denoted by 5 : A → M, is assumed here to correspond to the application of a one-layer feedforward network (compare **Figure 11**). The activations of the k motor neurons then are spanning the output layer, and we define

$$\Pi(a)\_{\rangle} \coloneqq \sum\_{i=1}^{n} \,\,\omega\_{ji}\pi(a\_i)\,,\,\,\,a \in A,\,\,\,j=1,\,\ldots,k,\tag{6}$$

where wji, i = 1, ... , n, j = 1, ... , k denote the weights from the n internal neurons to the k motor neurons. The activation of the jth motor neuron m<sup>j</sup> ∈ M having a bias value θ M j is then given by

$$m\_j = \theta\_j^M + \Pi(a)\_j, \quad a \in A, \quad j = 1, \ldots, k. \tag{7}$$

Such a motor neuron in general will not be connected to all of the brains neurons. Therefore, there will be many internal states a ∈ A which will project to identical motor activations m ∈ M. This will give the second fiber structure of the sensorimotor loop, where the fiber F<sup>m</sup> ⊂ A over m ∈ M is given by

$$F\_m \colon = \{ a \in A \mid \Pi(a) = m \}, \quad m \in M. \tag{8}$$

Then, what is observable is the behavior of the animat generated by a sequence of motor states

$$\mu(\sigma\_{\theta}) \colon= (m(t), m(t+1), m(t+2), \dots) \tag{9}$$

which corresponds to a given meta-transient α(σ<sup>θ</sup> ) on A; that is, with ρ(t) ∈ Q, a(t) ∈ A, and bias terms of motor neurons θ <sup>M</sup> ∈ R <sup>k</sup> one has

$$m(t) = \theta^M + \Pi \circ \Phi(\rho(t), a(t))\,. \tag{10}$$

From the projection argument it is clear that not the whole state space A is of direct relevance for the behavior of the animat. It is obvious that the activity of neurons not connected to the motor neurons do not have a direct effect on the behavior of the animat. Therefore an attractor in A, if it is a fixed point, a periodic orbit or even a chaotic attractor, may be projected to only one and the same motor state m ∈ M; attractors, their transients or metatransients may then have little or no effect on motor activities at all.

To reflect this property we introduce a splitting of every state a ∈ A into a so called horizontal and a vertical part; i.e.,

$$a = a^{\nu} + a^{h}, \quad \text{with} \quad \Pi(a^{\nu}) \colon= \mathbf{0} \,. \tag{11}$$

And due to this splitting we have a direct decomposition of the space of brain states A into horizontal and vertical parts; i.e.,

$$A = A^{\nu} \oplus A^{h},\tag{12}$$

where A v is given as A <sup>v</sup> = ker 5.

Let there be l ≥ k internal neurons being directly connected with neurons in the motor layer; they serve as an l-dimensional input space B ⊂ A <sup>h</sup> of the feedforward network (compare **Figure 11**). Furthermore, due to the geometry of feedforward networks (Pasemann, 1993b), in general there is a (l − k) dimensional linear subspace C<sup>m</sup> ⊂ B on which the activation of the motor neurons is constant.

The dynamics directly relevant for behavior then will actually live in the horizontal state space A <sup>h</sup> ⊂ A. Correspondingly, what will lead to an effective behavior is a sequence of horizontal states given by a horizontal meta-transient on A

$$\alpha^h(\sigma\_\theta) \colon= \{a^h(t), a^h(t+1), a^h(t+2), a^h(t+3), \ldots\}.\tag{13}$$

Going back to section 3 let us consider again a discrete-time dynamical systems f<sup>ρ</sup> :A → A with fixed parameter vector ρ ∈ Q. Then, post hock, we can introduce a well-defined splitting of the dynamical system fρ into vertical and horizontal parts by

$$f\_{\rho}(a) = f\_{\rho}^{\prime}(a^{\prime}) + f\_{\rho}^{h}(a^{h}), \quad f\_{\rho}^{\prime}(a^{h}) \colon= 0, \quad f\_{\rho}^{h}(a^{\prime}) \colon= 0. \tag{14}$$

It is obvious that only the horizontal dynamics f <sup>h</sup> ρ : A <sup>h</sup> → A h contributes to the observable behavior of an animat, whereas the vertical dynamics f <sup>v</sup> ρ :A <sup>v</sup> → A <sup>v</sup> will describe brain processes which may be associated to a dynamical kind of memory, to association, planning, dreaming, contemplation, and the like; that is, to the cognitive faculties of the brain.

Furthermore, suppose that two dynamical systems fρ and fρ ′ with ρ, ρ ′ ∈ P, P ⊂ Q a domain of parametric stability (compare section 3.1), are topologically conjugate. Then it is reasonable that their horizontal components f h ρ and f h ρ ′ will generate motor states in M which lead to variants of a specific behavior. The next example gives a demonstration of this situation.

**Example 3:** In evolutionary robotics one often used the motor dynamics of a system as a fitness criterion to reduce the "ineffective" higher dimensional neurodynamics of evolved controllers to analyzable, minimalistic solutions for which the discussed effects could be studied (Wischmann and Pasemann, 2006; von Twickel et al., 2011; Pasemann et al., 2012). Here only a simple example of a neurocontroller may be given by the following recurrent neural network (**Figure 11**). It provides an obstacle avoiding behavior of a Khepera-like Robot (Toutounji and Pasemann, 2016). It uses five distance sensors (sensor layer) and two motor neurons (motor layer).

The hidden layer (the "brain") has eight neurons, but only two of them project to the two-dimensional motor space M. Though the brain dynamics runs in an 8-dimensional state space A only a 2-dimensional subspace B ⊂ A determines the motor activity directly. What is going on in the 6-dimensional vertical state space has no immediate effect on the behavior of the robot. Indirectly, of course, the dynamics on A can

influence the behavior of the robot; for instance, the over-critical excitatory self-connections of the input neurons I (**Figure 11**) control the turning angle of the robot at walls. The submodule in N(A) composed of the two neurons A and B has an interesting dynamics not influencing the motor behavior. They display a "chaotic" meta-transient while the the robot is turning, ending up in a period-2 attractor after a complete right turn, and in a period-4 attractor after a complete left turn. This internal (vertical) dynamics does not contribute to the behavior of the robot, but can be used as a kind of memory for subsequent decisions. The over all performance of this controller is comparable to that of the 2-neuron network called the MRC (minimal recurrent controller) in Hülse et al. (2004).

#### 5.1. Closing the Loop

Every activity of the motor neurons will change the sensor input to the system (compare **Figure 12**). In this sense we have a closed loop, and one may call it the ego-motion-loop. But the essential point is, that this loop has to go through the environment of the system; i.e., how the motor activity is reflected by the sensor input depends, first, on the appearance and properties of the environment, and second, on processes in the environment itself, called exo-motion. This may lead to a discrimination of sensor input variations into those which are due to changes of the motor signals, and those which are due to changes in the environment only (Philipona et al., 2003).

That this inextricable fusion of two influences can not be described as a control theoretical type of closed loop with

an additional noise term is clear by two facts: First, what is happening in the physical environment of an animat in general will not be a well defined process, and, second, the motor outputs, as we have seen, are not necessarily a direct reflex of the sensor inputs. Planning, focusing, ignoring performed by the vertical brain activation dynamics are modulating the reaction to sensor inputs. Thus, even formally it is difficult to describe the neural dynamics in the sensorimotor loop in terms of a control-theoretical model.

#### 6. DISCUSSION

The description of biological brains as dynamical systems is often assumed to be an appropriate approach to describe cognition and the behavior of animals (Port and Van Gelder, 1995; Thelen and Smith, 1996). Based on the observation that the typical activity of an animat is a reaction to its environment, we used the sensorimotor loop to carefully approach the dynamics hypothesis in three steps. Relying on experiences in the field of evolutionary robotics (Nolfi and Floreano, 2000) we used discrete-time neurodynamics to, first, describe the (isolated) brains as dynamical systems. Having realized that (living) brains are always driven by sensor inputs, we made clear that the description of brains as parametrized families of dynamical systems is more appropriate. This allowed to introduce the concept of parametric stability which helped to formalize the general observation that a certain behavior is robust against "noise," and can be classified as "the same," although the initializing sensor inputs vary over a larger domain.

In a third step, assuming that sensor inputs may change so fast that they can not be assumed to serve as parameters in the mathematical sense (compare for instance Manoonpong et al., 2005), we were compelled to introduce the concept of metatransients to describe the brains activity in a sensorimotor loop. These meta-transients in general will not be describable as orbits of a dynamical system. Finally, we used the fact that not all of the brains activity is directly reflected in the motor performance to discern between the brains effective (horizontal) and internal (vertical) activations.

In a more general sense the horizontal part is associated more with the sensorimotor pathways, whereas the vertical part is assigned to the higher centers of the brain, associated with cognitive faculties of a system. Of course horizontal and vertical processes are not decoupled and depend on each other; they are processes on one and the same highly recurrent network. As usual, higher centers are assumed to check the adequacy of the activities along the sensorimotor pathways; they are modulating the sensorimotor flow of signals. On the other hand, the vertical processes are permanently restricted by the "horizontal" flow of signals; otherwise, that is, without sensor inputs, they will run freely into perhaps noxious states of brain and body.

Following a purely formal approach to neurodynamics, we introduced in Section 3.2 the concept of parametric stability and the associated concept of a dynamical form. We think that these concepts may help to discuss questions concerning the representation of objects or, in this context better, behavior relevant situations in the external world.

From the dynamical point of view certain patterns of sensor inputs will be associated with the existence of certain attractors in activation space A; or otherwise stated, with the existence of a certain attractor-landscape. Because one has to assume that the brains dynamics is always driven by sensor inputs (including proprioception) it is more plausible to refer to a basin of attraction as a candidate for representing an external situation. Taking our argument for meta-transients serious it becomes obvious, that a dynamical form, associated with a certain type of behavior, is a reasonable representative for behavior relevant situations in the external world. Thus, taking parametric stability as essential for the reproducible identification of "the same" situations gives a reasonable conceptual basis for treating brain dynamics induced by an ever changing complex environment.

If one approves this interpretation then it will also allow for a less restrictive dynamical view on memory. Neural memories usually are represented by asymptotically stable fixed points, like in Hopfield's associative-memory model, or are conceived as periodic, quasiperiodic, or even chaotic attractors of neural networks. In fact, the correspondence between attractors and memories is one of the fundamental aspects of neural networks. But, as we have seen, situated in a sensorimotor loop and driven by sensor inputs, the best we can expect is that attractors of a neural network serve as kinds of symbols, while the system always runs on transients to these attractors (or on meta-transients). So in a first step memory should be associated with the basins of certain attractors. Taken that the natural situation is such that neural systems in the sensorimotor loop run on meta-transients, we have to assume that the union of all basins of attraction, belonging to the possibly morphing attractors of a dynamic form, should be identified with the memory of certain behavior relevant external situations. We will call this kind of memory model a blurred memory. The relation between learning and blurred memory will be the subject of further research.

#### 7. AUTHOR CONTRIBUTIONS

The author confirms being the sole contributor of this work and approved it for publication.

#### ACKNOWLEDGMENTS

The author would like to thank the Institute of Cognitive Science, University of Osnabrück, for its hospitality and support.

#### REFERENCES


**Conflict of Interest Statement:** The author declares that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2017 Pasemann. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# APPENDIX

The following is a list of parameters corresponding to the designated 2-neuron networks used for demonstrations in this paper.

TABLE A1 | Parameters for 2-neuron neural networks (Equation 2) discussed in the paper.


# An Adaptive Neural Mechanism for Acoustic Motion Perception with Varying Sparsity

#### Danish Shaikh\* and Poramate Manoonpong

*Embodied AI and Neurorobotics Laboratory, Centre for BioRobotics, Maersk Mc-Kinney Moeller Institute, University of Southern Denmark, Odense, Denmark*

Biological motion-sensitive neural circuits are quite adept in perceiving the relative motion of a relevant stimulus. Motion perception is a fundamental ability in neural sensory processing and crucial in target tracking tasks. Tracking a stimulus entails the ability to perceive its motion, i.e., extracting information about its direction and velocity. Here we focus on auditory motion perception of sound stimuli, which is poorly understood as compared to its visual counterpart. In earlier work we have developed a bio-inspired neural learning mechanism for acoustic motion perception. The mechanism extracts directional information via a model of the peripheral auditory system of lizards. The mechanism uses only this directional information obtained via specific motor behaviour to learn the angular velocity of unoccluded sound stimuli in motion. In nature however the stimulus being tracked may be occluded by artefacts in the environment, such as an escaping prey momentarily disappearing behind a cover of trees. This article extends the earlier work by presenting a comparative investigation of auditory motion perception for unoccluded and occluded tonal sound stimuli with a frequency of 2.2 kHz in both simulation and practice. Three instances of each stimulus are employed, differing in their movement velocities–0.5◦ /time step, 1.0◦ /time step and 1.5◦ /time step. To validate the approach in practice, we implement the proposed neural mechanism on a wheeled mobile robot and evaluate its performance in auditory tracking.

Edited by:

*Shuai Li, Hong Kong Polytechnic University, Hong Kong*

#### Reviewed by:

*Xiaosu Hu, University of Michigan, USA Noman Naseer, Air University, Pakistan Muhammad Umer Khan, Air University, Pakistan*

> \*Correspondence: *Danish Shaikh danish@mmmi.sdu.dk*

Received: *03 December 2016* Accepted: *20 February 2017* Published: *09 March 2017*

#### Citation:

*Shaikh D and Manoonpong P (2017) An Adaptive Neural Mechanism for Acoustic Motion Perception with Varying Sparsity. Front. Neurorobot. 11:11. doi: 10.3389/fnbot.2017.00011* Keywords: acoustic motion perception, binaural acoustic tracking, sound localisation, correlation-based learning, lizard peripheral auditory system

# 1. INTRODUCTION

Historically motion perception has been extensively studied in the context of visual tracking. This comes as no surprise as it is the dominant sense for humans and most animal species. In humans it plays an important role in visuomotor coordination tasks such as catching a ball (Oudejans et al., 1996). In the animal kingdom, motion perception is a crucial element that is relevant to sustenance and survival. It is particularly important in conditions where the target being tracked is sporadically occluded (Morgan and Turnbull, 1978) such as a predator tracking a moving prey that occasionally disappears from view behind trees.

A simple correlation-based neural circuit for motion detection in vision that selectively responds to direction and velocity given monocular visual input has been proposed decades ago by Reichardt (1969). Such low-level motion detectors however have not been reported for audition. Auditory motion perception has therefore been suggested by Carlile and Leung (2016) to exist as a higher level system, similar to binocular, attention-modulated third-order visual motion detectors. The authors furthermore suggest that such third-order systems likely respond to snapshots of location information extracted from binaural cues. However, visual tracking experiments in the context of smooth eye pursuit that utilised periodic occlusion of the target indicate that target velocity may be a significant spatial information source (Barnes and Asselman, 1992; Churchland et al., 2003; Orban de Xivry et al., 2008).

Given a means to estimate a moving target's relative location and information regarding the time during which subsequent estimates are determined, the target's velocity can be derived. Here we demonstrate that the target velocity for continuous unoccluded as well as occluded acoustic targets could be learned based on the determination of these two pieces of information. We frame the problem of acoustic motion perception as an active acoustic tracking task. Active acoustic tracking entails movement of the acoustic organs to track an object, which is a natural auditory tracking behaviour. The dynamics of auditory tracking in cats with disconnected optical nerves, which disabled visual processing, have been behaviourally investigated (Beitel, 1999). The recorded head motion of these animals while tracking a series of click sounds emitted by a rotating loudspeaker suggested sound localisation being performed in a series of steps. The animals first displayed a rapid saccadelike head-orienting response to localise the target within the frontal sound field. This was followed by successive head movement cycles where the head would overshoot and pause, ensuring that the target's location remained close to the median plane.

### 1.1. Auditory Localisation Cues for Spatial Motion Perception

There are three types of cues available for auditory localisation– the difference in arrival times of a sound (interaural time difference or ITD), the difference in sound level (interaural level difference or ILD) and spectral information (directiondependent energy minimisation over the entire frequency spectrum due to filtering by the outer ear). Several animals such as frogs, crickets and lizards utilise only ITD cues for sound localisation. For these animals spectral and ILD cues are unavailable due to lack of pinnae and the diffraction of sound around the head respectively. Using difference cues for localisation requires two ears with a frequency-dependent displacement between them. Generating ILD cues requires a sufficiently large head between the ears. The dimensions of the head should however be at least greater than the half-wavelength of the sound signal to successfully generate ILD cues. This creates an acoustic shadow inside which the relative sound amplitude is reduced. ITD cues can however be generated without the need of such obstructions, but do depend on the displacement between the ears and the angle of incidence of the sound with respect to the median plane. Here we restrict ourselves to acoustic tracking of a moving sound signal using only ITD cues extracted from microphones.

A sound signal moving in a given direction with a constant velocity with respect to the microphones generates dynamically varying ITD cues. The instantaneous values of these cues are dependent on the relative instantaneous position of the sound signal, while the rate with which they vary is dependent on the relative movement speed of the sound signal. Actively tracking a moving sound signal therefore requires transforming these relative position- and velocity-dependent cues into a desired behaviour, for example robotic orientation or phonotaxis. One must first determine the instantaneous spatial location of the sound signal to within the desired threshold of the instantaneous tracking error. This localisation must then be successively repeated sufficiently quickly to minimise the tracking error.

### 1.2. Relevance of Acoustic Motion Perception

There are several applications where actively tracking an acoustic target can be of interest. In robot phonotaxis applications, the robot could localise acoustic signals and navigate toward them (Reeve and Webb, 2003; Oh et al., 2008). In audio-visual teleconferencing systems, dynamically-steered microphone systems that automatically orient toward a speaker as they move about in a room could maximise the power of the incoming audio signal or orient a video camera toward the current speaker (Wang and Chu, 1997; Brandstein and Ward, 2001). Social robots that respond to sound and/or speech input from the human are another example. The verbal humanrobot interaction element in social robots is deemed to be more natural and richer if the robot's acoustomotor response orients and maintains its gaze as well as auditory focus on the subject of interest (Nakadai et al., 2000; Okuno et al., 2003) in motion. For example, a human walks around in a room while addressing the robot via either directed or undirected speech commands.

Conventional acoustic tracking techniques (Liang et al., 2008a; Tsuji and Suyama, 2009; Kwak, 2011; Ju et al., 2012, 2013; Nishie and Akagi, 2013) are passive in that they require no movement of the listener. All of these techniques extract ITD cues for localisation by utilising multi-microphone arrays with at least four microphones. Typical arrays comprise an order of magnitude more microphones arranged in various geometric configurations such as linear, square, circular or in distributed arrays. ITD-based sound localisation and tracking techniques also tend to utilise computationally intensive algorithms such as particle filtering to compute the relative sound signal location from raw ITD data (Ward et al., 2003; Lehmann, 2004; Valin et al., 2007; Liang et al., 2008b; Ning et al., 2015). More conventional approaches are based on the generalised crosscorrelation technique (Knapp and Carter, 1976) or the more recent steered response power technique (DiBiase, 2000; DiBiase et al., 2001; Zotkin and Duraiswami, 2004; Dmochowski et al., 2007; Cai et al., 2010; Wan and Wu, 2010; Marti et al., 2013; Zhao et al., 2013; Lima et al., 2015). Employing a larger number of microphones can improve localisation accuracy but at the expense of greater computational complexity and costly hardware for synchronisation and processing of multi-channel acoustic signals.

# 1.3. Contribution of the Present Work

We have previously reported a system for acoustic motion perception (Shaikh and Manoonpong, 2016) employing two microphones that implements a neural learning mechanism. The learning utilises a mathematical model that mimics the functionality of the auditory processing performed by the lizard peripheral auditory system (Wever, 1978). The system provides sound direction information and has been characterised via biofaithful mathematical modelling (Zhang, 2009). The parameters of the model have been determined from biophysical data recorded from live lizards (Christensen-Dalsgaard and Manley, 2005). The model has also been implemented on a number of robotic platforms as reviewed in Shaikh et al. (2016). The neural learning mechanism has been adapted from the Input Correlation (ICO) learning approach (Porr and Wörgötter, 2006), which itself has been derived from a class of differential Hebbian learning rules (Kosko, 1986). The neural mechanism is considered to be a first step toward the development of a biologically-plausible neural learning mechanism for acoustic motion perception. The mechanism has been validated in simulation for tracking a continuous unoccluded acoustic signal moving with a constant and unknown angular velocity along a semi-circular trajectory. It has also been shown to learn various target angular velocities in separate simulated trials.

Here we implement the neural learning mechanism and compare its tracking performance for three different types of sound signals–continuous unoccluded, periodically occluded and randomly occluded. We first implement the neural mechanism in simulation that allows a robotic agent to learn to track a virtually-moving continuous unoccluded sound signal for a set of three different and unknown target angular velocities. As earlier the virtual sound signal is a pure tone moving along a semi-circular trajectory. To validate the tracking performance in practice, the learned synaptic weights representing a given target angular velocity are then used directly on a wheeled mobile robot that also implements the neural mechanism.

Next we implement another instance of the neural mechanism in simulation to learn to track a periodically occluded acoustic signal, moving with a constant but unknown angular velocity along a semi-circular trajectory. The occluded acoustic signal is implemented as an intermittent signal, i.e., it has a continuous unoccluded sound for a constant interval followed by complete silence for a constant interval. The silence implies that the signal is occluded and therefore inaudible. The acoustic tracking performance is evaluated in simulation for a constant "duty cycle" of sound emission. In this manner, the acoustic tracking performance is again evaluated for a set of three different target angular velocities identical to those used earlier. An instance of the simulation results is validated in practice via robotic trials with the wheeled mobile robot.

Finally, we implement a third instance of the neural mechanism in simulation to learn to track an occluded acoustic signal as described earlier, however with a randomly varying duty cycle. The signal moves as before with a constant but unknown angular velocity along a semi-circular trajectory. We evaluate the acoustic tracking performance for a set of three different target angular velocities identical to those used earlier. The main contribution of this work lies in systematically investigating the comparative performance of a neural closed-loop learning mechanism in learning the angular velocity of an acoustic stimulus with varying sparsity.

This article is organised in the following manner. Section 2 provides background information about the lizard peripheral auditory system and its equivalent model as well as about ICO learning. Section 3 presents the adaptive neural acoustic tracking architecture, the experimental setup and the robot model. Section 4 shows the experimental results in both simulation and practice. Section 5 summarises the work and discusses future directions.

### 2. BACKGROUND

# 2.1. The Lizard Peripheral Auditory System

The remarkable sensitivity of the peripheral auditory system (Christensen-Dalsgaard and Manley, 2005; Christensen-Dalsgaard et al., 2011) of lizards such as the bronze grass skink or Mabuya macularia, and the tokay gecko or Gekko gecko as depicted in **Figure 1A** is quite well understood. This "directionality" enables the animal to extract the relative position of a relevant sound signal. The lizard ear achieves a directionality higher than that of any known vertebrate (Christensen-Dalsgaard and Manley, 2005). This is due to an internal acoustical connection formed by efficient sound transmission through internal pathways in the head as depicted in **Figure 1B**, between the animal's two eardrums.

In spite of the peripheral auditory system's relatively small dimensions (the eardrums for most lizard species are separated by 10–20 mm), the range of sound wavelengths over which it exhibits strong directionality (Christensen-Dalsgaard et al., 2011) is relatively wide (340–85 mm, corresponding to 1–4 kHz). Within this range of frequencies the sound pressure difference between the eardrums is negligible due to acoustic diffraction around the animal's head, thus generating almost negligible (1– 2 dB) ILD cues. The system thus relies on µs-scale interaural phase differences between incoming sound waves at the two ears due to the physical separation. These phase differences, corresponding to ITDs, are used extract information about sound direction relative to the animal. The system essentially converts these relatively tiny phase differences into relatively larger (up to 40 dB) interaural vibrational amplitude differences (Christensen-Dalsgaard and Manley, 2005). These amplitude differences encode sound direction information. Each eardrum's vibrations are the result of the superposition of two acoustic components generated due to sound interference in the internal pathways–an external sound pressure acting on the eardrum's periphery and an equivalent internal sound pressure acting on its interior. This leads to the ipsilateral (toward the sound signal) amplification of eardrum vibrations and contralateral (away from the sound signal) cancellation of eardrum vibrations. In other words, the ear closer to the relevant sound signal vibrates more strongly as compared to the ear further away from it. The relative phase difference between the incoming sound waves at the two eardrums determines the relative strengths of their vibrations.

FIGURE 1 | (A) An eardrum visible on the side of the gecko head (redrawn from Christensen-Dalsgaard et al., 2011). (B) Early cross-sectional diagram of the lizard (*Sceloporus*) auditory system (taken from Christensen-Dalsgaard and Manley, 2005). (C) Ideal lumped-parameter circuit model (based on Fletcher and Thwaites, 1979; Fletcher, 1992 and redrawn from Zhang, 2009). Voltages *V*<sup>I</sup> and *V*<sup>C</sup> respectively represent sound pressures *P*<sup>I</sup> and *P*<sup>C</sup> at the ipsilateral and contralateral eardrums. Currents *i* <sup>I</sup> and *i*C, respectively represent the vibrations of the ipsilateral and contralateral eardrums due to the sound pressures acting upon them. Impedances *Z*r model the combined acoustic filtering due to the mass of the eardrums and stiffness of the Eustachian tube through the central cavity connecting the tympani to each other. Impedance *Z*v models the acoustic filtering effects of the central cavity itself. Voltage *V*cc represents the resultant sound pressure in the central cavity due to the interaction of the internal sound pressures experienced from either side. This causes current *i*cc to flow, representing the movement of sound waves inside the central cavity as the pressure inside it varies. (D) Contour plot (redrawn from Zhang, 2009) modelling binaural subtraction of the ipsilateral and contralateral responses as defined by Equation (2).

An equivalent electrical circuit model of the peripheral auditory system as depicted in **Figure 1C** (Fletcher and Thwaites, 1979; Fletcher, 1992) allows the directionality to be visualised as shown in **Figure 1D** as a difference signal computed by subtracting the vibrational amplitudes of the eardrums. Labelling the vibrational amplitudes of the ipsilateral and contralateral eardrums respectively as i<sup>I</sup> and iC, the difference signal can be formulated as

$$\left| \frac{i\_{\rm I}}{i\_{\rm C}} \right| = \left| \frac{G\_{\rm I} \cdot V\_{\rm I} + G\_{\rm C} \cdot V\_{\rm C}}{G\_{\rm C} \cdot V\_{\rm I} + G\_{\rm I} \cdot V\_{\rm C}} \right| \,, \tag{1}$$

where frequency-dependent gains G<sup>I</sup> and G<sup>C</sup> respectively model the effect of sound pressure on the motion of the ipsilateral and contralateral eardrum. These gains are essentially analogue filters in signal processing terminology with their coefficients determined experimentally from eardrum vibration measurements for individual lizards via laser vibrometry (Christensen-Dalsgaard and Manley, 2005). Expressing i<sup>I</sup> and i<sup>C</sup> in decibels,

$$i\_{\rm ratio} = 20 \left( \log |i\_{\rm I}| - \log |i\_{\rm C}| \right) \text{ dB }. \tag{2}$$

  The model responds well for sound frequencies within the range 1–2.2 kHz, with a peak response at approximately 1.6 kHz. iratio is positive for |i<sup>I</sup> | > |iC| and negative for |iC| > |i<sup>I</sup> |. The model's symmetry implies that the model's response |iratio| is identical on either side of the centre point θ = 0 ◦ as well as locally symmetrical within the sound direction range [−90◦ , +90◦ ] (considered henceforth as the range of interest of sound direction). The difference signal expressed as Equation (2) provides information about sound direction in that its sign indicates whether the sound is coming from the ipsilateral side (positive sign) or from the contralateral side (negative sign), while its magnitude corresponds to the relative angular displacement of the sound signal with respect to the median.

#### 2.2. Input Correlation (ICO) Learning

Since the proposed neural mechanism is derived from the ICO learning algorithm (Porr and Wörgötter, 2006), this section gives a brief introduction to the algorithm. The algorithm, depicted as a neural mechanism in **Figure 2**, is online unsupervised learning. Its synaptic weight update is driven by cross-correlation of two types of input signals–one or multiple "predictive" signal(s) which are stimuli occurring earlier in time and a "reflex" signal which is a stimulus occurring later in time, that arrives after a finite delay and drives an unwanted response or reflex. The learning goal of ICO learning is to predict the occurrence of the reflex signal by utilising the predictive signal. This allows an agent to react earlier, before the reflex signal occurs. The agent essentially learns to execute an anticipatory action to avoid the reflex.

The output OICO of the ICO learning mechanism is a linear combination of the reflex input x<sup>0</sup> and the N predictive input(s) x<sup>k</sup> where k = 1, . . . , N and N ∈ N. OICO is formulated as

$$O\_{\rm ICO} = \rho\_0 \mathbf{x}\_0(t) + \sum\_{\mathbf{k}=1}^{N} \rho\_{\mathbf{k}}(t) \mathbf{x}\_{\mathbf{k}}(t) \,. \tag{3}$$

The synaptic weight ρ<sup>0</sup> of the reflex input is assigned a constant positive value such as 1.0, representing a reflex signal whose strength does not change over time. During learning, the synaptic weight(s) ρ<sup>k</sup> of the predictive signal(s) xk(t) are updated through differential Hebbian learning (Kosko, 1986; Klopf, 1988) using the cross-correlation between the predictive and reflex inputs. The synaptic weight update rule is given by

$$\frac{d\rho\_{\mathbf{k}}(t)}{dt} = \mu \chi\_{\mathbf{k}}(t) \frac{d\chi\_{\mathbf{0}}(t)}{dt}, \mathbf{k} = 1, \dots, N. \tag{4}$$

The learning rate µ, usually set to a value less than 1.0, determines how fast the neural mechanism can learn to avoid the reflex signal from occurring. The synaptic weights ρ<sup>k</sup> tend to stabilise when the reflex signal is nullified, which implies that the reflex signal has been successfully avoided. ICO learning is characterised by its fast learning speed and stability of synaptic weight updates and has been successfully applied to real robots to generate adaptive behaviour (Manoonpong et al., 2007; Porr and Wörgötter, 2007; Manoonpong and Wörgötter, 2009).

#### 3. MATERIALS AND METHODS

We define the task of acoustic tracking as follows–a robotic agent must learn to track a moving acoustic signal. The robot learns the target's angular velocity by matching it with its (the robot's) own angular turning velocity. The correct angular turning velocity should allow the agent to rotate along a fixed axis sufficiently quickly so as to align itself toward the instantaneous position of the acoustic signal. The signal is moved in the horizontal plane along a pre-defined semi-circular arc-shaped trajectory with an unknown velocity in an unknown but fixed direction. To solve this task we employ an adaptive neural architecture (Shaikh and Manoonpong, 2016) that combines the auditory preprocessing of the lizard peripheral auditory model with a neural ICO-based learning mechanism.

#### 3.1. The Neural Architecture

The neural mechanism is embedded within the task environment as a closed-loop circuit as depicted in **Figure 3**. The goal of the learning algorithm is to learn the temporal relationship between the perceived position of the target sound signal before turning and after turning. The synaptic weights of the neural mechanism encode this temporal relationship and they can then be used to calculate the correct angular turning velocity. A given set of learned synaptic weights can however only represent a given angular velocity. This is because the temporal relationship between the perceived position of the target sound signal before turning and after turning depends on the angular turning velocity. Therefore, the synaptic weights must be re-learned to obtain a new angular turning velocity.

The output of the neural mechanism is the angular velocity ω, defined as the angular deviation per time step, required to turn the robot quickly enough to orient toward the target sound signal in one time step. The rotational movements of the robotic agent translate ω into corresponding ITD cues. The peripheral auditory model (PAM), based on these cues,

computes a difference signal x(t) which encodes information regarding sound direction. Practically, x(t) is the difference between the modelled vibrational amplitudes of the left and right eardrums in response to sound input, i.e., it is essentially iratio as defined by Equation (2). A filter bank decomposes x(t) into sound frequency-dependent components xk(t), where k = 1, . . . , N, to extract frequency information. Each of these components encodes the extracted sound direction information within a specific frequency band. Practically, these components are the difference between the modelled vibrational amplitudes of the left and right eardrums in response to sound input. This step is necessary since the peripheral auditory model provides ambiguous information regarding the sound direction in the absence of sound frequency information. The ambiguity is a result of the difference signal x(t) having identical values for multiple positions of the sound signal if the sound frequency is unknown (see **Figure 1D**). The filter bank comprises five bandpass filters. The centre frequencies of these filters lie at 1.2, 1.4, 1.6, 1.8, and 2.0 kHz within the relevant response range. Each filter has a 3 dB cut-off frequency of 200 Hz. This results in N = 5 filtered difference signals at the output of the filter bank. The magnitude responses of the individual filters in the filter bank represent the receptive fields of individual auditory neurons. These spectro-temporal receptive fields (Aertsen et al., 1980) are essentially the range of sound frequencies over which the neurons are optimally stimulated. The filtered difference signals xk(t) are then used as inputs that are correlated with the derivative of the unfiltered difference signal x0(t). The input signals xk(t) represent the earlier-occurring predictive stimuli used to estimate the instantaneous sound direction before turning, while the

unfiltered difference signal x0(t) represents the later-occurring "reflex" stimuli or the retrospective signal generated after turning.

In traditional ICO learning the synaptic weights are stabilised once the reflex signal is nullified, thereby creating a behavioural response that prevents future occurrences of the reflex signal. In our case, as soon as the target sound signal moves to a new position along its trajectory, a new and finite retrospective signal x<sup>0</sup> corresponding to the new position is generated. This signal is then nullified after turning if the correct synaptic weights have been learned, and then the target sound signal moves to a new position along its trajectory. Our approach can therefore be considered as one successful step of ICO learning being successively repeated for each new position of the target sound signal as it moves along its trajectory. This implies that the synaptic weights can grow uncontrollably if the learning is allowed to continue indefinitely. A stopping criterion for the learning was therefore introduced to avoid this condition–the learning stops when the tracking error θ<sup>e</sup> becomes less than 0.5◦ . θ<sup>e</sup> is defined as the difference between the orientation of the robot and the angular position of the sound signal in one time step. In other words, the learning stops when the robot is able to orient itself toward a position that is within 0.5◦ from the position of the sound signal within one time step.

#### 3.2. The Experimental Setup

The experimental setup in simulation comprises a virtual loudspeaker array as depicted in **Figure 4** which generates relevant pure tone sounds at a 2.2 kHz frequency. This frequency is chosen because sufficient directional information from the peripheral auditory model is available at this frequency. The

array consists of 37 loudspeakers numbered #1–#37 from right to left, arranged in a semi-circle in the azimuth plane. The angular displacement between consecutive loudspeakers is 5◦ . The loudspeakers are turned on sequentially, starting from the loudspeaker at one of the ends of the array, to simulate the motion of a continuously moving sound signal (albeit in discrete steps). To maintain the continuity of the sound the next loudspeaker plays immediately after the previous loudspeaker has stopped. A given tone can therefore be moved with a given angular velocity across the array along a semi-circular trajectory from either the left or the right side. The angular velocity of the sound signal is defined as the angular displacement in radians per time step. A given loudspeaker, when turned on, plays a tone for 10 time steps before turning off and at the same instant the next consecutive loudspeaker turns on. This process is repeated until the sound reaches the last loudspeaker in the array. The movement of sound from loudspeaker #1 to loudspeaker #37 is defined as one complete learning iteration. Since one iteration may be insufficient to learn the correct angular velocity of the target sound signal, the learning is repeated over multiple iterations until the stopping criterion is met. After the completion of one learning iteration, the sound signal starts again from loudspeaker #1 in the next learning iteration. The direction of movement of sound is chosen to be from the right side (+90◦ ) to the left side (−90◦ ) of the array.

The robot that should track the moving target sound signal is positioned at the mid-point of the diameter of the semi-circle and is only allowed to rotate in the azimuth plane along a fixed axis. The robot must turn with a sufficiently large angular turning velocity to orient toward the instantaneous position of sound signal before the sound signal moves to the next position along its trajectory. The angular velocity of the robot is defined as the angular rotation in radians per time step. The goal of the learning algorithm is to learn the correct angular velocity that allows the robot to turn and orient toward the current loudspeaker in one time step, starting from the time step at which that loudspeaker started playing the tone.

The learning at every time step occurs as follows. The robotic agent is initially oriented toward a random direction toward the right side of the array. Loudspeaker #1 emits a tone and the robot uses the sound direction information extracted by the peripheral auditory model to turn toward the currently playing loudspeaker with an angular velocity ω (computed using the initial values of the synaptic weights) given by

$$
\omega = \rho\_0 \mathbf{x}\_0 + \sum\_{\mathbf{k}=1}^{N} \rho\_\mathbf{k} \mathbf{x}\_\mathbf{k}, \text{ where } N = 5. \tag{5}
$$

After the turn is complete, the robot once again extracts sound direction information via the peripheral auditory model and computes the retrospective signal x0(t + δt). The strength of x0(t + δt) depends on the relative position of the sound signal with respect to the orientation of the robotic agent after it has performed a motor action in the task environment. Therefore, this retrospective signal acts as the feedback information that is used to update the synaptic weights.

The synaptic weights ρ<sup>k</sup> are then updated according to the learning rule

$$\frac{d\rho\_{\mathbf{k}}(t)}{dt} = \mu \chi\_{\mathbf{k}}(t) \frac{d\chi\_{0}(t)}{dt}, \text{ where } \mathbf{k} = 1, \dots, N. \tag{6}$$

After 10 time steps loudspeaker #1 is deselected and the next loudspeaker in the array (loudspeaker #2) is selected. This learning procedure is repeated for all loudspeakers in succession.

We use three different angular velocities for the sound signals– 0.5◦ /time step, 1.0◦ /time step and 1.5◦ /time step. These values were chosen primarily because the loudspeaker array in the experimental setup in practice is restricted to sound signal displacements that are multiples of 5◦ . The neural parameters for all trials are set to the following values–the learning rate µ = 0.0001 and synaptic weight for the retrospective signal x0, ρ<sup>0</sup> = 0.00001. All plastic synaptic weights ρ<sup>k</sup> are initially set to zero and updated according to Equation (6).

We first implement a new instance of the neural learning mechanism in simulation. The mechanism allows a robotic agent to learn the synaptic weights required to track a continuous unoccluded sound signal in simulation. The initial orientation of the robotic agent is randomly chosen to be 116◦ to emphasise that the learning is independent of any specific initial orientation. The continuous unoccluded sound can be viewed as a sound with 100% sound emission duty cycle, i.e., there are no breaks in the sound emission. We evaluate the acoustic tracking performance in simulation for a set of three different target angular velocities– 0.5◦ /time step, 1.0◦ / time step and 1.5◦ /time step. We then verify the simulation results in practice for a target angular velocity of 1.5◦ /time step by recreating the experimental setup in the form of robotic trials. We employ a wheeled mobile robot, as described in Section 3.3, to track a continuous unoccluded pure tone sound signal that is moved along a semi-circular virtual loudspeaker array as depicted in **Figure 5**. The array has an identical configuration as the one used in the simulation setup and is located in a sound-dampening chamber to minimise acoustic reflections. The synaptic weights used on the robot are those learned offline in simulation.

We then use another identical instance of the neural mechanism in the same simulation setup as before to learn to track a virtual pure tone sound signal that is periodically occluded, i.e., it is structured as a constant sound for a constant interval followed by complete silence for a constant interval. This sound emission duty cycle is set to 60%. The target sound signal again moves with a constant but unknown angular velocity along a semi-circular trajectory as described earlier. The initial orientation of the robotic agent in simulation is randomly chosen to be 97◦ to emphasise that the learning is independent of any specific initial orientation. In this manner, the acoustic tracking performance is evaluated in simulation for a set of three different target angular velocities–0.5◦ /time step, 1.0◦ /time step and 1.5◦ /time step. The simulation results are validated for a target angular velocity of 1.5◦ /time step in practice via robotic trials with the mobile robot as described earlier. The synaptic weights used on the robot are again those learned offline in simulation.

Finally, we use a third instance of the neural mechanism in simulation to learn to track a virtual pure tone sound signal that is occluded as described earlier but with a randomly varying duty cycle of sound emission. During learning, for every loudspeaker the sound emission duty cycle is chosen from a uniform random distribution between 10 and 90%. As before, the target sound signal moves with a constant but unknown angular velocity along a semi-circular trajectory as described earlier. The initial orientation of the robotic agent in simulation is randomly chosen to be 97◦ , once again to emphasise that the learning is independent of any specific initial orientation. We evaluate the acoustic tracking performance in simulation for a set of three different target angular velocities–0.5◦ /time step, 1.0◦ /time step and 1.5◦ /time step. We once again validate the simulation results in practice for a target angular velocity of 1.5◦ /time step on the mobile robot as described earlier. The sound emission duty cycles for each loudspeaker in the robotic trial are again randomly chosen from a uniform random distribution between 10 and 90%. This implies that the sequence of duty cycles is not identical to that used in the simulated trials. As earlier, the synaptic weights used on the robot are those learned offline in simulation.

#### 3.3. The Robot Model

**Figures 6A,B** respectively depict the mobile robot used in the robotic trials and its kinematics. The basic platform is assembled

with components from the Robotics Starter Kit from Digilent Inc.–the chassis, the DC motors (6 V), the corresponding Hbridge motor drivers, the rear wheels and a front omnidirectional ball caster wheel. The peripheral auditory model and the neural mechanism is implemented on a Raspberry Pi 2 (Model B+ from the Raspberry Pi Foundation) controller, which is paired with a FPGA board (model LOGI Pi from ValentFX). A dual channel analogue-to-digital (ADC) driver is implemented on the FPGA IC (Integrated Circuit) using the VHDL (VHSIC Hardware Description Language) programming language (VHSIC stands for Very-High-Speed Integrated Circuits). The VHDL design for the ADC driver is synthesised or compiled via a proprietary software tool (Xilinx Integrated Synthesis Environment or ISE from Xilinx Inc.) into a hardware-level binary "bitstream" containing all the necessary information to properly configure and program the logic into the FPGA chip. The driver reads in raw audio data from a dual channel 12-bit simultaneous ADC that digitises the signals from two omnidirectional microphones (model FG-23329-P07 from Knowles Electronics LLC) mounted 13 mm apart at the front of the robot (see inset in **Figure 6**). Since the peripheral auditory model's parameters have been derived from laser vibrometry measurements from a lizard with 13 mm separation between its eardrums, the microphone separation must match that value. Any other separation would create a mismatch between the ITD cues to which the peripheral auditory model is tuned and the actual ITD cues. A WiFi access point (model TL-WR802N from TP-LINK Technologies Co. Ltd.) allows wireless access to the robot controller for programming purposes. A 12,000 mAh lithium polymer power bank (model Xtorm AL450 from A-solar bv) serves as the power source for the robot.

The robot's kinematics are used to convert the learned angular rotation in degrees per second into the rotational speed in revolutions per minute (rpm) for the robot's wheels. One time step in simulation corresponds to 0.2 s, such that a learned angular turning velocity of θ degrees per time step implies that the robot should turn by θ degrees in a time period t<sup>θ</sup> = 0.2 s. This value is chosen because in the experimental setup the software controlling the loudspeaker array can only switch between consecutive loudspeakers at least every 2 s. Since the robot can only rotate along a fixed axis, the wheels travel along a semi-circular arc of length ⌢L when the robot performs a turn. Therefore, an angular displacement of θ degrees corresponds to the arc length ⌢L in millimetres as given by

$$
\widehat{\mathcal{L}} = 2\pi R \frac{\theta}{360^\circ},
\tag{7}
$$

where R is the radius in millimetres of the arc along which the wheels travel, and is essentially the distance between the centre of rotation of the robot and the centre of either wheel. Assuming v<sup>l</sup> and v<sup>r</sup> as the rotational velocities of the left and right wheels respectively, to rotate through an arc length ⌢L the two wheels must turn with identical angular velocities |v<sup>l</sup> | = |v<sup>r</sup> | = v but in opposite directions (to perform a leftward rotation, v<sup>l</sup> is considered as having a negative value and v<sup>r</sup> is considered as having a positive value). The angular velocity ω = ⌢L tθ mm/s. The wheel rotational velocity v in rpm is given by

$$\begin{split} \nu &= \frac{\widehat{\mathcal{L}}}{t\_{\theta}} \cdot \frac{1}{\pi d\_{\mathrm{w}}} \cdot 60 \,\mathrm{s} \\ &= 2\pi R \frac{\theta}{360^{\circ}} \cdot \frac{1}{t\_{\theta}} \cdot \frac{1}{\pi d\_{\mathrm{w}}} \cdot 60 \,\mathrm{s}, \end{split} \tag{8}$$

where d<sup>w</sup> is the diameter of the wheel. For the robot, R is measured to be 80 mm and d<sup>w</sup> is measured to be 70 mm. Substituting for R, d<sup>w</sup> and t<sup>θ</sup> into Equation (8), the mathematical conversion between the robot's angular velocity in degrees per second into the corresponding wheel velocity v in rpm can be formulated as

$$\nu = 2\pi \cdot 80 \,\mathrm{mm} \cdot \frac{\theta}{360^\circ} \cdot \frac{1}{0.2 \,\mathrm{s}} \cdot \frac{1}{\pi 70} \cdot 60 \,\mathrm{s} \approx 3.81 \cdot \theta. \tag{9}$$

Using Equation (9), the wheel velocities required by the robot corresponding to the three angular velocities 0.5◦ /time step, 1.0◦ /time step and 1.5◦ /time step are calculated to be approximately 19 rpm, 38 rpm and 57 rpm respectively. These rpm values represent the no-load wheel velocities, i.e., when the DC motor shafts experience zero load. In practice these "ideal" rpm values will be adversely affected by the weight of the robot, the friction between the wheels and the ground and the instantaneous battery capacity. To approach reallife motion constraints during tracking, the effects of these physical quantities are deliberately not modelled. The speed commands for the wheels are therefore manually matched to the corresponding wheel velocities under load. This is done by making the robot perform an on-the-spot turn on the ground in the experimental arena, and determining via trial and error the speed command (which is the duty cycle for the signals controlling the motor drivers) for which the wheels complete the necessary revolutions in 1 min. This ensures that the effects of the aforementioned quantities are taken into account while the robot is tracking the sound signal during the robotic trials. Furthermore, there may be a mismatch between the characteristics of the individual DC motors of the robot. This may result in a mismatch between the angular velocities of the motor shafts even though both motors receive identical speed commands. To compensate for any potential mismatch, the robot is once again made to perform on-the-spot turns on the arena floor and the speed commands were fine-tuned via trial and error to generate turns of 0.5◦ , 1.0◦ , and 1.5◦ in 0.2 s.

Video footage of the robotic trials was recorded from an overhead camera (Raspberry Pi camera module from the Raspberry Pi Foundation). The footage was analysed with a video analysis software tool (Tracker version 4.95 from Open Source Physics (Open Source Physics, 2016) to determine the amount by which the robot turned for each loudspeaker. The robot's rotation angles were extracted by manually tracking a green LED (Light Emitting Diode) on the robot. The tracking was done for relevant video frames in which the robot was completely still after completing a turn, to determine its deviation from the reference. **Figure 7** depicts a screenshot of Tracker software.

#### 4. RESULTS AND DISCUSSION

#### 4.1. Simulation Trials

**Figure 8** depicts the evolution of the tracking error θ<sup>e</sup> during learning for a target angular velocity of 1.5◦ /time step as an example. Corresponding data for target angular velocities of 1.0◦ /time step and 0.5◦ /time step is illustrated respectively in Figures 8-1, 8-2 (see files "image1.pdf " and "image2.pdf " respectively in the Supplementary Materials). The insets show θ<sup>e</sup> for a single iteration as an example. θ<sup>e</sup> reduces exponentially over time for all three types of acoustic stimuli–continuous unoccluded sound (see **Figure 8A**), periodically occluded sound with a 60% sound emission duty cycle (see **Figure 8B**) and randomly occluded sound with a random sound emission duty cycle (see **Figure 8C**).

The spikes in θ<sup>e</sup> as visible in the insets are a result of a mismatch between the last angular position toward which the robotic agent was pointing and the new angular position of the target sound signal as it moves along its trajectory. This mismatch generates finite ITD cues from which the robotic agent extracts sound direction information using the peripheral auditory model. The robotic agent then turns toward the sound signal with the last learned angular turning velocity, thereby

FIGURE 7 | Example screenshot of Tracker software used for extracting the robot's turning angles from overhead video footage of the robotic trials. The red circles indicate the location of the LED on the robot that is used for computing its angular rotation.

reducing the maximum tracking error. This process repeats for each time step, exponentially reducing the overall tracking error, until the stop criterion is met.

occluded sound with random duty cycle.

The number of iterations required for the synaptic weights to converge toward their final values is relatively lower for sparse or occluded sound signals as compared to unoccluded sound signals. For an occluded signal, the number of time steps for which the sound is emitted per loudspeaker decreases. Consequently, the number of weight updates per loudspeaker also decreases. For example, for a 60% duty cycle, the weights are updated is six of the ten time steps per loudspeaker. When the loudspeaker stops playing there is no sound input and the peripheral auditory model's outputs are balanced, resulting in the difference signal x(t) being nullified. This implies that both the retrospective signal x0(t) and the predictive signal xk(t) become zero. The weight increment given by the update rule in Equation (6) is therefore also zero when there is no sound present. From the perspective of the robotic agent's behaviour, this implies that the robotic agent does not move in the absence of sound. This is because there is no directional information available and the robotic agent "assumes" that it is already oriented toward the target sound position. Thus for occluded sound signals the robotic agent takes relatively fewer turns for each loudspeaker as compared to the number of turns taken for each loudspeaker for the unoccluded sound signal. This means that when the sound moves to a new position along its trajectory, the tracking error is relatively larger for occluded sound signals as compared to the unoccluded sound signal. This implies a relatively greater mismatch between the actual angular position of the loudspeaker and the orientation of the robotic agent and therefore relatively larger values for both the predictive and retrospective signals. Consequently, the synaptic weight update is also relatively larger for occluded sound signals as compared to unoccluded sound signal from the very first iteration as evident from **Figures 8D,E**. These large changes at the very beginning of the learning bring the synaptic weights relatively closer to their optimal values earlier in the learning process, and thus fewer subsequent iterations are required to bring the weights to their optimal values. Therefore, for a given target angular velocity the total number of iterations required for the synaptic weights to converge decreases for occluded sound signals as compared to the unoccluded sound signal.

The change in tracking error θ<sup>e</sup> for a pure tone sound signal that is randomly occluded with a sound emission duty cycle between 10 and 90% for each loudspeaker is depicted in **Figure 8C** for the target angular velocity of 1.5◦ /time step. The insets show θ<sup>e</sup> for a single iteration as an example. The uneven spikes visible in the insets indicate that the tracking error θ<sup>e</sup> is different for different angular positions of the target sound signal. For each new angular position of the target sound signal, the learning algorithm increments the synaptic weights corresponding to the randomly selected sound emission duty cycle currently in effect for that angular position. Therefore, the synaptic weight increments are also random as evident in **Figure 8F**. As discussed earlier, a relatively smaller sound emission duty cycle results in relatively fewer weight updates. This implies that when the current duty cycle is relatively small, the robotic agent makes relatively fewer turns and thus the tracking error may only decrease to a finite non-zero value. When the target sound signal subsequently moves to the new consecutive angular position along its trajectory, the tracking error increases again due to mismatch between the last angular position estimated by the robotic agent and the new angular position of the target sound signal. The amount of mismatch depends on the last learned angular turning velocity of the robotic agent. This in turn depends on the sound emission duty cycle for the last target angular position and that for the current target angular position. If the new sound emission duty cycle is relatively larger than the last one then there are relatively more weight updates. The tracking error may either reduce to zero or to another finite but non-zero value for that particular target angular position.

As an example, the relationship between the predictive signal x5(t) and the derivative of the retrospective signal dx0(t) dt and the corresponding weight updates can be seen in **Figure 9** over one iteration of the learning. In the figure the sound signal is moving with an angular velocity of 1.5◦ /time step. Similar relationships corresponding to target angular velocities of 1.0◦ /time step and 0.5◦ /time step are illustrated respectively in Figures 9-1, 9-2 (see files "image3.pdf " and "image4.pdf " respectively in the Supplementary Materials). The respective weight updates (normalised for comparison) for all three types of acoustic stimuli–continuous unoccluded sound (see **Figure 9D**), periodically occluded sound with a 60 % sound emission duty cycle (see **Figure 9E**) and randomly occluded sound with a random sound emission duty cycle (see **Figure 9F**)–reflect the dependence of the size of the weight increments on the sound emission duty cycle as discussed earlier. The small initial spikes seen in the weight updates are a result of the dx0(t) dt term in Equation (6) being initially positive and then becoming negative in the subsequent time step. The retrospective signal x0(t) is first positive due to the sound signal moving further away from the robotic agent. In the subsequent time step the robot reacts by turning toward the sound signal, thereby reducing x0(t). This results in the derivative dx0(t) dt being negative. This leads to a negative weight increment which decreases the weight in the subsequent time steps after the spike. The term dx0(t) dt becomes negative because the robot always turns toward the sound signal, which reduces x0(t).

A more thorough investigation of the effect of decreasing sound emission duty cycle on the number of iterations required to learn the target angular velocity within the given error bounds is depicted in **Figure 10**. The number of iterations required for the synaptic weights to converge decreases with decreasing sound emission duty cycle, i.e., with increasing sparsity of sound stimulus as described earlier.

The number of iterations required for the synaptic weights to converge also decreases for increasing angular velocity of the sound signal. This can be seen in Figures 8-1, 8-2 (see files "image1.pdf " and "image2.pdf " respectively in the Supplementary Materials). For increasing target angular velocity, the mismatch between the angular position toward which the robot was oriented after its last turn and the current position of the sound signal is relatively greater. This results in relatively

FIGURE 9 | Example snapshots of the synaptic weight updates (right column) corresponding to the correlation between the predictive signal x<sup>5</sup> (t) (solid line, left column) and the derivative of the retrospective signal dx0(t) dt (dashed line, left column) for a sound signal moving with an angular velocity of 1.5◦ /time step. (A) Continuous unoccluded sound. (B) Periodically occluded sound with 60% duty cycle. (C) Randomly occluded sound with random duty cycle. (D) Synaptic weights for continuous unoccluded sound. (E) Synaptic weights for periodically occluded sound with 60% duty cycle. (F) Synaptic weights for randomly occluded sound with random duty cycle.

larger predictive signals xk(t), and therefore a relatively larger correlation term x<sup>k</sup> (t) dx0(t) dt per time step in Equation (6). This consequently leads to relatively faster weight updates, reducing the total number of time steps and thus iterations taken to learn the correct angular velocity.

#### 4.2. Real Robot Implementation

Individual robotic trials are conducted for continuous unoccluded as well as occluded sound signals. In all trials, the sound signal is moved virtually in the experimental arena as depicted in **Figure 5** with an angular velocity of 1.5◦ /0.2 s. We present video footage of the trials in which the robot's tracking behaviour after learning can be seen. Supplementary Videos #1, #2 and #3 (see files titled "video1.mp4" , "video2.mp4" and "video3.mp4" respectively in the Supplementary Materials) respectively show the tracking behaviour for the continuous unoccluded sound signal with a duty cycle of 100%, the periodically occluded signal with a duty cycle of 60% and the randomly occluded signal with a random duty cycle between 10 and 90% for each loudspeaker. As evident from the video footage, in all trials the robot is able to successfully perceive the acoustic motion of the sound signal and orient toward the currently playing loudspeaker as indicated by a green LED mounted on the top of the loudspeaker.

**Figure 11** depicts the tracking performance during the robotic trials for all three sound signals in terms of the tracking error θe. In the robotic trials, the robot's performance is relatively good. Small errors in tracking are observed during the trials as evident from the recorded video footage. Even after undertaking compensatory actions as described in Section 3.3, errors in tracking are unavoidable under real-life conditions due to ambient noise introduced in the sound signals. The robot manages to compensate for any positive or negative tracking errors (that are introduced by respectively turning either too fast or too slow) for a given loudspeaker by respectively making relatively smaller or larger turns for the next loudspeaker. This is because the difference signal x( t) generated by the peripheral auditory model also provides some information regarding the sound direction (see Section 2.1) that the neural mechanism uses to automatically compensate for tracking errors, even though the synaptic weights are fixed.

The tracking errors are relatively greater for the randomly occluded sound signal as compared to those for the unoccluded and periodically occluded signals. There is a consistent offset from the reference that implies that the robot's turns consistently lag behind the currently playing loudspeaker. This is in agreement with the consistent offset between the alignment of the robotic agent and the angular location of the sound signal (i.e., a consistently non-zero tracking error) observed in simulation as evident in the inset in (**Figure 8C**). This is because the synaptic weights learned for a randomly occluded sound do not correspond to any single sound emission duty cycle. Instead, the algorithm learns the "best possible" values for the synaptic weights that fit all the sound emission duty cycles. This implies that in the robotic trials there will always be an offset between the angular location of the loudspeaker and the orientation of the robot as well.

In our experiments the loudspeaker sequence was never broken during a trial. Breaking the sequence, for example by not playing a given loudspeaker, the sound signal will jump forward along the trajectory by an amount that is twice the nominal displacement. For example, if the angular velocity of the sound signal is 0.5◦ /time step (or 5◦ /10 time steps) skipping a single loudspeaker would displace the sound source by 10◦ in 20 time steps. Assuming that the number of time steps between consecutive loudspeakers is unchanged, the angular speed will

however remain unchanged. This forward jump in the sound signal will cause the synaptic weights to be updated by a relatively larger amount than usual. This would accelerate the learning during any such forward jumps in the sound signal, resulting in relatively faster convergence toward the optimal synaptic weights.

Furthermore, the loudspeaker sequence cannot be random because that would imply a sound signal moving with a randomly varying angular velocity in a randomly varying direction. For example, if the loudspeaker sequence is #1 → #2 → #7 → #4 . . ., then the angular velocity of sound signal will vary as 0.5◦ /time step → 2.5◦ /time step → 1.5◦ /time step → . . ., and the direction of motion will vary as left→left→right→ . . .. This would cause the synaptic weights to increase when the sound signal moves from left to right and decrease when it moves from right to left. The size of weight update, which corresponds to the angular speed, would vary randomly as well. As a consequence of these effects, the synaptic weights will not converge. This implies that the neural mechanism cannot learn a target angular velocity that is not constant.

We have employed a semi-circular trajectory for the sound signal in all experiments to simplify the problem of motion perception such that there is a 1:1 relationship between the agent's learned angular turning velocity and the target's angular velocity. The problem of motion perception is essentially the same in the case of a target moving along linear trajectory with a constant velocity. This is because the temporal relationship between the perceived position of a target sound signal before turning and after turning depends only on the signal's velocity and not on the shape of the trajectory. Therefore, a robotic agent using the proposed neural mechanism can still learn an angular turning velocity that corresponds to the target linear velocity. In the case of more complicated target trajectories comprising both linear and angular components, the neural mechanism may only learn the average velocity over the entire trajectory.

The neural mechanism is furthermore not limited to a specific sound frequency as its functionality is independent of the sound frequency. For a different sound frequency the peripheral auditory model generates a different difference signal that still encodes sound direction. The neural mechanism essentially uses the direction information in terms of the sign of difference signal to drive the synaptic weight updates in the right direction. However, the size of weight updates is dependent on the absolute magnitude of the difference signal. Therefore, for a different sound frequency but keeping all other neural parameters unchanged, the number of iterations taken for the synaptic weights to converge will be different.

### 5. CONCLUSIONS AND FUTURE DIRECTIONS

We present an adaptive neural learning mechanism, derived from ICO learning, that employs a synaptic weight update rule adapted from differential Hebbian learning. The neural mechanism was able to successfully learn the constant and unknown angular velocity of a continuous unoccluded pure tone virtual sound signal moving along a semi-circular trajectory in simulation. We also investigated the performance of the neural mechanism in the presence of sparsity in acoustic stimulus. We used three different types of acoustic stimuli each having a sound frequency of 2.2 kHz–continuous unoccluded sound, periodically occluded sound with a 60% sound emission duty cycle and randomly occluded sound with random sound emission duty cycle chosen from a uniform distribution within the range 10–90%.

We first implemented an instance of the neural mechanism in simulation. The neural mechanism was able to learn the angular velocity of the continuous unoccluded sound signal in simulation for three different target angular velocities–1.5◦ /time step, 1.0◦ /time step and 0.5◦ /time step. We validated the acoustic tracking performance of the neural mechanism after learning via robotic trials in tracking a virtually-moving continuous unoccluded sound signal with angular velocity of 1.5◦ /time step.

We then investigated whether a second instance of the neural mechanism could learn the angular velocity of an target sound signal that was periodically occluded with a 60% sound emission duty cycle. The sound signal moved with a constant and unknown angular velocity along a semi-circular trajectory as before. The neural mechanism was able to learn the angular velocity of the periodically occluded sound signal in simulation for three different target angular velocities– 1.5◦ /time step, 1.0◦ /time step and 0.5◦ /time step. We validated the acoustic tracking performance of the neural mechanism after learning via robotic trials in tracking a virtually-moving but periodically occluded sound signal with angular velocity of 1.5◦ /time step.

Finally we investigated whether a third instance of the neural mechanism could learn the angular velocity of a target sound signal that was randomly occluded with a randomly varying duty cycle uniformly distributed within the range 10–90%. Once again, the sound signal moved with a constant and unknown angular velocity along a semi-circular trajectory. The neural mechanism was able to learn the angular velocity of the randomly occluded sound signal in simulation for three different target angular velocities–1.5◦ /time step, 1.0◦ /time step and 0.5◦ /time step. We validated the acoustic tracking performance of the neural mechanism after learning via robotic trials in tracking a virtually-moving but randomly occluded sound signal with angular velocity of 1.5◦ /time step.

In all robotic trials the robot was relatively successful in tracking the sound signal in spite of the absence of compensation for possible detrimental effects such as ambient noise, mismatch between the robot's motor characteristics, ground friction and depletion rate of the battery.

The neural mechanism implements a purely reactive closedloop system; the robot only turns after the target sound signal has moved to a new location along its trajectory and it always follows the sound signal. There is an unavoidable positive and finite delay between the target sound signal moving to its new location and the robot completing its turn. In the simulation this time delay is of one time step and in practice with the real robot it is the sum of the time step and the non-deterministic processing time in the sensorimotor loop. Predatory animals that utilise tracking behaviour, to catch prey

#### REFERENCES


for example, tend to be able to predict its future position. Such prediction is clearly more advantageous for the predator to minimise the neural sensorimotor delays (Nijhawan and Wu, 2009; Franklin and Wolpert, 2011) and to maximise its chances of success. Behavioural evidence for predictive tracking mechanisms has been reported in salamanders (Borghuis and Leonardo, 2015) and dragonflies (Dickinson, 2015; Mischiati et al., 2015a,b) that use vision for prey capture. In the auditory domain, the barn owl, Tyto alba is well known for auditory prey capture (Konishi, 1973). Both behavioural (Langemann et al., 2016) and neurophysiological (Witten et al., 2006; Weston and Fischer, 2015) evidence has been reported for auditory motion representation in the barn owl. Lizards such as the Mediterranean house geckos, Hemidactylus tursicus, are known to prey on crickets and have been observed to orient and navigate toward loudspeakers playing male cricket calls (Sakaluk and Belwood, 1984). However, to the best of our knowledge there is no study or evidence reported in the literature of predictive mechanisms involved in lizard acoustic prey capture. The presented neural mechanism may be used to predict the future position of the sound signal by allowing the learning to continue such that the synaptic weights increase beyond those that correspond to the actual angular velocity of the target sound signal. After successful learning the robot would then turn quickly enough to orient toward a future position of the sound signal. Such a mechanism could be considered as an internal forward model (Wolpert et al., 1995) for acoustic motion perception.

#### AUTHOR CONTRIBUTIONS

DS: Scientific problem formulation, implementation, experimentation, and manuscript preparation. PM: Support in scientific discussion and manuscript preparation.

#### SUPPLEMENTARY MATERIAL

The Supplementary Material for this article can be found online at: http://journal.frontiersin.org/article/10.3389/fnbot. 2017.00011/full#supplementary-material

linear array. Appl. Acoust. 71, 134–139. doi: 10.1016/j.apacoust.2009. 07.015


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2017 Shaikh and Manoonpong. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Real-Time Biologically Inspired Action Recognition from Key Poses Using a Neuromorphic Architecture

Georg Layher\*, Tobias Brosch and Heiko Neumann

*Institute of Neural Information Processing, Ulm University, Ulm, Germany*

Intelligent agents, such as robots, have to serve a multitude of autonomous functions. Examples are, e.g., collision avoidance, navigation and route planning, active sensing of its environment, or the interaction and non-verbal communication with people in the extended reach space. Here, we focus on the recognition of the action of a human agent based on a biologically inspired visual architecture of analyzing articulated movements. The proposed processing architecture builds upon coarsely segregated streams of sensory processing along different pathways which separately process form and motion information (Layher et al., 2014). Action recognition is performed in an event-based scheme by identifying representations of characteristic pose configurations (key poses) in an image sequence. In line with perceptual studies, key poses are selected unsupervised utilizing a feature-driven criterion which combines extrema in the motion energy with the horizontal and the vertical extendedness of a body shape. Per class representations of key pose frames are learned using a deep convolutional neural network consisting of 15 convolutional layers. The network is trained using the *energy-efficient deep neuromorphic networks* (*Eedn*) framework (Esser et al., 2016), which realizes the mapping of the trained synaptic weights onto the *IBM Neurosynaptic System* platform (Merolla et al., 2014). After the mapping, the trained network achieves real-time capabilities for processing input streams and classify input images at about 1,000 frames per second while the computational stages only consume about 70 mW of energy (without spike transduction). Particularly regarding mobile robotic systems, a low energy profile might be crucial in a variety of application scenarios. Cross-validation results are reported for two different datasets and compared to state-of-the-art action recognition approaches. The results demonstrate, that (I) the presented approach is on par with other key pose based methods described in the literature, which select key pose frames by optimizing classification accuracy, (II) compared to the training on the full set of frames, representations trained on key pose frames result in a higher confidence in class assignments, and (III) key pose representations show promising generalization capabilities in a cross-dataset evaluation.

Keywords: action recognition, key pose selection, deep learning, neuromorphic architecture, IBM neurosynaptic system

#### Edited by:

*Poramate Manoonpong, University of Southern Denmark Odense, Denmark*

#### Reviewed by:

*Johannes Bill, Heidelberg University, Germany Quansheng Ren, Peking University, China*

> \*Correspondence: *Georg Layher georg.layher@uni-ulm.de*

Received: *25 November 2016* Accepted: *21 February 2017* Published: *22 March 2017*

#### Citation:

*Layher G, Brosch T and Neumann H (2017) Real-Time Biologically Inspired Action Recognition from Key Poses Using a Neuromorphic Architecture. Front. Neurorobot. 11:13. doi: 10.3389/fnbot.2017.00013*

# 1. INTRODUCTION

Analyzing and understanding the actions of humans is one of the major challenges for future technical systems aiming at visual sensory behavior analysis. Acquiring knowledge about what a person is doing is of importance and sometimes even crucial in a variety of scenarios. In the context of automated surveillance systems, action analysis is an essential ability, allowing to identify potential threads emanating from an individual or a group of persons. In Human-Computer-Interaction (HCI), action analysis helps in understanding the objectives and intentions of a user and increases the potential of a system to adapt to the specific context of an interaction and appropriately support, guide or protect the user. Moreover, recognizing actions in the surrounding area is an integral part of interpreting the own situative context and environment, and thus is in particular crucial for mobile robotic systems which may find themselves embedded in a variety of different situations.

In the presented work, as the first main contribution, a feature-driven key pose selection method is proposed, which is driven by combining two features in the biological motion input, namely extrema in the temporal motion energy signal and the relative extent of a subject's pose. Such temporally defined features (from the motion stream) help to automatically select key pose representations. The use of these dynamic features has been motivated by psychophysical investigations (Thirkettle et al., 2009) which demonstrate that humans select specific poses in a continuous sequence of video input based on such criteria. We first show how such key poses define events within articulated motion sequences and how these can be reliably and automatically detected. The proposed processing architecture builds upon coarsely segregated streams of sensory processing along different pathways which separately process form and motion information (Giese and Poggio, 2003). An interaction between the two processing streams enables an automatic selection of characteristic poses during learning (Layher et al., 2014). To use such recognition functionality in an autonomous neurobiologically inspired recognition system various constraints need to be satisfied. Such neurobiological systems need to implement the underlying processes along the processing and recognition cascade which defines the parts of their cognitive functionality.

As the second key contribution, we employ here an energy efficient deep convolutional neural network (Eedn; Esser et al., 2016) to realize the key pose learning and classification, which achieves a computationally efficient solution using a sparse and energy efficient implementation based on neuromorphic hardware. This allows us to establish a cascaded hierarchy of representations with an increasing complexity for key pose form and motion patterns. After their establishment, key pose representations allow an assignment of a given input image to a specific action category. We use an offline training scheme that utilizes a deep convolutional neural network with 15 convolutional layers. The trained network runs on IBM's TrueNorth chip (Merolla et al., 2014; Akopyan et al., 2015). This solution renders it possible to approach faster than real-time capabilities for processing input streams and classify articulated still images at about 1, 000 frames per second while the computational stages consume only about 70 mW of energy. We present cross-validation results on an action recognition dataset consisting of 14 actions and 22 subjects and about 29, 000 key pose frames, which show a recall rate for the presented approach of about 88%, as well as a comparison to state-of-theart action recognition approaches on a second dataset. To show the generalization capabilities of the proposed key pose based approach, we additionally present the results of a cross-dataset evaluation, where the training and the testing of the network was performed on two completely separate datasets with overlapping classes.

# 2. RELATED WORK

The proposed key pose based action recognition approach is motivated and inspired by recent evidences about the learning mechanisms and representations involved in the processing of articulated motion sequences, as well as hardware and software developments from various fields of visual sciences. For instance, empirical studies indicate, that special kinds of events within a motion sequence facilitate the recognition of an action. Additional evidences from psychophysics, as well as neurophysiology suggest that both, form and motion information contribute to the representation of an action. Modeling efforts propose functional mechanisms for the processing of biological motion and show how such processing principles can be transfered to technical domains. Deep convolutional networks make it possible to learn hierarchical object representations, which show an impressive recognition performance and enable the implementation of fast and energy efficient classification architectures, particularly in combination with neuromorphic hardware platforms. In the following sections, we will briefly introduce related work and results from different scientific fields, all contributing to a better understanding of action representation and the development of efficient action recognition approaches.

# 2.1. Articulated and Biological Motion

Starting with the pioneering work of Johansson (1973), perceptual sciences gained more and more insights about how biological motion might be represented in the human brain and what the characteristic properties of an articulated motion sequence are. In psychophysical experiments, humans show a remarkable performance in recognizing biological motions, even when the presented motion is reduced to a set of points moving coherently with body joints (point light stimuli; PLS). In a detection task, subjects were capable of recognizing a walking motion within about 200 ms (Johansson, 1976). These stimuli, however, are not free of – at least configurational – form information and the discussion about the contributions of form and motion in biological motion representation is still ongoing (Garcia and Grossman, 2008). Some studies indicate a stronger importance of motion cues (Mather and Murdoch, 1994), others emphasize the role of configurational

form information (Lange and Lappe, 2006). Even less is known about the specific nature and characteristic of the visual cues which facilitate the recognition of a biological motion sequence. In Casile and Giese (2005), a statistical analysis as well as the results of psychophysical experiments indicate that local opponent motion in horizontal direction is one of the critical features for the recognition of PLS. Thurman and Grossman (2008) conclude, that there are specific moments in an action performance which are "more perceptually salient" compared to others. Their results emphasize the importance of dynamic cues in moments when the distance between opposing limbs is the lowest (corresponding to local opponent motion; maxima in the motion energy). On the contrary, more recent findings by Thirkettle et al. (2009) indicate, that moments of a large horizontal body extension (co-occurring with minima in the motion energy) facilitate the recognition of a biological motion in a PLS.

In neurophysiology, functional imaging studies (Grossman et al., 2000), as well as single-cell recordings (Oram and Perrett, 1994) indicate the existence of specialized mechanisms for the processing of biological motion in the superior temporal sulcus (STS). STS has been suggested to be a point of convergence of the separate dorsal "where" and the ventral "what" pathways (Boussaoud et al., 1990; Felleman and Van Essen, 1991), containing cells which integrate form and motion information of biological objects (Oram and Perrett, 1996) and selectively respond to, e.g., object manipulation, face, limb and whole body motion (Puce and Perrett, 2003). Besides the evidence that both form and motion information contribute to the registration of biological motion, action specific cells in STS are reported to respond to static images of articulated bodies which in parallel evoke activities in the medio temporal (MT) and medial superior temporal (MST) areas of the dorsal stream (implied motion), although there is no motion present in the input signal (Kourtzi and Kanwisher, 2000; Jellema and Perrett, 2003). In line with the psychophysical studies, these results indicate that poses with a specific feature characteristic (here, articulation) facilitate the recognition of a human motion sequence.

Complementary modeling efforts in the field of computational neuroscience suggest potential mechanisms which might explain the underlying neural processing and learning principles. In Giese and Poggio (2003) a model for the recognition of biological movements is proposed, which processes visual input along two separate form and motion pathways and temporally integrates the responses of prototypical motion and form patterns (snapshots) cells via asymmetric connections in both pathways. Layher et al. (2014) extended this model by incorporating an interaction between the two pathways, realizing the automatic and unsupervised learning of key poses by modulating the learning of the form prototypes using a motion energy based signal derived in the motion pathway. In addition, a feedback mechanism is proposed in this extended model architecture which (I) realizes sequence selectivity by temporal association learning and (II) gives a potential explanation for the activities in MT/MST observed for static images of articulated poses in neurophysiological studies.

# 2.2. Action Recognition in Image Sequences

In computer vision, the term vision-based action recognition summarizes approaches to assign an action label to each frame or a collection of frames of an image sequence. Over the last decades, numerous vision-based action recognition approaches have been developed and different taxonomies have been proposed to classify them by different aspects of their processing principles. In Poppe (2010), action recognition methods are separated by the nature of the image representation they rely on, as well as the kind of the employed classification scheme. Image representations are divided into global representations, which use a holistic representation of the body in the region of interest (ROI; most often the bounding box around a body silhouette in the image space), and local representations, which describe image and motion characteristics in a spatial or spatio-temporal local neighborhood. Prominent examples for the use of whole body representations are motion history images (MHI) (Bobick and Davis, 2001), or the application of histograms of oriented gradients (HOG) (Dalal and Triggs, 2005; Thurau and Hlavác, 2008). Local representations are, e.g., employed in Dollar et al. (2005), where motion and form based descriptors are derived in the local neighborhood (cuboids) of spatio-temporal interest points. Classification approaches are separated into direct classification, which disregard temporal relationships (e.g., using histograms of prototype descriptors, Dollar et al., 2005) and temporal state-space models, which explicitly model temporal transitions between observations (e.g., by employing Hidden Markov models (HMMs) Yamato et al., 1992, or dynamic time warping (DTW) Chaaraoui et al., 2013). For further taxonomies and an exhaustive overview of computer vision action recognition approaches we refer to the excellent reviews in Gavrila (1999); Aggarwal and Ryoo (2011); Weinland et al. (2011).

The proposed approach uses motion and form based feature properties to extract key pose frames. The identified key pose frames are used to learn class specific key pose representations using a deep convolutional neural network (DCNN). Classification is either performed framewise or by temporal integration through majority voting. Thus, following the taxonomy of Poppe (2010), the approach can be classified as using global representations together with a direct classification scheme. Key pose frames are considered as temporal events within an action sequence. This kind of action representation and classification is inherently invariant against variations in (recording and execution) speed. We do not argue that modeling temporal relationships between such events is not necessary in general. The very simple temporal integration scheme was chosen to focus on an analysis of the importance of key poses in the context of action representation and recognition. Because of the relevance to the presented approach, we will briefly compare specifically key pose base action recognition approaches in the following.

#### 2.3. Key Pose Based Action Recognition

Key pose based action recognition approaches differ in their understanding of the concept of key poses. Some take a phenomenological perspective and define key poses as events which possess a specific feature characteristic giving rise to their peculiarity. There is no a priori knowledge available about whether, when and how often such feature-driven events occur within an observed action sequence, neither during the establishment of the key pose representations during training, nor while trying to recognize an action sequence. Others regard key pose selection as the result of a statistical analysis, favoring poses which are easy to separate among different classes or maximally capture the characteristics of an action sequence. The majority of approaches rely on such statistical properties and either consider the intra- or the inter-class distribution of image-based pose descriptors to identify key poses in action sequences.

#### Intra-Class Based Approaches

Approaches which evaluate intra-class properties of the feature distributions regard key poses as the most representative poses of an action and measures of centrality are exploited on agglomerations in pose feature spaces to identify the poses which are most common to an action sequence. In Chaaraoui et al. (2013), a contour based descriptor following (Dedeog˘lu et al., 2006) is used. Key poses are selected by repetitive k-means clustering of the pose descriptors and evaluating the resulting clusters using a compactness metric. A sequence of nearest neighbor key poses is derived for each test sequence and dynamic time warping (DTW) is applied to account for different temporal scales. The class of the closest matching temporal sequence of key poses from the training set is used as the final recognition result. Based on histograms of oriented gradients (HOG) and histograms of weighted optical flow (HOWOF) descriptors, Cao et al. (2012) adapt a local linear embedding (LLE) strategy to establish a manifold model which reduces descriptor dimensionality, while preserving the local relationship between the descriptors. Key poses are identified by interpreting the data points (i.e., descriptors/poses) on the manifold as an adjacent graph and applying a PageRank (Brin and Page, 1998) based procedure to determine the vertices of the graph with the highest centrality, or relevance.

In all, key pose selection based on an intra-class analysis of the feature distribution has the advantage of capturing the characteristics of one action in isolation, independent of other classes in a dataset. Thus, key poses are not dataset specific and – in principle – can also be shared among different actions. However, most intra-class distribution based approaches build upon measures of centrality (i.e., as a part of cluster algorithms) and thus key poses are dominated by frequent poses of an action. Because they are part of transitions between others, frequent poses tend to occur in different classes and thus do not help in separating them. Infrequent poses, on the other hand, are not captured very well, but are intuitively more likely to be discriminative. The authors' are not aware of an intra-class distribution based method which tries to identify key poses based on their infrequency or abnormality (e.g., by evaluating cluster sizes and distances).

#### Inter-Class Based Approaches

Approaches based on inter-class distribution, on the other hand, consider highly discriminative poses as key poses to separate different action appearances. Discriminability is here defined as resulting in either the best classification performance or in maximum dissimilarities between the extracted pose descriptors of different classes. To maximize the classification performance, Weinland and Boyer (2008) propose a method of identifying a vocabulary of highly discriminative pose exemplars. In each iteration of the forward selection of key poses, one exemplar at a time is added to the set of key poses by independently evaluating the classification performance of the currently selected set of poses in union with one of the remaining exemplars in the training set. The pose exemplar, which increases classification performance the most is then added to the final key pose set. The procedure is repeated until a predefined number of key poses is reached. Classification is performed based on a distance metric obtained by either silhouette-to-silhouette or silhouetteto-edge matching. Liu et al. (2013) combine the output of the early stages of an HMAX inspired processing architecture (Riesenhuber and Poggio, 1999) with a center-surround feature map obtained by subtracting several layers of a Gaussian pyramid and a wavelet laplacian pyramid feature map into framewise pose descriptors. The linearized feature descriptors are projected into a low-dimensional subspace derived by principal component analysis (PCA). Key poses are selected by employing an adaptive boosting technique (AdaBoost; Freund and Schapire, 1995) to select the most discriminative feature descriptors (i.e., poses). A test action sequence is matched to the thus reduced number of exemplars per action by applying an adapted local naive Bayes nearest neighbor classification scheme (LNBNN; McCann and Lowe, 2012). Each descriptor of a test sequence is assigned to its k nearest neighbors and a classwise voting is updated by the distance of a descriptor to the respective neighbor weighted by the relative number of classes per descriptor. In Baysal et al. (2010), noise reduced edges of an image are chained into a contour segmented network (CSN) by using orientation and closeness properties and transformed into a 2 adjacent segment descriptor (k-AS; Ferrari et al., 2008). The most characteristic descriptors are determined by identifying k candidate key poses per class using the k-medoids clustering algorithm and selecting the most distinctive ones among the set of all classes using a similarity measure on the 2-AS descriptors. Classification is performed by assigning each frame to the class of the key pose with the highest similarity and sequence-wide majority voting. Cheema et al. (2011) follow the same key pose extraction scheme, but instead of selecting only the most distinctive ones, key pose candidates are weighted by the number of false and correct assignments to an action class. A weighted voting scheme is then used to classify a given test sequence. Thus, although key poses with large weights have an increased influence on the final class assignment, all key poses take part in the classification process. Zhao and Elgammal (2008) use an information theoretic approach to select key frames within action sequences. They propose to describe the local neighborhood of spatiotemporal interest points using an intensity gradient based descriptor (Dollar et al., 2005). The extracted descriptors are then clustered, resulting in a codebook of prototypical descriptors (visual words). The pose prototypes are used to estimate the discriminatory power of a frame by calculating a measure based on the conditional entropy given the visual words detected in a frame. The frames with the highest discriminatory power are marked as key frames. Chi-square distances of histogram based spatiotemporal representations are used to compare key frames from the test and training datasets and majority voting is used to assign an action class to a test sequence.

For a given pose descriptor and/or classification architecture, inter-class based key pose selection methods in principle minimize the recognition error, either for the recognition of the key poses (e.g., Baysal et al., 2010; Liu et al., 2013) or for the action classification (e.g., Weinland and Boyer, 2008). But, on the other hand, key poses obtained by inter-class analysis inherently do not cover the most characteristic poses of an action, but the ones which are the most distinctive within a specific set of actions. Applying this class of algorithms on two different sets of actions sharing one common action might result in a different selection of key poses for the same action. Thus, once extracted, key pose representations do not necessarily generalize over different datasets/domains and, in addition, sharing of key poses between different classes is not intended.

#### Feature-Driven Approaches

Feature-driven key pose selection methods do not rely on the distribution of features or descriptors at all and define a key pose as a pose which co-occurs with a specific characteristic of an image or feature. Commonly employed features, such as extrema in a motion energy based signal, are often correlated with pose properties such as the degree of articulation or the extendedness. Compared to statistical methods, this is a more pose centered perspective, since parameters of the pose itself are used to select a key pose instead of parameters describing the relationship or differences between poses.

Lv and Nevatia (2007) select key poses in sequences of 3D-joint positions by automatically locating extrema of the motion energy within temporal windows. Motion energy in their approach is determined by calculating the sum over the L<sup>2</sup> norm of the motion vectors of the joints between two temporally adjacent timesteps. 3D motion capturing data is used to render 2D projections of the key poses from different view angles. Single frames of an action sequence are matched to the silhouettes of the resulting 2D key pose representations using an extension of the Pyramid Match Kernel algorithm (PMK; Grauman and Darrell, 2005). Transitions between key poses are modeled using action graph models. Given an action sequence, the most likely action model is determined using the Viterbi Algorithm. In Gong et al. (2010), a key pose selection mechanism for 3D human action representations is proposed. Per action sequence, feature vectors (three angles for twelve joints) are projected onto the subspace spanned by the first three eigenvectors obtained by PCA. Several instances of an action are synchronized to derive the mean performance (in terms of execution) of an action. Motion energy is then defined by calculating the Euclidean distance between two adjacent poses in the mean performance. The local extrema of the motion energy are used to select the key poses, which after their reconstruction in the original space are used as the vocabulary in a bag of words approach. During recognition, each pose within a sequence is assigned to the key pose with the minimum Euclidean distance resulting in a histogram of key pose occurrences per sequence. These histograms serve as input to a support vector machine (SVM) classifier. In Ogale et al. (2007), candidate key poses are extracted by localizing the extrema of the mean motion magnitude in the estimated optical flow. Redundant poses are sorted out pairwise by considering the ratio between the intersection and the union of two registered silhouettes. The final set of unique key poses is used to construct a probabilistic context-free grammar (PCFG). This method uses an inter-class metric to reject preselected key pose candidates and thus is not purely feature-driven.

Feature-driven key pose selection methods are independent of the number of different actions within a dataset. Thus, retraining is not necessary if, e.g., a new action is added to a dataset and the sharing of key poses among different actions is in principle possible. Naturally, there is no guarantee, that the selected poses maximize the separability of pose or action classes.

# 3. MODEL/METHODS

To realize an energy efficient implementation for key pose based action recognition, the proposed model uses a neuromorphic deep convolutional neural network (DCNN) to selectively learn representations of key poses which are assigned to different action classes. In the preprocessing phase, optical flow is calculated on the input sequences and key pose frames are selected in an unsupervised manner. Form and motion information is calculated for each key pose frame. The concatenated form and motion information is then used as the input to the DCNN. In the following, detailed information about the image preprocessing, the key pose selection automatism and the structure and functionality of the DCNN are presented. All simulations were carried out using a neuromorphic computing paradigm and mapped to the IBM TrueNorth hardware platform (Merolla et al., 2014).

# 3.1. Key Pose Selection and Image Preprocessing

During preprocessing, two elementary processing steps are performed. First, the key pose selection is performed by automatically analyzing simple motion and form parameters. Second, the final input to the network is calculated by combining the form and motion representations I form and I motion obtained by simple image-based operations.

#### Key Pose Selection

The key pose selection process operates upon two parameters, namely (I) local temporal extrema in the motion energy and (II) the extendedness of a subject at a given timestep. Optical flow is calculated using a differential method, as suggested in the Lucas-Kanade optical flow estimation algorithm (Lucas and Kanade, 1981). Given an image sequence I(**x**, t), the optical flow **u**(**x**, t) = (u(**x**, t), v(**x**, t)) at timestep t and position **x** = (x, y) is estimated in a local neighborhood N(**x**) by minimizing

$$\sum\_{\mathbf{y}\in N(\mathbf{x})} W(\mathbf{x} - \mathbf{y})^2 [I\_x(\mathbf{y}, t)\,\mu(\mathbf{x}, t) + I\_\mathbf{y}(\mathbf{x}, t)\,\nu(\mathbf{x}, t) + I\_l(\mathbf{y}, t)]^2,\tag{1}$$

where W(**x** − **y**) increases the influence of the optical flow constraints within the center of the local neighborhood (for details see Barron et al., 1994). The spatiotemporal derivatives Ix,I<sup>y</sup> and I<sup>t</sup> are estimated by convolution of the image sequences with the forth-order central difference [−1, 8, 0, −8, 1]/12 and it's transpose in the spatial and the first-order backward difference [−1, 1] in the temporal domain. A separable 2D kernel with 1D coefficients of [1, 4, 6, 4, 1]/16 is used to realize the weighted integration of the derivatives within a 5 × 5 spatial neighborhood (N(**x**))<sup>1</sup> . The use of the Lucas-Kanade algorithm is not a hard prerequisite for the proposed approach. Other types of optical flow estimators might be applied as well (e.g., (Brosch and Neumann, 2016), which is capable to be executed on neuromorphic hardware). The overall motion energy E flo is then calculated by integrating the speed of all estimated flow vectors within the vector field.

$$E^{\text{dlo}}(t) = \sum\_{\mathbf{x} \in I(\mathbf{x}, t)} \|\mathbf{u}(\mathbf{x}, t)\|\_{2} = \sum\_{\mathbf{x} \in I(\mathbf{x}, t)} \sqrt{u(\mathbf{x}, t)^2 + v(\mathbf{x}, t)^2}, \tag{2}$$

Motion energy is smoothed by convolving the estimated motion energy with a Gaussian kernel, E˜ flo(t) = (E flo ∗ G<sup>σ</sup> )(t). In the performed simulations, σ = 2 and σ = 4 were used dependent on the dataset<sup>2</sup> . Potential key pose frames are then marked by identifying the local extrema of the motion energy signal.

$$\mathcal{K}^{\text{flo}} = \{ I(t), t \in [1, \dots, T] | \text{t is a local extremum of } \triangle^{\text{flo}}(t) \}, \quad \text{(3)}$$

The relative horizontal and vertical extent of a given pose at time t is then used to reject local extrema with an extent smaller than a predefined percentual threshold λ, as defined by:

$$
\mathbb{X} = \mathbb{X}^{\text{flo}} \cap \mathbb{X}^{\text{ext}}.\tag{4}
$$

with

$$\begin{aligned} \mathsf{X}^{\text{ext}} &= \{ I(t), t \in [1, \ldots, T] \mid \text{(Ext}^{\text{ver}}(t) > (1 + \lambda) \overline{\text{Ext}}^{\text{ver}}) \} \\ &\quad \vee \text{(Ext}^{\text{ver}}(t) < (1 - \lambda) \overline{\text{Ext}}^{\text{ver}}) \end{aligned} \tag{5}$$
 
$$\begin{aligned} \vee \text{(Ext}^{\text{hor}}(t) &> (1 + \lambda) \overline{\text{Ext}}^{\text{hor}}) \\ &\quad \vee \text{(Ext}^{\text{hor}}(t) < (1 - \lambda) \overline{\text{Ext}}^{\text{hor}}) \end{aligned}$$

In the performed simulations, values of λ = 0.1 and λ = 0.05 were used for the two different datasets. The percentual thresholds were determined manually with the aim to compensate for differences in the temporal resolution of the datasets. The horizontal and vertical extent Exthor and Extver are derived framewise by estimating the width and the height of the bounding box enclosing the body shape. The extent of a neutral pose is used as the reference extent Exthor and Extver , which are derived from the width and height of the bounding box in the first frame of a sequence. Silhouette representations, and thus the bounding boxes of the bodies, are available for both datasets used in the simulations. In constrained recording scenarios, silhouettes can be extracted by background subtraction or using the optical flow fields calculated for the selection of the key pose frames. **Figure 1A** shows the motion energy signal E˜ flo together with the extent Exthor and Extver and their reference values. A strong correlation between the motion energy and the extent of the pose can be seen. In **Figure 1B**, examples for the horizontal and the vertical extent are displayed for a neutral and a extended posture. While the motion energy allows an identification of temporal anchor points in a motion sequence, the extent helps in selecting the most characteristic ones.

#### Form and Motion Representations

For each selected key pose frame I key ∈ K, a form representation is derived by estimating the spatial derivatives I key <sup>x</sup> and I key y and combining them into one contour representation I con by concatenating the orientation selective maps (see **Figure 2**, second row). The final form representation is then obtained by applying a logarithmic transformation emphasizing low range values and normalizing the response amplitudes, using the transformation:

$$I\_{\log}^{\rm con} = \log(1 + 5|I^{\rm con}|) \tag{6}$$

$$I^{\text{form}} = \frac{I\_{\text{log}}^{\text{con}}}{\max(I\_{\text{log}}^{\text{con}})} \tag{7}$$

Likewise, for each key pose frame I key, optical flow is separated into vertical and horizontal components and concatenated (see **Figure 2**, first row). The resulting motion representation I flo is log-transformed and normalized. As for the contrast mapping, the transformation is given through:

$$I\_{\log}^{\text{flo}} = \log(1 + 5|I^{\text{flo}}|) \tag{8}$$

$$I\_{\text{motion}} = \frac{I\_{\text{log}}^{\text{flo}}}{\max(I\_{\text{log}}^{\text{flo}})} \tag{9}$$

The form representations I form and the motion representations I motion are combined to an overall input representation I input (**Figure 2**, last column). I input is then used as an input for the training of the DCNN described in the following section.

I

#### 3.2. Learning of Class Specific Key Pose Representations

A neuromorphic deep convolutional neural network was used to establish classwise representations of the preselected and suppress wrapping key pose frames using a supervised learning scheme. The network was implemented using the energyefficient deep neuromorphic networks (Eedn) framework (Esser

<sup>1</sup> In the presented simulations, the MATLAB <sup>R</sup> implementation of the Lucas-Kanade flow estimation algorithm was used.

<sup>2</sup>The values of σ were chosen manually to take different temporal resolutions into account.

(*SP2*/jumping jack). At the bottom, body poses for several frames are shown. Local minima in the motion energy are marked by a circle, local maxima by a diamond. Extrema which are rejected as a key pose frame because of an insufficient extent are additionally marked with ×. (B) Shows an example for the horizontal and vertical extent of a neutral and a highly articulated body pose. The first frame of each action sequence is defined to be the neutral pose. (C) Shows the relative number of identified key poses per action sequence for the uulmMAD dataset used for the simulations (see Section 4.1). Written informed consent for the publication of the exemplary images was obtained from the displayed subject.

et al., 2016), which adapts and extends the training and network functions of the MatConvNet toolbox (Vedaldi and Lenc, 2015). In the following for readers' convenience, we will briefly recapitulate and summarize key aspects of the framework and its extensions presented in Esser et al. (2016). In the framework, the weights established through learning match the representation scheme and processing principles used in neuromorphic computing paradigms. The structure of the DCNN follows one of the network parameter sets presented by Esser et al. (2016), which show a close to state-of-the-art classification performance on a variety of image datasets and allow the trained network to be run on a single IBM TrueNorth chip (Merolla et al., 2014).

A deep convolutional neural network is typically organized in a feedforward cascade of layers composed of artificial neurons (LeCun et al., 2010), which process the output of the proceeding layer (afferent synaptic connections) and propagate the result to the subsequent one (efferent synaptic connections). Following the definition in Esser et al. (2016), an artificial cell j in a DCNN calculates a weighted sum over the input to that cell, as defined by:

$$s\_{\circ} = \sum\_{\text{xy}} \sum\_{f} in\_{\text{xy}f} \mathcal{W}\_{\text{xy}f},\tag{10}$$

where inxyf are the signals in the input field of cell j at locations (xy) in the spatial and (f) in the feature domain and, wxyfj the respective synaptic weights. In the following, we will use the linear index i to denote locations in the (xyf) space-feature cube. Normalizing the weighted sum over a set of input samples (batch normalization) allows to accelerate the training of the network by standardizing s<sup>j</sup> as defined through:

$$
\tilde{s}\_{\dot{j}} = \frac{s\_{\dot{j}} - \mu\_{\dot{j}}}{\sigma\_{\dot{j}} + \epsilon} + b\_{\dot{j}},
\tag{11}
$$

with s˜<sup>j</sup> the standardized weighted sum, µ<sup>j</sup> the mean and σ<sup>j</sup> the standard deviation of s calculated over the number of training examples within a batch (Ioffe and Szegedy, 2015). b<sup>j</sup> is a bias term, allowing to shift the activation function φ(•), and ǫ guarantees numerical stability. The output activation of the artificial neuron is calculated by applying an activation function on the standardized filter response:

$$r\_{\mathfrak{j}} = \phi(\tilde{\mathfrak{s}}\_{\mathfrak{j}}).\tag{12}$$

Weight adaptation is performed through gradient descent by applying error backpropagation with momentum (Rumelhart et al., 1986). In the forward phase, an input pattern is propagated through the network until the activations of the cells in the output layer are obtained. In the backward phase, the target values of an input pattern are used to calculate the cross entropy C given the current and the desired response of the output layer cell activations, as defined by:

$$C = -\sum\_{j=1}^{M} \nu\_j \ln(r\_j) = -\sum\_{j=1}^{M} \nu\_j \ln(\phi(\tilde{s}\_j)),\tag{13}$$

with M denoting the number of cells in the output layer. Here, v<sup>j</sup> is the one-hot encoded target value (or teaching signal) of a cell

FIGURE 2 | Input generation. Each frame of an action sequence is transformed into a combined motion and form representation. In the top row, the estimated optical flow is displayed in the second column (direction γ color-encoded) for two consecutive frames (first column). The optical flow field is then separated into horizontal and vertical components (third column) and their absolute value is log transformed (forth column) and normalized (fifth column). Form representations (bottom row) are derived framewise by estimating the horizontal and vertical derivatives *I<sup>x</sup>* and *I<sup>y</sup>* (second column, gradient orientation with polarity β color-encoded). The resulting contrast images are then log-transformed and normalized. The form and motion representations are combined into a single feature map *I* input which is then fed into the convolutional neural network. Image sizes are increased for a better visibility. Written informed consent for the publication of exemplary images was obtained from the shown subject.

j with activation r<sup>j</sup> . A softmax function is employed as activation function in the output layer, as defined through:

$$\phi(\vec{s}\_{\vec{j}}) = \frac{e^{\vec{s}\_{\vec{j}}}}{\sum\_{k=1}^{M} e^{\vec{s}\_{k}}}.\tag{14}$$

The cross entropy error E(t) = C is then propagated backwards through the network and the synaptic weight adaptation is calculated for all cells in the output and hidden layers by applying the chain rule. The strength of weight adaptation 1wij is given through:

$$
\Delta \boldsymbol{\omega}\_{\vec{\boldsymbol{\eta}}}(t) = -\eta \frac{\partial E(t)}{\partial \boldsymbol{\omega}\_{\vec{\boldsymbol{\eta}}}} + \alpha \Delta \boldsymbol{\omega}\_{\vec{\boldsymbol{\eta}}}(t-1) = -\eta \delta\_{\vec{\boldsymbol{\eta}}} \boldsymbol{\dot{m}}\_{\vec{\boldsymbol{\eta}}} + \alpha \Delta \boldsymbol{\omega}\_{\vec{\boldsymbol{\eta}}}(t-1), \tag{15}
$$

$$\begin{aligned} \text{with } \delta\_{\hat{j}} = \begin{cases} (r\_{\hat{j}} - \nu\_{\hat{j}}) & \text{if } j \text{ is a neuron in the output layer} \\ \phi'(\check{s}\_{\hat{j}}) \sum\_{k} \delta\_{k} w\_{jk} & \text{if } j \text{ is a neuron in a hidden layer,} \end{cases} \end{aligned} \tag{16}$$

which includes a momentum term for smoothing instantaneous weight changes. Here, k is the index of cells in the layer succeeding cell j, t describes the current training step, or iteration, and η denotes the learning rate. The momentum factor 0 ≤ α ≤ 1 helps the network to handle local minima and flat plateaus on the error surface. After the backward pass, weights are finally adapted by:

$$
\omega\_{i\bar{j}}(t+1) = \omega\_{i\bar{j}}(t) + \Delta\omega\_{i\bar{j}}(t). \tag{17}
$$

To ensure the compatibility to neuromorphic processing principles, a binary activation function φ(s˜j) is applied in the hidden layers (for details see Section 3.3).

Within a convolutional layer, weights wij of a cell j are shared over multiple input fields, which are arranged as a regular grid in the source layer. The calculation of the weighted sum during the forward, as well as the integration of the error derivative during the backward pass can be formulated as a convolution with the input from the source, or the error signal from the succeeding layer. The weights wij act as the filter (or convolution) kernel, s˜<sup>j</sup> as the filter response and r<sup>j</sup> as the output of an artificial cell. The size and stride of a filter allow to adjust the size and the overlap of the input fields to a filter in the source layer. A small stride results in an increased overlap and thus a large number of output values. The number of features defines how many different filters are employed in a layer. The weight matrices of the cells within a layer can be separated into groups of filters, which define the set of input features from the source layer covered by a filter<sup>3</sup> .

It is a common practice to construct deep neural networks by employing convolutional layers for feature extraction in the lower layers and connect them with (one or more) fully connected layers (equivalent to Multilayer Perceptrons/MLPs) on top for classification purposes. In contrast, the proposed network follows the strategy of global average pooling (gap) proposed in Lin et al. (2013) and applied in Esser et al. (2016). In the final convolutional layer of the network, one feature map is generated for each category of the classification problem. Instead of a full connectivity, the average value of each class-associated feature map is propagated to the output (softmax) layer. Due to their association to classes, the feature maps can directly be interpreted as confidence maps. Following the softmax layer, the cross-entropy error is calculated using one-hot encoded target values v<sup>j</sup> and propagated back through the network (according to Equation 16). Networks using parameter-free global average pooling layers in combination with softmax are less prone to overfitting (compared to MLPs) and increase the robustness to spatial translations (for details see Lin et al., 2013).

<sup>3</sup> In **Figure 3**, the weight matrices in the convolutional layer 5 have a dimensionality of 3 × 3 × 8, since they receive input from 256 feature maps in layer 4 which are separated into 32 groups of filters, each receiving input from 8 feature maps.

FIGURE 3 | Deep convolutional neural network structure. The implemented DCNN follows the structure proposed in Esser et al. (2016) and employs three different convolutional layer types (layers 1–15). *Spatial layers* (SPAT; colored in blue) perform a linear filtering operation by convolution. *Pooling layers* (POOL; colored in red) decrease the spatial dimensionality while increasing the invariance and diminishing the chance of overfitting. *Network-in-network* layers (NIN; colored in green) perform a parametric cross channel integration (Lin et al., 2013). The proposed network consists of a data (or input) layer, 15 convolutional layers and a prediction and softmax layer on top. Each of the cells in the last convolutional layer (layer 15) is associated with one class of the classification problem. In the prediction layer, the average class-associated activations are derived (*global average pooling*/gap) and fed into the softmax layer (i.e., one per class), where the cross-entropy error is calculated and propagated backwards through the network. The parameters used for the convolutional layers of the network are given in the central rows of the table. In the last row, the number of artificial cells per layer is listed. The cell count in the *prediction* and *softmax* layer depends on the number of categories of the classification task (i.e., the number of actions in the dataset).

The employed network consists of 15 convolutional layers, which implement three different types of convolutional operations. Spatial layers (SPAT) perform a standard convolution operation, pooling layers (POOL) reduce the spatial dimensions by applying a convolution with a large stride (Springenberg et al., 2014), network-in-network layers (NIN) are realized by convolutional layers with a size of 1x1 and a stride of 1 and act as cross channel integration layers (Lin et al., 2013). The network structure is summarized in **Figure 3**. Each of the cells in the last convolutional layer (layer 15) is assigned to one class. During learning, activities in this layer are averaged per feature map and fed into the softmax layer. For recognition, the average output of the cell populations associated to the individual classes are used as prediction values and serve as the final output r class c of the network (prediction layer in **Figure 3**).

#### 3.3. Neuromorphic Implementation

Processing actual spikes in hardware, the execution of a DCNN on a neuromorphic platform poses several constraints on the activity and weight representation schemes. Since the processing architecture of the TrueNorth neuromorphic platform is based on event-based representations, the gradual activations need to be mapped onto a spike-based mechanism. To be in conformity with these processing principles, Esser et al. (2016) employ a binary activation function, as defined by:

$$\phi(\tilde{s}\_j) = \begin{cases} 1 & \text{if } \tilde{s}\_j \ge 0 \\ 0 & \text{otherwise,} \end{cases} \tag{18}$$

and ternary synaptic weights (wxyf ∈ {−1, 0, 1}). For the backpropagation of the error signal, the derivative of the binary activation is approximated linearly in the range of [0, 1], as given through:

$$\frac{\partial \phi(\tilde{s}\_{\hat{j}})}{\partial \tilde{s}\_{\hat{j}}} \approx \max(0, 1 - \left| \tilde{s}\_{\hat{j}} \right|). \tag{19}$$

During training, a copy of the model weights is held in a shadow network, which allows gradual weight adaptation. Weight updates are performed on values in the shadow network using high precision values. For the forward and backward pass, the hidden weights w h ij in the shadow network are clipped to [−1, 1] and mapped to the ternary values using rounding and hysteresis, following:

$$\boldsymbol{w}\_{ij}(t) = \begin{cases} -1 & \text{if } \boldsymbol{w}\_{ij}^h(t) \le -0.5 - h \\ 0 & \text{if } \boldsymbol{w}\_{ij}^h(t) \ge -0.5 + h \land \boldsymbol{w}\_{ij}^h(t) \le 0.5 - h \\ 1 & \text{if } \boldsymbol{w}\_{ij}^h(t) \ge 0.5 + h \\ \boldsymbol{w}\_{ij}(t-1) & \text{otherwise} \end{cases} \tag{20}$$

(for details refer to Esser et al., 2016). The hidden weights w h ij allow the synaptic connection strengths to switch between the ternary values based on small changes in the error gradients obtained during backpropagation, while the hysteresis factor h prevents them from oscillating. The parameters for the training of the network were chosen according to Esser et al. (2016), using a momentum factor of α = 0.9 and a learning rate of η = 20 (reduced by a factor of 0.1 after 2/3 and 5/6 of the total training iterations). The hysteresis factor h was set to 0.1. The mapping of the training network on the TrueNorth platform was performed by the Eedn framework. Training was carried out on Nvidia GPUs, testing was performed on the IBM TrueNorth NS1e board.

The IBM TrueNorth chip consists of 4, 096 interconnected neurosynaptic cores with 1 million spiking neurons and 256 million configurable synaptic connections. For the execution of the network on the TrueNorth chip, the trained network parameters are mapped to hardware using an abstraction of a TrueNorth program called Corelet (Amir et al., 2013). The platform independent Corelets translate the network parameters into a TrueNorth specific configuration, which can be used to program the parameters of the neurons and synaptic connection strengths on the chip. For details on Corelets and the mapping of the DCNN on neuromorphic hardware platforms refer to Amir et al. (2013); Esser et al. (2016).

### 3.4. Temporal Integration of Framewise Class Predictions

After the training of the DCNN, classification is either performed framewise by directly selecting the class corresponding to the cell population in layer 15 with the maximum average activation, or by integrating the individual framewise classification results using majority voting in temporal windows or over the full sequence.

For framewise classification, a key pose frame is identified in an input image sequence I(**x**, t) and preprocessed as described in Section 3.1. The resulting input map I input is fed into the DCNN and the class label c associated to the cell population in layer 15 with the maximum average output r class c defines the class prediction for I input. The value of r class c can directly be interpreted as the confidence in the prediction.

In sliding window based classification, the predicted class labels for key pose frames are collected within temporal windows of size n [frames], which are shifted over the input sequence I(**x**, t). The class with the most frequent occurrence of key pose frames determines the class predicted for the window (majority voting). At the moment, we do not use the confidence r class c of the predictions as weights for the voting. Note that it is not guaranteed, that key pose frames occur in all temporal windows. Windows which do not contain key poses are not used for evaluation.

Full sequence classification follows the same principle as sliding window based classification, but collects all key pose frames within a sequence. Thus, the amount of temporal information integrated in the voting process might differ substantially from sequence to sequence.

# 4. DATASETS

The proposed action recognition approach was evaluated using two different action datasets. Due to the higher number of subjects and actions, we focused our analysis on the uulm multiperspective action dataset (uulmMAD). In addition, we analyzed the performance on the widely used Weizmann dataset to allow a comparison to other approaches and to perform a cross-dataset evaluation of overlapping classes. In the following, we will briefly describe the main characteristics of the two datasets.

#### 4.1. uulmMAD

The uulm multiperspective action dataset<sup>4</sup> (uumlMAD; Glodek et al., 2014) consists of data from 31 subjects performing actions from the areas of everyday life (ED), sport/fitness (SP) and stretching (ST). Eight of the actions are repeated three times, six actions are performed four times with varying speed. Altogether, each action is performed either 93 or 124 times. Actions were recorded in front of a greenscreen using three synchronized cameras and the body posture was captured in parallel by an inertial motion capturing system worn by the subjects. To decrease the likelihood of similar visual appearances, the motion capture suit was covered by additional clothes whenever possible. **Figure 4** shows the 14 actions together with a characteristic picture, an abbreviation and a short description for each action.

<sup>4</sup>Available via https://www.uni-ulm.de/imagedb.

FIGURE 4 | uulmMAD – uulm multiperspective action dataset. The uulmMAD dataset contains 14 actions in the area of everyday activities, fitness/sports and stretching performed by 31 subjects. Per subject, eight of the actions are repeated three times, six actions are performed four times with varying speed. Actions were recorded by three synchronized cameras (frontal, diagonal and lateral) with a frame rate of 30 Hz and an inertial motion capturing system with a sample rate of 120 Hz. Silhouettes were extracted using chromakeying. At the time we carried out the simulations, silhouettes were available for 22 subjects. In the first row exemplary pictures are shown for all actions. The number of videos (green) and total sum of frames (blue) which were available for the evaluation are displayed in the second row. At the bottom, an abbreviation for each action is defined and a short description is given. Written informed consent for the publication of exemplary images was obtained from the displayed subjects.

At the time we carried out the simulations, silhouette representations were available for all sequences of 22 subjects. Since the silhouettes are used to calculate an estimate of the horizontal and vertical extent of a pose, only the frontal recordings of this subset of subjects were used within the evaluation. Some action pairs (e.g., ED2 and ST4) in the dataset are deliberately intended to appear visually similar and thus be difficult to separate. In total, the sequences used for evaluation contain 381, 194 frames, of which 28, 902 are selected by the key pose selection procedure.

## 4.2. Weizmann Dataset

To allow a comparison with different action recognition approaches, simulations were additionally carried out using a well established action dataset. The Weizmann dataset<sup>5</sup> (see **Figure 5**; Gorelick et al., 2007) consists of ten actions performed by nine subjects. Actions are mostly performed once per subject, although some actions are occasionally performed twice. Actions are captured in 25 Hz from a frontoparallel perspective in front of a uniform background.

Silhouettes are available for all subjects and sequences. In total, the sequences contain 5, 594 frames, 1, 873 of which are identified as key pose frames by using the procedure described in Section 3.1.

# 5. RESULTS

Several simulations were carried out to evaluate the performance of the proposed key pose based action recognition approach. The simulations were intended to address questions related to (I) the overall performance of the approach on different datasets using a framewise, as well as windowed and full sequence majority voting recognition schemes, (II) a comparison to other action recognition methods, (III) a juxtaposition of key pose based and full sequence learning, and (IV) cross-dataset evaluation. Since action recognition datasets—in particular, in case of framewise recognition—are often highly imbalanced, we provide different types of performance measures, as well

<sup>5</sup>Available via http://www.wisdom.weizmann.ac.il/~vision/SpaceTimeActions. html.

TABLE 1 | Performance measures.


as classwise performance values for the most essential results. Since the nomenclature and definition of performance measures vary largely in the pattern recognition and machine learning community we will briefly define and describe the reported measures to allow a better comparability. For a comprehensive discussion on performance measures, we refer to Sokolova and Lapalme (2009) and the contributions of D. Powers, e.g. (Powers, 2013).

In a multiclass classification problem with N classes tp<sup>i</sup> (true positives) are commonly defined as the number of correct acceptances (hits) for a class C<sup>i</sup> (i ∈ [1, ..., N]), fn<sup>i</sup> as the number of false rejections (misses), tn<sup>i</sup> as the number of correct rejections of samples of different classes Cj6=<sup>i</sup> and fn<sup>i</sup> (false negatives) as the number of false acceptances (false alarms). Together, these four counts constitute the confusion matrix and allow to derive a variety of measures describing the performance of a trained classification system. The ones used for the evaluation of the presented results are listed alongside with an abbreviation and their definition in **Table 1**.

All multiclass performance measures are calculated using macro averaging (M), since using micro averaging, classes with a large number of examples would dominate the averaging. RecM, often referred to as (average) recognition rate or somewhat misleading as (classification) accuracy, might be the performance measurement most frequently used in the action recognition literature and describes the average percentage of correctly identified positive examples per class. Inf<sup>M</sup> reflects how informed the decision of a classifier is in comparison to chance, whereas Mark<sup>M</sup> follows the inverse concept by describing how likely the prediction variable is marked by the true variable (Powers, 2013). Note, that when calculating the average per class values of Inf<sup>M</sup> and Mark<sup>M</sup> are weighted by the Bias<sup>i</sup> = tpi+fp<sup>i</sup> tpi+fni+tni+fp<sup>i</sup> and the Prevalence<sup>i</sup> = tpi+fn<sup>i</sup> tpi+fni+tni+fp<sup>i</sup> , respectively. The Matthews Correlation Coefficient MCC<sup>M</sup> can be derived by calculating the geometric mean of Inf<sup>M</sup> and Mark<sup>M</sup> and expresses the correlation between predicted classes and true values.

Leave-one-subject-out cross-validation (LOSO) was performed in all test scenarios and the resulting average performance measures are reported together with the corresponding standard deviations. In the following, rates are either reported in a range of [0, 100] or [0, 1] (due to limited space).

# 5.1. Classification Performance

The equivalent network structure (see Section 3.2) was used to train the network on the two datasets described in Section 4. In case of the uulmMAD dataset, 28, 902 key pose frames (per class average 2, 064.43, std 1, 097.16) were selected and used as the training input. 576 cells in the last convolutional layer (layer 15) of the CNN were assigned to each of the 14 classes in the dataset. The network was trained in 150, 000 iterations. Testing was performed using the preselected key pose frames of the test subject as input. The average population activation of the cells assigned to each class was used to infer the final classification decision (for an exemplary activation pattern see **Figure 8**). **Figure 6** summarizes classification results obtained for different temporal integration schemes of single frame classification results. A framewise classification scheme allows to recognize an action in an instant when the key pose frame is presented to the network. This kind of immediate decision might be crucial for systems which rely on decisions in real time. Not only the processing speed, but also the time necessary to sample and construct the action descriptors is relevant in this context. **Figure 6A** summarizes the framewise

FIGURE 6 | uulmMAD classification performance. The proposed network was trained on the key pose frames extracted from the uulmMAD action recognition dataset. (A) Shows the per class classification rates obtained by single key pose frame classification. This allows the recognition of an action in the instant a key pose frame emerges in the input sequence. Average classwise recall (on the diagonal) ranges from 0.78 to 0.98. Some of the notable confusions between classes can be explained by a large visual similarity (e.g., between ED2 and ST4). In (B) sequence level majority voting was applied. The final decision is made after evaluating all key pose frames within an action sequence and determining the class with the most frequent occurrence of key poses. The resulting per class values of RecM range from 0.94 to 1.00. A sliding window based classification scheme was evaluated in (C). The best and worst per class average recall values together with the average value of RecM are displayed for temporal window sizes from 1 to 60 frames. In addition, the percentage of windows containing one or more key pose frames (and thus allow a classification of the action) is shown (blue line).

#### TABLE 2 | uulmMAD classification performance.


\**Size [overlap]*

classification rates per class (average Rec<sup>M</sup> of 0.887, std 0.057). Some of the confusions between classes might be explained by similar visual appearances of the key poses (e.g., ED2 and ST4). Accumulating the classified key poses over a sequence by majority voting increases the classification performance (average Rec<sup>M</sup> of 0.967, std 0.028, compare **Figure 6B**), but requires to analyze all frames of a sequence and is thus not well suited for real time applications. As a compromise between classification speed and performance, a sliding window based approach was evaluated. In **Figure 6C**, the best and worst average per class recall is displayed together with the Rec<sup>M</sup> for window sizes of n = [1, ..., 60], each with an overlap of n − 1. In addition, the relative number of windows which contain at least one key pose (and thus allow a classification) is shown. **Table 2** summarizes the classification performance for different single frame and temporal integration schemes. Single frame performance is, in addition, reported for the evaluation of not only the key pose but the full set of frames. As can be seen, the classification performance decreases significantly but the average recall of Rec<sup>M</sup> of 67.56 (std 6.06) indicates, that the learned key pose representations are still rich enough to classify a majority of the frames correctly. Note, that the relative number of correct classifications clearly exceeds the percentage of key pose frames in the dataset (per class average of 7.46 %, std 2.19 %, compare **Figure 1C**).

The model was additionally trained using the Weizmann dataset (Gorelick et al., 2007, see Section 4.2). 1, 873 frames (per class average 187.30, std 59.51) were selected as key pose frames utilizing the combined criterion developed in Section 3.1. Except for the number of output features encoding each class (806), the same network and learning parameters were applied. As for the uulmMAD dataset, **Figure 7** gives an overview over the classification performance, by showing confusion matrices for single key pose frame evaluation (**Figure 7A**), full sequence majority voting (**Figure 7B**), as well as best and worst class recall for different sized windows of temporal integration (**Figure 7C**). In comparison to the results reported for the uulmMAD dataset, the gap between the best and worst class recall is considerably increased. This might be explained by a different overall number of available training examples in

FIGURE 7 | Weizmann classification performance. The network was evaluated on the Weizmann dataset to allow a comparison to other approaches. As in Figure 6, (A) shows the classifications rates for a classification of single key pose frames per class. (B) Displays classwise recognition results for a full sequence evaluation using majority voting. Similar visual appearances might explain the increased rate of confusions for some of the classes (e.g., *run* and *skip*). In (C) the average best and worst per class recall values and ReclM are reported for temporal window sizes between 1 and 30 frames together with the relative number of windows which contain at least one frame classified as key pose.


#### TABLE 3 | Weizmann classification performance.

\**Size [overlap]*

the datasets (the per class average of training examples in the uulmMAD dataset exceeds the Weizmann dataset by a factor of 11.02), higher visual similarities between the classes (the most prominent confusions are observed for skip, jump and pjump), the lack of a sufficient number of descriptive key poses, or a combination hereof. A direct relationship of the classwise performance and the per class number of key pose frames available for training cannot be observed. Even though the least number of key pose frames was extracted for the class bend, the second best recall value was achieved. As for the uulmMAD dataset, performance measures are reported for different single frame and temporal integration schemes in **Table 3**. Again, the trained key pose representations achieve a considerable performance even when tested per frame on all frames of the action sequences (Rec<sup>M</sup> = 77.15, std 6.46). **Table 4** compares the reported classification results on the Weizmann dataset to state-of the art single frame based (second block) and sequence level approaches (third block). In particular, other key pose based action recognition approaches are listed (first block). The direct comparison of different classification architectures, even when evaluated on the same dataset, is often difficult, since different evaluation strategies may have been applied. Thus, whenever possible, the number of considered classes (sometimes the class skip is excluded) and the evaluation strategy is listed together with classification performance and speed. Evaluation strategies are either leave-one-subject-out (LOSO), leave-one-action-out

(LOAO) or leave-one-out (LOO, not specifying what is left out) cross-validation.

On a sequence level, the classification performance of the proposed approach is on par with almost all other key pose based methods. Only Liu et al. (2013) achieved a noteworthy higher performance (recall of 100). It is important to stress that the compared methods substantially differ in their key pose selection procedures and thus in the underlying conceptual definition of key poses. For example, Weinland and Boyer (2008) and Liu et al. (2013) select key poses that maximize the classification performance in a validation subset of the dataset, whereas (Baysal et al., 2010; Cheema et al., 2011) select and weight candidate pose descriptors dependent on their distinctiveness with respect to the other classes contained in the dataset. In Chaaraoui et al. (2013), key poses are selected independently per class using clustering in combination with a compactness metric. All the above mentioned approaches, except the last one, rely on inter-class distributions of pose descriptors to identify key poses, implicitly stating that representativeness is equivalent to distinctiveness (among a known set of classes). If the task at hand is to separate an a priori defined set of actions, this seems to be the superior way of defining key poses for the establishment of temporally sparse representations of actions. On the other hand such poses always describe differences based on comparisons and do not necessarily capture characteristic poses of an action.

The presented approach follows a different principle. Certain properties of image or skeleton based pose features are assumed


*Bold values indicate maximum recall/fps values per column.*

to co-occur with characteristic body configurations and thus are used to identify key pose frames. The feature characteristic indicating a key pose and the representations/descriptors used for the recognition of a pose do not necessarily have a close relationship. In doing so, we accept the fact that the selected poses are not guaranteed to be very distinctive and some even may occur in more than one action in exactly the same way. Key poses are assumed to be the most representative poses of a particular action, not in comparison, but in general. Nevertheless, the presented results demonstrate that a feature-driven, pose centered key pose selection mechanism is capable of achieving the same level of performance, without loosing generality.

Most key pose based approaches in the literature try to assign single frames of an image sequence to key pose frames with a high similarity, temporally integrate the result (e.g., by using histograms or majority voting) and perform a classification of the action on a sequence level. The result of single frame action recognition based on the extracted key poses (directly linking key poses to actions) is rarely reported. Single frame based approaches (see **Table 4**, second block), however, try to perform action classification using information solely extracted within one frame (two frames if optical flow is part of the descriptor) and achieve impressive results. In direct comparison, the single frame performance of the presented approach (Rec<sup>M</sup> of 82.15 for key pose evaluation and 77.15 for the classification of all single frames, compare **Table 3**) cannot compete with the other methods, which, on the contrary, utilize all frames during learning to maximize classification performance in the test training dataset. The presented approach, however, achieves a single frame performance of Rec<sup>M</sup> = 77.15 when evaluated over all frames, although in case of the Weizmann dataset only a per class average of 33.84 % (std 8.63 %) of all frames is used for training.

In the third block of **Table 4**, selected approaches performing action recognition on a sequence level using a variety of different representations and classification architectures are listed. Note that in an overall comparison, (I) due to the transfer on neuromorphic hardware, the presented approach achieves the highest processing speed<sup>6</sup> while consuming a minimal amount of energy, and (II) due to fact, that we aim at executing the model on a single TrueNorth chip we only use input maps with a resolution of 32 × 32 (using 4,064 of the 4,096 cores available on one chip). This is no limitation of the employed Eedn framework, which allows to realize models which run on systems with more than one chip (Esser et al., 2016; Sawada et al., 2016). An increased input resolution, as well as the use of more than two flow direction and contour orientation maps might help in separating classes with a high visual similarity (e.g., skip, jump, and run).

#### 5.2. Comparison to Full Sequence Learning

To address the question whether and how the proposed classification architecture might benefit from using all frames (as opposed to only key pose frames) during training, we performed exactly the same training and testing procedure twice on the uulmMAD dataset. First, only key pose frames were presented during training, while second, all frames were provided during the training phase. Likewise, testing was performed just on the preselected key pose frames, as well as the full set of frames. **Table 5** compares the average recall under the different training (rows) and testing conditions (columns) for single frame evaluation and sequence level majority voting.

In both cases, training and testing on key pose frames achieves the highest performance. However, the observed differences between the two training conditions could not shown to be significant, neither when testing on key poses nor on the full set of frames. Nevertheless, having a closer look at the activation patterns of the network reveals some insights on the effectiveness of the two variants of trained representations. **Figure 8** shows the average activation pattern of the 14 cell populations in layer 15 assigned to the individual classes of a network trained on key pose frames and tested on all frames of the action SP2 (jumping jack). The displayed activation levels clearly show how

TABLE 5 | uulmMAD key pose versus all frame learning.


*Bold values indicate the maximum average recall for framewise and full sequence majority voting classification schemes.*

FIGURE 8 | Activation of cell populations. The activations of the cell populations in the last convolutional layer of the DCNN assigned to the 14 classes of the uulmMAD dataset are displayed for a network trained only on key pose frames and tested on all frames of the action SP2 (jumping jack). The activation level of the cell population with the maximum activation (red) and the remaining populations (blue) is encoded by color intensity. Corresponding poses are displayed for selected frames (bottom row). Key pose frames are marked by asterisks. The activation pattern shows how the cell population assigned to the class SP2 selectively responds to frames in the temporal neighborhood of the corresponding key pose frames. At the beginning and the end of the sequence, as well as in between the occurrence of key pose frames, different cell populations achieve the maximum average activation and thus result in misclassifications. Written informed consent for the publication of exemplary images was obtained from the shown subject.

<sup>6</sup> Image preprocessing and key pose selection is not integrated in the estimated processing time. Optical flow estimation can be performed on a second TrueNorth chip (Brosch and Neumann, 2016).

the trained representations of the corresponding class selectively respond within the temporal neighborhood of the key pose frames. Frames sampled from periods without the presence of key pose frames (at the beginning and the end of the sequence, as well as in between key pose frames) result mostly in a large activation of other cell populations and thus in misclassifications. This is in line with the results shown in **Table 5**, which indicate that classification performance increases under both training conditions when testing is only performed on key pose frames. At this point we can conclude that, compared to a training on the full set of frames, key pose based learning of class specific representations at least performs at an equal level. Whether there is any benefit of training exclusively on key pose frames next to an increased learning speed, remains, however, an open question. **Figure 9** summarizes the per class activation levels of the cell populations which resulted in a correct classification. For almost all classes (except ED3), the activation level is significantly increased when training was performed on key pose frames only. This might become a very important property in situations where it is not an option to accept any false negatives. Applying a threshold on the activation levels would allow to eliminate false negatives, while key pose based training would decrease the number of positive examples rejected by the fixed threshold. Thus, thresholding might further increase the performance for the key pose based training reported so far. Taken together, key pose based learning achieves a slightly increased classification performance with an increased selectivity of the cell populations and thus a higher confidence of the classification decisions.

#### 5.3. Cross-Dataset Evaluation

Learning to classify input samples and the associated representations is conducted with the aim to robustly predict future outputs and, thus, generalize for new input data. Here, we evaluate such network capability by evaluating the classification of the trained network using input data across different datasets. More precisely, cross-dataset evaluation was performed to evaluate how the learned representations generalize over different datasets. The preselected key pose frames of the uulmMAD and the Weizmann dataset were used for both

training and testing constellations. Performance is reported for two classes, one being one-handed wave (ED1 and wave1), which is available in both datasets. The second class was formed by combining the visually similar classes SP2/SP6 and jack/wave2 during evaluation into one joint class raising two hands. Training was performed on the full set of classes in both cases. Thus, for one-handed wave a random guess classifier would achieve a recall of either 7.14 (uulmMAD) or 10.00 (Weizmann). In case of the combined class raising two hands, the recall chance level increases to 14.29 (uulmMAD) and 20.00 (Weizmann), respectively. **Table 6** shows the result for one-handed wave for the two testing (row) and training (column) setups alongside with exemplary pictures of the classes from both datasets. When training was performed on the Weizmann dataset, the recall performance for examples from the uulmMAD dataset is still considerable (loss of 24.07). Training on the uulmMAD and testing on the Weizmann dataset results in an increased performance loss, but still achieves a recall of 53.03.

In case of the combined class raising two hands, the performance loss is below 30 for both training and testing configurations. **Table 7** shows the achieved performance in detail for each of the four classes in isolation and their combination. Note that when trained on the uulmMAD dataset, jumping jack is recognized almost without any loss of performance. Vice versa, SP2 is often confused with wave2 when training was performed on the Weizmann dataset. This may be explained by the large visual similarities between the classes.

The proposed approach shows promising generalization capabilities, which might partially be explained by the classindependent, feature-driven selection of the key pose frames.

# 6. CONCLUSION AND DISCUSSION

The presented work consists of two main contributions. First, a feature-driven key pose selection mechanism is proposed, which builds upon evidences about human action perception. The selection mechanism does not utilize any information about the inter- or intra-class distribution of the key poses (or key pose descriptors) to optimize the classification accuracy. It is demonstrated, that the classification accuracy is on par with state-of-the-art key pose based action recognition approaches, while only motion and form related feature characteristics are used to select a key pose frame. Second, we propose a biologically inspired architecture combining form and motion information to learn hierarchical representations of key pose frames. We expect such hierarchical feature representations to make the recognition more robust against clutter and partial occlusions, in comparison to holistic shape representations of the full body configurations used in previous approaches. Form and motion pattern representations are established employing a neuromorphic deep convolutional neural network. The trained network is mapped onto the IBM Neurosynaptic System platform, which enables a computationally and energy efficient execution.

# 6.1. Relation to Other Work

The presented results demonstrate, that classifying actions using a minimal amount of temporal information is in principle

#### TABLE 6 | Cross-dataset evaluation one-handed wave.


*Bold values indicate the maximum recall values per column. Written informed consent for the publication of exemplary images was obtained from the shown subjects (uulmMAD).*


*Bold values indicate the maximum recall values per column. Written informed consent for the publication of exemplary images was obtained from the shown subjects (uulmMAD).*

possible. This is in line with results from other action recognition approaches. For example, Schindler and van Gool (2008) reported that actions can be successfully recognized using snippets of three or even less frames. In their work, the length of the temporal window used for the classification of an action sequence was systematically varied. The most important result was that a reliable action recognition can be achieved by only using individual snippets, i.e. up to three consecutive frames in temporal order. The question whether there are special "key snippets" of frames, which are particularly useful for the recognition of an action and how they might be defined, however, remains open.

Inspired by evidences from perceptual studies (Thurman and Grossman, 2008; Thirkettle et al., 2009), key poses are potential candidates for representing such special events in articulated motion sequences. Unlike the majority of other approaches reported in the literature (e.g., Baysal et al., 2010; Liu et al., 2013), the proposed key pose selection mechanism identifies key pose frames without optimizing the inter-class distinctiveness or classification performance of the selected key poses. The feature-driven selection criterion proposed in this work combines form and motion information and allows the identification of key poses without any knowledge about other classes. It extends a previous proposal utilizing local temporal extrema in the motion energy as a function of time (Layher et al., 2014) by additionally taking a measure of extendedness of the silhouette shape into account. Given that these features are entirely data-driven, this has two major implications. On the one hand, the selected poses are independent of any other class and thus are more likely to generalize over different sets of actions. This property is appreciated and valuable in many applications since it does not require any prior knowledge about the distribution of classes/poses in other datasets. On the other hand, there is no guarantee, however, that a learned key pose representation is not part of more than one action and thus results in ambiguous representations. This may lead to drawbacks and deteriorations of the model performance in terms of classification rates for rather ambiguous sequences with similar pose articulations. We argue that, although the proposed key pose selection criterion might not result in the best classification performance on all action recognition datasets in isolation, it selects key pose frames which capture the nature of an action in general (independent of a specific dataset). In addition, the reported results demonstrate, that there is no substantial loss in performance when comparing the proposed feature-driven key pose selection mechanism to performance optimizing key pose approaches in literature. In contrast to other action recognition approaches building upon convolutional neural networks, the proposed model does not aim at establishing representations which capture the temporal relationship between successive frames. This can be accomplished by e.g., directly feeding spatiotemporal input to the network and applying 3D convolutions (e.g., Baccouche et al., 2011; Ji et al., 2013) or by applying a multiple spatio-temporal scales neural network (MSTNN; Jung et al., 2015). Instead, in this work, the employed DCNN exclusively aims at identifying class specific key pose frames as events in an image (and optical flow) stream.

The investigation reported in this work adds an important piece to the debate of how representations for action sequence analysis might be organized. Some previous approaches have utilized motion and form information for the classification of action categories. For example, Giese and Poggio (2003) proposed that biological motion sequences representing articulated movements of persons is subdivided into two parallel streams in primate visual cortex. In particular, the authors argue that motion patterns are represented in a hierarchy and these are paralleled by regular temporal sampling of static frames from the same input sequence. This model architecture has been extended in Layher et al. (2014) suggesting that instead of representing sequences of static frames only key poses need to be selected. As a candidate criterion, the motion energy is calculated over time and local energy minima depict reversal points of bodily articulation. Such reversals, in turn, most likely coincide with extremal articulations and thus can be utilized to select a key pose in such articulation sequences. While these models focus on cortical architecture of visual dorsal and ventral streams, other computer vision approaches also consider combinations of motion and form information for action recognition. While the proposal of Jhuang et al. (2007) builds on a hierarchy of cascaded form and motion representations, the approach of Schindler and van Gool (2008) also utilized two parallel streams of motion and form processing. Both streams generate feature vectors of equal length which are subsequently concatenated including a weighting of the relative strength of their contribution. An evaluation of the relative weights showed that a fusion with 70 % motion against a 30 % form feature concatenation yielded the best performance on the Weizmann dataset. On the contrary, Schindler et al. (2008) demonstrated that emotion categories can be classified using static images only which are processed by a multi-scale bank of filters with subsequent pooling operation and dimension reduction. Our findings add new insights to the investigation of utilizing form/shape and motion information in biological/articulated motion analysis for action recognition. Our findings highlight that key poses defined by events of temporal extrema in motion energy and dynamic object silhouette features reliably reflect a high information content regarding the whole action sequence. In other words, key poses can be detected by an entirely feature-driven approach (without utilizing any a priori model of actions in the sequence) and that the associated temporal events contain a high proportion of the information about the main components of the action sequence.

We successfully trained a DCNN of 15 convolutional layers on the key pose frames used as input, which were assigned to different action classes. The network was trained using the energy-efficient deep neuromorphic networks (Eedn) framework (Esser et al., 2016) and executed on a TrueNorth NS1e board (Merolla et al., 2014). The results show that action recognition can be performed on mobile robotic platforms under realtime constraints while consuming a minimal amount of energy. The reduced energy consumption and the high performance in classification rate (compare **Table 4**) makes such a model architecture a valuable candidate for applications in mobile or remote control scenarios in which autonomy in energy supply and external control are constraints of core importance. The automatic selection of key pose information for the classification mechanism is a key step to make use of the demonstrated parameters.

Although some classes contained examples with highly similar visual appearances, the network shows an impressive single frame recognition performance when tested on key frames. Even when tested on the full set of frames, recognition performance is still significantly above chance level. Using a simple temporal integration scheme, we show that the results are on par with competing key pose based action recognition approaches (**Table 4**). Cross-dataset evaluation of classes with the same/a similar visual appearance in both datasets shows how the learned representations generalize over the different datasets (training was performed on the full set of classes).

# 6.2. Shortcomings and Possible Further Improvements

Currently, the optical flow estimation and the key pose selection are performed prior to the training and the classification of input sequences. To realize a complete neuromorphic implementation of the presented approach, optical flow can be estimated as well on neuromorphic hardware following the principles described in Brosch and Neumann (2016). A neuromorphic implementation of localizing the local extrema in the motion energy and the extendedness of a person's silhouette could be realized on top of the flow estimation process. In addition, dynamic vision sensors (e.g., iniLabs DVS128) are an option to directly feed a network similar to the proposed one with spike-based sensory streams. First attempts to realize an action recognition system using such sparse asynchronous data streams have already shown promising results (Tschechne et al., 2014).

The presented approach does not make use of any temporal relationship between the identified events (key poses) in an action sequence. Thus, the reversed, or scrambled presentation of images (and optical flow) of a sequence would result in an assignment to an action class, although, the visual appearance of the sequence is totally different. A modeling or learning of the temporal relationships between the key pose frames, e.g., their temporal order, would help in reducing ambiguities and thus increase sequence-wide or windowed classification rates. In case of the proposed approach, this could be achieved by employing, e.g., long short-term memory cells (LSTM; Hochreiter and Schmidhuber, 1997), which are candidates to realize the learning of temporal relationships without loosing the invariance against changes in speed. The simple majority voting based integration scheme was chosen, because of hardware limitations and to focus on an analysis of the importance of key poses in the context of action representation and recognition.

We also did not apply a weighted majority voting scheme using the confidences of the frame-wise predictions or apply thresholding on the predictions. Both strategies might further increase the classification performance but again would weaken the focus on the analysis of key pose base representations of action sequences.

The proposed architecture of a deep convolutional neural network (DCNN) as depicted in **Figure 3** builds increasingly more complex feature representations through learning from initial simple features. It would be interesting to investigate the feature selectivities of the feature representations that have been established by the learning. Such a study would potentially shed light about the structure of the feature compositions (and their hierarchical organization) which lead to the selectivity of the key poses in relation to the action sequences to be classified. Some approaches analyzing the low-, intermediate-, and higher-level feature representations have recently been proposed in the literature (Zeiler and Fergus, 2014; Güçlü and van Gerven, 2015; Mahendran and Vedaldi, 2016). Such approaches have so far investigated CNNs for static inputs only. For that reason, some principles might also be useful for the analysis of key pose representations. In addition, the consideration of short-term spatio-temporal feature representations will help to extend the scope of the overall study of visualizing internal representations after learning. We expect necessary major efforts to carefully develop an extended set of tools which is beyond the scope of the modeling investigation presented here.

Overall, the presented results show, that the learned key pose representations allow the classification of actions using a minimal amount of temporal information. By implementing the proposed DCNN on the TrueNorth chip, we show that real-time action recognition relying on the proposed principles is possible while consuming a minimal amount of energy, as reported for the runtime environments of the IBM Neurosynaptic System (Esser et al., 2016).

#### AUTHOR CONTRIBUTIONS

Conceived and designed the approach: GL, TB, HN; Implemented the architecture: GL, TB; Performed the simulations: GL; Analyzed the data: GL; Wrote the paper: GL, HN.

### REFERENCES


#### FUNDING

This research has been supported by the Transregional Collaborative Research Centre SFB/TRR 62 "A Companion Technology for Cognitive Technical Systems" funded by the German Research Foundation (DFG). HN has been supported by the collaborative project "SenseEmotion" funded by the German Federal Ministry of Education and Research (BMBF).

#### ACKNOWLEDGMENTS

The authors gratefully acknowledge the support via a field test agreement between Ulm University and IBM Research Almaden, and particularly the TrueNorth team at IBM for their support. In addition, the support of NVIDIA Corporation with the donation of a Tesla K40 GPU used for this research is also gratefully acknowledged. The authors like to express their gratefulness for the reviewers' efforts and suggestions that helped to improve the manuscript.


(CVPR), 2008 (IEEE), 1–8. Available online at: http://ieeexplore.ieee.org/ abstract/document/4587735/


Lin, M., Chen, Q., and Yan, S. (2013). Network in network. CoRR, abs/1312.4400.


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2017 Layher, Brosch and Neumann. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Fast Dynamical Coupling Enhances Frequency Adaptation of Oscillators for Robotic Locomotion Control

Timo Nachstedt 1, 2 \*, Christian Tetzlaff 2, 3 and Poramate Manoonpong<sup>4</sup>

<sup>1</sup> Third Institute of Physics, Universität Göttingen, Göttingen, Germany, <sup>2</sup> Bernstein Center for Computational Neuroscience, Göttingen, Germany, <sup>3</sup> Max Planck Institute for Dynamics and Self-Organization, Göttingen, Germany, <sup>4</sup> Embodied AI and Neurorobotics Lab, Centre for BioRobotics, The Mærsk Mc-Kinney Møller Institute, University of Southern Denmark, Odense, Denmark

Rhythmic neural signals serve as basis of many brain processes, in particular of locomotion control and generation of rhythmic movements. It has been found that specific neural circuits, named central pattern generators (CPGs), are able to autonomously produce such rhythmic activities. In order to tune, shape and coordinate the produced rhythmic activity, CPGs require sensory feedback, i.e., external signals. Nonlinear oscillators are a standard model of CPGs and are used in various robotic applications. A special class of nonlinear oscillators are adaptive frequency oscillators (AFOs). AFOs are able to adapt their frequency toward the frequency of an external periodic signal and to keep this learned frequency once the external signal vanishes. AFOs have been successfully used, for instance, for resonant tuning of robotic locomotion control. However, the choice of parameters for a standard AFO is characterized by a trade-off between the speed of the adaptation and its precision and, additionally, is strongly dependent on the range of frequencies the AFO is confronted with. As a result, AFOs are typically tuned such that they require a comparably long time for their adaptation. To overcome the problem, here, we improve the standard AFO by introducing a novel adaptation mechanism based on dynamical coupling strengths. The dynamical adaptation mechanism enhances both the speed and precision of the frequency adaptation. In contrast to standard AFOs, in this system, the interplay of dynamics on short and long time scales enables fast as well as precise adaptation of the oscillator for a wide range of frequencies. Amongst others, a very natural implementation of this mechanism is in terms of neural networks. The proposed system enables robotic applications which require fast retuning of locomotion control in order to react to environmental changes or conditions.

Keywords: adaptive frequency oscillator, central pattern generator, neural networks, resonance tuning, locomotion control

# 1. INTRODUCTION

Rhythmic processes are of central importance for many aspects of biological life (Winfree, 1967; Barkai and Leibler, 2000; Goldbeter et al., 2012). Examples include the cardiac rhythm, various circadian rhythms and, in particular, all forms of biological locomotion like walking, flying or swimming. The latter are controlled by specific neural circuits, so called central pattern generators

#### Edited by:

Shuai Li, Hong Kong Polytechnic University, Hong Kong

#### Reviewed by:

Paolo Arena, University of Catania, Italy Patrick Henaff, University of Lorraine, France

### \*Correspondence:

Timo Nachstedt timo.nachstedt@ phys.uni-goettingen.de

Received: 18 October 2016 Accepted: 24 February 2017 Published: 21 March 2017

#### Citation:

Nachstedt T, Tetzlaff C and Manoonpong P (2017) Fast Dynamical Coupling Enhances Frequency Adaptation of Oscillators for Robotic Locomotion Control. Front. Neurorobot. 11:14. doi: 10.3389/fnbot.2017.00014 (CPGs) (Hooper, 2001; Marder and Bucher, 2001). Theoretical models of CPGs range from detailed biophysical models (Hellgren and Grillner, 1992) to pure mathematical oscillators (Matsuoka, 1985). In general, CPGs can be described as nonlinear oscillators which have been used in numerous applications for different variants of robotic control problems (Nakamura et al., 2007; Ijspeert, 2008; Pinto et al., 2012; Nassour et al., 2014; Santos et al., 2017). For instance, compared to purely reflexive control schemes (Foth and Bässler, 1985; Cruse et al., 1995), oscillatorcontrolled robots enable more stable and robust locomotion (Kimura et al., 2001; Righetti and Ijspeert, 2008).

CPGs do not require any external input or feedback to produce basic rhythmic activity. However, they still require feedback signals to adapt and tune their produced activity, for instance its frequency. For the theoretical concept of nonlinear oscillators, a universal mechanism to adapt the intrinsic frequency of an oscillator according to the frequency of an external periodic signal, which is coupled to the oscillator, was formulated by Righetti et al. (2006). This frequency adaptation schema is applicable to many different types of oscillators. In contrast to the well-known phenomenon of entrainment, which is a purely reactive mechanism with only transient effect on the oscillatory system (Buchli et al., 2006), the frequency adaptation schema modifies the intrinsic frequency of the system permanently. Oscillators with this schema are commonly called adaptive frequency oscillators (AFOs). Several applications of AFOs have been proposed including adaptive control of compliant robots (Righetti et al., 2009), pendulum swing-up problems (Spong, 1995; Furuta, 2003), understanding, simulation and support of human locomotion (Ronsse et al., 2011a; Tropea et al., 2015; Santos et al., 2017), mimicking of fish swimming (Wang et al., 2013), frequency analysis of an input signal (Buchli et al., 2008), and construction of limit cycles of arbitrary shape (Righetti et al., 2009). However, all of these applications suffer from significantly long adaptation times.

For a given oscillatory system, the dynamics of a standard AFO is determined by only two parameters: the strength of the coupling of the external signal to the oscillator and the learning rate of the parameter determining the intrinsic frequency of the system. Here, we show that, when choosing these two parameters, one has to make a compromise between speed and precision of the resulting adaption dynamics. Furthermore, we demonstrate that the optimal parameters for a certain balance of speed and precision strongly depend on the initial intrinsic frequency of the oscillator and on the target frequency, i.e., the frequency of the external signal. As a result, situation-specific fine-tuning of the parameters is necessary.

In contrast, we propose an extension of the standard frequency adaptation mechanism which provides both fast as well as precise adaptation for a wide range of initial intrinsic and target frequencies without the need for parameter fine tuning. In the following, we call this mechanism "Adaptation through Fast Dynamical Coupling" (AFDC). It is based on dynamically adapting the coupling strength of the external signal. If the difference between the current intrinsic frequency and the target frequency is high, the coupling strength is increased in order to accelerate the adaptation. If the difference between the current intrinsic frequency and the target frequency becomes small, the coupling strength is reduced to increase the precision of the adaptation. This process is autonomous and can be integrated into the dynamical equations of the system. Neither the current intrinsic nor the target frequency need to be explicitly available as the mechanism solely relies on signal correlations. We compare the adaptation processes obtained by regular AFOs with those obtained with the new AFDC mechanism by means of quantitative measures of speed and precision of the adaptation. We find that the AFDC mechanism clearly outperforms standard AFOs within the tested frequency interval covering two orders of magnitudes.

### 2. RESULTS

### 2.1. Standard Adaptive Frequency Oscillator

In very general terms, an oscillator is an autonomous dynamical system with at least one limit cycle attractor (Buchli et al., 2006). Naturally, every two-dimensional oscillatory system (x, y) can be expressed as a system of two equations x˙(t) = gx(x(t), y(t), θ) and y˙(t) = gy(x(t), y(t), θ) where the functions g<sup>x</sup> and g<sup>y</sup> define the dynamics of the system. We require that these two functions do not only depend on the state variables x and y but also explicitly on a variable θ which determines the intrinsic oscillation frequency f of the system. The function f(θ) may be of an arbitrary shape and in many cases is not explicitly known. We only assume it to be monotonic. The system can be transformed into an adaptive frequency oscillator (AFO) by coupling it to an external signal F(t):

$$\begin{aligned} \dot{\mathbf{x}}(t) &= \mathbf{g}\_{\mathbf{x}}(\mathbf{x}(t), \mathbf{y}(t), \boldsymbol{\theta}(t)) + \boldsymbol{\epsilon}F(t) \\ \dot{\mathbf{y}}(t) &= \mathbf{g}\_{\mathbf{y}}(\mathbf{x}(t), \mathbf{y}(t), \boldsymbol{\theta}(t)). \end{aligned} \tag{1}$$

Here, ǫ denotes the coupling strength. Furthermore, additional dynamics of the θ-variable are introduced (Righetti et al., 2006):

$$\dot{\theta}(t) = \pm \eta F(t) \frac{\wp(t)}{\sqrt{\varkappa(t)^2 + \wp(t)^2}}. \tag{2}$$

with a learning rate η. The sign on the right-hand side depends on the direction of oscillation of the actual oscillatory system in the phase space. Note that in the original publication (Righetti et al., 2006), always η = ǫ is chosen as it emerges naturally when deriving the adaptation rule from analyzing the effect of the periodic external signal F on the phase velocity of the oscillator (Righetti et al., 2006). Apart from this, however, there is no a priori reason why this choice should provide optimal adaptation results. It has been shown that, using this rule, a wide range of oscillators can adapt their intrinsic frequencies to the frequency of basically any external periodic signal F(t). In this contribution, we consider the Hopf oscillator (**Figure 1A**), which possesses a harmonic limit cycle, and the Van der Pol oscillator (Van der Pol, 1920) (**Figure 1B**), which, depending on the choice of parameters, exhibits highly non-harmonic oscillations.

For analyzing a given adaptation process, we start with an oscillator with an initial frequency variable θ<sup>0</sup> corresponding to

FIGURE 1 | Adaptation of two standard adaptive frequency oscillators. The upper panels show the time course of the frequency determining parameter θ. The time during which the external signal is applied to the system is indicated by the yellow shaded area. The dashed horizontal lines indicate the values θ<sup>0</sup> and θext corresponding to the initial intrinsic frequency f0 and the target frequency fext of the external signal, respectively. The panels below show the oscillating state variables x and y and the external signal F at different short time windows during the adaptation process. In both cases, the initial intrinsic frequency of the oscillator is f<sup>0</sup> = 4.0 and the external signal is a sine wave with unit amplitude and frequency fext = 2.0. (A) Adaptive frequency Hopf oscillator with µ = 1.0 and ǫ = η =1.0 (see Methods). The initial value of the parameter θ is given by θ<sup>0</sup> = 2πf<sup>0</sup> ≈ 25.1. Accordingly, the value corresponding to the frequency of the external signal is θext = 2πfext ≈ 12.6. The external signal is applied for 100 ≤ t < 700. (B) Adaptive frequency Van der Pol oscillator with µ = 100.0 and ǫ = η = 0.7 (see Methods). The values of the parameter θ corresponding to f<sup>0</sup> and fext are θ<sup>0</sup> ≈ 34.8 and θext ≈ 22.0 (see Methods). The external signal is applied for 100 ≤ t < 500.

an initial intrinsic frequency f<sup>0</sup> = f(θ0). Here, the function f(θ) is not explicitly known but can be numerically approximated. We denote the target frequency, i.e., the frequency of the external signal, by fext. Furthermore, we define the target value θext as the value of θ such that fext = f(θext) for the given oscillator. The frequency variable θ is not modified by the adaptation rule (Equation 2) as long as the external signal F is zero (t < 100 in **Figure 1A**). After the onset of the external signal, θ is slowly adapted toward the target value θext (100 < t < 700 in **Figure 1A**). The adaptation rate increases as θ gets closer to θext. The final adaptation phase is typically characterized by a small θovershoot before it converges toward a quasi-constant state with only small periodic fluctuations (600 < t < 700 in **Figure 1A**). Now, when removing the external signal, i.e., setting F = 0, the oscillator maintains oscillations at the adapted frequency (t > 700 in **Figure 1A**). Note that it is not guaranteed that the finally reached value of θ corresponds exactly to θext. In contrast, in some cases significant deviations can be observed (**Figure 1B**). As it turns out, reducing this deviation is only possible when accepting longer adaptation times.

#### 2.1.1. Speed vs. Precision Trade-Off

In many applications, for instance in robotic systems, it is usually desired to have systems that are able to adapt to new situations or circumstances quickly. In contrast, AFOs with the usual choice of parameters require many periods of oscillations to complete a given adaptation process. The convergence time of the adaptation process, i.e., the time between the onset of the external signal and the quasi-convergence of the frequency parameter θ of the oscillator, can be adjusted by manipulating the coupling strength ǫ in Equation (1) or the learning rate η in Equation (2) (**Figure 2**). However, increasing ǫ or η does not only increase the speed of the frequency adaptation but also increases the general influence of the external signal on the oscillatory system. As a result, the dynamics of the parameter θ, once it has converged to a quasistable state, is affected as well (**Figure 3**). On the one hand, high learning rates η lead to increased fluctuations of the parameter θ in the finally reached state. On the other hand, higher values of ǫ result in a higher offset of the finally reached mean value θ¯ from the value θext. Therefore, shorter convergence times in the standard AFO systems go hand in hand with a loss of precision. Naturally, this trade-off complicates real-world applications of the mechanism.

#### 2.1.2. Quantitative Adaptation Quality Measures

In order to quantitatively capture the trade-off between speed and precision, we introduce three measures characterizing the quality of a given adaptation process (**Figure 3**). As already discussed, in many applications fast adaptation is desired. This is captured by the convergence time 1 which measures the time interval between the onset of the external signal and the last deviation of the intrinsic frequency f of the system (determined by θ) of more than 5% (10% for the Van der Pol oscillator) from the finally reached mean value ¯ f . The precision of the adaptation, in turn, is reflected by two measures. First, the intrinsic frequency to which the system converges should be as close as possible to the frequency of the external signal. This is measured by the frequency offset δ which is given by the offset of the finally reached mean value of the intrinsic frequency from the frequency of the external signal. Second, the fluctuations of the intrinsic

frequency around its mean value should be low as otherwise the value of the intrinsic frequency when switching off the external signal depends on the exact point of time of this event. The magnitude of these fluctuations is measured by σ which equals the standard deviation of the intrinsic frequency f in the converged state.

To allow interpretation of these measures independently from the chosen internal and external frequencies, we introduce relative measures scaled by the frequency fext or the cycle duration f −1 ext of the external signal, respectively: <sup>1</sup>˜ <sup>=</sup> 1/<sup>f</sup> −1 ext , δ˜ = δ/fext and σ˜ = σ/fext. In addition, we define a quality index Q combining these three relative measures into a single scalar value:

$$Q = \max\left(1 - \frac{\tilde{\Delta}}{\tilde{\Delta}\_{\text{max}}} - \frac{|\tilde{\delta}|}{\tilde{\delta}\_{\text{max}}} - \frac{\tilde{\sigma}}{\tilde{\sigma}\_{\text{max}}}, 0\right). \tag{3}$$

Here, 1˜ max, δ˜max, and σ˜max are the maximum values of the respective measures which we allow for a reasonably good adaptation process. Accordingly, a Q value close to 1 corresponds to a very fast as well as very precise adaptation process. A value of Q = 0, in contrast, indicates that 1 >˜ 1˜ max, δ >˜ δ˜max, σ >˜ σ˜max or the weighted sum (Equation 3) of the individual measures is larger than 1. In the following, if not stated otherwise, we use 1˜ max = 100, δ˜max = 0.05 and σ˜max = 0.05.

#### 2.1.3. Finding Optimal Parameters

For an easy application of an adaptive oscillator in a given setup, no fine tuning of the system parameters for the specific application context should be necessary. It is therefore necessary to find a system which is able to adapt its intrinsic frequency to a wide range of external frequencies without the need for any parameter adaptation other than the one of the frequency determining parameter θ. It turns out, however, that already for the comparable simple case of the harmonic Hopf oscillator, the range of frequencies for which a given set of parameters

FIGURE 3 | Quantitative measures to capture the quality of an adaptation process. Shown is the time course of the intrinsic frequency of an adaptive frequency oscillator during the adaptation to an external periodic signal with high coupling constant ǫ and high learning rate η. The yellow shaded area indicates the time during which the external signal is applied. The inset shows a close up of the data in the indicated area. We introduce three measures to quantify the quality of a given frequency adaption process. The convergence time 1 is the time interval between the onset of the external signal at time t0 and the last deviation of the intrinsic frequency of the oscillator of more than 5% (orange horizontal lines) from the finally reached average intrinsic frequency ¯f. The frequency offset δ measures the difference between the final average intrinsic frequency ¯f and the target frequency of the external signal fext. In order to also capture the periodic fluctuations of the intrinsic frequency from the average value ¯f, we additionally introduce the final frequency fluctuation σ given by the standard deviation of the oscillations of the intrinsic frequency f in the finally reached state (area shaded in light red in the inset). The shown time course of the intrinsic frequency is taken from an adaptive frequency Hopf oscillator with µ = 1.0, ǫ = 5.0, η = 5.0, and f<sup>0</sup> = 2.0 adapting to an external unit sine-wave signal with frequency fext = 1.0.

allows fast as well as precise adaptations is very limited (**Figure 4**). Higher values of ǫ and η increase the intervals of initial intrinsic frequencies f<sup>0</sup> and external frequencies fext for which fast adaptation is achieved (left column in **Figure 4**). In contrast, small frequency offsets δ˜ are achieved only for small values of the coupling strength ǫ (second column in **Figure 4**) and small values of the learning rate η enable small fluctuations as measured by σ (third column in **Figure 4**). The compilation of these observations is reflected by only small intervals of initial intrinsic and external frequencies for which the quality index Q attains non-zero values (right column in **Figure 4**).

Trying to find parameters that allow fast and precise adaptation for a range of initial intrinsic and external target frequencies spanning two orders of magnitudes reveals that actually no ǫ-η-combination allows for an average adaptation quality index hQi higher than approximately 0.12 (**Figure 5**). We conclude that a standard AFO with a fixed set of parameters is not capable to provide fast as well as precise adaptation over a wide range of frequencies.

values. For the same reason, even on the diagonal f<sup>0</sup> = fext, high convergence times are measured for low values of fext.

#### 2.2. Fast Dynamical Coupling Mechanism

As discussed, no fixed value pair for the coupling strength ǫ and the learning rate η suffices for fast and precise adaptation over a wider range of initial intrinsic and external target frequencies. In order to obtain a system without the requirement for applicationspecific fine-tuning, the down- or up-scaling of coupling strength and learning rate has to be accomplished in a self-organized manner. Here, we propose such a system. Instead of coupling the external signal F(t) directly to the oscillator, we now use a filtered signal P(t):

$$\begin{aligned} \dot{\mathbf{x}}(t) &= f\_{\mathbf{x}}(\mathbf{x}(t), \mathbf{y}(t), \boldsymbol{\theta}(t)) + P(t) \\ \dot{\mathbf{y}}(t) &= f\_{\mathbf{y}}(\mathbf{x}(t), \mathbf{y}(t), \boldsymbol{\theta}(t)). \end{aligned} \tag{4}$$

Accordingly, also the adaptation of θ is based on P(t):

$$\dot{\theta}(t) = \pm \eta P(t) \frac{\wp(t)}{\sqrt{\varkappa(t)^2 + \wp(t)^2}}.\tag{5}$$

P(t) is given by a weighted difference of the external signal F(t) and the oscillator variable x(t):

$$P(t) = \epsilon(t)F(t) - \beta(t)\mathbf{x}(t) \tag{6}$$

with the adaptive coupling strengths ǫ(t) and β(t). Following the discussion of the quality measures introduced before, for an optimal adaptation process, the dynamics of ǫ(t) and β(t) has to fulfill two requirements: as long as the difference between the intrinsic frequency f and the target frequency fext of the external signal is high, P(t) should basically be an amplified version of F(t) in order to accelerate the adaptation process. In contrast, when f is already close to fext, P(t) is supposed to attain values close to zero such as to reduce the influence of the external signal to a minimum. Both of these requirements can be fulfilled by adapting β(t) and ǫ(t) according to a combination of correlationbased growth and a passive decay toward a low resting value. For β(t), we propose the following dynamics:

$$
\pi \dot{\beta}(t) = \beta\_0 - \beta(t) + \kappa P(t)\mathbf{x}(t) \tag{7}
$$

with time constant τ and correlation learning rate κ. The value of β scales the subtraction of the system variable x from the external signal F(t) in Equation (6). The product of P and x (averaged over time) is large if the difference between the intrinsic frequency f and the external target frequency fext is low. At this stage, the influence of the external signal on the oscillator should be reduced, i.e., the amplitude of P should be decreased, as done by increasing β. The proposed dynamics for ǫ(t) are very similar:

$$
\tau \dot{\epsilon}(t) = \epsilon\_0 - \epsilon(t) + \kappa F(t)P(t). \tag{8}
$$

FIGURE 6 | Adaptation of two oscillators with the AFDC mechanism. The upper-most panels show the time course of the frequency determining parameter θ. The time during which the external signal is applied to the system is indicated by the yellow shaded area. The dashed horizontal lines indicate the initial value θ<sup>0</sup> and the value θext corresponding to the exact frequency of the external signal. The second panels from the top show the time course of the adaptive coupling strengths β and ǫ. The third panels from the top show the time course of the filtered external signal P. The panels on the bottom show the oscillating state variables x and y and the external signal F at different short time windows during the adaptation process. In both cases, the initial intrinsic frequency of the oscillator is f<sup>0</sup> = 4.0 and the external signal is a sine wave with unit amplitude and frequency fext = 2.0. (A) Hopf oscillator with AFDC mechanism with µ = 1.0, η = 0.5, κ = 5.0, τ = 2.0, β<sup>0</sup> = 0.0 and ǫ<sup>0</sup> = 0.01. The initial value of the parameter θ is given by θ<sup>0</sup> = 2πf<sup>0</sup> ≈ 25.1, the value corresponding to the frequency of the external signal is θext = 2πfext ≈ 12.6. The external signal is applied for 5 ≤ t < 30. (B) Van der Pol oscillator with AFDC mechanism with µ = 100.0, η = 2.0, κ = 5.0, τ = 15.0, β<sup>0</sup> = 0.0 and ǫ<sup>0</sup> = 0.01. The values of the parameter θ corresponding to f<sup>0</sup> and fext are determined to be θ<sup>0</sup> ≈ 34.8 and θext ≈ 22.0 (see Methods). The external signal is applied for 5 ≤ t < 150.

The value of ǫ scales the influence of the external signal F on the filtered signal P (Equation 6). If the averaged product of F and P is large, this implies that the subtraction of x in Equation (6) cannot cancel the addition of F, i.e., the internal frequency of the oscillator is different from the target frequency of the external signal. Thus, an increase of ǫ is desired to increase the influence of the signal on the system and to herewith increase the adaptation speed. However, as for β(t) ≈ 0, the last term of Equation (8) can be approximated by κF(t) 2 ǫ(t), without adaptation of β(t), the value of ǫ(t) would not return to ǫ<sup>0</sup> as long as the external signal is present and therefore would not allow precise adaptation. Only the interplay of the dynamics of ǫ(t), which detects the onset of an external signal with a frequency different from the intrinsic frequency of the oscillator, and of β(t), which detects when the adaptation is nearly completed, allows fast as well as precise adaptation. In the following, we call the described mechanism "Adaptation through Fast Dynamical Coupling" (AFDC).

The process of frequency adaptation supported by the AFDC mechanism can be separated into several stages (**Figure 6**) as qualitatively described in the following: Before the onset of an external signal (F = 0), the average product of P and F is zero and the adaptive coupling constants β and ǫ converge toward their resting values β0/(1 + κ ¯x 2 ) and <sup>ǫ</sup>0. Here, ¯<sup>x</sup> 2 is the mean over time of the squared signal x 2 . As soon as the external signal is applied, the average product of P and F gets positive (Equation 6). As a result of this, ǫ starts to increase (Equation 8). A higher value of ǫ, in turn, increases the average product of P and F. This establishes a positive feedback loop that leads to a fast increase of the amplitude of P. The high amplitude of P results in a large influence of the external signal on the oscillator (Equation 4) as well as in a fast adaptation of the frequency determining variable θ toward the frequency of F (Equation 5). As a consequence of both of these effects, the oscillator follows the external frequency implying a positive correlation between P and x. This correlation leads to an increase of β (Equation 7). Higher values of β decrease the amplitude of P (Equation 6) and, as such, also the average product between P and x. This is a negative feedback loop. Note that a decrease of the amplitude of P also reduces the average product of P and F and therefore breaks the positive feedback loop between ǫ and the average product of P and F (Equation 8) yielding a decline of both β and ǫ to their respective resting values. At this point, switching off the external signal does not significantly change the system dynamics as the influence of the external signal has already been reduced to a minimum.

In summary, the described interplay of the dynamics of the two adaptive coupling constants β and ǫ scales up the magnitude of the external signal as long as adaptation of θ is needed and reduces it once the value corresponding to the frequency of the external signal is reached.

#### 2.2.1. Adaptation Quality in Frequency Space

The dynamics of the AFDC mechanism is mainly dominated by three free parameters: The time scale τ of the adaptive coupling strengths, the correlation learning rate κ and the learning rate η of the frequency determining variable θ. While, in general, an oscillator equipped with an AFDC mechanism shows more tolerance with respect to large frequency ranges, certain parameter combinations allow a faster or more precise adaptation over a larger frequency range (**Figure 7**). Some combinations (for instance η = 1.0, κ = 100.0, τ = 1.0) result in a very good performance, as indicated by high values of Q, for the complete range of initial intrinsic frequencies f<sup>0</sup> and frequencies fext of the external signal analyzed here.

This is also reflected by the frequency space averaged quality hQi (**Figure 8**). For a sufficiently high κ (κ & 3), parameters ǫ and η can be found with an average quality value close to the theoretical maximum of 1 corresponding to very fast adaptation without significant frequency offset or frequency oscillations in the finally reached state (**Figure 8**). A comparison of the performance of the best found configuration of the regular adaptive Hopf oscillator with the performance of the best found configuration of the Hopf oscillator with AFDC mechanism shows that the AFDC mechanism outperforms the regular AFO mechanism in terms of all quality measures (**Figure 9A**). The same holds true for the comparison of the regular adaptive Van der Pol oscillator with the respective AFDC implementation (**Figure 9B**). In contrast to a regular AFO, the AFDC mechanism manages to provide fast and precise frequency adaptation over a wide frequency range with a fixed set of parameters.

Note that the values of the additional parameters ǫ<sup>0</sup> and β<sup>0</sup> do not significantly influence the dynamics of the mechanism as long as they are chosen reasonably low.

#### 2.2.2. Neural Implementation

The AFDC mechanism relies on dynamically adapting the coupling strengths ǫ and β. In terms of signal flow, ǫ can be interpreted as a feedforward coupling from the external signal to the filtered signal P. The value of β, in turn, determines the strength of feedback coupling from the oscillator back to P. A standard way to implement this kind of signal flow between different entities is in terms of artificial neural networks. Neural networks are composed of multiple comparably simple computational units, the neurons, which project signals to each other via so-called synapses. Every synapse is characterized by a scalar value, the synaptic weight, which determines the efficacy of the synaptic signal transmission.

There exist neuron models on many different levels of abstraction, ranging from simple binary units to complex biophysical plausible spiking models. Here, we restrict ourselves to a very basic model of point-like neurons described by timediscrete dynamics. It has been shown that already a fully connected network of only two of these very simple neurons suffices to autonomously produce oscillatory signals (Pasemann et al., 2003). In every time step, each neuron sums up the incoming outputs from other neurons as well as from itself weighted by the respective synaptic weights. This sum is transformed into the new neural output by means of a sigmoidal transfer function. The weight matrix of this two neuron network is given by a scaled rotational matrix for a rotation angle ϕ. The value of ϕ monotonically controls the frequency of the obtained oscillatory signal of this so-called SO(2)-oscillator.

As already shown earlier (Nachstedt et al., 2012), a neural SO(2)-oscillator with neurons H<sup>0</sup> and H<sup>1</sup> can be extended by an AFDC mechanism by introducing an additional neuron H<sup>2</sup> into the system (**Figure 10A**). Now, this neural implementation can be understood as a special implementation of the general AFDC mechanism. The additional neuron H<sup>2</sup> is used to calculate the filtered external signal P by receiving synapses from both the external input F and the output of neuron H0. The latter takes the role of the variable x of the general oscillators discussed above. The synaptic weight w2<sup>F</sup> of the synapse from the external signal F to the additional neuron H<sup>2</sup> implements the dynamics of the ǫ coupling. The weight w<sup>20</sup> of the synapse from the oscillator neuron H<sup>0</sup> to the neuron H<sup>2</sup> takes the role of β. Adapting the synaptic weights according to Equations (7) and (8)

FIGURE 8 | Average combined quality measure hQi for different values of κ in the ǫ-η-parameter space of the Hopf oscillator with AFDC mechanism. For every parameter triple of coupling strength ǫ, frequency learning rate η and correlation learning rate κ, the average adaption quality measure hQi over the logarithmically sampled space of initial intrinsic frequencies f<sup>0</sup> and frequencies of the external signal fext is shown (0.1 < f0, fext < 10). In each case, the external signal is a sine-wave with unit amplitude. The red circles indicate the four cases shown in Figure 7. Comparing these results to the ones obtained for the standard AFO in Figure 5, the AFDC mechanism provides significant higher quality values indicating versatility with respect to different initial intrinsic and external target frequencies.

FIGURE 9 | Comparison of the frequency space averaged adaptation quality measures for the best found configurations of the regular adaptive oscillators and the respective oscillators with AFDC mechanism. Note that the averages of the relative convergence time 1˜ , the final frequency offset |δ˜| and the relative final frequency fluctuation σ˜ include only values from (f0, fext)-frequency pairs in which the combined quality measure Q has a nonzero value. The ratio of the number NQ><sup>0</sup> of (f0, fext)-pairs for which the quality Q has a nonzero value and the total number Ntot of frequency pairs is shown on the very right. All numbers are rounded. See methods for the used parameter values. (A) For the Hopf oscillator, all parameters are identical to the ones used in Figures 7, 8. (B) For the Van der Pol oscillator, we adapt the maximal allowed values of the quality measures. We use 1˜ max = 200, δ˜max = 0.10 and σ˜max = 0.05. In addition, we calculate δ˜ and σ˜ directly from the frequency determining variable θ and consider the last deviation of θ of more than 10% from the finally reached mean value θ¯ to determine the adaptation time 1. (\*) Values shown as 0.00 are too small to be resolved in the figure. For the Hopf oscillator with AFDC mechanism, we find h ˜|δ|i/δ˜max ≈ 5.3 · 10−<sup>7</sup> and h ˜σi/σ˜max ≈ 8.5 · 10−<sup>8</sup> . For the Van der Pol oscillator with AFDC mechanism, the average normalized final frequency fluctuation is h ˜σi/σ˜max ≈ 2.5 · 10−<sup>3</sup> .

effectively introduces synaptic plasticity into the system (Abbott and Nelson, 2000). In contrast to earlier publications (Nachstedt et al., 2012), here, the weight w<sup>02</sup> of the synapse feeding the filtered signal P into the oscillator is simply kept constant.

The adaptation of the intrinsic oscillation frequency by modifying the parameter ϕ and hereby the synaptic weights of the neural SO(2)-oscillator is a long-lasting change. The discussed plasticity of the synaptic weights w<sup>20</sup> and w2F, in contrast, has a transient character. The combination of these two different kinds of dynamics results in a fast and precise adaptive neural oscillator (**Figure 10B**) (Nachstedt et al., 2012). This shows that the AFDC mechanism can be easily integrated into existing neural control schemes, for instance, in robotic applications. In addition, the successful implementation of the AFDC mechanism in a time-discrete system shows that the concept can be generalized to this class of dynamical systems.

#### 2.2.3. Closed-Loop Locomotion Control

In addition to the open-loop scenarios studied so far, the AFDC mechanism also allows to apply adaptive oscillators in closedloop scenarios where fast adaptation toward a specific frequency is required. A classical problem of robotic locomotion control is the task to find the optimal frequency to drive the legs of a walking machine. For animals, it has been found that the frequency during locomotion is tightly related to the resonant frequency of the free swinging leg (Holt et al., 1990). This way, animals are able to maintain energy efficiency during locomotion (Ahlborn and Blake, 2002). Furthermore, it has been proposed that animals actively modify the resonant frequency of their legs in order to optimize for different walking speeds (Ahlborn and Blake, 2002).

Given that CPGs control locomotion, adaptation of CPGs toward a system's resonant frequency to optimize locomotion has been repeatedly investigated and modeled (Verdaasdonk et al., 2006, 2009). A simplistic model of this control problem is given by a mathematical pendulum which is driven by a torque signal according to the output of an oscillator (Nachstedt et al., 2012). The most energy-efficient control is achieved if the pendulum is driven at its resonant frequency determined by its physical length l and its mass m as well as the current amplitude of its oscillation. Here, a neural SO(2)-oscillator with AFDC mechanism is used to control the torque applied to the pendulum. The control loop

FIGURE 10 | Neural implementation of the AFDC mechanism. (A) The neurons H0 and H1 are fully connected by the synapses w00, w01, w10, and w11 and form a neural SO(2)-oscillator. The neuron H2 calculates the signal P which is the weighted difference between the external signal F and the activity value of H0. Accordingly, the weight w2F corresponds to the coupling strength ǫ and the weight w<sup>20</sup> represents the variable β of the AFDC mechanism. The weight w02 can either be fixed at a positive value or adapted with similar dynamics as w20 and w2F . (B) Example adaptation of the neural oscillator. It is initialized with an intrinsic frequency of f<sup>0</sup> = 0.04 corresponding to a value of ϕ<sup>0</sup> = 0.25 of the internal frequency determining variable. At time step t = 100, an external signal with a frequency of fext = 0.02 is applied until time step t = 1, 000 (yellow shaded area). For 1, 000 < t < 1, 900, the frequency of the external signal is changed to fext = 0.04 (green shaded area). For t ≥ 1, 900, there is no external signal. Shown from top to bottom are the activities oi of the neurons Hi (i ∈ {1, 2, 3}), the external signal F, the synaptic weights w<sup>20</sup> and w2<sup>F</sup> and the frequency determining variable ϕ of the SO(2)-oscillator.

is closed by feeding the current position of the pendulum as external signal back to the oscillator (**Figure 11A**).

In this closed-loop system, the current frequency of the pendulum is completely determined by the current frequency of the driving neural oscillator. The observed oscillation frequencies of the pendulum and the neural oscillator are therefore always identical. Still, it is possible to adapt the intrinsic frequency of the neural oscillator to the target frequency given by the resonant frequency of the pendulum. The information about the difference between the intrinsic frequency of the oscillator and the resonant frequency of the pendulum is encoded in the phase relation between the internal oscillation and the feedback signal received as external signal by the oscillator. In particular, driving the

if the pendulum is driven at its resonant frequency. (A) The output o1 of neuron H1 controls the torque M driving the pendulum with length l and mass m. The current angular displacement λ is converted into the external signal F which is fed back to the adaptive oscillator. The neural network is updated with a frequency of 25 Hz. (B) Simulation of the system with varying pendulum length l. The initial length of the pendulum is l<sup>0</sup> = 0.2 m. At t = 30 s, the length is changed to l<sup>1</sup> = 0.4 m. At t = 50 s, the original length l<sup>0</sup> is restored. At t = 70 s, the feedback connection from the pendulum to the oscillator is cut to demonstrate that the oscillator has indeed learned the correct frequency to drive the pendulum. Shown are the current angular displacement λ of the pendulum, the outputs o0, o1, and o2 of the three neurons, the synaptic weights w2F and w20 of the plastic synapses of the AFDC mechanism, and the intrinsic frequency of the oscillator and the resonant frequency of the undamped and undriven pendulum (target frequency for the oscillator). The resonant frequency of the pendulum does not only depend on the current physical properties of the pendulum but also on the current amplitude of its oscillations.

pendulum at its resonant frequency is characterized by a phase shift of π/2 between the applied torque and the current angular position of the pendulum. In the neural SO(2)-oscillator, the same phase shift is found between the outputs of the neurons H<sup>0</sup> and H1. Therefore, when using the output of H<sup>1</sup> to control the torque applied to the pendulum, at resonant frequency, the current angular position of the pendulum is exactly in phase with the output of neuron H0. This corresponds to the converged state of the AFDC mechanism. If, in turn, the oscillation frequency of the neural SO(2)-oscillator is different from the resonant frequency of the pendulum, the output of H<sup>0</sup> and the angular position of the pendulum are not in phase. The respective phase difference encodes the information about the difference between the intrinsic frequency of the neural oscillator and the resonant frequency of the pendulum and allows the adaptation of the former into the direction of the latter.

In our simulation, we first let the neural SO(2)-oscillator with AFDC mechanism adapt its intrinsic frequency toward the pendulum's resonant frequency (0 s < t < 30 s in **Figure 11B**). We then simulate a change of the physical properties of the driven system by abruptly changing the length l of the pendulum. Accordingly, the neural oscillator readapts its intrinsic frequency to the new resonant frequency of the pendulum (30 s < t < 50 s in **Figure 11B**). Afterwards, we change the length l back to its original value. Finally, we cut the feedback connection from the pendulum to the oscillator (t > 70 s in **Figure 11B**) demonstrating that the oscillator has actually learned the proper frequency to drive the pendulum.

# 3. DISCUSSION

Transferring key concepts of biological solutions for complex control problems to robotic applications has been proven to be a promising approach regarding the adaptivity, robustness, versatility and agility found in biological organisms (Pfeifer et al., 2007). One especially successful concept is the one of using oscillators, i.e., CPGs, to control complex locomotion. As such, the study of nonlinear oscillators, their entrainment and adaptation properties and possible applications in robotics has gained a lot of interest. The here presented AFDC mechanism overcomes the demonstrated trade-off between speed and precision inherent to regular AFOs as introduced by Righetti et al. (2006). As a result, the AFDC mechanism allows fast and precise adaptation to external signals for a wide range of frequencies with a fixed set of parameters.

Since the discovery of the AFO mechanism, various different mechanisms to improve or extend the adaptation capabilities have been proposed. Subtracting the output of an oscillator from the external signal, as also done in the AFDC mechanism, was used to decompose a signal into its Fourier components (Ronsse et al., 2011b) with the help of an array of AFOs. In order to more reliably find the basic frequency of the external signal, it was proposed to combine a single adaptive frequency oscillator with a Fourier decomposition (Petric et al., 2011). The detailed interaction between multiple AFOs has been studied in the context of networks of self-adaptive dynamical systems (Rodriguez and Hongler, 2014). As an alternative to adapting the system parameters in order to modify the frequency of an oscillator, switching between different oscillation frequencies of an oscillator operated in the chaotic regime by dynamically stabilizing different periods was demonstrated (Steingrube et al., 2010).

The main novelty of the here presented mechanism is the introduced dynamics of the adaptive coupling strengths between the external signal and the filtered signal as well as between the output of the oscillator and the filtered signal. This dynamics temporally increases the influence of the signal on the oscillator as long as it is necessary to achieve fast adaptation and decreases it once precision is needed toward the end of the adaptation process. Adaptive coupling strengths have been proposed earlier as a method to increase the synchronization in a network of phase oscillators with fixed intrinsic frequencies (Ren and Zhao, 2007). The interaction of the transient dynamics of the adaptive coupling strengths on the one hand and the permanent change of intrinsic frequency on the other hand resembles the interplay of longterm (Wood et al., 2011) and short-term (Zucker and Regehr, 2002) plasticity in biological organisms. The interplay of longterm and short-term plasticity in biological system has already been shown to be highly relevant for biological motor control, in particular for fast network reconfiguration (Nadim and Manor, 2000).

The AFDC mechanism increases the complexity of the oscillatory system by the addition of two dynamical equations. Their interplay is required to first scale up the influence of the external signal and later on reduce it again. In particular, this interplay is enabled by the weighted difference P of the external signal F and the oscillator variable x. The correlation of P and F determines the growth of the adaptive coupling constant ǫ which, in turn, increases the correlation of P and F. To counterbalance this self-enhancing dynamics, a second dynamic variable, i.e., β, is required. To make β increase, F and x have to be correlated which is the case once the oscillator has attained the externally applied frequency. This delay of the onsets of the growth processes of ǫ and β is crucial for the AFDC mechanism and cannot be realized by a single variable.

In this contribution, we focused on the Hopf oscillator and the Van der Pol oscillator for the detailed analyses of the regular AFO and the AFDC mechanism. It remains an interesting question for future research in how far the results obtained for these oscillators regarding the frequencyindependent choice of parameters as well as regarding the quality measures of the adaptation process generalize to other types of nonlinear oscillators (Rayleigh, 1877; Duffing, 1918; Fitzhugh, 1961).

As the dynamics of the coupling strengths in the AFDC mechanism is solely correlation based, we showed that it is easy to implement the mechanism in neural networks. We demonstrated this by discussing the already earlier published neural timediscrete SO(2)-oscillator with AFDC mechanism (Nachstedt et al., 2012). This special realization of the AFDC mechanism has already been successfully applied in different robotic applications including self-organized control of a snake-like robot (Nachstedt et al., 2013), adaptive control of a robot leg with compliant tarsus (Canio et al., 2016b), and bipedal locomotion with robustness against global loss of sensory feedback (Canio et al., 2016a). In contrast to the neural implementation, the new general formulation of the AFDC mechanism makes it possible to apply the mechanism to all kinds of existing applications of regular AFOs (Buchli et al., 2005). Additionally, it allows the usage of adaptive oscillators in completely new scenarios where, up to now, regular AFOs could not provide sufficiently fast as well as precise adaptation.

#### 4. MATERIALS AND METHODS

#### 4.1. Hopf Oscillator

The regular Hopf oscillator with the state variables x and y is given by the following system of dynamical equations:

$$\begin{aligned} \dot{\mathbf{x}}(t) &= \left(\mu - r(t)^2\right)\mathbf{x}(t) - \theta \mathbf{y}(t) \\ \dot{\mathbf{y}}(t) &= \left(\mu - r(t)^2\right)\mathbf{y}(t) + \theta \mathbf{x}(t) \end{aligned} \tag{9}$$

with r(t) = p x(t) <sup>2</sup> + y(t) 2 . The variable µ > 0 determines the amplitude of the oscillations. Without an external signal (F(t) = 0), this system possesses an asymptotically stable and harmonic limit cycle with an angular frequency of exactly θ.

#### 4.1.1. Adaptive Frequency Hopf Oscillator

The Hopf oscillator can be turned into an adaptive frequency oscillator by coupling an external signal F to the system and introducing the dynamics described by Equation (2) to the parameter θ. The complete system is given by

$$\begin{aligned} \dot{x}(t) &= \left(\mu - r(t)^2\right)x(t) - \theta(t)\wp(t) + \epsilon F(t) \\ \dot{\wp}(t) &= \left(\mu - r(t)^2\right)\wp(t) + \theta(t)\wp(t) \\ \dot{\theta}(t) &= -\eta F(t) \frac{\wp(t)}{\sqrt{\varkappa(t)^2 + \wp(t)^2}}. \end{aligned} \tag{10}$$

#### 4.1.2. Hopf Oscillator with Fast Dynamical Coupling

The Hopf oscillator equipped with the AFDC mechanism is given by the following system of differential equations:

$$\begin{aligned} \dot{\mathbf{x}}(t) &= \left(\mu - r(t)^2\right)\mathbf{x}(t) - \theta(t)\mathbf{y}(t) + P(t) \\ \dot{\mathbf{y}}(t) &= \left(\mu - r(t)^2\right)\mathbf{y}(t) + \theta(t)\mathbf{x}(t) \\ \mathbf{r}\,\dot{\theta}(t) &= \beta\_0 - \beta(t) + \kappa P(t)\mathbf{x}(t) \\ \mathbf{r}\,\dot{\epsilon}(t) &= \epsilon\_0 - \epsilon(t) + \kappa F(t)P(t) \\ \dot{\theta}(t) &= -\eta P(t)\frac{\mathbf{y}(t)}{\sqrt{\mathbf{x}(t)^2 + \mathbf{y}(t)^2}} \end{aligned} \tag{11}$$

with P(t) = ǫ(t)F(t) − β(t)x(t).

#### 4.2. Van der Pol Oscillator

The Van der Pol oscillator with the state variables x and y is defined as follows:

$$\begin{aligned} \dot{\mathbf{x}}(t) &= \mathbf{y}(t) \\ \dot{\mathbf{y}}(t) &= \mu \left(1 - \mathbf{x}(t)^2\right) \mathbf{y}(t) - \boldsymbol{\theta}^2 \mathbf{x}(t). \end{aligned} \tag{12}$$

The parameter µ > 0 determines the "degree of nonlinearity" of the system. For µ = 0, the system is harmonic. The intrinsic frequency f depends in a nonlinear and non-trivial way on the parameter θ. We use a Fourier transform in conjunction with a sequence of nested intervals to determine the values of θ corresponding to a given frequency f .

#### 4.2.1. Adaptive Frequency Van der Pol Oscillator

The adaptive frequency formulation of the Van der Pol Oscillator coupled to a time-dependent external signal F(t) requires a positive sign in Equation (2):

$$\begin{split} \dot{\boldsymbol{x}}(t) &= \boldsymbol{\jmath}(t) + \epsilon \boldsymbol{F}(t) \\ \dot{\boldsymbol{\jmath}}(t) &= \boldsymbol{\mu} \left( 1 - \boldsymbol{\varkappa}(t)^{2} \right) \boldsymbol{\jmath}(t) - \boldsymbol{\theta}(t)^{2} \boldsymbol{\varkappa} \\ \dot{\boldsymbol{\theta}}(t) &= + \boldsymbol{\eta} \boldsymbol{F}(t) \frac{\boldsymbol{\jmath}(t)}{\sqrt{\boldsymbol{\varkappa}(t)^{2} + \boldsymbol{\jmath}(t)^{2}}}. \end{split} \tag{13}$$

#### 4.2.2. Van der Pol Oscillator with Fast Dynamical Coupling

Applying the AFDC mechanism to the Van der Pol oscillator is described by the following system of differential equations:

$$\begin{aligned} \dot{\mathbf{x}}(t) &= \mathbf{y}(t) + P(t) \\ \dot{\mathbf{y}}(t) &= \mu \left(1 - \mathbf{x}(t)^2\right) \mathbf{y}(t) - \theta(t)^2 \mathbf{x} \\ \mathbf{r}\,\dot{\theta}(t) &= \beta\_0 - \beta(t) + \kappa P(t)\mathbf{x}(t) \\ \mathbf{r}\,\dot{\epsilon}(t) &= \epsilon\_0 - \epsilon(t) + \kappa F(t)P(t) \\ \dot{\theta}(t) &= +\eta P(t) \frac{\mathbf{y}(t)}{\sqrt{\mathbf{x}(t)^2 + \mathbf{y}(t)^2}} \end{aligned} \tag{14}$$

with P(t) = ǫ(t)F(t) − β(t)x(t).

#### 4.3. Neural SO(2)-Oscillator

We use standard additive time-discrete neurons H<sup>i</sup> , i ∈ {0, . . . , N −1}, where N is the number of neurons in the network. The activation a<sup>i</sup> of neuron H<sup>i</sup> at time t + 1 is given by the sum of incoming presynaptic neural firing rates o<sup>j</sup> weighted by the synaptic weights wij at time t:

$$a\_i(t+1) = \sum\_{j=0}^{N-1} w\_{ij}(t) o\_j(t), \quad i = 0, \ldots, N-1. \tag{15}$$

The activation a<sup>i</sup> of neuron H<sup>i</sup> is transformed into its firing rate o<sup>i</sup> by a sigmoidal transfer function:

$$o\_i(t) = \tanh\left(a\_i(t)\right). \tag{16}$$

The pure SO(2)-network consists of N = 2 fully connected neurons H<sup>0</sup> and H1. The synaptic weight matrix is chosen according to

$$
\begin{pmatrix}
\begin{matrix}
\boldsymbol{w}\_{00}(t) & \boldsymbol{w}\_{01}(t) \\
\boldsymbol{w}\_{10}(t) & \boldsymbol{w}\_{11}(t)
\end{matrix}
\end{pmatrix} = \boldsymbol{\alpha} \cdot \begin{pmatrix}
\cos\varphi(t) & \sin\varphi(t) \\
\end{pmatrix} \tag{17}
$$

with 0 < ϕ(t) < π the frequency determining parameter. The factor α determines the amplitude as well as the nonlinearity of the oscillations. We use α = 1.01 to obtain very harmonic oscillations and an approximately linear relationship between ϕ and the intrinsic frequency of the oscillator.

Frontiers in Neurorobotics | www.frontiersin.org March 2017 | Volume 11 | Article 14

4.3.1. SO(2)-Oscillator with Fast Dynamical Coupling In order to equip the neural SO(2)-oscillator with the AFDC mechanism, an additional neuron H2is introduced. The external signal F(t) is fed into the neuron H<sup>2</sup> via a synapse w2F. The neuron H<sup>2</sup> calculates the filtered version of the external signal and receives signals via the synapses w<sup>20</sup> (= β) and w2<sup>F</sup> (= ǫ) governed by the following plasticity rules:

$$\begin{aligned} \boldsymbol{w}\_{20}(t+1) &= \boldsymbol{w}\_{20}(t) + (\beta\_0 - \boldsymbol{w}\_{20}(t) - \kappa \boldsymbol{o}\_2(t)\boldsymbol{o}\_0(t))/\tau \\ \boldsymbol{w}\_{2F}(t+1) &= \boldsymbol{w}\_{2F}(t) + (\epsilon\_0 - \boldsymbol{w}\_{2F}(t) - \kappa F(t)\boldsymbol{o}\_2(t))/\tau. \end{aligned} \tag{18}$$

In accordance with our earlier publication (Nachstedt et al., 2012), we simplify the frequency adaptation rule of the AFDC mechanism and reformulate it in terms of the signals arriving at neuron H0:

$$
\varphi(t+1) = \phi(t) + \eta \w\_{02}(t) o\_2(t) \omega\_{01}(t) o\_1(t). \tag{19}
$$

For the example adaptation process (**Figure 10B**), we use α = 1.01, η = 1, κ = 100, τ = 100, β<sup>0</sup> = 0 and ǫ<sup>0</sup> = 0.01.

#### 4.4. Mathematical Pendulum

The angular displacement λ of a mathematical pendulum with length l and mass m is described by the following differential equation:

$$\ddot{\lambda} = -\frac{g}{l}\sin\lambda - \frac{D}{ml^2}\dot{\lambda} + \frac{M}{ml^2} \tag{20}$$

with the gravitational acceleration g, the external torque M evoked on the system and the damping constant D. The resonant frequency fres of the undamped (D = 0) and undriven (M = 0) mathematical pendulum is given by Ochs (2011):

$$f\_{\rm res} = \frac{\alpha\_0}{4K(k)}\tag{21}$$

with

$$k = \frac{\dot{\lambda}^2 + 4\alpha\_0^2 (\sin \frac{\lambda}{2})^2}{4\alpha\_0^2} \tag{22}$$

and ω<sup>0</sup> = q g l . K(k) is the complete elliptic integral of the first kind. In Equation (22), the current values of the angular displacement λ and the angular velocity λ˙ are used to obtain the current total energy of the system. For our simulations, we use g = 9.81 mm−<sup>2</sup> and D = 0.005 kg m<sup>2</sup> s −1 .

#### 4.5. Numerical Integration

The integrations of the different differential systems are carried out using the odeint method of the scipy python package (Jones et al., 2001). This methods relies on the LSODA algorithm (Brown and Hindmarsh, 1989) from the FORTRAN library odepack (Hindmarsh, 1983). The LSODA algorithm utilizes an adaptive step size.

#### 4.6. Frequency and Parameter Scans

For the frequency scans performed for the adaptive Hopf oscillator (**Figures 4**, **5**) and the adaptive Van der Pol oscillator as well as for the respective oscillators with AFDC mechanism (**Figures 7**, **8**), we sample the frequency space in the range 0.1 ≤ f0, fext ≤ 10.0. We consider 21 sample values uniformly spaced on a logarithmic axis of f<sup>0</sup> and fext and investigate the behavior of the oscillators for all possible 21<sup>2</sup> (f0, fext)-pairs. For every frequency pair, in the case of the regular adaptive oscillators, we sample 21 parameter values again uniformly spaced on a logarithmic axes of each ǫ and η in the range 0.01 ≤ ǫ, η ≤ 100. Therefore, we investigate a total of 21<sup>4</sup> (f0, fext, ǫ, η) configurations for each regular adaptive oscillator. In the case of the oscillators with AFDC mechanism, the parameters are investigated in the ranges 0.01 ≤ η, τ ≤ 100 and 1 ≤ κ ≤ 1, 000 again with 21 samples in every parameter dimension yielding a total of 21<sup>5</sup> sampled (f0, fext, η, κ, τ )-configurations each.

The best sampled parameter values of the frequency space averaged combined quality measure hQi are ǫ ≈ 15.85 and η ≈ 15.85 for the regular adaptive Hopf oscillator and η ≈ 1.58, κ ≈ 398.11 and τ ≈ 3.98 for the Hopf oscillator with AFDC mechanism (**Figure 9**). For the regular adaptive Van der Pol oscillator, we find ǫ ≈ 0.0158 and η = 1.0 to perform best while η ≈ 0.158, κ = 100 and τ ≈ 1.585 yield the best result for the Van der Pol oscillator with AFDC mechanism.

#### AUTHOR CONTRIBUTIONS

The AFDC mechanism was developed by TN and PM. TN and CT planned the presented analyses and experiments. Implementation and analysis of the data was done by TN. TN, CT, and PM wrote and reviewed the manuscript.

#### FUNDING

The research leading to these results has received funding from the Federal Ministry of Education and Research (BMBF) Germany to the Göttingen Bernstein Center for Computational Neuroscience under grant numbers 01GQ1005A [TN, PM] and 01GQ1005B [CT] and from the International Max Planck Research School for Physics of Biological and Complex Systems (IMPRS-PBCS) by stipends of the country of Lower Saxony with funds from the initiative Niedersächsisches Vorab and of the University of Göttingen [TN].

#### ACKNOWLEDGMENTS

We thank Florentin Wörgötter for fruitful discussions. We acknowledge support by the Open Access Publication Funds of the Göttingen University.

#### REFERENCES


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2017 Nachstedt, Tetzlaff and Manoonpong. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Development and Training of a Neural Controller for Hind Leg Walking in a Dog Robot

#### Alexander Hunt <sup>1</sup> \*, Nicholas Szczecinski <sup>2</sup> and Roger Quinn<sup>2</sup>

*<sup>1</sup> Department of Mechanical and Materials Engineering, Portland State University, Portland, OR, USA, <sup>2</sup> Department of Mechanical and Aerospace Engineering, Case Western Reserve University, Cleveland, OH, USA*

Animals dynamically adapt to varying terrain and small perturbations with remarkable ease. These adaptations arise from complex interactions between the environment and biomechanical and neural components of the animal's body and nervous system. Research into mammalian locomotion has resulted in several neural and neuro-mechanical models, some of which have been tested in simulation, but few "synthetic nervous systems" have been implemented in physical hardware models of animal systems. One reason is that the implementation into a physical system is not straightforward. For example, it is difficult to make robotic actuators and sensors that model those in the animal. Therefore, even if the sensorimotor circuits were known in great detail, those parameters would not be applicable and new parameter values must be found for the network in the robotic model of the animal. This manuscript demonstrates an automatic method for setting parameter values in a synthetic nervous system composed of non-spiking leaky integrator neuron models. This method works by first using a model of the system to determine required motor neuron activations to produce stable walking. Parameters in the neural system are then tuned systematically such that it produces similar activations to the desired pattern determined using expected sensory feedback. We demonstrate that the developed method successfully produces adaptive locomotion in the rear legs of a dog-like robot actuated by artificial muscles. Furthermore, the results support the validity of current models of mammalian locomotion. This research will serve as a basis for testing more complex locomotion controllers and for testing specific sensory pathways and biomechanical designs. Additionally, the developed method can be used to automatically adapt the neural controller for different mechanical designs such that it could be used to control different robotic systems.

Keywords: central pattern generator, dog, artificial muscle, locomotion, walking

### 1. INTRODUCTION

Controlling complex robots using traditional control methods with on-line optimization and "single brain" control becomes increasingly difficult and computationally intensive as more degrees of freedom and more points of contact are added. This is in stark contrast with the animal kingdom, in which high redundancy is the norm, and complex interactions with the environment are often accomplished with ease. For example, having more feet on the ground makes an individual animal's

#### Edited by:

*Poramate Manoonpong, University of Southern Denmark Odense, Denmark*

#### Reviewed by:

*Daniel P. Ferris, University of Michigan, USA Horacio Rostro Gonzalez, Universidad de Guanajuato, Mexico*

#### \*Correspondence:

*Alexander Hunt ajh26@pdx.edu*

Received: *25 November 2016* Accepted: *15 March 2017* Published: *04 April 2017*

#### Citation:

*Hunt A, Szczecinski N and Quinn R (2017) Development and Training of a Neural Controller for Hind Leg Walking in a Dog Robot. Front. Neurorobot. 11:18. doi: 10.3389/fnbot.2017.00018*

control and balance easier, rather than harder. Big or small, it takes little mental effort on the part of the animal to change from fast speeds to slow speeds, change gaits, start turning, step over an object, respond to ground slip, or move from concrete to loose dirt.

Unfortunately, animals are immensely complex, and the majority of our current robots barely resemble any animals in the world today. Instead of muscles for actuation, our most agile robots use electric motors (Seok et al., 2015) or hydraulics (Raibert et al., 2008; Boaventura et al., 2013). For determining body states and sensing the world, modern robots rely on a few strategically placed sensors instead of an animal's wide net of somatic sensory neurons spread across its whole body. For control, instead of a highly distributed and hierarchical network of neurons, a single algorithm is often used to calculate the exact position of each joint needed to maintain stability and provide locomotion.

All this is beginning to change however, as details on how biomechanics and neural systems provide advantages to moving around in the world are being uncovered. The compliant nature of muscles can automatically reject perturbations and significantly reduce the burden on the control system (Loeb et al., 1999; Jindrich and Full, 2002). To take advantage of this, actuators which add compliance and greater control of force are being developed (Pratt and Williamson, 1995; Thorson and Caldwell, 2011; Rollinson et al., 2013; Schilling et al., 2013b). A compliant actuator combined with the tri-segmented shape of the legs (Fischer and Blickhan, 2006) produces a mechanical system which is robust to perturbations capable of performing dynamic walking with open-loop control (Spröwitz et al., 2014).

Neural control of locomotion is a complex interaction of rhythm generation, sensory processing, feed-forward muscle activation, and sensory feedback systems. Central pattern generators (CPGs) are sub-circuits located in the spinal cord which are responsible for repetitive behaviors such as walking and breathing. CPGs are capable of oscillating and providing a patterned output either with or without external input. CPGs coordinate complex muscle activations to help the animal achieve proper timing to accomplish a given task. They have been found to be involved in a large variety of movement behaviors including the leech heartbeat (Arbas and Calabrese, 1987), human breathing and gasping (Tryba et al., 2006), lobster digestion (Meyrand et al., 1994), turtle scratching (Mortin and Stein, 1989), and locomotion in stick insects (Bässler and Büschges, 1998), lamprey (Cohen et al., 1992), cats (Brown, 1914), and mice (Hägglund et al., 2013).

Modeling of these circuits show that CPGs coordinate multiple segments into predictable patterns during locomotion through entrainment of the CPG to the mechanical systems they control (Iwasaki and Zheng, 2006; Markin et al., 2010). For example, a set of CPGs that are coupled similarly to that of a lamprey have been shown to produce a traveling wave along the body that provides forward locomotion (Ekeberg, 1993). It was shown that this wave can be easily modified by sensory feedback to allow the model to adapt to its surroundings and produce more robust waves for both water and land (Ekeberg and Grillner, 1999; Ijspeert et al., 1999; Bicanski et al., 2013). Similar evidence has shown that sensory feedback can be used to coordinate multiple CPGs in leech swimming and stick insect, cricket, and cockroache walking without direct coupling of the CPGs (Bässler and Büschges, 1998; Ekeberg et al., 2004; Akay and Büschges, 2006; Chen et al., 2011; Szczecinski et al., 2014).

Less is known about the organization of CPGs in mammals than in insects and other invertebrates. Early theories hypothesized the existence of a single CPG per leg, driving transitions between stance and swing (Brown, 1914). However, more recent models utilize multiple oscillating circuits at multiple hierarchical levels (McCrea and Rybak, 2008) supported by recent neurological data (Zhong et al., 2012). Mammalian CPG systems may look more similar to those in insects than previously hypothesized (Büschges and Borgmann, 2013). Models of CPGs coordinated through sensory feedback pathways have been shown to successfully replicate many behaviors in mammalian systems and produce coordinate motion for multiple joints (Ekeberg and Pearson, 2005; Amrollah and Henaff, 2010; Markin et al., 2010; Hunt et al., 2014, 2015a; Li et al., 2016). However, these models have not been tested on a robot, and it is difficult to determine whether they are true in real world physics or possibly exploiting the simplified physics of a simulation.

These advances in understanding of the neuro-mechanical control of locomotion have led to an increase in bio-inspired robots (see Ijspeert, 2008, 2014; Iada and Ijspeert, 2016 for recent reviews) with simultaneous goals of building more advanced and adaptable robots in addition to developing a better understanding of the theories produced from the experimental work. Modern biologically inspired walking robots fall into one of two categories: abstracted biologically-inspired or biologyfirst. Several abstracted biologically-inspired approaches have effectively demonstrated many principles of animal locomotion. Hopf oscillator-driven robots such as Amphibot and Salamandra Robotica II provide valuable insights into how changing sensory feedback can be used to adapt CPGs and produce rhythmic movement entrained to the mechanics of the robot and its surrounding (Crespi et al., 2005, 2013). AMOS and HECTOR are two robots which are built around machine learning of specific tasks. AMOS is controlled by a large recurrent neural network trained by reservoir computing methods to estimate the leg's state and anticipate future sensory information (Dasgupta et al., 2015). HECTOR uses many feedforward artificial neural networks to map between different states, such as mapping joint angles to the height of a leg (Schilling et al., 2013a). Both these robots are also able to effectively integrate sensory information to produce adaptive, rhythmic output. Additionally, several robots have been controlled with dynamic spiking neural networks (Rostro-Gonzalez et al., 2015; Espinal et al., 2016). All these robots produce adaptive locomotion over diverse terrain, but their controllers abstract many principles of animals' nervous systems, limiting their applications as neurobiological research tools.

Other robots use a biology-first approach to controller design. Biology-first approaches begin with known connectivity from the animal, and set parameter values in the control networks to match data from the animal. RoboLobster, Bill-Ant and LegConNet control walking with finite state controllers based on previous state-based models of locomotion (Ayers, 2004; Lewinger and Quinn, 2010; Rutter et al., 2011). Locomotion direction is changed by modifying local reflexes that cause transitions between the finite states of leg motion. OCTAVIO uses an artificial neural network assembled from modular subnetworks, much like the work we present in this paper (von Twickel et al., 2011). The biped built by Klein and Lewis and Redbot both demonstrated how a spiking neural network can be used to produce locomotion in a biped robot (Klein and Lewis, 2012). These robots have controllers that mimic the logic and structure of the animal's nervous systems, and as such, serve as tools for investigating neurobiological hypotheses, however, all these controllers were developed by hand tuning parameter values, and are limited by the engineer's ability to calibrate the system.

To improve the applicability and performance of these robots, methods are being developed for setting parameter values in these networks. A major component of these methods focus on breaking the problem into several more easily solved subproblems. These subproblems are solved individually, and often in a specific order to build up the complexity of the network. Redbot uses a staged genetic algorithm process to set stepping frequency, gait, and finally joint angle profiles (Russell et al., 2007). This controller, however, does not use sensory feedback, an important component for adaptive locomotion. The controller for MantisBot, and is formulated around steady state activity of the neural system, however, walking has not yet been demonstrated with this robot (Szczecinski et al., 2015). In previous work, we developed a training process which utilizes many of the same tools as MantisBot (Szczecinski et al., 2017) and sets parameter values in a locomotory network for forward locomotion of a rat simulation (Hunt et al., 2015b). In the work presented in this paper, we demonstrate the broader applicability of this process by applying the same procedure to a dog-like robot to generate adaptive, forward walking.

The key contributions of this paper are (1) the testing of a synthetic nervous system for dynamic walking on a hardware model of a dog's rear legs actuated by artificial muscles, and (2) the validation of an automatic, repeatable method for setting parameter values in a synthetic neural system composed of a CPG locomotion network without requiring a mechanical simulation. Additionally, this work demonstrates the validity of using synthetic neural controllers for controlling dynamic robotic locomotion and acts as a launching point for developing more complex controllers for adaptive locomotion.

#### 2. METHODS

#### 2.1. Robot Architecture

Puppy (**Figure 1**) is a four legged robot with 12 planar joint degrees of freedom (three per leg), first introduced in Aschenbeck et al. (2006). It is 57.5 cm tall, 60 cm long, 23 cm wide, and weighs 6.8 kg (15 lbs). Each joint has an antagonistic pair of 10 mm Festo MXAM-10-AA (Festo Inc.) actuators, also known as "fluidic artificial muscles," that are energized by compressed air. Motion is constrained to the sagittal plane by two plastic sheets (see **Figure 1**). A 2.3 kg (5 lb) counterweight was hung through a pulley on a linear slider and attached to the center of the robot, partially supporting the robot's weight for the trials presented in this manuscript. The robot's hind legs walked on the treadmill. The front legs were suspended above the belt to prevent interference.

Each actuator has separate input and exhaust valves controlled by a single board real-time, reconfigurable input output module, sbRIO-9602 (National Instruments), with an embedded field programmable gate array (FPGA). The sbRIO was connected via a 10/100 Ethernet port to a host computer running Windows 7 on an Intel i7-2770K. Each actuator is connected in parallel to a Freescale MPX5700 GP gauge pressure sensor. Joint angles are collected from a Vishay Spectrol 140-0-0-103 potentiometer placed at each joint. Analog data from the joints and pressure sensors is converted to digital data for the sbRIO with a custom board developed by Osmisys, Inc. Velocity data, calculated by differentiating length data, was filtered by a 2nd order lowpass Butterworth filter with a normalized cutoff frequency of 0.01 Hz, applied after differentiation.

The overall control architecture is illustrated in **Figure 2**. The neural control system is simulated using Animatlab (Cofer et al., 2010). The neural controller outputs motor neuron activations for each of the muscles and receives muscle afferent feedback values via virtual serial ports with Labview. Labview uses the motor neuron values to calculate desired muscle force output and then calculates the pressure required to produce that force. Desired muscle force is calculated by adapting the Hill muscle model (Hill, 1970) (**Figure 3**) parameter values to the artificial muscle where tension, T, is developed in the muscle according to:

$$\frac{dT}{dt} = \frac{k\_{se}}{b} \left( k\_{pe} \chi + b \dot{\chi} - \left( 1 + \frac{k\_{pe}}{k\_{se}} \right) \cdot T + A \right), \tag{1}$$

where x is the muscle length, kse and kpe are the series and parallel stiffness, and b is the viscous damping constant. A is the activation level of the muscle,

$$A = A\_m \* A\_l. \tag{2}$$

A<sup>m</sup> is the sigmoid adapter equation,

$$A\_m = \frac{F\_{\text{max}}}{1 + \exp(\mathcal{C}(V\_o - V)) + \mathcal{B}}.\tag{3}$$

Fmax is the maximum muscle force, C is the maximum slope of the sigmoid, V is the membrane voltage of the motor neuron, and V<sup>o</sup> and B describe the voltage and force offsets of the sigmoid. A<sup>l</sup> is the length-tension relationship,

$$A\_l = 1 - \frac{(l - l\_{rest})^2}{l\_{width}^2},\tag{4}$$

where lrest is the length at which the muscle can provide the most force and lwidth is the length from lrest at which the muscle can provide no force.

FIGURE 1 | The robot is constrained to motion in the sagital plane and a counterweight pulley system is used to reduce the effective weight of the robot and encourage a center position on the belt.

The series spring element, kse, simulates the tendon and is very stiff (107N/m). kpe is calculated such that all stretching under the maximum expected load is absorbed by the parallel and series elements,

$$k\_{\mathbb{P}^e} = \frac{k\_{\text{se}} \cdot F\_{\text{max}}}{k\_{\text{se}}(l\_{\text{max}} - l\_{\text{min}}) - F\_{\text{max}}}.\tag{5}$$

To develop the length-tension relationship, the maximum output force was set to 509 N (based on extrapolation of the actuator fit curve found in Hunt, 2015 at 90 psi). Length parameter values were unique for each muscle and set such that lrest was equal to the length with no pressure and no load, and the lwidth was set such that A<sup>l</sup> = 0 when the muscle was at its shortest length with no load under 90 psi. The peak velocity of the muscles (vmax) was calculated from empirical testing and used

to set b such that b = Fmax/vmax. The values V<sup>o</sup> = −50 mV, C = 121.46, and B = −1.17 are found by solving Equation (3) for the conditions: Am(−100 mV) = 0; Am(−10 mV) = 0.99 ∗ Fmax; and Am(−50 mV) = 0.5 ∗ Fmax.

The commanded pressure values are calculated from the empirical model of the actuators derived in Hunt (2015). In this model, the commanded tension from Animatlab and current geometry of the robot are used to calculate the commanded pressure for each of the artificial muscles with the equation

$$P = 254 \text{ kPA} + 1.23 \frac{\text{kPA}}{\text{mN}} \cdot T + 15.6 \text{ kPA} \cdot \text{S} + \\\\ 192 \text{ kPA} \cdot \tan \left( 2.03 \left( \frac{k}{-0.33 \frac{1}{\text{mN}} \cdot F + \max(k)} - 0.46 \right) \right), \quad \text{(6)}$$

where S is the state of the artificial muscle in which 1 indicates the muscle is shortening and −1 indicates lengthening. For stability, this value was changed from the binary values calculated originally to continuous linearly scaled values based on the maximum velocity of the muscle. This commanded pressure is sent to the FPGA. Because of limited bandwidth, the valve controller on the FPGA opens the inlet or exhaust valve until the actual pressure reading is within ±15 kPa of the commanded pressure, and then closes the valve.

The sbRIO collects joint angle data and muscle pressure data and passes this information to the Labview computer program for processing. Labview converts the joint angle data to muscle lengths such that

$$l\_m = \mathbf{a}\_m + \mathbf{b}\_m \cos \left(\alpha\_m + \theta\_m \right). \tag{7}$$

am, bm, and θ<sup>m</sup> are unique constants based on the specific geometry of the robot and α<sup>m</sup> is the joint angle. Muscle force is then calculated from pressure and length using a lookup table built on Equation (6). Types Ia, Ib, and II muscle afferents are calculated for the neural control system. Though this feedback is simplified, it captures the main function of each type,

$$\text{Ia} = k\_a \dot{\mathbf{x}} \qquad \text{Ib} = k\_b T \qquad \text{II} = k\_c \mathbf{x}. \tag{8}$$

where ka, k<sup>b</sup> , and k<sup>c</sup> are gain parameters whose values are set such that the injected current is 20 nA when the muscle is at its maximum velocity, tension, and length, respectively.

#### 2.2. Neural Network Architecture

The neurons in the control network have leaky integrator dynamics. The leaky integrator model captures the most basic behavior of neurons and allows for more complex dynamics to be added without increasing the complexity of the rest of the network. It is capable of modeling individual non-spiking interneurons, the firing rate of a population of neurons, or a single spiking neuron after a spiking threshold is included. This work is not concerned with the specifics of how action potentials are generated and has neglected Hodgkin-Huxley sodium and potassium currents. In this work, each neuron is used to model the average firing rate of a population of spiking neurons. The dynamical equations that describe their behavior can be found in Szczecinski et al. (2017).

#### 2.2.1. Joint Control

The connectivity of the Zhong locomotor model (Zhong et al., 2012) was chosen as the basis for the neural control system for low level control. Since our focus is on understanding how sensory feedback affects the timing and activation of motor neurons, the presented model neglects the highest level CPG, and is simplified to a single network for each joint with a pattern formation layer and lower level afferent feedback networks (**Figure 4**).

Intra-joint sensory feedback controls each joint. Positive force feedback (Prochazka et al., 1997) provides self exciting Ib feedback to each muscle. As tension within a muscle increases, the motor neuron is excited further to apply even more tension. Though this leads to a destabilizing influence in most control systems, the length-tension properties of the muscles and geometric alignment of the musculoskeletal system prevent unstable behavior. This influence helps the animal compensate for unexpected increased loads during walking. Cross inhibitory velocity feedback through Ia pathways limits muscle speed

(McCrea et al., 1980; Lundberg, 1981; Pratt and Jordan, 1987; Jankowska, 1992; Geertsen et al., 2011). When a muscle is stretched quickly, it inhibits the antagonist via the Flexor or Extensor Ia - IN interneuron.

control through Ia interneurons or directly onto the motor neuron. Synapses that terminate in a close circle indicate an inhibiting synapse while those that

terminate in an open triangle indicate an excitatory synapse.

#### 2.2.2. Leg Control

Intra-leg sensory feedback connections are derived from proposed coordination mechanisms in mammalian literature. Stance-to-swing transition is the most studied phenomenon, and is caused both by reduced firing in Ib Golgi tendon afferents and increased firing from hip flexor stretching (Pearson, 2008). This integration of signals is shown in **Figure 4** as inhibitory connections from the "Hip Flexor Ia" and "Hip Flexor II" afferent feedbacks and an excitatory connection from the "Ankle Extensor Ib" afferent feedback onto the "Extensor Interneuron" for each joint. Stance is initiated by reduced firing of the hip flexor type II afferent or increased firing of hip extensor type II afferent (McVea et al., 2005; Akay et al., 2014). This indicates that the hip is forward, causing contraction of the hip and ankle extensors. This is realized as an inhibitory connection from the "Hip Extensor II" afferent feedback onto the "Flexor Interneuron" for each joint.

#### 2.2.3. Inter-Leg Control

Commissural interneurons encourage an alternating gait between the legs. These connections mimic those that have been found in mice (Talpalar et al., 2013) and cats (Jankowska, 2008), and further described with neural modeling (Rybak et al., 2013). In these models, the interneuronal connections are between high level leg CPGs, which are not included in our model. Because we have a CPG for each joint, our commissural interneurons are made to act on the most proximal joint, which drives the protraction and retraction of the leg. The hip joint CPGs are connected with inhibitory and excitatory commissural internerons (CINi and CINe), and the rest of the CPGs remain unconnected. These pathways are set such that the CINi pathways provide three times as much inhibition as the CINe provides excitation, similar to related models (Rybak et al., 2013) and more than an order of magnitude weaker than other synapses within the model. These connections are illustrated in **Figure 5**. Parameter values were used from our previous work Hunt et al. (2015a).

# 2.3. Calculating MN Activations

The motor neurons are the interface between the neural and mechanical systems. The motion of the robot and the dynamics of the actuators dictate the motor neuron activations during locomotion, which the neural system must be tuned to produce. This section describes how we calculate the motor neuron activations.

#### 2.3.1. Joint Torques and Kinematic Motions

To determine kinematic and dynamic motions for the robot, models of the hind and fore legs during stance and swing were developed in Simulink-SimMechanics (Mathworks, Inc.). A cubic spline was fit to predetermined angles and duty cycles for touchdown, midstance, liftoff, and midswing based on walking whippets, a species of dog with similar limb proportions and body mass to Puppy (Fischer and Lilje, 2011). The data for the walking kinematics was averaged from 7 dogs with an average stepping period of 0.54 s, and a speed of 1.01 m/s, or 1.97 body lengths/s.

Swing torques were calculated by adding friction to the joints and doing a forward dynamic analysis using the equations of motion. The calculation of stance torques was done by building a closed chain system with a fore and hind leg on the ground at one time. A proportional-derivative (PD) controller at each joint was used to produce a kinematic trajectory similar to that collected from whippets (Fischer and Lilje, 2011). The PD controller torques are the torques required to produce whippetlike locomotion with Puppy. The stance data and swing data were concatenated assuming a 50% duty cycle and smoothed non-linearly to remove discontinuities at the edges (**Figure 6**).

### 2.3.2. Calculating Muscle Tension and MN Activation

Muscle tensions during locomotion were calculated using the joint torques in the previous section and the active lengths of the muscles during locomotion. A unique solution was obtained by assuming only one muscle per joint is activated at a time (Hooper et al., 2009). The active muscle must produce the previously calculated torques as well as overcome torques created by the passive forces produced in each muscle.

Passive forces were calculated using Equation (1) with A = 0. Muscle length (x) and muscle velocity (x˙) were calculated using

a forward kinematic model of the Festo attachment points and joint kinematics. The derivative was discretized and T was solved for at the next time step based on the previous tension,

$$T\_{i+1} = T\_i + \Delta t \cdot \frac{k\_{\rm s\varepsilon}}{c} \left( k\_{\rm pc} \mathbf{x}\_i + c \dot{\mathbf{x}}\_i - \left( 1 + \frac{k\_{\rm pc}}{k\_{\rm s\varepsilon}} \right) \cdot T\_i \right). \tag{9}$$

Starting with T = 0 and repeating this process for several step cycles produces a periodic steady-state tension profile that resists the ground-force and dynamic torques. The active muscle, then, must overcome this passive muscle force, the ground force, and dynamic forces. The active muscle force is calculated by using a bisection root-finder to balance the static and dynamic forces acting on each joint for each time step. The motor neuron activation is calculated by solving Equations (2) and (9) with a bisection root-finder.

# 2.4. Training CPG Network Output

Training the CPG network output is performed with the same four step process as is presented in (Hunt et al., 2015b) for the simulation of a walking rat. This process is similar to the staged evolution technique used to evolve parameters for Redbot locomotion and other systems (Inada and Ishii, 2004; Russell et al., 2007). A review of the process is below.

Each leg network (which includes three joints) consists of 82 neurons with 12 parameters each, and 134 synapse connections with 4 parameters each. The large number of parameters is a result of the complexity of the biologically-based model that we use to control each joint (see **Figure 4**) (Zhong et al., 2012). Many parameter values were set using basic heuristics such as resting voltage (−60 mV), time constant (5 ms), and relative size (1). Even after these simplifications, approximately 90 parameters per leg, mostly synapse strengths, still needed to be set. Because of the large number of possible local solutions, the design and training of the CPG network was done over the course of four iterations in which progressively more complete networks were tuned. First, parameter values within the CPG were tuned to generate appropriate rhythm and response properties. Second, synapses from sensory neurons to the CPG were tuned to generate the intended CPG activity during walking. Third, synapses from the CPG to the MNs were tuned to obtain the proper MN activation. Finally, afferent feedback from the muscles to the MNs was tuned to further refine MN activation. This entire tuning process was performed without a physicsbased simulation and then the results were tested on the Puppy robot.

#### 2.4.1. CPG Design

The first step is designing a CPG for the pattern formation layer of a single joint which is capable of producing the desired phase transitions in response to sensory feedback. The system is composed of two mutually inhibitory neurons called half-centers (HCs), each with persistent sodium channels. It has the same basic set of equations as has been used in other recent modeling work (Daun-Gruhn et al., 2009). These channels provide nonlinear positive reinforcement to membrane voltage fluctuations, which make sustained oscillation possible. Mutual inhibition is implemented via non-spiking interneurons (INs). Each HC excites an IN, which inhibits the other HC, as shown in **Figure 4**. Though this CPG is composed of only 4 non-spiking neurons, it exhibits many of the same shapes, behaviors, and responses to perturbations that exist in the average spiking frequency of reciprocally inhibited spiking neurons with postinhibitory rebound (Perkel and Mulloney, 1974; Pinsker, 1977; Ayers and Selverston, 1979). It also has the same network architecture as the pattern formation neural pools used in the Zhong locomotor model, and the oscillatory dynamics are also governed by a slowly activating and deactivating persistent sodium current.

Our previous work described a bifurcation parameter, δ, which controls the CPG's endogenous rhythm and sensitivity to inputs (Szczecinski et al., 2017). The CPG oscillates endogenously if δ > 0. When δ is near to 0, it easily entrains with incoming sensory signals. As δ increases, it less easily entrains

with sensory signals. Each joint of Puppy is controlled by a CPG in which δ = 0.1. In addition, the slope of m∞, h∞, and GNa were adjusted until the CPG's bursts peaked approximately 20% above the high equilibrium point, and the endogenous period was twice that of the robot's intended stepping period.

#### 2.4.2. CPG Entrainment

The second step in choosing parameter values for the network to produce the intended MN activations is to tune the synapses from sensory neurons to the CPG, such that the CPG both entrains to the sensory information and produces the MN activations calculated in the previous section. In our network, sensory feedback synapses onto the CPG according to rules discovered in vertebrates, described in Section 2.2 (e.g., hip flexor stretch encourages a transition from stance to swing Pearson, 2008, etc.). The synaptic conductance and threshold of these pathways determine how they affect the CPG's phase (Szczecinski et al., 2017), meaning that they must be carefully calculated for Puppy to walk properly.

Two steps are required to tune the synapses from sensory neurons to the CPGs. First, the intended walking kinematics are used to find the type Ia, Ib, and II afferents during normal walking motion. These are the signals that entrain the CPG into the proper phase for walking. Second, a neural simulation is assembled in which the calculated muscle afferents are input to the CPG. A fitness function, f1(Vthresh,Gsyn), is calculated that describes how well the CPG entrained to the sensory information,

$$f\_1(V\_{\text{threshold}}, \mathcal{G}\_{\text{syn}}) = (P - P\_o)^2 + (\text{Se} - \text{Se}\_o)^2 + (\text{Sf} - \text{Sf}\_o)^2 + \sum \text{(G}\_{\text{syn}}),\tag{10}$$

where P is the oscillation period, S<sup>e</sup> is the timing of the extensor MN's rising edge, S<sup>f</sup> is the timing of the flexor MN's rising edge, and Gsyn is a vector of conductance values for the synapses under consideration. Vthresh is a vector of the conductance threshold for the same synapses. Terms with the subscript "o" are the intended values. Note that synaptic conductances are penalized, preventing synapse conductances from becoming too large.

Gsyn and Vthresh were found to minimize f<sup>1</sup> with a twostep optimization process. First, a genetic algorithm (GA) was used as a global search of the parameter space. The GA was initialized with a population of 1,500 possible parameter value combinations. At the end of every generation, the worst 50% of solutions were eliminated, and the others were randomly selected for mating with a performance-based weighting. Mating was performed with single-crossover, and the mutation rate was 0.1%. Once the GA completed five generations, the best solution was used as the starting point for a Nelder-Mead simplex minimizer. Thus, the parameter space was first globally sampled, and then serially refined to find a promising solution.

#### 2.4.3. CPG Output

In the third step, the CPG output synapse strength was trained to produce activations of the motor neurons with a peak magnitude that matches each desired motor neuron activation and a minimum of no activation at some point in the cycle. Similar to entraining the CPGs, we used the GA from the previous section with a population of 300 parameter value combinations for 5 generations and refined the best solution with a Nelder-Mead routine. The fitness function is

$$f\_2(\mathbf{x}) = (\max(E) - \max(E\_o))^2 + (\max(F) - \max(F\_o))^2 + \min(E)$$

$$+ \min(F),\tag{11}$$

where E and F are the a single cycle of extensor and flexor motor neuron patterns and E<sup>o</sup> and F<sup>o</sup> are the desired patterns.

#### 2.4.4. Afferent Influence of MN Activation

In the last step, afferent feedback was trained to help shape the MN output and provide additional force if necessary to overcome changes in foot placement (excitatory Ib feedback), or reduced force if the leg is moving too quickly (inhibitory Ia feedback). All neurons and pathways involved in these networks were designed to be completely continuous over all possible ranges. The fitness function for the final training is

$$f\_3(x) = \left(\max(E) - \max(E\_o)\right)^2 + \left(\max(F) - \max(F\_o)\right)^2 + \dotsb \quad \text{(12)}$$

$$\min(E) + \min(F) + \left(E - E\_o\right)^2 + \left(F - F\_o\right)^2.$$

#### 3. RESULTS

#### 3.1. Offline Training Results

The final results of the training can be seen in **Figure 7**. A clear relationship between the training data and the network output is observed. A step cycle with the desired period is produced based on expected sensory feedback. All muscles are active at the correct point of the step cycle, with extensors active during stance and flexors active during swing. The transitions between the stance and swing phases are close to the desired transition point of the step cycle based on expected sensory feedback. Additionally, five of the six activation curves follow within 10% of the magnitudes for the inverse dynamics calculated activation values.

For the hip, extensor output at the beginning of stance and flexor output at the beginning of swing are both a little high, but final output is within 5% of the training curve. The transition from stance to swing in the hip occurs 10% earlier than the training data anticipates; however, this is a phenomenon observed in kinematic data for dogs (Fischer and Lilje, 2011) and other mammals (Fischer et al., 2002). Additionally, knee extensor output is initially within a few percent of the desired angle, and it maintains much higher output during stance than the training data. The knee flexor output peaks at a higher magnitude than the training data, however, this is not for long. The transition timings from stance to swing and swing back to stance are directly in time with the expected feedback and training data. For the ankle, both trained ankle output for extensor and flexor activity follow the training data shape and are within a few percent of the desired output. Here, like the knee, the transitions from stance to swing and back to stance are directly timed with the expected sensory feedback and training data.

#### 3.2. Robot Results

The trained network output MN activity based on expected sensory feedback is nearly as expected and results in robot walking. With the applied trained network and the commissural inter-leg network, the hind legs perform sustained, alternating stepping at a period of 0.83 s. The walking robot had approximately a 50–50% stance to swing duty cycle. Data presented in **Figures 8**–**11** is for a stepping speed of 1 m/s or 1.67 body lengths/s. A screen capture of a step sequence is shown in **Figure 8** (See Supplementary Material Video 1).

The average MN activations, muscle tensions, and joint kinematics for 38 right and left steps can be seen in **Figure 9**. Average extensor MN activations have peaks that are within 10% of intended magnitude, while flexor activity peaks are lower. Relative timing between the joints is as expected, with hip, knee, and ankle flexors transitioning to swing at about the same time, and knee and ankle extensors activating mid-swing before the hip extensors at the beginning of stance. When comparing averaged activity, overall activity is more spread out than desired activations, however, activity during single steps show sharp transitions and distinct off periods as can be seen in **Figure 10**.

Sensory signals produce adaptive motions by changing step timing. The transition from swing to stance occurs with increasing Hip Extensor II feedback (**Figure 10**, column one, solid arrow). The transition from stance to swing occurs with increasing Hip Flexor II feedback and a drop in Ankle Extensor Ib feedback (**Figure 10**, column one, dashed arrow). These sensory changes cause the CPGs to rapidly change phase between extension and flexion. The CPG change produces a corresponding rapid change in MN voltage and change in motion. These transitions vary in timing depending on the voltage values and rate of change for sensory feedback neurons (**Figure 10**, column 2).

Afferent feedback also provides shaping of MN activation activity. During walking, the contribution to MN output from the CPGs drop over time due to the decreased level in activity of the CPG neurons. However, the desired MN activation at the end of swing and stance increases over time for the hip muscles (**Figure 7**). The synthetic neural controller achieves this with local hip extensor and flexor Ib excitatory feedback pathways as is seen in row three of **Figure 10**. This activation is even more pronounced in the robot than was calculated with inverse dynamics or predicted by the offline training and neural simulation.

Comparisons between the right and left leg show activations and joint angles with similar shapes and peak amplitudes within a few percent of each other, except with a small phase delay (**Figure 11**).

# 4. DISCUSSION

The robotic system demonstrated here shows the sufficiency of the known neural system for timing joints and producing the necessary kinematic motions. Our work reaffirms the work

FIGURE 7 | Trained network output MN activity compared with desired motor neuron activations. This output is simulated using expected feedback and is not the actual MN output of the walking robot. The transitions between stance and swing phases are close to the desired transition point of the step cycle and most activation curves follow within 10% of the magnitudes for the activation values calculated with inverse dynamics.

by Klein and Lewis (2012) that dynamic neural systems are affective tools for controlling dynamic walking systems. Our work expands upon this by implementing a more detailed model of intra-leg sensory pathways and demonstrates that the proposed mechanisms are effective for regulating stance and swing timing, as well as muscle force production for forward walking by adapting each step individually. Additionally, our work demonstrates a network controller that can produce locomotion at faster speeds and with less external support than this previous work.

Our work also demonstrates the larger applicability of the parameter value setting method first presented in Hunt et al. (2015b). This method was first developed for setting locomotion parameter values in a simulation model of a rat actuated by a Hill muscle model. Compared with the dog robot, the rat simulation has a different kinematic configuration, different stepping frequency, different actuators, and different torque demands. Despite all these differences, the same method is effective for setting parameter values in the rat simulation and the dog robot.

The method for setting parameter values in the stepping network presented here significantly reduces time to application in two ways. First, by having an autonomous method for setting parameter values, the computer is able to remove the guesswork involved and evaluate possible parameter values at a much faster speed than a human. Second, by eliminating the need for physics-based simulations or hardware, the method is able to iterate through possible parameter value choices several orders of magnitude faster than with a simulation or hardware in the loop. This methods works by evaluating the network with expected sensory feedback, assuming locomotion speed, kinematics, and forces are occurring as designed.

Despite differences in sensory signals that occur when the robot actually walks vs. those that were predicted, the simulated

neural system maintains effective control of locomotion. We believe this is the case because of the robust design of the locomotory circuit combined with the stable design of the legs. The central pattern generator ensures that stepping remains continuous despite deviations in sensory signals. Additionally, the sensory feedback pathways are able to adapt the locomotion steps and maintain stability while there are variations in stepping behavior. This confirms the effectiveness of the neural organization and different sensory signals and pathways implemented in our neural model for rhythm generation (Zhong et al., 2012), joint coordination (McVea et al., 2005; Akay et al., 2006; Pearson, 2008; Akay et al., 2014), leg coordination (Jankowska, 2008; Rybak et al., 2013; Talpalar et al., 2013), and motor neuron activity regulation (Jankowska, 1992; Prochazka et al., 1997; Zhong et al., 2012). The Ib and Ia feedback pathways that modulate motor neuron output add significant control to the robot. Positive Ib feedback adds additional MN activation when load is encountered on a muscle, enabling it to pull harder to overcome obstacles. In terms of walking, this means pushing harder on the ground if the stance leg is in a position where the muscles have low mechanical advantage. Negative Ia feedback reduces MN activation when the joint is moving too quickly, slowing down stance or swing.

This work also demonstrates a method for determining the required motor neuron activations from desired kinematics and a model of the robot. Though these torques were within 20% of peak torques recorded in the greyhound (another dog of similar limb proportions and body mass to Puppy) (Colborne et al., 2006), the method required the implementation of a PD controller, which can be very sensitive to parameter values. Recent advances in the fields of biology and biomechanics have led to more sophisticated methods for calculating joint torques using both kinematic and dynamic (force) data from the animal itself, leading to interesting implementations of biorobotic systems (Andrada et al., 2013; Karakasiliotis et al., 2016). As this data becomes available for dogs, we can use it to refine the required joint torque output of the robot similar to what we did in the simulation of rat locomotion (Hunt et al., 2015b). However, when this data is not available, e.g., it either has not be collected yet for a particular animal or when a robot has a unique morphology, our work demonstrates the effectiveness of this approach non-the-less.

#### 4.1. Possible Causes of Error

Though there are some observable differences between the animal data and robot motion and control, the presented controller is a starting point for developing further improvements. For example, the Hip Extensor motorneuron activity has significant additional activation early in stance phase, which is a result of training the CPG output synapse to match the highest desired MN activity. This could be improved through the inclusion of additional pathways and different training methods. The training of the output strength could be based on the highest point of initial MN activity or additional feedback pathways may be required to limit the knee extensor activity during stance.

activation is even more pronounced than predicted by the offline training and neural simulation. Column two shows the same three steps beginning at foot

touchdown. Differences in sensory signals provide adaptation and changes in the CPG level transition timing as well as MN activity levels.

All joint peak angles are accurate within 5–15◦ . The largest errors occur with the hip. Errors in hip peaks are possibly due to the delays in communication between Animatlab and Labview and the robot. The hip is the only joint to provide feedback on position, and this delay would impact the sensory signal which causes transitions in the neural system to lag real time of the robot. The response of the robot would then be additionally delayed by the returning communication. There is no such delay built into the training of the neural system. In the future, we could simulate such a delay in our training method, or improve the bandwith between the robot and the neural controller.

Observations of individual step data reveal larger variations occurring on a step by step basis with sharper transitions and higher peak heights in MN activity than is noticeable in the average data. This indicates that the neural system is adapting the stride and adjusting its control continuously. This also shows the adverse effects of working with data that is averaged from multiple steps. Though averaged data shows important information, it does not depict the whole picture where individual variety and adaptation play an important role in locomotion.

Another product of using the averaged data is potentially incomplete training of the sensory feedback in both setting thresholds, and setting strengths of local Ia and Ib feedback parameter values. Though sensory feedback could be modulated by thresholds in the animal, the thresholds were not trained in our work because we used a single feedback signal without noise. While training, the reliance on this expected input caused the system to become overly dependent on exact threshold points, and small changes in feedback strength produced significant effects on behavior. Additionally, there is not enough available data on how intra-joint Ia and Ib pathways affect walking to properly train and set these weights off-line. Intra-joint feedback is likely instrumental in changing force production on a step-bystep basis, and training these pathways using average data may never be sufficient for adaptive, animal-like walking.

Puppy's gait was asymmetrical, and one possible explanation for the asymmetry could be differences in ankle motion. It is noted that the left ankle maintains a more flexed position than the right, especially during stance. This difference could be a result of a problem in the robot controller at the low level, turning the MN activations into actuator pressures in an uneven manner. Another explanation is that the controller is such that when a phase delay occurs, it continues to occur based on the overall kinematics and dynamics of the system. This could be determined through more extensive testing of the robot in different initial conditions and determining if the lag always occurs on the same side of the robot.

### 4.2. Future Work

Future work in controller development will be explored in several areas. First, we will improve our training method in several ways. To do this, we will perform optimization on a physics-based simulation or the walking robot. The neural system could be trained to provide greater stability and/or matching of animal kinematics. This would enable the system to learn low-level feedback pathways that are able to make the subtle corrections necessary for the simulation to produce repetitive, self-supporting walking that more closely matches that of the animal. The second method would require more animal data, using kinematics and dynamics for a series of steps in the training. These series would have different motor neuron profiles for each step, and the optimizer could adjust the feedback pathways to better match the step by step information, and not just the averaged data.

Though the developed controller is able to produce walking with only feedback from muscles, animals take advantage of significantly more sensors while walking. Walking can be made more robust and able to handle more diverse situations such as large perturbations or obstacle avoidance by adding more sensors to the control system. Currently, Puppy is equipped with sensors on the bottom of the feet, which are able to sense ground contact and force in a single direction. Inclusion of these sensors in the walking control system would add redundancy to ground detection and would likely result in more stable behaviors. These could act as ground contact sensors, similar to those used in Klein and Lewis (2012).

We are also in the processes of redesigning the front legs to more accurately reflect the anatomy of the dog (Fischer and Blickhan, 2006). Upon completion of the front legs, we will be able to apply the same training process to produce forward walking in the front legs, and then begin to explore processes which affect inter-leg coordination similar to the work performed in simulation in Hunt et al. (2015a). By working with the physical robot, we will be able to more accurately observe the roles that mechanical interactions play in inter-leg coordination.

This robot and other such biorobots controlled by synthetic nervous systems offer advantages for further researching neural control of locomotion and movement. With our robot, we will be able to test more detailed neurological models of locomotion by replicating experiments which explore how the elimination of different sensory signals can cause specific effects in locomotion. For example, we can adjust the relative strengths of inter-leg pathways similar to those performed in Talpalar et al. (2013), and observe if similar hopping motions result. Additionally, we can perform experiments which attempt to mimic diseases and their effect on the nervous system. We can then perform experiments in the robot, observe the effects on locomotion, and use the results to inform better models of the disease. We can additionally perform a variety of interventions to overcome deficits caused by the disease without risk to an animal.

#### 5. CONCLUSION

This manuscript presents a robot controlled by a synthetic nervous system built from the known connectivity of mammalian locomotor systems. We demonstrate that the neural controller effectively adapts the robot's stepping on a step-by-step basis and maintains rhythmic walking. This research platform, consisting of the robot, its hardware control system, and its synthetic nervous system, will serve as a useful launching point for studying more complex behaviors as well as the role of different sensory signals in locomotion. The computational method for setting parameter values in a synthetic nervous system based on desired behavior is also presented. This method is significantly faster and more reliable than manual tuning, and has been effective for both a rat simulation and the Puppy robot described

#### REFERENCES


here. We believe that the method presented here will prove useful to other researchers attempting to explore the use of neural controllers for other simulated models and robotic systems.

#### AUTHOR CONTRIBUTIONS

AH: Developed the control system layout and training methods. Performed data collection and analysis. Drafted the manuscript. NS: Performed detailed CPG analysis and developed the software algorithms for performing training. Provided significant revising of the manuscript. RQ: Provided robot support and control development guidance. Provided revisions for important intellectual content.

#### FUNDING

This work was supported by DARPA M3 Grant DI-MISC-81612A and by the NASA Office of the Chief Technologist, Grant Number NNX12AN24H.

#### SUPPLEMENTARY MATERIAL

The Supplementary Material for this article can be found online at: http://journal.frontiersin.org/article/10.3389/fnbot. 2017.00018/full#supplementary-material

combining neurobiology, modeling and robotics. Biol. Cybern. 107, 545–564. doi: 10.1007/s00422-012-0543-1


and CPG-based control for complex behaviors of walking robots. Front. Neurorobotics 9:10. doi: 10.3389/fnbot.2015.00010


to turning of the cockroach Blaberus discoidalis. Biol. Cybern. 108, 1–21. doi: 10.1007/s00422-013-0573-3


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2017 Hunt, Szczecinski and Quinn. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# A Neurocomputational Model of Goal-Directed Navigation in Insect-Inspired Artificial Agents

Dennis Goldschmidt 1, 2 \*, Poramate Manoonpong<sup>3</sup> and Sakyasingha Dasgupta4, 5

<sup>1</sup> Bernstein Center for Computational Neuroscience, Third Institute of Physics – Biophysics, Georg-August University, Göttingen, Germany, <sup>2</sup> Champalimaud Neuroscience Programme, Champalimaud Centre for the Unknown, Lisbon, Portugal, <sup>3</sup> Embodied AI and Neurorobotics Lab, Centre of BioRobotics, The Mærsk Mc-Kinney Møller Institute, University of Southern Denmark, Odense, Denmark, <sup>4</sup> IBM Research, Tokyo, Japan, <sup>5</sup> Riken Brain Science Institute, Saitama, Japan

Despite their small size, insect brains are able to produce robust and efficient navigation in complex environments. Specifically in social insects, such as ants and bees, these navigational capabilities are guided by orientation directing vectors generated by a process called path integration. During this process, they integrate compass and odometric cues to estimate their current location as a vector, called the home vector for guiding them back home on a straight path. They further acquire and retrieve path integration-based vector memories globally to the nest or based on visual landmarks. Although existing computational models reproduced similar behaviors, a neurocomputational model of vector navigation including the acquisition of vector representations has not been described before. Here we present a model of neural mechanisms in a modular closed-loop control—enabling vector navigation in artificial agents. The model consists of a path integration mechanism, reward-modulated global learning, random search, and action selection. The path integration mechanism integrates compass and odometric cues to compute a vectorial representation of the agent's current location as neural activity patterns in circular arrays. A reward-modulated learning rule enables the acquisition of vector memories by associating the local food reward with the path integration state. A motor output is computed based on the combination of vector memories and random exploration. In simulation, we show that the neural mechanisms enable robust homing and localization, even in the presence of external sensory noise. The proposed learning rules lead to goal-directed navigation and route formation performed under realistic conditions. Consequently, we provide a novel approach for vector learning and navigation in a simulated, situated agent linking behavioral observations to their possible underlying neural substrates.

Keywords: path integration, artificial intelligence, insect navigation, neural networks, reward-based learning

# 1. INTRODUCTION

Social insects, including ants and bees, have evolved remarkable behavioral capabilities for navigating in complex dynamic environments, which enable them to survive by finding vital locations (e.g., food sources). For example, desert ants are able to forage and find small, sparsely distributed food items in a featureless environment, and form stereotyped and efficient routes

#### Edited by:

Mehdi Khamassi, Université Pierre et Marie Curie, France

#### Reviewed by:

Nicolas Cuperlier, Université de Cergy-Pontoise, France Andrew Philippides, University of Sussex, UK

#### \*Correspondence:

Dennis Goldschmidt dennis.goldschmidt@neuro. fchampalimaud.org

Received: 13 December 2016 Accepted: 24 March 2017 Published: 12 April 2017

#### Citation:

Goldschmidt D, Manoonpong P and Dasgupta S (2017) A Neurocomputational Model of Goal-Directed Navigation in Insect-Inspired Artificial Agents. Front. Neurorobot. 11:20. doi: 10.3389/fnbot.2017.00020

**155**

between their nest and reliable food sources (Collett, 2012; Mangan and Webb, 2012; Collett and Cardé, 2014; Cheng et al., 2014). These navigational behaviors not only rely on sensory information, mainly from visual cues, but also on internal memories acquired through learning mechanisms (Collett et al., 2013). Such learned memories have shown to be based on orientation directing vectors, which are generated by a process called path integration (PI) (Wehner, 2003).

## 1.1. Vector Navigation in Social Insects

In PI, animals integrate angular and linear ego-motion cues over time to produce an estimate of their current location with respect to their starting point. This vector representation is called the home vector (HV) and is used by social insects to return back to the home on a straight path. Many animals have been shown to apply PI, including vertebrate (Etienne and Jeffery, 2004) and invertebrate species (Srinivasan, 2015). While PI has mainly been observed in homing behavior, it can also serve as a scaffold for spatial learning of food sources (Collett et al., 1999, 2013). Indeed, experiments have shown that desert ants are capable of forming such memories by using their path integrator (Schmid-Hempel, 1984; Collett et al., 1999). Such memory is interpreted as a so-called global vector (GV), because the vector origin is fixed to the nest (Collett et al., 1998). If the ant is forced to take a detour during a foraging trip, the deviation from the GV is compensated by comparing the GV with the current PI state (Collett et al., 1999). Another example of vector memory is the waggle dance of honeybees (De Marco and Menzel, 2005; Menzel et al., 2005), in which the distance and direction to a goal are encoded by the duration and direction of the dance, respectively. After returning from a successful foraging run, insects re-apply this vector information in subsequent foraging runs (Capaldi et al., 2000; Wolf et al., 2012; Fernandes et al., 2015).

Although PI plays a key role in navigating through environments where visual cues, such as landmarks, are abundant, it also influences navigational behaviors in cluttered environments (Bühlmann et al., 2011). If an ant follows a learned GV repeatedly, it learns the heading directions at local landmarks along the path (Collett and Collett, 2009). These heading directions are view-based from the visual panorama surrounding the ant (Graham and Cheng, 2009; Narendra et al., 2013), and vector-based with additional information about the path segment length (Collett and Collett, 2009, 2015). The latter vector memories are also termed local vectors, because their retrieval is linked to local landmarks instead of global location with respect to the nest (Collett et al., 1998). Besides spatial learning of locations and routes, searching patterns of desert ants have also shown to be influenced by PI (Bolek and Wolf, 2015; Pfeffer et al., 2015).

# 1.2. Neural Substrates of Social Insect Navigation

Neural substrates of social insect navigation have yet to be completely identified, but previous findings of neural representations of compass cues and visual sceneries may provide essential information about how PI and vector learning is achieved in neural systems (Duer et al., 2015; Plath and Barron, 2015; Seelig and Jayaraman, 2015; Weir and Dickinson, 2015). In particular, neurons in the central complex, a protocerebral neuropil in the insect brain, have shown to be involved in visually guided navigation.

The main sensory cue for PI in social insects is derived from the linear polarization of scattered sunlight (Homberg et al., 2011; Lebhardt et al., 2012; Evangelista et al., 2014). Specialized photoreceptors in the outer dorsal part of the insect eye detect certain orientations of linear polarization, which depend on the azimuthal position of the sun. A distinct neural pathway processes polarization-derived signals leading to neurons in the central complex, which encode azimuthal directions of the sun (Heinze and Homberg, 2007). In a recent study, Seelig and Jayaraman (2015) placed a fruit fly tethered on a track ball setup in a virtual environment and measured the activity of neurons in the central complex. They demonstrated that certain neurons in the ellipsoid body, which is a toroidal subset in the central complex, encode for the animal's body orientation based on visual landmarks and angular self-motion. When both visual and self-motion cues are absent, this representation is maintained through persistent activity, which is a potential neural substrate for short-term memory in insects (Dubnau and Chiang, 2013). A similar neural code of orientations has been found in the rat limbic system (Taube et al., 1990). These so-called head direction (HD) cells are derived from motor and vestibular sensory information by integrating head movements through space. Thus, neural substrates of allothetic compass cues have been found in both invertebrate and vertebrate species. These cues provide input signals for a potential PI mechanism based on the accumulation of azimuthal directions of the moving animal as previously proposed by Kubie and Fenton (2009).

# 1.3. Computational Models of Vector-Guided Navigation

Because spatial navigation is a central task of biological as well as artificial agents, many studies have focused on computational modeling of such behavioral capabilities (see Madl et al., 2015 for review). Computational modeling has been successful in exploring the link between neural structures and their behavioral function, including learning (Bienenstock et al., 1982; Oja, 1982), perception (Salinas and Abbott, 1995; Olshausen and Field, 1997), and motor control (Todorov and Jordan, 2002). It allows for hypotheses about the underlying mechanisms to be defined precisely and their generated behavior can be examined and validated qualitatively and quantitatively with respect to experimental data.

Most models of PI favor a particular coordinate system (Cartesian or polar) and reference frame (geo- or egocentric) to perform PI based on theoretical and biological arguments (Vickerstaff and Cheung, 2010). While some models (Müller and Wehner, 1988; Hartmann and Wehner, 1995) include behavioral data from navigating animals in order to argue for their proposed PI method, others (Wittmann and Schwegler, 1995; Haferlach et al., 2007; Kim and Lee, 2011) have applied neural network models to investigate possible memory mechanisms for PI. Despite the wide variety of models, only a few of these models have been implemented on embodied artificial agents (Schmolke et al., 2002; Haferlach et al., 2007) and in foraging tasks similar to the ones faced by animals in terms of distance and tortuosity of paths (Lambrinos et al., 1997, 2000). Furthermore, while some vertebrate-inspired models (Gaussier et al., 2000; Jauffret et al., 2015) offer underlying spatial learning mechanisms based on place and view cells, many insect-inspired models have not linked PI and navigational capabilities to spatial learning and memory. A notable exception is a recent model based on the Drosophila brain show impressive results to generate adaptive behaviors in an autonomous agent, including exploration, visual landmark learning, and homing (Arena et al., 2014). However, the model has not been explicitly shown to be scalable for long-distance central-place foraging as observed in social insects.

Kubie and Fenton (2009) proposed a PI model based on the summation of path segments with HD accumulator cells, which are individually tuned to different HDs and hypothesized to encode how far the animal traveled in this direction. These summated path vectors are then stored in a fixed memory structure called shortcut matrix, which is used for navigating toward goals. Although this model is based on HD cells and therefore presented as for mammalian navigation, recent findings in Drosophila melanogaster (Seelig and Jayaraman, 2015) demonstrate that similar HD accumulator cells can also be hypothesized for insect navigation. Similar HD accumulator models have been applied for chemo-visual robotic navigation (Mathews et al., 2009) and PI-based homing behavior (Kim and Lee, 2011).

Cruse and Wehner (2011) presented a decentralized memory model of insect vector navigation to demonstrate that the observed navigational capabilities do not require a map-like memory representation. Their model is a cybernetical network structure, which mainly consists of a PI system, multiple memory banks and internal motivational states that control the steering angle of a simulated point agent. The PI system provides the position of the agent given by euclidean coordinates, which are stored as discrete vector memories when the agent finds a food location. To our knowledge, this model is the first and only modeling approach which accounts for behavioral aspects of insect vector navigation. However, although they introduce a learning rule for so-called quality values of stored vectors in a more recent version of the model (Hoinville et al., 2012), their model does not account for how the navigation vectors are represented and learned in a neural implementation.

# 1.4. Our Approach

Inspired by these findings, in this paper, we present a novel model framework for PI and adaptive vector navigation as observed in social insects. The framework is applied as closed-loop control to an artificial agent and consists of four functional subparts: (1) a neural PI mechanism, (2) a reward-modulated learning rule for vector memories, (3) random search, and (4) an adaptive action selection mechanism. Here, the artificial agent primarily enables us to provide the necessary physical embodiment (Webb, 1995) in order to test the efficacy of our adaptive navigation mechanism, without a detailed reverse engineering of the insect brain.

Based on population-coded heading directions in circular arrays, we apply PI by accumulating speed-modulated HD signals through a self-recurrent loop. The final home vector representation is computed by local excitation-lateral inhibition connections, which projects accumulated heading directions onto the array of output neurons. The activity of these neurons encodes the vector angle as the position of maximum firing in the array, and the vector length as the amplitude of the maximum firing rate in the array. The self-localization ability of PI allows social insects to learn spatial representations for navigation (Collett et al., 1999). We design a reward-modulated associative learning rule (Smith et al., 2008; Cassenaer and Laurent, 2012; Hige et al., 2015) to learn vector representations based on PI. This vector, called global vector, connects the nest to a rewarding food location. Vectors are learned by associating the PI state and a reward received at the food location given a context-dependent state. This association induces weight changes in plastic synapses connecting the context-dependent unit to a circular array of neurons, which represents the vector. The context-dependent unit activates the vector representation in the array, and therefore represents a motivational state for goal-directed foraging. Using the vector learning rule, the agent is able to learn rewarding locations and demonstrate goal-directed navigation. Because of the vector addition of global and inverted home vector in the action selection mechanism, it can compensate for unexpected detours from the original trajectory, such as obstacles (Collett et al., 1999, 2001).

Taken together, our model is a novel framework for generating and examining social insect navigation based on PI and vector representations. It is based on plausible neural mechanisms, which are related to neurobiological findings in the insect central complex. Therefore, we provide a computational approach for linking behavioral observations to their possible underlying neural substrates. In the next section, we will describe the proposed model for reward-modulated vector learning and navigation. The results section will provide detailed descriptions of our experimental setups and simulation results. Finally, conclusions and implications of our model with respect to behavioral and neurobiological studies are discussed in Section 4.

# 2. MATERIALS AND METHODS

In this paper we propose an insect-inspired model of vectorguided navigation in artificial agents using modular closedloop control. The model (see **Figure 1A**) consists of four parts: (1) a neural PI mechanism, (2) plastic neural circuits for reward-based learning of vector memories, (3) random search, and (4) action selection. The neural mechanisms in our model receive multimodal sensory inputs from exteroceptive and proprioceptive sensors to produce a directional signal based on a vector (see **Figure 1B**). This vector is represented by the activity of circular arrays, where the position of the maximum indicates its direction and the amplitude at this position indicates its length. We evaluate our model in simulation using a twodimensional point agent as well as a hexapod walking robot (see Supplementary Material for details).

FIGURE 1 | Schematic diagram of the modular closed-loop control for vector navigation. (A) The model consists of a neural path integration (PI) mechanism (1), reward-modulated vector learning (2), random search (3), and action selection (4). Vector information for guiding navigation is computed and represented in the activity of circular arrays. The home vector (HV) array is the output of the PI mechanism and is applied for homing behavior and as a scaffold for global vector (GV) learning. These three vector representations and random search are integrated through an adaptive action selection mechanism, which produces the steering command to the CPG-based locomotion control. (B) Spatial representation of the different vectors used for navigation. The HV is computed by PI and gives an estimate for the current location of the agent. In general, GVs connect the nest to a rewarding location. Using vector addition, the agent is able to compute, how to orient from its current location toward the feeder.

#### 2.1. Path Integration (PI) Mechanism for Home Vector (HV) Representation

The PI mechanism (**Figure 2**) is a multilayered neural network consisting of circular arrays, where the final layer's activity pattern represents the HV. Neural activities of the circular arrays represent population-coded compass information and rate-coded linear displacements. Incoming signals are sustained through leaky neural integrator circuits, and they compute the HV by local excitatory-lateral inhibitory interactions.

#### **A) Sensory inputs**

The PI mechanism receives angular and linear cues as sensory inputs. Like in social insects, angular cues are derived from allothetic compass cues. We employ a compass sensor which measures the angle φ of the agent's orientation. In insects, this information is derived from the combination of sun- and skylight compass information (Wehner, 2003). In desert ants, it has been found that linear cues are derived from the strides taken by the animal during the journey (Wittlinger et al., 2006, 2007). For our model, we assume that such odometry is translated into an estimate of the animal's walking speed. For the embodied agent employed here (i.e., a hexapod robot), the walking speed is computed by accumulating steps and averaging over a certain time window. These step counting signals are derived from the motor signals. The input signals for the angular component φ and

FIGURE 2 | Multilayered neural network of the proposed path integration (PI) mechanism. (A) Sensory inputs from a compass sensor (φ) and odometer (s) are provided to the mechanism. (B) Neurons in the head direction (HD) layer encodes the sensory input from a compass sensor using a cosine response function. Each neuron encodes a particular preferred direction enclosing the full range of 2π. Note that the figure depicts only six neurons for simplicity. (C) An odometric sensory signal (i.e., walking speed) is used to modulate the HD signals. (D) The memory layer accumulates the signals by self-recurrent connections. (E) Cosine weight kernels decode the accumulated directions to compute the output activity representing the home vector (HV). (F) The difference between the HV angle and current heading angle is used to compute the homing signal (see Equation 11).

the linear component s have value ranges of

$$
\phi \in [0, 2\pi),
\tag{1}
$$

$$s \in [0, 1]. \tag{2}$$

#### **B) Head direction layer**

The first layer of the neural network is composed of HD cells with activation functions

$$\mathbf{x}\_i^{\text{HD}}(\phi(t)) = \cos(\phi(t) - \phi\_i), \tag{3}$$

$$\phi\_i = \frac{2\pi i}{N}, \ i \in [0, N-1], \tag{4}$$

where the compass signal φ(t) is encoded by a cosine response function with N preferred directions φ<sup>i</sup> ∈ [0, 2π). The resolution is determined by 1φ = 2π N and the coarse encoding of variables, here angles, by cosine responses allows for high accuracy and optimized information transfer (Eurich and Schwegler, 1997). Coarse coding has been shown to be present in different sensory processing in the insect brain, including olfactory (Friedrich and Stopfer, 2001) and visual processing (Wystrach et al., 2014). Furthermore, it has been shown that polarization-sensitive neurons in the anterior optic tubercle of locusts exhibit broad and sinusoidal tuning curves of 90–120◦ (Heinze et al., 2009; Heinze and Homberg, 2009; el Jundi and Homberg, 2012). Headdirection cells in the central complex of Drosophila melanogaster were shown to have activity bump widths of 80–90◦ (Seelig and Jayaraman, 2015). However, their measurements are based on calcium imaging data, which is only an approximation of the neuron's firing rate.

#### **C) Odometric modulation of head direction signals**

The second layer acts as a gating mechanism (G), which modulates the neural activity using the odometry signal s (∈ [0, 1]). Therefore, it encodes in its activity, the traveled distances of the agent. The gating layer units decrease the HD activities by a constant bias of 1, so that the maximum activity is equal to zero. A positive speed increases the signal linearly. The gating activity is defined as follows:

$$\mathbf{x}\_i^G(t) = f\left(\sum\_{j=0}^{N-1} \delta\_{ij} \mathbf{x}\_j^{HD}(t) - 1 + s\right),\tag{5}$$

$$f(\mathbf{x}) = \max(0, \mathbf{x}), \tag{6}$$

where δij is the Kronecker delta, i.e., first layer neurons j and second layer neurons i are connected one-to-one. Forward speed signals have been found in the central complex of walking cockroaches (Martin et al., 2015).

#### **D) Memory layer**

The third layer is the so-called memory layer (M), where the speed-modulated HD activations are temporally accumulated through self-excitatory connections:

$$\mathbf{x}\_{i}^{\mathcal{M}}(t) = f\left(\sum\_{j=0}^{N-1} \delta\_{ij} \mathbf{x}\_{j}^{G}(t) + (1 - \lambda) \mathbf{x}\_{i}^{\mathcal{M}}(t - \Delta t)\right),\tag{7}$$

where λ is a positive constant defined as the integrator leak rate, which indicates the loss of information over time. A leaky integrator has previously been applied by Vickerstaff (2007) to explain systematic errors in homing of desert ants (Müller and Wehner, 1988). If the leak rate is equal to zero, the accumulation of incoming directional signals is unbounded, which is not biologically plausible. As such, any path integration system based on linear integration therefore bounds the natural foraging range of the animal in order to exhibit accurate path integration (Burak and Fiete, 2009).

#### **E) Decoding layer**

The final and fourth layer decodes the activations from the memory layer to produce a vector representation, i.e., the HV, which serves as the output of the mechanism referred to as PI state:

$$\mathbf{x}\_{i}^{\text{PI}}(t) = f\left(\sum\_{j=0}^{N-1} \mathbf{w}\_{ij}\mathbf{x}\_{j}^{\text{M}}(t)\right) \tag{8}$$

$$\omega\_{ij} = \cos(\phi\_i - \phi\_j) = \cos\left(\frac{2\pi(i-j)}{N}\right),\tag{9}$$

where wij is a cosine kernel, which decomposes the projections of memory layer actitivities of the jth neuron to the ith neuron's preferred orientation. While a cosine synaptic weight kernel is biologically implausible, it is reasonable to assume that an approximate connectivity could arise from forming local-excitation lateral-inhibition connections (e.g., mexican-hat connectivity). An example of such a connectivity formed by cell proximity could be the ring architecture of head-directionselective neurons in the ellipsoid body of the central complex (Seelig and Jayaraman, 2015; Wolff et al., 2015). The resulting HV is encoded by the average position of maximum firing in the array (angle θHV) and the sum of all firing rates of the array (length lHV). We calculate the position of maximum firing using the population vector average given by:

$$\theta\_{HV}(t) = \arctan\left(\frac{\sum\_{i=0}^{N-1} \mathbb{x}\_i^{\text{Pl}}(t) \sin(2\pi i/N)}{\sum\_{i=0}^{N-1} \mathbb{x}\_i^{\text{Pl}}(t) \cos(2\pi i/N)}\right),\tag{10}$$

where the denominator is the x coordinate of the population vector average, and the numerator is the y coordinate. See **Figure 3** for example output activities of the decoding layer neurons.

#### **F) Homing signal**

To apply the HV for homing behavior, i.e., returning home on a straight path, the vector is inverted by a 180◦ rotation. The difference between the heading direction φ and the inverted HV direction θHV −π is used for steering the agent toward home. The agent applies homing by sine error compensation, which defines the motor command:

$$m\_{HV}(t) = l\_{HV}(t)\sin\left(\theta\_{HV}(t) - \phi(t) - \pi\right). \tag{11}$$

This leads to right (mHV < 0) and left turns (mHV > 0) for negative and positive differences, respectively, and thereby decreasing the net error at each step. The underlying dynamical behavior of this sine error compensation is defined by a stable and an unstable fixed point (see Supplementary Marterial). This leads to dense searching behavior around a desired position, where the error changes rapidly (Vickerstaff and Cheung, 2010).

#### 2.2. A Reward-Modulated Learning Rule for Acquiring and Retrieving Vector Memories

We propose a heterosynaptic, reward-modulated learning rule (Smith et al., 2008; Cassenaer and Laurent, 2012; Hige et al., 2015) with a canonical form to learn vector memories based on four factors (see **Figure 4**): a context-dependent state, an input-dependent PI state, a modulatory reward signal, and the vector array state. Like the HV, GV memories are computed and represented in circular arrays. The context-dependent state, such as inbound or outbound foraging, activates the vector representation, and thus retrieves the vector memory. The association between the PI-based state and the reward signal modulates the plastic synapses connecting the context unit (presynaptic) with the vector array units (postsynaptic). The associated information is used by the agent on future foraging trips to steer toward the rewarding location. The received reward is an internally generated signal based on food reward due to visiting the feeder.

vector. Note that, as the agent returns to the home position, the output activities are suppressed to zero resulting from the elimination of opposite directions.

The context-dependent unit (see **Figure 4**) is a unit that represents the agent's foraging state, i.e., inward or outward. Here we apply a simple binary unit given by:

$$\sigma(t) = \begin{cases} 1 & \text{if outward trip,} \\ 0 & \text{if inward trip.} \end{cases} \tag{12}$$

The context-dependent unit projects plastic synapses onto a circular array that represents the GV. The GV array has the same number of neurons, thus the same preferred orientations as the PI array. In this way, each neuron i ∈ [0, N − 1] has a preferred orientation of <sup>2</sup>π<sup>i</sup> N . The activity x GV i of the GV array is given by:

$$
\alpha\_i^{GV}(t) = \mathcal{w}\_i^{GV}(t)\sigma(t),\tag{13}
$$

where w GV i are the weights of the plastic synapses. For these synapses, we apply a reward-modulated associative learning rule given by:

$$
\Delta \boldsymbol{w}\_i^{GV}(t) = \mu^{GV} r(t) \sigma(t) \left( \boldsymbol{x}\_i^{PI}(t) - \boldsymbol{x}\_i^{GV}(t) \right), \tag{14}
$$

$$
\boldsymbol{w}\_{i}^{GV}(t+\Delta t) = \boldsymbol{w}\_{i}^{GV}(t) + \Delta \boldsymbol{w}\_{i}^{GV}(t),\tag{15}
$$

where µ GV = 2 is the learning rate, and x PI i (t) is the PI activity in the direction i = 2πi N . The weights are therefore only changed when the agent forages outbound, because for the inward trip we assume that the agent returns to the home on a straight path. This is in accordance with behavioral data indicating that ants acquire and retrieve spatial memories based on internal motivational states, given by whether they are on an inward or outward trip (Wehner et al., 2006). The food reward r(t) at the feeder is given by:

$$r(t) = \max(0, 1 - 5d(t))\tag{16}$$

where d(t) is the agent's distance to the feeder, which we computed directly using the positions of the agent and feeder, given that the reward is physically bound to the location of the

food. Due to the delta rule-like term x PI i (t) − x GV i (t), the weights w GV i approach same values as the activities of the PI state at the rewarding location. Thus, the weights represent the static GV to the rewarding location (feeder). After returning back home, the agent applies the angle θGV of the GV to navigate toward the feeder using error compensation. The motor signal of the GV:

$$m\_{GV}(t) = l\_{GV}(t) \sin\left(\theta\_{GV}(t) - \phi(t)\right),\tag{17}$$

is applied together with the homing signal mHV and random search mε, where lGV is the length of the GV. We model the random search by the agent as a correlated Gaussian random walk, which has been previously used to study animal foraging (Bovet and Benhamou, 1988). Therefore, mε is drawn from a Gaussian distribution N (mean, S.D.):

$$m\_{\varepsilon}(t) \in \mathcal{N}(0, \varepsilon(t)),\tag{18}$$

with an adaptive exploration rate ε(t) given by:

$$\varepsilon(t) = \sigma(t) \exp\left(-\beta(t)\nu(t)\right),\tag{19}$$

where v(t) is an estimate for the average food reward received over time and β(t) is the inverse temperature parameter. The exploration rate is thus zero for inward trips, because the agent applies path integration to reach its home position on a straight path. We define v by the recursive formula:

$$\nu(t) = r(t) + \gamma \,\nu(t - \Delta t),\tag{20}$$

where v(t) is a lowpass filtered signal of the received food reward r(t) with discount factor γ = 0.995. Convergence of goal-directed behavior is achieved for ε below a critical value, which depends on the choice of β. We assume that ǫ and v are based on a probability distribution with fixed mean. We derive a gradient rule, which leads to minimization of the Kullback-Leibler divergence between the distribution of ǫ(v) and an optimal exponential distribution (see Supplementary Material for a derivation). The learning rule is given by:

$$
\Delta\beta(t) = \mu\_{\beta}\left(\frac{1}{\beta(t)} + \mu\_{\nu}\nu(t)\varepsilon(t)\right),
\tag{21}
$$

$$
\beta(t + \Delta t) = \beta(t) + \Delta \beta(t), \tag{22}
$$

where µ<sup>β</sup> = 10−<sup>6</sup> is a global learning rate, µ<sup>v</sup> = 10<sup>2</sup> is a rewardbased learning rate. The adaptation of beta is characterized by small changes scaling with the square root of time, while the term containing v(t) allows for exploitation of explored food rewards to further decrease ε through β. In ecological terms, such exploitation of sparse distributed resources is crucial for the survival of an individual as well as the whole colony (Biesmeijer and de Vries, 2001; Wolf et al., 2012; Bolek and Wolf, 2015).

The final motor command 6 in our action selection mechanism is given by the linear combination:

$$
\Sigma(t) = (1 - \varepsilon(t)) \left( \sigma(t) m\_{GV}(t) + m\_{HV}(t) \right) + m\_{\varepsilon}(t), \tag{23}
$$

where outward trips are controlled by the balance of random walk and global-vector guided navigation depending on the exploration rate ε, while inward trips are controlled solely by the homing signal mHV. The combination of the two sinusoidals is equivalent to a phase vector (phasor) addition resulting in a phasor, which connects the current position of the agent with the learned feeder location (see Supplementary Material for a derivation).

#### 3. RESULTS

Using the proposed model embedded as a closed-loop control into a simulated agent, we carried out several experiments to validate the performance and efficiency in navigating the agent through complex and noisy environments. We will further demonstrate that the generated behaviors not only resemble insect navigational strategies, but can also predict certain observed behavioral parameters of social insects.

#### 3.1. Path Integration (PI) in Noisy Environments

It has been shown, both theoretically and numerically, that PI is inherently prone to error accumulation (Benhamou et al., 1990; Vickerstaff and Cheung, 2010). Studies have focused on analyzing resulting errors from using certain coordinate systems to perform PI (Benhamou et al., 1990; Cheung and Vickerstaff, 2010; Cheung, 2014). Here we apply a system of geocentric static vectors (fixed preferred orientations) and analyze the effect of noise on the resulting error. How can noise be characterized in PI systems? Both artificial and biological systems operate under noisy conditions. Artificial systems, such as robots employ a multitude of sensors which provide noisy measurements, and generate motor outputs that are similarly noisy. Rounding errors in their control systems can be an additional source of noise. In animals, noise is mainly attributed to random influences on signal processing and transmission in the nervous system, including synaptic release and membrane conductance by ion channels and pumps (see Stein et al., 2005 for review).

In order to validate the accuracy of the PI mechanism, we measure the positional errors of the estimated nest position with respect to the actual position over time. In the following experiments, we averaged positional errors over 1,000 trials with trial duration T = 1, 000 s (simulation time step 1t = 0.1 s). In each trial, the agent randomly forages out from the nest and when the trial duration T is reached, the agent switches to the inward state and only applies the path integration mechanism for homing (see **Figure 5A** for example trajectories). After trial duration T, the mean distance of the agent from the nest is 9.3 ± 5.0 m. The radius of the nest the agent has to reach for successful homing is set to 20 cm. **Figure 5B** shows the distribution of positional errors for three different correlated, sensory noise levels (1, 2, and 5%). The distribution of errors follows a two-dimensional Gaussian distribution with mean 0.0 (nest) and width hδri.

In population coding, neural responses are characterized by correlated or uncorrelated noise (Averbeck et al., 2006, see **Figure 5C** for examples). In the uncorrelated case, fluctuations in one neuron are independent from fluctuations in the other neurons. Correlated noise is described by fluctuations which are similarly expressed across the population activity, and therefore leads to a shift of the observed peak activity. Here, we numerically analyze the effects of correlated and uncorrelated noise on the accuracy of the proposed PI mechanism. Correlated noise is here defined as a shift δφ of the peak activity, i.e., fully correlated noise, such that the compass input to the PI mechanism is given by:

$$
\phi\_{noisy}(t) = \phi(t) + \delta\phi,\tag{24}
$$

where δφ is drawn from a Gaussian distribution N (0, 2πζsens) with sensory noise level ζsens. Uncorrelated noise, also referred to as neural noise, is defined by adding fluctuations δx HD i to the activities of the HD layer, which are drawn from a Gaussian distribution N (0, ζneur) with neural noise level ζneur.

**Figure 5D** shows the effect of different degrees of sensory noise on the performance of PI for a fixed number of 18 neurons per layer averaged over 1000 trials. For noise levels up to 5% (equal to 18◦ ), the observed mean position error increases only slowly and nonlinearly with values below 0.4 m demonstrating that our PI mechanism is robust for sensory noise up to these levels.

In **Figure 5E**, we show mean position errors for different levels of uncorrelated noise. Similar to sensory noise, the errors first increase slowly and nonlinearly for noise up to 2%, while for noise larger than 5%, errors increase linearly. In comparison with sensory noise levels, uncorrelated noise leads to larger errors due to a more dispersed peak activity. However, for noise levels up to 2%, mean position errors are well below 0.2 m indicating robustness of our PI mechanism with respect to uncorrelated noise. Given this apparent similar nature of correlated and uncorrelated noise, we only applied sensory, correlated noise for the following experiments of this study.

In **Figure 6**, we varied the number of neurons in the circular arrays of the PI mechanism for three different sensory noise level (0, 2, and 5%). Note that the errors for 0% noise arise from the accuracy limit given the number of neurons. While the mean position error is significantly higher for 6 and 9 neurons, it achieves a minimal value for 18 neurons. For larger system sizes, the error only changes minimally. This is again mainly due to the coarse coding of heading directions. Interestingly, the ellipsoid body of the insect central complex contains neurons with 16–32 functional arborization columns (called wedges, see Wolff et al., 2015). The numerical results here might point toward an explanation for this number, which efficiently minimizes the error.

Besides errors resulting from random noise, there are also systematic errors observed in navigating animals. Both invertebrate and vertebrate species exhibit systematic errors in homing behavior after running an L-shaped outward journey (see Etienne and Jeffery, 2004 for review). Müller and Wehner (1988) have examined such errors in desert ants by measuring the angular deviation with respect to the angle of the L-shaped course (see **Figure 7**). In order to show that our mechanism is able to reproduce these errors, we fit our model against the desert ant data from Müller and Wehner (1988) using the leak rate λ (Equation 7) of the PI memory layer as control variable. Using a leak rate of λ ≈ 0.0075 resulted in angular errors most consistent with behavioral data. Leaky integration producing systematic errors is an idea that has been previously proposed (Mittelstaedt and Glasauer, 1991; Vickerstaff and Cheung, 2010). Thus, here our mechanism is not only performing accurately in the presence of random noise, but it also reproduces behavioral aspects observed in animals.

In **Table 1**, we compare the accuracy and efficiency with other state-of-the-art PI models. Haferlach et al. (2007) apply less neurons than our model, but we achieve a better performance in terms of positional accuracy with larger sensory noise (values taken from **Figure 9**). Note that our model achieves similar accuracy, when using six neurons (see **Figure 6**). The model by Kim and Lee (2011) applies 100 neurons per layer leading to a fairly small positional error despite of 10% uncorrelated noise (**Figure 6A**, N<sup>1</sup> = 100 neurons). However, both models apply straight paths before homing, which results in smaller path integration errors compared to random foraging as observed in insects. Furthermore, many desert ant species were measured to freely forage average distances of 10–40 m depending on the species (Muser et al., 2005), whereas some individuals travel even up to multiple hundred meters (Buehlmann et al., 2014). Our foraging time has been adjusted for realistic foraging distances, and if we reduce the foraging time in our model, we achieve similarly small positional errors as previous models. Furthermore, behavioral data measured in desert ants (Merkle et al., 2006) revealed that path integration errors are approximately 1–2 m depending on foraging distance. The median values are taken from Figure 3B in Merkle et al. (2006) and reflect the error between the endpoint of an ant's inward run and the correct position of the nest. These larger errors compared

gray) and homing behavior (dark gray) for different sensory, correlated noise levels: 1, 2, and 5%. The red point marks the starting point at the nest, and the blue point indicates the return, when the agent switches to its inward state. Using only path integration, the agent successfully navigates back to the nest with a home radius (green circle) of 0.2 m. (B) We evaluate the accuracy of the proposed PI mechanism by using the mean positional error averaged over each time step during each trial. Distribution of positional errors for different sensory, correlated noise levels: 1, 2, and 5%. (C) Examples of population-coded HD activities with correlated and uncorrelated noise. Filled dots are activities of individual neurons, while the dashed line is a cosine response function. (D) Mean position errors hδri (± S.D.) in PI with respect to fully correlated, sensory noise levels averaged over 1,000 trials (fixed number of 18 neurons per layer). (E) Mean position errors hδri (± S.D.) in PI with respect to uncorrelated, neural noise levels averaged over 1,000 trials (fixed number of 18 neurons per layer).

to model accuracies are likely due to noise accumulation in sensing, neural processing and motor control, although it is difficult to determine an exact quantification. Nonetheless, ants are able to reliably navigate by falling back to other strategies, such as searching behavior or visual homing.

# 3.2. Global Vector (GV) Learning and Goal-Directed Navigation

In the previous section, we proposed a reward-modulated associative learning rule for GV learning. In order to test the performance of our insect-inspired model applying this learning rule, and to validate the use of learned vector representation in goal-directed navigation, we carried out several experiments under biologically realistic conditions. We apply the PI mechanism with N = 18 neurons per layer and a sensory noise level of 5%. In the first series of experiments, a single feeder is placed with a certain distance Lfeed and angle θfeed to the nest. The agent is initialized at the nest with a random orientation drawn from a uniform distribution on interval [0, 2π). In this naïve condition, the agent starts to randomly search in the environment. If the agent is unsuccessful in locating the feeder after a fixed time tforage, it turns inward and performs homing

with respect to number of neurons per layer averaged over 1, 000 trials for three different sensory noise level (0, 2, and 5%). In all three cases, the error reaches a minimum plateau between 16 and 32 neurons (colored area), which corresponds to the number of functional columns in the ellipsoid body of the insect central complex (Wolff et al., 2015).

behavior using only the PI mechanism. If the agent however finds the feeder, the current PI state is associated with the received reward, and stored in the weights to the GV array. The agent returns back home after the accumulated reward surpasses a fixed threshold. Each trial lasts a fixed maximum time of T = 3 2 tforage, before the agent is reset to the nest position. On subsequent foraging trips, the agent applies the learned vector representation and navigates along the GV, because the exploration rate is decreased due to the previous reward. If the agent finds the feeder repeatedly, the learned GV stabilizes and the exploration rate decreases further.

**Figure 8** demonstrates such an experiment for a feeder with a distance of Lfeed = 10 m and angle θfeed = 90◦ from the nest.

TABLE 1 | Comparison of existing path integration (PI) models in terms of accuracy and efficiency.


In **Figure 8A**, we show the trajectories of the agent during five trials. The trial numbers are color-coded (see colorbox). During the first trial, the agent has not visited the feeder yet and returns home after tforage = 2, 000 s of random search. During the second trial (see yellow-colored trajectory), the agent finds the feeder and learns the GV representation from the PI state (see **Figure 8B**). Here the red dotted line indicates the correct angle θfeed = 90◦ to the feeder, while the cyan-colored line is the average angle estimated from the synaptic strengths of the GV array. In doing so, the agent is able to acquire an accurate vector representation (**Figure 8B**) resulting in stable trajectories toward the goal for the final three trials, which is again due to a low exploration rate (**Figure 8C**). The repeated visits to the feeder decrease the exploration rate due to the received reward (red line). In the final two trials, the agent navigates to the feeder on a stable trajectory (i.e., low exploration rate) demonstrating that the learning rule is robust for goal-directed navigation in noisy environments. Note, that the reward signal peak is decreased for the final two trials, because the agent does not enter the reward area centrally. Furthermore, switching the context unit to the inbound state is determined by the accumulated amount of reward over time. As such, smaller, but broader reward signals give a similar accumulated reward than a bigger and sharper signal.

In **Figure 9**, we simulated 100 learning cycles with different randomly generated environments, each consisting of 100 consecutive trials. The feeders are randomly placed by sampling from a uniform distribution U as follows:

$$r\_{feed} = (r\_{max} - r\_{min})\sqrt{n\_1} + r\_{min},\tag{25}$$

$$
\theta\_{\text{feed}} = 2\pi n\_2,\tag{26}
$$

$$n\_1, n\_2 \in \mathcal{U}(0, 1), \tag{27}$$

where rfeed is the distance from the nest to a feeder and θfeed is the angle with respect to the x axis. We chose the rmin = 1 m and rmax = 40 m to be the bounds, in which the feeders can be placed. The density is determined by how many feeders will be placed within these bounds. Here, we generated 50 feeders for each environment. In **Figure 9A**, we show the mean exploration rate, and the running averages of mean homing and goal success rates with respect to trials (foraging time tforage = 1, 000 s, averaged over 100 cycles). Note that the foraging time has been

reduced compared to **Figure 8**, because the random environment contain multiple, not just a single feeder. This leads to a higher probability of finding a feeder and for the learning algorithm to converge. During the 100 trials, learning converges on average within the first 20 trials given by a low mean exploration rate. Like in the previous experiment, the agent reaches the feeder in every trial after convergence is achieved. This is indicated by the goal success approaching one. Average homing success is one for every trial, which results from sufficient searching behavior and the given total time T. The convergence of the learning process is dependent on the foraging time, because longer time allow for longer foraging distances, and thus larger search areas. Therefore, we varied the foraging time tforage = 200, 400, 600, 800, and 1, 000 s and measure the mean goal success rate after 100 trials averaged over 100 cycles (**Figure 9B**). Note, that in contrast to naturalistic learning in ants, our agents reduces the exploration rate to zero leading to pure exploitation of the learned global vector. Ants live in environments with rather sparse, dynamic food sources, thus their exploitation of learned vector memories is rather flexible. Nevertheless, our results indicate that for longer foraging times, the mean goal success rate approaches one and its variance decreases. However, by measuring the averaged ratio of learned vector and nearest feeder distance, we show that this ratio decreases for larger foraging times (**Figure 9C**). Thus, there is a trade-off with respect to convergence and reward maximization, leading to an optimal foraging time. Desert ants have been shown to increase their foraging times up to a certain value, after which it saturates (Wehner et al., 2004). This adaptation of foraging time might be indicated by the trade-off resulting from our model. Furthermore, we encourage the reader to see the **Supplementary Video** of path integration and global vector learning performed by a simulated hexapod robot.

# 4. DISCUSSION

Social insects, such as bees and ants, use PI-based vector memories for guiding navigation in complex environments (Collett et al., 1998, 1999; De Marco and Menzel, 2005; Collett and Collett, 2015). Here, we proposed a novel computational model for combining PI and the acquisition of vector memories in a simulated agent. We have shown that a computational model based on population-coded vector representations can generate efficient and insect-like navigational behaviors in artificial agents. These representations are computed and stored using a simple neural network model combined with reward-modulated associative learning rules. Thus, the proposed model is not only accounting for a number of behavioral aspects of insect navigation, but it further provides insights in possible neural mechanisms in relevant insect brain areas, such as the central complex. In the following, we will discuss certain aspects of our model juxtaposing it with neurobiological findings in insects. Furthermore, we provide comparisons to other state-of-the-art models of vector-guided navigation (Kubie and Fenton, 2009; Cruse and Wehner, 2011).

# 4.1. Head-Direction (HD) Cells and Path Integration (PI)

A main property of the PI mechanism of our model is that it receives input from a population of neurons, which encode for allothetic compass cues. Here, we apply a cosine response curve for coarse encoding of orientations. Such a mechanism was previously applied by other models (Haferlach et al., 2007; Kim and Lee, 2011). Neurons in the central complex of locusts contain a population-coded representation of allothetic compass cues based on the skylight polarization pattern (Heinze and Homberg, 2007). Similarly, central complex neurons in the Drosophila brain encode for heading orientations based on idiothetic self-motion and visual landmarks. Seelig and Jayaraman (2015) measured the fluorescent activity of genetically expressed calcium sensors indicating action potentials, while the fly was tethered on an air-suspended track ball system connected to a panoramic LED display. Any rotation of the fly on the ball is detected and fed back by corresponding motions of the visual scene on the display. The activity of 16 columnar neurons, which display the full circular range, generates a single maximum, which moves according to the turns of the fly on the ball. Interestingly, even though the representation is generated by visual stimuli, it can be accurately maintained solely by self-motion cues over the course of several seconds in the dark. A recent study on dung beetles (el Jundi et al., 2015), which navigate completely unaffected by landmarks, has shown that celestial compass cues are encoded in the central complex revealed by electrophysiological recordings. Taken together, it is likely that the central complex of social insects contains a similar neural coding of polarization- and landmark-based compass cues. Not only is the central complex function and anatomy highly conserved across insect species, but behavioral experiments on ants and bees also suggest the central role of using polarization and landmark cues for navigation. Our model further predicts allothetic goal-direction cues to be involved in PI mechanisms. Such neural representations have yet to be observed in experiments, ideally by applying the tethered track ball setup described in Seelig and Jayaraman (2015). A recent study has developed such a system for the use in desert ants (Dahmen et al., 2017), providing a powerful tool for future investigation of underlying neuronal mechanisms by combining this technology with electrophysiological recordings.

In our model, we assume that the agent's walking speed is neurally encoded as a linear signal that modulates the amplitude of HD activities by an additive gain. A similar, so-called gater mechanism has been applied in a model by Bernardet et al. (2008). Such linear speed signals have recently been found to be encoded by neurons in the rat's medial entorhinal cortex (Kropff et al., 2015) as well as the cockroach central complex (Martin et al., 2015). This shared encoding mechanism indicates the necessity of linear velocity components for accurate PI (Issa and Zhang, 2012). The temporal accumulation of speedmodulated HD signals in our model is achieved by a selfrecurrent connection. Biologically, these recurrent connections can be interpreted as positive feedback within a group of neurons with the same preferred direction. Since our model applies PI as a scaffold for spatial learning, we apply this simplified accumulation mechanism to avoid random drifts observed in more complex attractor networks (Wang, 2001), which were applied in previous PI models (Touretzky et al., 1993; Hartmann and Wehner, 1995). We were also able to test the leaky-integrator hypothesis (Mittelstaedt and Glasauer, 1991) by fitting a single leakage parameter to observed behavioral data from desert ants (Müller and Wehner, 1988). The leakage parameter decreases the self-recurrent connection weight for leaky integration.

A HV representation is computed by using a cosine weight kernel, which was also used in Bernardet et al. (2008). Such a connectivity acts on each represented direction by adding the projections from other directions, respectively. This leads to the formation of an activity pattern with a single maximum across the population. The angle of the represented vector is readout by averaging the population vectors, while the distance is encoded by the amplitude of the population activity. We show that such a readout of a population-coded vector is sufficient to generate robust homing behavior in an artificial agent. Furthermore, it allows for accurate localization required for spatial learning of locations.

The extensive numerical analysis of noise affecting the accuracy of our PI mechanism leads to two predictions. First, PI accuracy seems to follow a similar function with respect to the noise levels for both the fully correlated and uncorrelated random fluctuations. While uncorrelated noise could be further filtered depending on the system size N, decorrelation of sensory input noise could be achieved by adding inhibitory feedback as shown in a model by Helias et al. (2014). Second, we varied the number of neurons N per layer for different levels of fully correlated noise, which predicts an accuracy plateau between 16 and 32 neurons where the accuracy will not increase for larger systems. This indicates that such a number of partitions for representing orientation variables is efficient and accurate enough. Interestingly, most prominent neuropils of the central complex exhibit a similar number of functional columns (Wolff et al., 2015). The central complex has been shown to be involved in sky compass processing (Heinze and Homberg, 2007), spatial orientation (Seelig and Jayaraman, 2015), and spatio-visual memory (Neuser et al., 2008; Ofstad et al., 2011). Its columnar and reverberating connectivity further supports the functional role of integrating orientation stimuli. These evidences suggest that the proposed circular arrays representing navigation vectors might be encoded in the central complex. We conclude that further experiments are needed to unravel how PI is exactly performed in the insect brain by closely linking neural activity and circuitry to behavioral function.

# 4.2. Reward-Modulated Vector Acquisition and the Role of Motivational Context

PI provides a possible mechanism for self-localization. As such, it has been shown experimentally that social insects apply this mechanism as a scaffold for spatial learning and memory (Collett et al., 2013). Here we propose a reward-modulated associative learning rule (Smith et al., 2008; Cassenaer and Laurent, 2012; Hige et al., 2015) for acquiring and storing vector representations. The acquisition and expression of such vector memories depend on the context during navigation. For GVs, the context is determined by the foraging state, which we model as a binary unit. Indeed, behavioral studies on desert (Wehner et al., 2006) and wood ants (Fernandes et al., 2015) have shown that expression of spatial memories is controlled by an internal state in a binary fashion. The association of the context with a reward signal, received at the feeder, drives synaptic weight changes corresponding to the difference between the current PI state and the respective weight. As this difference is minimized, the weights converge toward values representing the PI state when the reward was received at the feeder. Thus, like the HV, GVs are populationencoded with the angle determined by the position of the maximum activity and the length determined by the amplitude of the activity. To our knowledge, this is the first model that applies such a neural representation to perform vector-guided navigation. Previous models, such as Kubie and Fenton (2009); Cruse and Wehner (2011), do not provide possible underlying neural implementations of the PI-based stored information used for navigation. The HD accumulator model (Kubie and Fenton, 2009) argued that vector information is stored in so-called shortcut matrices, which are subsequently used for navigating toward goals. Similarly, the Cruse and Wehner model (Cruse and Wehner, 2011) stored HVs as geocentric coordinates in the activity of two neurons. Although it has been argued that this representation is biologically plausible, it is unlikely that persistent activity can explain global vector memories which are expressed over several days (Wehner et al., 2004). Furthermore, representing a two-dimensional variable requires at least three neurons, because firing rates are strictly positive. As such, existing models offer sufficient mechanisms in order to generate vectorguided navigation, they neither seem biologically plausible nor provide any explanations how such information is dynamically learned during navigation.

Our proposed encoding of GVs is validated by recent findings from a behavioral study on wood ants (Fernandes et al., 2015). The authors carried out a series of novel experimental paradigms involving training and testing channels. In the training channel, ants were trained to walk from their nest to a feeder at a certain distance, before they were transferred to the testing channel. There, they measured the expression of vector memories by observing the behavior. The authors showed that vector memories are expressed by successful association of direction and distance, therefore such memories might be encoded in a common neural population of the insect brain. The acquisition of vectors were rapid after 4–5 training trials, which corresponds to the rapid vector learning shown by our model during learning walks (**Figure 8**). However, the study mainly examined the expression of homeward vector memories which are not included in our model, because here the agent applies PI for homing. Recent work by Fleischmann et al. (2016) investigates landmark learning and memory during naturalistic foraging in the desert ant species Cataglyphis fortis. Like other desert ants, they spent the initial weeks of their lifetime inside the nest, before spending about a week foraging repeatedly for food to bring back to the nest. By placing controlled, prominent landmarks around the nest, the authors could measure the foraging routes of individual, marked ants. They also measured the accuracy

of landmark-guided memories by transferring inward running ants right before they entered the nest. Their results show that ants initially forage only within a short distance and duration, but more experienced foragers increase their average foraging range and duration. Furthermore, they paths become straighter and they are more successful in finding food (also shown in another desert ant species; Wehner et al., 2004). Taken together, their results indicate that landmark learning and memory is a gradual process. Our model does not model landmark guidance during foraging, but it provides a simple strategy that could support this gradual learning mechanism. Specifically, it could provide the agent with a directional bias, by which the agent can learn visual routes toward rewarding food sources (Ardin et al., 2016). Finally, possible interactions between path integration and landmark-based memories has been recently shown in behavioral experiments (Wystrach et al., 2015), and as such, a complete neural model of naturalistic foraging behavior remains to be future work.

Two major higher brain areas in social insects exhibit experience-dependent plasticity due to foraging activity: the mushroom bodies (Yilmaz et al., 2016) and the central complex (Schmitt et al., 2016). The mushroom bodies are paired neuropils known to be involved in olfactory learning and memory (Owald and Waddell, 2015), as well as visual learning in discrimination tasks (Vogt et al., 2014). Studies on the central complex across various insect species have revealed its role in visual object localization (Seelig and Jayaraman, 2013) and visual learning (Liu et al., 2006), motor adaptation (Strauss, 2002), spatio-visual memory (Neuser et al., 2008; Seelig and Jayaraman, 2015; Ofstad et al., 2011), as well as polarization-based compass (Heinze and Homberg, 2007). A common coding principle in the central complex appears to be the topological mapping of stimuli within the full azimuthal circle (Plath and Barron, 2015). Both higher brain neuropils involve the functional diversity of multiple neuropeptides and neurotransmitters (Kahsai et al., 2012). The short neuropeptide F is a likely candidate influencing the foraging state, as it has been shown to regulate feeding behavior and foraging activity after starvation (Kahsai et al., 2010). Based on this evidence, we conclude that the population-coded vector memories described by our model are likely to be found in the central complex. Nonetheless, we do not exclude the possibility of possible interactions between the central complex and the mushroom bodies involved in spatial learning and navigation, which is supported by recent findings on novelty choice behavior in Drosophila (Solanki et al., 2015).

We proposed a novel computational model for PI and the acquisition and expression of vector memories in artificial agents. Although existing vertebrate and invertebrate models (Kubie and Fenton, 2009; Cruse and Wehner, 2011) have followed a similar approach of implementing vector-guided navigation, here we provide plausible neural implementations of the underlying control and learning mechanisms. Tested on a simulated agent, we show that the proposed model produces navigational behavior in the context of realistic closed-loop body-environment interactions (Webb, 1995; Seth et al., 2005; Pfeifer et al., 2007). In our previous work, we applied this approach to study adaptive locomotion and climbing (Manoonpong et al., 2013; Goldschmidt et al., 2014; Manoonpong et al., 2014), goal-directed behavior (Dasgupta et al., 2014) and memory-guided decision-making (Dasgupta et al., 2013). Although our model does not reproduce the full repertoire of insect navigation, it has shown to be sufficient in generating robust and efficient vector-guided navigation. Besides behavioral observations, our model also provides predictions about the structure and plasticity of related neural circuits in the insect brain (Haberkern and Jayaraman, 2016). We discussed our findings in the context of neurobiological evidences related to two higher brain areas of insects, the central complex and the mushroom bodies. We therefore conclude that our model offers a novel computational model for studying vector-guided navigation in social insects, which combines neural mechanisms with their generated behaviors. This can guide future behavioral and neurobiological experiments needed to evaluate our findings.

# AUTHOR CONTRIBUTIONS

Conceived and designed the experiments: DG, SD, and PM. Performed the experiments: DG. Analyzed the data: DG, SD, and PM. Contributed reagents/materials/analysis tools: DG and SD. Wrote the paper: DG, SD, and PM.

# FUNDING

This research was supported by Centre for BioRobotics (CBR) at University of Southern Denmark (SDU, Denmark). DG was supported by the Fundação para a Ciência e Tecnologia (FCT). PM was supported by Bernstein Center for Computational Neuroscience II Göttingen (BCCN grant 01GQ1005A, project D1) and Horizon 2020 Framework Programme (FETPROACT-01-2016—FET Proactive: emerging themes and communities) under grant agreement no. 732266 (Plan4Act). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

# ACKNOWLEDGMENTS

We thank Florentin Wörgötter at the Department of Computational Neuroscience in Göttingen, where most of this work was conducted. DG and SD thank Taro Toyoizumi and his lab members at RIKEN BSI for fruitful discussions. We thank James Humble for comments on the manuscript.

# SUPPLEMENTARY MATERIAL

The Supplementary Material for this article can be found online at: http://journal.frontiersin.org/article/10.3389/fnbot. 2017.00020/full#supplementary-material

Supplementary Video | Path integration and global vector learning in a simulated hexapod robot.

#### REFERENCES


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2017 Goldschmidt, Manoonpong and Dasgupta. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Closed-loop Robots Driven by Short-Term Synaptic Plasticity: Emergent Explorative vs. Limit-Cycle Locomotion

#### Laura Martin, Bulcsú Sándor\* and Claudius Gros

Institute for Theoretical Physics, Goethe University Frankfurt, Frankfurt am Main, Germany

We examine the hypothesis, that short-term synaptic plasticity (STSP) may generate self-organized motor patterns. We simulated sphere-shaped autonomous robots, within the LPZRobots simulation package, containing three weights moving along orthogonal internal rods. The position of a weight is controlled by a single neuron receiving excitatory input from the sensor, measuring its actual position, and inhibitory inputs from the other two neurons. The inhibitory connections are transiently plastic, following physiologically inspired STSP-rules. We find that a wide palette of motion patterns are generated through the interaction of STSP, robot, and environment (closed-loop configuration), including various forward meandering and circular motions, together with chaotic trajectories. The observed locomotion is robust with respect to additional interactions with obstacles. In the chaotic phase the robot is seemingly engaged in actively exploring its environment. We believe that our results constitute a concept of proof that transient synaptic plasticity, as described by STSP, may potentially be important for the generation of motor commands and for the emergence of complex locomotion patterns, adapting seamlessly also to unexpected environmental feedback. We observe spontaneous and collision induced mode switchings, finding in addition, that locomotion may follow transiently limit cycles which are otherwise unstable. Regular locomotion corresponds to stable limit cycles in the sensorimotor loop, which may be characterized in turn by arbitrary angles of propagation. This degeneracy is, in our analysis, one of the drivings for the chaotic wandering observed for selected parameter settings, which is induced by the smooth diffusion of the angle of propagation.

Keywords: closed-loop robots, short-term synaptic plasticity, limit cycles, sensorimotor loop, self-organized locomotion, compliant robot

## 1. INTRODUCTION

It has been argued (Pfeifer et al., 2007; Aguilar et al., 2016) that "robophysics," defined as the pursuit of the discovery of biologically inspired principles of self generated motion, may constitute a promising road for eventually achieving life-like locomotor abilities. Distinct principles such as predictive information (Ay et al., 2008), surprise minimization (Friston, 2011), chaos control (Steingrube et al., 2010), empowerment (Salge et al., 2014), homeokinesis (Der and Martius, 2012), cheap design (Montúfar et al., 2015), and curiosity (Frank et al., 2014) have been studied in this

#### Edited by:

Poramate Manoonpong, University of Southern Denmark, Denmark

#### Reviewed by:

Malte Schilling, Bielefeld University, Germany Ralf Der, Max Planck Institute for Mathematics, Germany Georg Martius, Institute of Science and Technology Austria, Austria Keyan Ghazi-Zahedi, Max Planck Institute for Mathematics in the Sciences, Germany

#### \*Correspondence: Bulcsú Sándor

sandor@itp.uni-frankfurt.de

Received: 09 August 2016 Accepted: 03 October 2016 Published: 18 October 2016

#### Citation:

Martin L, Sándor B and Gros C (2016) Closed-loop Robots Driven by Short-Term Synaptic Plasticity: Emergent Explorative vs. Limit-Cycle Locomotion. Front. Neurorobot. 10:12. doi: 10.3389/fnbot.2016.00012 context. Behavior, resulting from guided self organization (Prokopenko, 2009) or autonomous adaption (Chiel and Beer, 1997), may be generated in addition through suitable synaptic (Der and Martius, 2015; Der, 2016) and intrinsic (Sándor et al., 2015) plasticity rules.

Here we point out, that complex dynamics may be generated through a transient plasticity mechanism widely present in the brain. Short-term synaptic plasticity (STSP) (Fioravante and Regehr, 2011; Regehr, 2012) is an activity induced transient modulation of the synaptic efficiency, which may lead either to facilitating or to depressing behavior lasting from a few hundred to a few thousand milliseconds. STSP has been argued, besides others, to be relevant or causal for working memory (Barak and Tsodyks, 2014), for the facilitation of time sequences of alternating neural populations (Carrillo-Reid et al., 2015), for motor control in general (Nadim and Manor, 2000), and for the sculpting of rhythmic motor patterns (Jia and Parker, 2016) in particular. Plasticity mechanisms similar to STSP have also been shown to allow for stable gaits (Toutounji and Pasemann, 2014) in neural networks which are distinctively simpler than the ones used conventionally for bio-inspired controllers (Schilling et al., 2013).

In this study we use the LPZRobots physics simulation package (Der and Martius, 2012) for the investigation of the spherical three-axis robot illustrated in **Figure 1**. This robot is driven exclusively by STSP, with locomotion coming to a stillstand both in the absence of synaptic plasticity and when the feedback from the environment is cut off, e.g., when the gravitational constant is set to zero. We find a surprisingly large palette of self-organized motion primitives, which includes a chaotic phase. The locomotion observed is flexible, in all modes, readjusting seamlessly to disturbances like the collision of the robot with obstacles.

The capability of STSP to have a large impact on locomotion can be traced back in our analysis to the destabilizing effect short-term synaptic plasticity may have on attracting states of the controlling network, inducing attractor-to-attractor transitions within timescales of the order of a few hundred milliseconds. We corroborate this findings by short-circuiting the sensori-motor loop, viz by taking out the environment. Transitions between distinct limit cycles within the full sensori-motor loop are found in addition in the chaotic mode.

### 2. MATERIALS AND METHODS

### 2.1. Tsodyks-Markram Model with Full Depletion

The way neurotransmitters are released through the synaptic cleft may change transiently upon repeated presynaptic activity (Tsodyks and Markram, 1997), both for excitatory (Wang et al., 2006) and for inhibitory (Gupta et al., 2000) synapses. Physiologically this is, on the one side due to an increase of the Ca-concentration u ∈ [1, Umax] within the presynaptic bulge, facilitating the release of the respective neurotransmitter, and, on the other side, due to the decrease of the number ϕ ∈ [0, 1] of available vesicles of neurotransmitters. We use here with

$$\begin{aligned} \dot{u} &= \frac{U(\wp) - u}{T\_u}, & U(\wp) &= 1 + (U\_{\text{max}} - 1)\wp \\ \dot{\varphi} &= \frac{\Phi(u, \wp) - \varphi}{T\_{\varphi}}, & \Phi(u, \wp) &= 1 - \frac{u\wp}{U\_{\text{max}}} \end{aligned} \tag{1}$$

a modified version of the original Tsodyks-Markram model (Tsodyks and Markram, 1997; Hennig, 2013), in which the the Ca-concentration u and the number of vesicles ϕ of a given synapse relax to target values U = U(y) and 8 = 8(u, y), determined in turn by the level y ∈ [0, 1] of the presynaptic activity. A prolonged maximal presynaptic activity y ≡ 1 would lead with ϕ → 0 to a full depletion of the reservoir of vesicles.

The dynamics of the full depletion model (1) is determined by the relaxation time constants T<sup>u</sup> and Tϕ, and by the maximal level Umax of the Ca concentration. For Umax = 1 a monotone depression is present, whereas Umax > 1 initially generates facilitation by a fast calcium influx, being annulled later on by the depletion of neurotransmitters. Overall, the synaptic efficiency is proportional to uϕ, viz to the number of vesicles and to the release probability (which in turn is assumed to be proportional to u). We use T<sup>u</sup> = 300 ms and T<sup>ϕ</sup> = 600 ms, together with either

FIGURE 1 | Left: A snapshot of the spherical robot from the LPZRobots simulation environment (Martius et al., 2013). The three weights (red, green, and blue) can move along the respective rods without interference. Right: A sketch of the robot with the three perpendicular rods together with the three weights of mass m. The red vertical dashed lines show the actual position x (a) i and a putative target position x (t) i of the red weight along its rod. A damped spring with spring constant k and damping γ then pulls the weight toward the target position, which is given in turn by the output of a controlling neuron (compare Figure 2).

FIGURE 2 | Left: Sketch of the sensorimotor loop of the three-axis spherical robot illustrated in Figure 1. The three weights i = 1, 2, 3 with masses m are each controlled by a single neuron. The excitatory input w0(x (a) i + pR)/(2pR) of neuron i is proportional to the proprio-sensory measurement of the actual position x (a) i ∈ [−R, R] of the i-th mass (p ∈ [0, 1]). The neuron also receives inhibitory inputs −z0ϕju<sup>j</sup> y(xj ) from the other two neurons (j 6= i). The output y(x<sup>i</sup> ) of the i-th neuron determines via x (t) i = pR[2y(x<sup>i</sup> ) − 1] the target position of the i-th mass. Right: A network of (three) neurons having the identical topology as the one of the three-axis spherical robot, but with the feedback of the environment short-cut by identifying the actual position x (a) i with the target position x (t) i .

Umax = 1 or Umax = 4. These values are within the typical range of what is physiologically observed (Gupta et al., 2000; Wang et al., 2006).

#### 2.2. The Robot

The movement of robot illustrated in **Figure 1** is induced by the relative gravitational pull of the three weights, together with the rolling friction and angular momentum conservation. The individual neuronsi = 1, 2, 3 are modeled as rate-encoding leaky integrators,

$$\dot{\mathbf{x}}\_{i} = -\Gamma \mathbf{x}\_{i} + \frac{\mathbf{w}\_{0}}{2\rho R} \left( \mathbf{x}\_{i}^{\{a\}} + \rho \mathbf{R} \right) - z\_{0} \sum\_{j \neq i} u\_{j} \varphi\_{j} \mathbf{y}\_{i}(\mathbf{x}\_{j}),$$

$$\mathbf{y}(\mathbf{x}\_{j}) = \frac{1}{1 + \exp(-a\mathbf{x}\_{j})},\tag{2}$$

where x<sup>i</sup> and y(xi) are the respective membrane potentials and firing rates. Ŵ is the relaxation rate, R the diameter of the robot, p ∈ [0, 1] a rescaling factor, x (a) <sup>i</sup> <sup>∈</sup> [−R, <sup>R</sup>] the sensory reading of the actual position of the weight on the rod, w<sup>0</sup> > 0 the weight of excitatory input and z<sup>0</sup> > 0 the magnitude of the interneural inhibitory connections. We note that the variables of the STSP, u<sup>j</sup> and ϕ<sup>j</sup> , as described by Equation (1), depend only on the presynaptic activity and can hence be attributed altogether to the presynaptic neuron. For the slope of the sigmoidal a = 0.4 has been selected. The weight of the excitatory input w<sup>0</sup> is not modulated here by short-term synaptic plasticity, corresponding to a direct sensory reading.

We selected with p = 1/2 a reduced range for the target position x (t) i ,

$$\mathbf{x}\_{i}^{(t)} = \rho \mathbb{R} \left[ 2\mathbf{y}(\mathbf{x}\_{i}) - 1 \right], \qquad \qquad \mathbf{x}\_{i}^{(t)} \in [-\rho \mathbb{R}, \rho \mathbb{R}].\tag{3}$$

This choice allows to avoid dynamic overshooting of the weight when accelerated from its actual to the target position. The force accelerating the weight is calculated by the LPZRobots package by simulating a damped spring:

$$m\ddot{\mathbf{x}}\_i^{(a)} = -k(\mathbf{x}\_i^{(a)} - \mathbf{x}\_i^{(t)}) - \gamma \frac{d(\mathbf{x}\_i^{(a)} - \mathbf{x}\_i^{(t)})}{dt} + F\_i, \quad \mathbf{x}\_i^{(a)} \to \mathbf{x}\_i^{(t)}, \tag{4}$$

where k is the spring constant and γ the damping. Centrifugal and other induced forces, F<sup>i</sup> , act additionally in Equation (4) on the individual weights. The complete setup of the three-neuron network is illustrated in **Figure 2**.

#### 2.3. Simulation parameters

The LPZRobots simulation environment (Der and Martius, 2012) is an interactive simulator based on the ODE (Open Dynamic Engine) (Smith, 2005). LPZRobots contains rigid body dynamics in terms of a library of basic primitive objects, such as spheres and cuboids, as well as a variety of joints, sensors and surface materials.

We used roughness = 0.8, slip = 0.01, hardness = 40 and elasiticity = 0.5 for the collision and friction properties together with friction = 0.3 (the rolling friction coefficient), gravity = −9.81 (the gravitational constant) and noise = 0 (for the global noise level). All parameters are in SI units. For the stepsize of the physical simulation simstepsize = 0.001 was used (corresponding to a millisecond). With controlinterval = 1 one ensures that the controller, viz Equation (2), is updated as often as the physics of the environment.

The robot itself has a diameter of 2R = 0.5, a mass off M = 1 and a motorpowerfactor = 120. The parameters for the damped oscillator (Equation 4) are m = 1, k = m ∗ motorpowerfactor and γ = 2 √ k ∗ m (critical damping). The relaxation rate for the membrane potential entering Equation (2) has been set to Ŵ = 20, retaining the bare excitatory and inhibitory weights, w<sup>0</sup> and z0, as free simulation parameters.

#### 3. RESULTS

#### 3.1. Emergent Limit-Cycle Locomotion

In **Figure 3** we present the stability regions for the various regular movement patterns found, with respective close-ups given in

observed are indicated by black filled circles (at the tip of the respective arrows).

**Figure 4**. The results are for Umax = 1 (depressing short-term synaptic plasticity without Ca dynamics) and for the parameters specified in Section 2.3. They are obtained by adiabatically continuing stable states along a grid until stability is lost. Without STSP only a globally attracting fixpoint corresponding to a motionless robot is present. We note that regular motion arises for a wide range of bare excitatory (w0) and inhibitory (z0) synaptic weights. z<sup>0</sup> needs however to be larger than w0.

All motion patterns observed are self-organized. There is no objective function (Gros, 2014), such as a maximal velocity, to be optimized. This implies that the quantitative features of the individual motion patterns change smoothly within their respective stability regions, and that one can identify the observed regular movement patters as stable limit cycles in the sensorimotor loop (Sándor et al., 2015). Fast switching between motion primitives would be possible by a putative overarching controller, since more than one limit cycle may be stable for given synaptic weights w<sup>0</sup> and z0. Interactions between robots or with external obstacles might also lead to the automatic selection of another coexisting mode (see for instance Supplementary Video 1).

It is evident that the body plan of the robot examined here tends to produce meandering motion pattern. T1 and T2 are sun- and star-like movements with small (T1) and large (T2) processing angles (compare **Figure 4**; "T" stands for torus in phase space). There is, in addition, a (nearly pure) circular motion, C1, and three types of forward snake-like meandering motion patters, S1, S2, and S3. From these S3 partly overlaps with itself. These modes are characterized by distinct motion patterns of the three weights, as shown in **Figure 5**, as measured by their positions along their respective rods. The differences between the distinct modes are in part qualitative, in terms of the time sequences in which the three neurons are subsequently active, and in part only quantitative. The difference between T1 and S1 is, in this respect, that the up-times of the two active neurons are symmetric for S1, but not for T1. A spontaneous symmetry

breaking can be furthermore observed in case of T1, S1, S2, S3, for which two weights always have alternating dynamics, the third one showing a qualitatively different behavior. In contrast to that, the time-series of the C1 and T2 modes reveals the symmetrical but phase shifted oscillation of the three weights. Note that the positions of the weights may overshoot the interval [−pR, pR] for the target positions x (t) i , both due to inertia and due to the additional gravitational pull. Motion patterns similar to the ones shown in **Figure 4** have been observed in a self-organized two-wheeled robot in the frozen mode (Der and Martius, 2013).

# 3.2. Chaotic Modes Allowing for Explorative Behavior

The dynamics of the robot takes place in a phase space combining the internal variables, of both body and controller, with the ones of the environment. The stability regions of the individual limit cycles presented in **Figure 3** will therefore be bounded, generically, by a suitable bifurcation, such as a supercritical Hopf bifurcation or a fold bifurcation of limit cycles (Gros, 2015; Sándor et al., 2015). Alternatively, a transition to chaos may occur. It is on the other side also possible that chaotic attractors emerge from previously unstable manifolds and that the stability region of chaotic and stable manifolds overlap.

Close to a chaotic phase long transients may occur, which makes it difficult to study systematically the exact extend of the chaotic region. In **Figure 3** we have indicated however a few representative combinations of parameters, for which stable chaos is observed both in the limit of long simulations times and for a wide range of stepsizes of the ODE simulator. No regular motion patterns can be observed in the screenshots presented in **Figure 6**. We have also evaluated the long-time behavior of the square of the covered real-space distance,

$$d^2(\mathbf{r}) = \langle \left( \mathbf{x}(t+\mathbf{r}) - \mathbf{x}(t) \right)^2 \rangle\_t. \tag{5}$$

We found diffusive transport d ∼ √ τ for the chaotic mode and ballistic transport d ∼ τ for the forward meandering modes S1, S2, and S3. Both as expected.

It has been observed, that chaotic locomotion of an embodied system may be considered as a basic explorative behavior, both of the environment and of the own motor pattern (Steingrube et al., 2010; Shim and Husbands, 2012). As a test of this hypothesis we have set our three-rod robot into a restricted playground containing movable objects in the form of blocks, which can be pushed, to a certain extend, over the ground. A screenshot is presented in **Figure 6**. One can observe, that the robot stays for a while close to the object, bumping around, and retracting in part a trajectory having a shape similar to the one generated by a C1 limit cycle. This is possible, as the set of parameters (w0, z0) = (210, 400) considered is located close to but outside the C1-stability region. The C1 limit cycle is hence only weakly unstable in the chaotic phase. The active exploration of the environment, occurring here when bumping into obstacles, gives the robot hence access to otherwise unstable locomotion options. The overall behavior may be interpreted alternatively in terms of non-representational sensorimotor knowledge (Buhrmann and Di Paolo, 2014). For a longer simulation see the Supplementary Videos.

motion of the robot is close to the one of the S2 mode, which is here an unstable attractor (compare Figure 4). Left: In open space. Right: In a closed environment allowing for the interaction with movable objects (yellow blocks). The circular sections correspond to unstable C1 limit cycles. A close-up to the dynamics and a longer simulation in the maze can be seen in Supplementary Videos 2, 3 respectively.

In the movie presented in the Supplementary Material one can observe, furthermore, that the robot is pushing the blocks around in a seemingly "playful" manner (see Supplementary Video 3). A remarkable behavior, in our view, considering that the sphere robot disposes of a mere total of three controlling neurons. We note, that this complex behavior results from the interplay of the autonomous dynamics, as resulting from the inter-neural short-term synaptic plasticity, with environmental feedback.

# 3.3. Embodiment Shaping the Intrinsic Dynamics

One can consider the controlling 3-neuron network in isolation by identifying the sensory reading x (a) i for the actual position of the weight along the rod with the respective target position x (t) i , viz by setting x (a) <sup>i</sup> = x (t) i in Equation (2). The resulting network contains a self-excitatory coupling w<sup>0</sup> together with all-to-all inhibition with a bare synaptic strength z0. The short-term synaptic plasticity then induces an autonomous activity, as illustrated in **Figure 7**, which is topologically equivalent to the C1 mode. This equivalence becomes even more pronounced when suspending the robot in air, which can be achieved in turn by simply removing gravity from the physics simulation (bottom time-series in **Figure 7**). One can hence consider the C1 mode as the driver for the observed physical motion.

The isolated 3-neuron network has, however, only a single stable limit cycle. Numerically integrating the isolated network for parameters settings (w0, z0) corresponding to the six modes of **Figure 5**, as well as for chaotic states, we find always an identical sequential activation of the three neurons illustrated in **Figure 7**, with only slight changes in the overall shape. It is hence clear, that the other modes T1, T2, S1, S2, and S3, as well as the chaotic behavior, do result from the closed-loop feedback of the environment. The interaction of the environment with the intrinsic dynamics then results in the emergence of alternative types of locomotion.

### 3.4. Stability with Respect to Noise

We present in **Figure 8** an analysis of the stability of the various modes found, with respect to noise in the sensory readings, where the level of the noise is given by the relative standard deviation σ of the sensory readings x (a) i . Comparing with the phase diagram, as presented in **Figure 3**, one notices that first modes to disappear, T1 and S3, are the ones with small stability regions in the phase diagram. Ramping up the noise level the T1 and S3 modes turn respectively, above their corresponding critical noise levels, into C1 and S1 modes. The other modes, including the chaotic phase, are in contrast very stable with respect to noise.

### 3.5. Autonomous Mode Switching

We present in **Figure 9** the phase diagram obtained when using Umax = 4 for the maximal Ca-level entering Equation (1). Within the range of (w0, z0) scanned we find four out of the six modes observed for Umax = 1 (compare **Figure 3**). The range of inhibitory weights z<sup>0</sup> for which stable locomotion is found is rescaled down, in addition, with respect to the Umax = 1 case. Interestingly we found a chaotic state at (180, 80) which lies just inside the stability region of the C1 mode.

We did let the robot evolve within the borders of a simple maze, as shown in **Figure 10** and Supplementary Video 4. Most of the time the robot is in the chaotic state, which is the dominant mode for the parameters used, namely (w0, z0) = (180, 80) and Umax = 4. Intermittently, after colliding with a wall, the robot switches to the coexisting C1 mode. The radius of the stable C1 limit cycle in real-world coordinates is however so large, for (w0, z0) = (180, 80), that it does not fit into the maze.

on a relatively large scale.

The robot hence continues exploring. We have obtained similar results when using a Umax = 1 chaotic mode.

A screenshot of a trajectory in open space is presented in **Figure 11**. One notices, that the Umax = 4 and (w0, z0) = (180, 80) chaotic mode wanders around aimlessly in much smother manner, than the Umax = 1 chaotic mode shown in **Figure 6**. This is the result of topologically different attractor structures, as seen in the phase space of internal variables (see the Supplementary Materials). Different types of chaos are indeed known to exist (Wernecke et al., 2016).

The autonomous mode switching observed for the regular motion primitives can also be seen in Supplementary Video 1. For a detailed discussion of the possible switching scenarios see the Supplementary Materials.

# 3.6. Switching between Degenerate Unstable Limit Cycles

In **Figure 12** we compare for the two chaotic modes, realized for Umax = 1 and for Umax = 4 respectively, the time series for the positions of the weights along the rods. One observes, that the movements of the weight is qualitatively similar, on short time scales, to an S2 mode (compare **Figure 5**, see also Supplementary Video 3). It is interesting, in this context, that the S2 mode has two types of degeneracies.

• Continuous. The S2 mode may propagate in any direction. There is hence a continuous manifold of attractors in the combined phase of controller, body and environment. Outside the actual region of stability this manifold contains either unstable limit cycles or limit cycle relicts (Gros, 2009).

• Discrete. There is a spontaneous symmetry breaking in the S2 mode, with two weights having identical but phase shifted movement patterns along their respective rods, which are qualitatively different to the trajectory of the third weight (see **Figure 5**).

For the Umax = 4 chaotic mode we did not observe discrete mode switching, in above sense, which however occurs frequently for the Umax = 1 mode (see **Figure 12**). The chaotic meandering observed for the Umax = 4 chaotic mode, as evident in **Figure 11**, is hence a consequence of a smooth diffusion of the angle of propagation on the manifold of unstable S2 limit cycles (or limit cycle relicts Linkerhand and Gros, 2013). In the phase space of the neural activity (as shown in Supplementary Figure 5), the trajectory corresponds to a chaotic phase diffusion along a limit cycle (Wernecke et al., 2016). This process is determinstic and not due to numerical errors, as we have checked by systematically reducting the stepsize used for the numerical integration. Noise is absent.

# 4. CONCLUSIONS

We have shown here, that a robot controlled by only a very limited number of neurons, three in our case, may show complex behavior which may be interpreted as explorative or playful. This is possible when locomotion results from selforganizing processes in the sensorimotor loop. The driving control dynamics, for which we have considered here shortterm synaptic plasticity, then adapts itself seemingless to the physical requirements. No central controller is needed to detect an external object (Rai et al., 2014), or to switch direction

occur for the case of Umax = 1, but not for Umax = 4.

when colliding with it. Stable and unstable limit cycles, together with chaotic attractors, arise in the phase space of internal (control and robot body) variables. These attractors form continua in the space of physical location and overall propagation direction, with the chaotic locomotion transitioning between unstable limit cycles. Transitions may either be between different types of regular locomotion, bounded circular or propagation meandering modes, or between the directions of unstable propagating limit cycles.

We note that the formation of a continuum of attractors is possible, whenever internal and external variables can be separated, such that internal variables span an independent subset of the phase space of the dynamical system. Here, the position of the robot (on the ground plane, in the absence of obstacles) acts as an external variable, all the other variables being independent of it. The limit cycles and chaotic attractors, living in the subspace of internal variables, exist thus for all position vectors, generating a continuous degeneracy of locomotion modes. The interactions with other robots and obstacles then results in a transient breakdown of this degeneracy, which is restored instantaneously with the termination of physical contact. Within this context, higher order control mechanisms would correspond to an externalvariable dependent feedback, shaping the attractors either intermittently or slowly (with respect to the internal dynamics), thus leading possibly to the emergence of transiently stable attractors.

Our result, that the three-rod robot switches spontaneously between a continuous set of attractors, in the chaotic state, can be seen as a realization of chaotic wandering (Tsuda, 2001), which has been argued in turn to occur in the brain in the form of selforganized instabilities (Friston et al., 2012), viz as transient-state dynamics (Gros, 2007). There is furthermore a close relation to the concept of attractor metadynamics (Gros et al., 2014), which denotes the either induced or spontaneous switching between attracting sets.

The here simulated robot is furthermore compliant both on the level of control and actuators, showing a highly flexible response. The actuators are implemented by specifying a target position for a limb, here a moving weight on a rod. The force acting on the weight then results from the interplay between the internal driving, provided by a damped spring (between the actual and the target position), with the physical restoring forces acting on the weights, which in turn depend on the body dynamics determined by the interaction with the ground, obstacles and other robots (Floreano et al., 2014).

The isolated controlling network (realized in the limit of infinitely strong actuators) can be interpreted in addition as a central pattern generator (Steingrube et al., 2010), having a single intrinsic limit-cycle attractor. The openloop control incorporates however the feedback of the environment through the induced forces. We find here, that the resulting embodiment (Cangelosi et al., 2015) does morph the driving dynamics of the central pattern generator not only quantitatively, but also qualitatively, giving rise to a vast array of modes which differ in part topologically from the dynamics of the underlying central pattern generator. We believe that this dynamical systems approach of the locomotion of simple robots has not been fully exploited yet, having many interesting features and applications in store for the field of neurorobotics.

#### REFERENCES


#### AUTHOR CONTRIBUTIONS

The experiments were conceived and designed by CG, BS, and LM, performed mainly by LM with BS adding some data. The data was analyzed by CG, BS, and LM, most of the plots produced by LM. The manuscript was mostly written by CG, with BS adding some paragraphs and revising it with LM.

#### SUPPLEMENTARY MATERIAL

The Supplementary Material for this article can be found online at: http://journal.frontiersin.org/article/10.3389/fnbot. 2016.00012


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2016 Martin, Sándor and Gros. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# ReaCog, a Minimal Cognitive Controller Based on Recruitment of Reactive Systems

Malte Schilling<sup>1</sup> \* and Holk Cruse<sup>2</sup>

*<sup>1</sup> Center of Excellence Cognitive Interaction Technology, Bielefeld University, Bielefeld, Germany, <sup>2</sup> Department of Biological Cybernetics and Theoretical Biology, Bielefeld University, Bielefeld, Germany*

It has often been stated that for a neuronal system to become a cognitive one, it has to be large enough. In contrast, we argue that a basic property of a cognitive system, namely the ability to plan ahead, can already be fulfilled by small neuronal systems. As a proof of concept, we propose an artificial neural network, termed reaCog, that, first, is able to deal with a specific domain of behavior (six-legged-walking). Second, we show how a minor expansion of this system enables the system to plan ahead and deploy existing behavioral elements in novel contexts in order to solve current problems. To this end, the system invents new solutions that are not possible for the reactive network. Rather these solutions result from new combinations of given memory elements. This faculty does not rely on a dedicated system being more or less independent of the reactive basis, but results from exploitation of the reactive basis by recruiting the lower-level control structures in a way that motor planning becomes possible as an internal simulation relying on internal representation being grounded in embodied experiences.

#### Edited by:

*Poramate Manoonpong, University of Southern Denmark, Denmark*

#### Reviewed by:

*Yulia Sandamirskaya, University of Zurich, Switzerland Yoonsuck Choe, Texas A&M University, USA Michail Maniadakis, Foundation for Research & Technology – Hellas, Greece*

#### \*Correspondence:

*Malte Schilling mschilli@techfak.uni-bielefeld.de*

Received: *18 September 2016* Accepted: *11 January 2017* Published: *30 January 2017*

#### Citation:

*Schilling M and Cruse H (2017) ReaCog, a Minimal Cognitive Controller Based on Recruitment of Reactive Systems. Front. Neurorobot. 11:3. doi: 10.3389/fnbot.2017.00003* Keywords: reactive system, cognitive system; internal model, motor planning, internal simulation, neural networks, attention

# INTRODUCTION

Over the last years more and more findings in neuroscience have shown that higher level cognitive capabilities cannot be detached from the functioning of lower level sensorimotor control systems (van Duijn et al., 2006; Barsalou, 2008) which is the core idea of embodied cognition as a field. It is assumed that cognition recruits the underlying sensorimotor systems (Anderson, 2010). Intensively studied examples controlled by such sensorimotor, or reactive, systems are insects. Already a lot is known about their structure and properties of their sensorimotor systems (Menzel et al., 2007; Cruse et al., 2009) which allows to build well performing biologically inspired systems (Pfeifer et al., 2007; Ijspeert, 2014). But it is still unclear if all the crucial properties are understood that are required to form the basis for a cognitive system. Do the known principles allow to leverage the sensorimotor control systems toward cognition?

A basic problem concerns what, after all, is meant by the term "cognition." Definitions cover various ideas, reaching from Maturana and Varela (1981) "life is cognition" (which would include even bacteria to be cognitive systems), Engel et al. (2013) who note that "cognition is action." Other authors avoid the problem of a short definition, which almost inevitably includes comparatively simple systems, by listing a collection of phenomena to characterize cognitive systems (e.g., Khlentzos and Schalley, 2007; Menzel et al., 2007). The most important faculties generally agreed as to characterize a cognitive system are attention, awareness, emotion, learning, specific aspects of memory, language as well as thinking, reasoning, planning ahead, decision making, volition, Theory of Mind or even subjective feelings and consciousness (for another list proposed by Langley et al. (2009, see Discussion). In this article, we will not enter this discussion but focus on basic properties discussed by several authors as to be crucial for a cognitive system, namely the ability to invent new behaviors and the ability to plan ahead the latter being required to test the feasibility of the new invention.

Lower level behaviors, often termed reactive or automatic, controlled by "reactive systems," require procedural elements ensuring survival and allowing for basic behavioral abilities, e.g., locomotion, feeding, object avoidance. The combination of such controllers may also be suited to guide seemingly more complex behaviors (e.g., navigation). These controllers constitute the procedural memory of the system. Exploiting the loop through the world (Brooks, 1989) even a "hard-wired" memory system allows for adaptation to changing environments as will be illustrated in the second section (Reactive Walker). In reactive systems many of these procedures (or "action-perception circuits," Pulvermüller and Garagnani, 2014) can be active at the same time, but they may also compete amongst each other for controlling the system (Brooks, 1989). Therefore, a crucial ability for each behaving system—including reactive systems is the ability to select one among different possible actions. This architecture is inspired by earlier authors as Arbib (1998), Brooks (1991b), and Minsky (1986).

Reactive systems, by definition, do not belong to the field of cognition. However, many authors (e.g., Newell, 1994; Anderson, 2010; Glenberg and Gallese, 2012) argue that cognition in all known systems is strongly based on and is intimately connected with a functional reactive system. Even more, as proposed by Barsalou (2008) and others, reactive (or behavior-based) systems having internal states (as introduced in the second section, Reactive Walker) plus being embodied are basic requirements for a system to become a cognitive one. As already noted briefly above, there is indeed strong support showing that neuronal elements forming cognitive properties are tightly intertwined with the reactive system itself and a functional separation is not possible. For example, planning of a movement is interpreted in this view as a mental enactment of the movement (Jeannerod, 2001; Hesslow, 2002). This view is supported as brain regions that formerly were assumed as being highly specialized, for example the motor area, are also activated during language processing or perception (Feldman and Narayanan, 2004; Buccino et al., 2005; Pulvermüller, 2005; Jeannerod, 2006; Pulvermüller and Garagnani, 2014). More generally, Gallese and Lakoff state that "a key aspect of human cognition is. the adaptation of sensorymotor brain mechanisms to serve new roles in reason and language, while retaining their original function as well." (Gallese and Lakoff, 2005, p. 456). This is supported by behavioral research showing that behavioral and cognitive processes are functionally related insofar as both processes seem to apply the same structuring principles and seem to have access to memory in a structurally similar way (e.g., Jeannerod and Decety, 1995; Cross et al., 2006; Barsalou, 2008; Barsalou et al., 2012).

What distinguishes a reactive system from a cognitive one? A key feature that might be suited for a distinction between reactive, or behavior-based, systems, and cognitive systems is that the former are restricted to apply their procedural memory elements (or internal representations, or internal models) only in the context in which the latter have been acquired (Wilson, 2008). For example, a specific movement (e.g., grasping a specific type of prey) is stored as a (congenital or learned) procedural memory. The content of this memory element may also be considered as a model of that movement, which can—in a reactive system only be triggered by a specific stimulus, the specific prey. In contrast, cognitive systems are able to modify their behaviors and thereby may come up with solutions for a novel task (Glenberg and Gallese, 2012). A novel task is considered here a task in which, in the current context, none of the existing procedural memory elements can be applied to solve the problem, as none of the available procedures are able to deal with the actual situation or to predict the resulting consequences. Therefore, to approach a cognitive level, one has to search for systems that are creative, i.e., able to alter their procedural memory elements or to compose them in a new way allowing the system to handle such a novel tasks. This characterization agrees with the statement of Limongelli et al. (1995) "cognition is the ability to relate different unconnected pieces of information in new ways and apply the resulting knowledge in an adaptive manner." Taking a broader view, Anderson (2010), in his massive redeployment hypothesis, states that "neural reuse" is a fundamental principle not only applied in evolutionary time scales but also for solving current problems by a cognitive system. Thus, in this article we will focus on a system that is able to find solutions for novel tasks.

What are the prerequisites to find a solution to a current problem? One way to find new solutions is to apply a search strategy based on simple trial and error. But trial and error is a risky approach and generally quite slow. As an alternative, "internal trial-and-error" could be applied. This means that in addition to the ability to modify the procedures and their composition, such systems are able to anticipate consequences of new actions which enables the agent to decide based on these predictions (Hesslow, 2002). These aspects have already been captured by McFarland and Bösser (1993) who indeed define cognition as the faculty to plan ahead. Planning ahead allows to verify the feasibility of new solutions before execution. Therefore, planning ahead is the second basic property of our system. The ability to predict requires internal models, or internal representations.

Because our system is characterized here as to search for new solutions by exploiting the already existing memories (or internal models) in a flexible way, i.e., not only in a specific context, but in different contextual situations, an organizational scheme is required that allows for compositionality and modulation of specific parameters. In the third section (Motor Planning) we will provide a simple solution for this problem.

Following the view proposed by Barsalou (2008), Glenberg and Gallese (2012) and others, our approach is to start with a non-trivial reactive system that is then equipped with the ability to plan ahead. To this end, we will consider a system with a complex enough body (i.e., having a considerable number of extra degrees of freedom), but an arguably simple controller, which—in order to comply with biological constraints—is based on elements forming an artificial neural network.

Using a system able to control autonomous behavior and using a complex, non-trivial body, we follow a whole-systems approach. We take the embodiment approach literally insofar as our system is constructed in such a way that it is currently used to control a simulated robot in a dynamical simulation environment, but will be transferred to a physical robot in a next step. Thus, we deal with really executable behaviors rather than with more abstract approaches on a dynamical systems level or systems that operate on a symbolic level. Application of such purely high-level approaches may bear the danger that serious problems occurring at a lower level may be overlooked (Brooks, 1991a; Verschure and Althaus, 2003).

Taken together, we focus on a system that allows for the ability to plan ahead (McFarland and Bösser, 1993) relying on intersnal representation (Steels, 2003) that are grounded in embodied experiences (Gallese and Lakoff, 2005). In this way, we follow the proposal of Feynman, who stated that we can only understand a system when we are able to create it (in Hawking, 2001; p. 83). We start with a decentralized, reactive neuronal network controller (Dürr et al., 2004) for a complex hexapod robot which is expanded by a holistic body model represented by a "hardwired" recurrent neural network (RNN) and used for inverse kinematics (Schilling et al., 2012). Based on a reactive structure the robot allows for walking in an unpredictable environment.

We will further enable the robot to cope with situations for which the reactive system does not offer a solution. In this case, a "cognitive expansion" shall allow the system to search for a new solution to this problem. The search space is not only characterized by the 18◦C of freedom (DoF) of the robot, but is expanded by the fact that the controller being embodied heavily depends on the "loop through the world," i.e., depends on the unpredictable properties of the environment. Further, the complexity of the situation is increased as behavioral elements to be selected show various time dependencies. To cope with such situations, the system first has to search for a behavioral element normally not used in the current context. The search space is large and not continuous. So, gradient descent methods are not applicable. The search for new solutions is based on (i) a somatotopic heuristic, (ii) noise applied to part of the cognitive expansion network as well as (iii) tests for physical feasibility of the solution proposed, first by internal simulation, second by performing the behavior in reality. For internal simulation, we exploit the property of the body model used here, which means that the same model cannot only be used as an inverse model, but also as a predictive model. Therefore, this body model can be used for motor planning applying an internal simulation to test newly selected behavioral elements.

The results show that the cognitive expansion requires only a small number of neurons coupled by a quite simple connectivity. This simple network shows basic properties required for a cognitive system and can be used as a scaffold for later introduction of further properties. In addition, capabilities like showing attention or emotions, might be found as properties emerging from such an architecture as discussed in Cruse and Schilling (2013).

The article is structured in the following way. The second section (Methods and Material) is divided in three parts. In section Background and Previously Developed Models. Reactive Walker—the Walknet (Reactive Walker) the simple control system for a hexapod walker is introduced which is biologically inspired from studies on the walking of insects. In section Motor Planning: from Walknet to reaCog (Motor Planning) the cognitive expansion is presented including an example that illustrates how the basic reactive system is recruited for planning. This will be followed by a more detailed explanation of the control architecture and the experiment setup (section Cognitive Expansion). Simulation results will be presented, on the one hand, for an example scenario (section Results) explaining our approach. On the other hand, a series of simulations shall demonstrate how the approach deals with disturbed walking. While there is no similar robotic architecture which applies behaviors out of context and realizes recruitment as internal simulation, we will present a brief overview on related work and discuss differences and implications (section Related Work). In the Discussion we will analyze the properties of the complete system, discuss them and briefly turn toward the question as to how aspects of higher-level phenomena being listed above may emerge in our system (Discussion and Conclusions).

### MATERIALS AND METHODS

#### Background and Previously Developed Models. Reactive Walker—The Walknet Biological Model of Insect Walking

The example we choose as a reactive basis and which will briefly be explained in the following concerns a hexapod (insect-like) walking system (see review Schilling et al., 2013b for details). The task to walk over a non-predictable substrate—possibly cluttered with obstacles of varying size and holes—is by no means a trivial one. The walker has six legs each equipped with three joints. Therefore, the controller has to deal with 18◦C of freedom (DoF). As body position in space is defined by only six DoFs (three for position in space, three for orientation) there are 12 DoFs free to be decided upon by the controller which means that the controller has to make these 12 (respectively 18) decisions in a sensible way at any moment of time while dealing with an unpredictable environment. As a first step, the walker is only using tactile sensors situated in the legs (and possibly the antennae Schütz and Dürr, 2011) measuring contact with external objects, and with proprioceptors measuring position, torques and velocities of joints.

The walking system to be described in the following is based on behavioral (and to some extent neurophysiological) studies on insects, in particular stick insects (Schilling et al., 2013b). At first, we briefly describe the essentials of the earlier version, Walknet, and will then introduce expansions.

Experiments on the walking stick insect have shown that the neuronal system is organized in a decentralized way (Wendler, 1968; Bässler, 1983; Cruse, 1990). Derived from these results, a model has been proposed in which each leg is attributed a separate controller (Dürr et al., 2004; for a review Schilling

et al., 2013b). **Figure 1** sketches the approximate anatomical arrangement of the controllers and the numbering of the legs. These single leg controllers are assumed to be situated in the thoracic ganglia (for a review see Bässler and Büschges, 1998). **Figure 2** shows details of the controllers as used in Walknet for the left middle leg (LM\_leg) and the left hind leg (LH\_leg). A single leg controller mainly consists of several movement primitives that reflect the leg movement consisting of two phases. These are the stance movement, during which the leg maintains ground contact and is retracted to propel the body forward, while supporting the weight of the body, and the swing movement where the leg is lifted off the ground and moved in the direction of walking, to touch down at the location where the next stance should begin. The movement primitives controlling stance and swing are realized in the leg controller (**Figure 2**) as several modules, or procedures, each containing artificial neurons forming a local, in general, recurrent neural network (RNN). These modules might receive direct sensory input and provide output signals that can be used for driving motor elements. The two most important procedural elements in our example are the Swing-net, responsible for controlling a swing movement, and the Stance-net controlling a stance movement (**Figure 2**, for swing: see Dürr et al., 2004; Schumm and Cruse, 2006, for stance: Schmitz et al., 2008; Schilling et al., 2012). The end positions used during forward walking are stored in the procedures for the swing and stance movement, i.e., the Swing-net and Stance-net respectively (in **Figure 2** they are part of the gray rounded boxes called Swing-net and Stance-net. Swing is triggered when the stance-end-position is reached, Stance movement is triggered by ground contact).

Following Maes (1990) the overall activation of a procedural element is controlled by a motivation unit (represented by yellow circles in the Figures) that gates to what extent the corresponding procedural element contributes to the control of the leg. In the network, these units forming rate coded, nonspiking neurons with leaky integrator, i.e., low pass, dynamics. They have a piecewise linear activation function (from 0 to 1) and control the strength of the output of the corresponding procedure (in a multiplicative way). Here we deal with a very simple motivation unit network that, initially, consists of just two units, the motivation units for the two procedural elements used in forward walking, Swing-net and Stance-net. Each motivation unit is reinforcing itself (not shown in **Figure 2**) and at the same time inhibiting the other motivation unit, forming a winner-takeall (WTA) net and allowing only one behavior to be active at any given time (**Figure 2**). Secondly, sensory signals control the behavior selection by influencing the motivation units and thus initiate behavioral transitions. When the leg touches the ground toward the end of a swing movement, the ground contact causes switching to stance movement by activating the motivation unit Stance. Correspondingly, during forward walking, reaching a given posterior position activates the motivation unit Swing. As an extension, we introduced backward walking. In this case, new swing and stance procedures are introduced including their motivation units (**Figure 3**). Swing\_toBack behavior stores the target for the swing movement to the back. As for forward walking, a memory element is required representing the stance end position (for details see Schilling et al. (2013a) and explanation of the Stance movement below).

Furthermore, a leg controller must also take into account the interaction with the other legs. Part of these interactions are mediated directly by the body and through the environment, making explicit computations superfluous (see, e.g., the local positive velocity feedback approach Schmitz et al., 2008). While the physical coupling through the environment is important, it is not sufficient. In addition, the controllers of neighboring legs are coupled via a small number of channels transmitting information concerning the actual state of that leg (e.g., swing, stance) or its position (i.e., values of joint angles). These coordination rules were derived from behavioral experiments on walking sticks (Cruse, 1990). In **Figure 1** the channels are numbered 1–3. These coordination rules influence the length of the stance movement by influencing the transition from stance to swing movement, i.e., they change the value of the PEP. In **Figure 2** only one connection is shown, influence # 1, which suppresses the start of a swing movement of the anterior leg during the swing movement of the posterior leg (for details see Schilling et al., 2013b).

Beyond the motivation units that are directly controlling a procedural element, there are also motivation units (**Figure 3**, yellow circles) that are arranged to form some kind of hierarchical structure. Units which belong to the procedural nets controlling the left middle leg show positive connections to a motivation unit termed Leg\_LM and this is correspondingly

the case for all six legs (only two legs are depicted in **Figure 3**). These six "leg units" are in turn connected to a unit termed "walk" in **Figure 3**. This unit serves the function of arousing all units possibly required when the behavior "walk" is activated.

In the case considered here, the motivation unit network, a recurrent neural network, can adopt different stable states, or attractors, forming different overlapping ensembles. For example, all "leg" units and "walk" are activated during backward walking and during forward walking, but only one of the two units termed "forward" and "backward" and only 12 of the 24 end position memories are active in either case. The network is therefore best described as forming a heterarchical structure (for details see Schilling et al., 2013a). Such an "internal state" adopted by the network protects the system to respond to inappropriate sensory input. For instance, as a lower-level example, depending on whether a leg is in swing state, or in stance state, a given sensory input can be treated differently: stimulation of a specific sense organ (not depicted in **Figures 2**–**3**, but see Schilling et al., 2013b) leads to a levator reflex when in swing, but not during stance. In other words, the motivation unit network can be considered to act as a top-down attention controller. On higher levels, further internal states could be distinguished, as for example walking, standing still or feeding (for a more detailed discussion on how such a heterarchical network can be structured and learned see Cruse and Schilling, 2010).

The heterarchical structure sketched in **Figure 3** comprises a simple realization of neural reuse as proposed in Anderson's massive redeployment hypothesis (Anderson, 2010) as specific procedures are used in different behavioral contexts.

The system as described so far is a slightly expanded version of the earlier Walknet that represents a typical case of an embodied controller (1st order embodiment, c.f. Metzinger, 2006, 2014): Kinematic and dynamic simulations as well as tests on robots have shown that this network can control walking at different velocities, producing different insect gaits including the continuous transitions between the so called wave gait, tetrapod gait and the tripod gait, negotiating curves (Kindermann, 2002), climbing over obstacles (Kindermann, 2002; Dürr et al., 2004), and over very large gaps (Bläsing, 2006), and coping with leg loss (Schilling et al., 2007). Thus, Walknet exhibits a free gait controller where the gaits emerge from a strictly decentralized architecture. Application of this

modulated by a motivation unit (yellow circle, coordination Rule # 1). Further motivation units are introduced (red connections and units) being arranged in a heterarchy—again only a fraction of the network is shown (see also Figure 2).

decentralized approach allows for a dramatic simplification of the computation by exploiting the loop through the world (including the own body). For example, trajectories of swing movements are not explicitly given, but result from the cooperation between the Swing-net and the "loop through the world," i.e., the sensor readings describing the current position of the leg joints. This structure allows for immediate adaptation of swing trajectories to unpredictable disturbances. Similarly, the spatio-temporal patterns of leg movement ("gaits") are not explicitly specified but result from decentralized local coordination rules and the coupling of the legs via the substrate (see review Schilling et al., 2013b). This network has been tested in dynamic simulation (Schilling et al., 2013a,b) and applied to the robot Hector (Schneider et al., 2011; Paskarbeit et al., 2015). As will be shown in section Motor Planning: from Walknet to reaCog (Motor Planning), this modular structure is a crucial condition to allow recombination of procedural elements as required by a cognitive system.

#### Walknet with a Body Model

The control of the stance movement is a complex task which requires the coordination of multiple legs and joints. While local embodied approaches can deal with quite complex walking scenarios and disturbances (Schmitz et al., 2008), a purely embodied approach relying on the coupling through the body itself and local leg controllers has shown to become insufficient in other cases (Schilling et al., 2012). For example, stick insects are able to negotiate curves which can be very tight (Dürr, 2005; Dürr and Ebeling, 2005). In the case of curve walking, the different legs are producing quite different movements and are taking over different roles as there is, for example, a differentiation between inner and outer legs. To better cope with such problems, we apply an internal model of the body for the control of the stance movement (Schilling et al., 2012).

Body models are used for three different purposes [for a recent, comprehensive review see Morasso et al. (2015)]. First, inverse models have been applied (e.g., Wolpert and Kawato, 1998) to compute motor commands for given goal positions of an end-effector. The second task concerns the ability to predict the position of the end-effector when motor commands are known but not yet executed (Wolpert and Flanagan, 2001; Webb, 2004). In this case the body model is used as a forward model, for instance to overcome sensory delays. Third, even simple animals as insects use a high number of sensors, for example to measure joint positions or load. In order to exploit this redundancy (e.g., to improve inexact or even missing sensor data), the different sensory inputs have to be fused which requires a body model (Makin et al, 2008). Used for visual perception, the body model, mirroring the observed movement, is strongly related to mirror systems as found in animals (Rizzolatti et al., 1996) and in humans (Rizzolatti, 2005), and might be linked to the understanding of others (Loula et al., 2005).

Whereas, in other approaches usually an individual model has been required for each task and each behavioral element (Wolpert and Kawato, 1998), we use one simple holistic recurrent neural network that can cope with all three tasks. The body model used copes with the at least 18◦C of freedom of the insect body (six legs of 3◦C of freedom each).

The complexity of the six-legged walker is distributed in the body model into interacting submodels (see **Figure 4**, Schilling and Cruse, 2007). On the lowest level, each leg is represented as a detailed model of all the leg segments and connecting joints [**Figure 4B**, right; for details see (Schilling, 2011; Schilling et al., 2012)]. These leg models are integrated on a higher level in a model of the central body, where each leg is only represented by a vector pointing from the body segment toward the tip of the leg (**Figure 4B**, left; for details see Schilling and Cruse, 2012; Schilling et al., 2013a). As this network is based on the principle of pattern completion, any input vector given to the network may it correspond to the input required for a forward model, an inverse model, or a sensor fusion model—provides an output that, after relaxation, leads to a coherent body state. This means that in any case the kinematics represent a geometrically correct body position. Next, we will explain how this body model can be integrated into the architecture of Walknet.

**Figure 5** illustrates how the body model is integrated into the network. As depicted in this figure, the internal body model comprises an independent system, which may receive sensory input and/or motor commands. In turn, it provides sensory

FIGURE 4 | The body model. (A) illustrates how the body model (black) represents the body of the robot (gray). (B) The Mean of Multiple Computation (MMC) body model for the six-legged walker is divided into two layers. The lower layer contains six networks, each representing one leg (for details see Schilling et al., 2012). The upper layer represents the body and the six legs, which are only represented by bold vectors pointing toward the tip of each leg as shown in (B), left. On this level the leg is described with reference to the respective body segment. Both layers are connected via the shared leg vectors (marked by the double-lined vectors of the left front leg) and are implemented as recurrent neural networks.

action. The body model is now driven by the motor commands predicting the sensory consequences instead of integrating them. For further explanations see text.

signals or motor commands to the reactive structure Walknet. The body model can be used for controlling the motor output of the stance behavior in complex walking scenarios. In this case it is part of the reactive controller (in **Figure 5** the switch has to take position 1). Using the body model as an inverse model, movement of the legs during stance can easily be controlled by applying the passive motion paradigm (Mussa-Ivaldi et al., 1988). Like a simulated puppet, the internally simulated body is pulled by its head in the direction of desired body movement (**Figure 5**, sensory input). As a consequence, the stance legs of the puppet follow that movement in an appropriate way and the changes of the simulated joint angles can be used as commands to control the actual joints. Therefore, if such a body model is given, that represents the kinematical constraints of the real body, we obtain an easy solution of the inverse kinematic problem, i.e., for the question how the joints of legs standing on the ground have to be moved in concert to propel the body (for details and application for the control of curve walking see Schilling et al., 2012, 2013a).

In the next section we will introduce a fundamental expansion termed "cognitive expansion." The complete network, as we will argue, shows how cognitive properties can emerge from a system heavily relying on reactive structures, why we will call this network reaCog.

### Motor Planning: from Walknet to Reacog The General Idea

To be able to implement the faculty to plan ahead, the neuronal system has to be equipped with a representation of parts of the environment (Schilling and Cruse, 2008; Marques and Holland, 2009). As it has been argued that, as seen from the brain's point of view, the body is the most important part of the environment (Cruse, 2003), a neural representation of the own body is the first step to take. Later, this body model of course has to be extended to include aspects of the environment as are tools extending the body, objects to be handled or an environment to interact with, for example obstacles to be climbed over or to be circumvented.

As mentioned the body model introduced in the previous section can be also used for prediction. Therefore, the body model will be applied to allow the system for being capable of planning ahead through internal simulation.

The basic idea that will be detailed in this section is simple. In short, we will apply the following two-step procedure. If a problem occurs, which means that the ongoing behavior cannot be continued when using only the existing reactive controller, the behavior will be interrupted. The system will then try to come up with new behaviors by recombining the existing procedural elements in a new way, i.e., not envisaged in the current context. A procedural element is characterized by a section of the network

that can be controlled by a motivation unit (as shown in **Figure 3**, red and yellow circles). The properties of the new combination will then be tested by using the internal body model instead of the real body, the former now exploiting its faculty to serve as a forward model. If the new combination turns out to be successful, it will be applied to control the behavior. If not, the system will search for another new combination.

For better illustration, we will use the following example: Imagine the case that one—say the left hind leg—has been moved far to the rear and now receives the signal to start a swing movement, i.e., to lift the leg off the ground. If the two neighboring legs—the left middle leg and the other, right, hind leg—accidentally are positioned far to the front, lifting the left hind leg might lead the body falling to the rear (**Figures 6A,B**).

#### Interruption of Behavior

To avoid tumbling over backwards, the system must be able to detect that it is running into trouble. Therefore, one or several systems are necessary that are able to detect that there is a problem. While there are different biologically plausible solutions (e.g., using load sensors as found in the insects), we chose as a simple approach a stability sensor which is activated in case the leg would be lifted,. In the example scenario this detector becomes activated immediately after the motivation unit swing of the hind left leg becomes activated, i.e., before the animal would fall backwards onto the lifted leg.

If a problem has been detected by any detector the system must (i) interrupt the ongoing behavior and (ii) be able to change from the state "perform behavior" to the state "simulate behavior." To this end, we have introduced a switch as shown in **Figure 5**. By moving the switch from position 1 to position 2, the output of the leg controller—which is normally (position 1 of the switch) routed to the motor output to influence the body is now instead routed directly to the body model. Thereby the position of the real body is kept fixed, i.e., the ongoing behavior is interrupted (Hesslow, 2002) is providing a biological account for this decoupling which has also been found in insects (Bläsing and Cruse, 2004), but the internal body model can perform the movements determined by the reactive controller. As in the case of actively moving the body, the output signals of the body model, in particular the angular values describing the position of the leg joints, are given to the reactive procedures. In this way the loop is closed and the system can internally simulate the behavior by moving the body model instead of the real body. Note that modules of the reactive procedures as are Swing-net and Stancenet are still active as is the case in Walknet. 2.2.3 Coming up with a new solution.

This switch given, it appears of course not very interesting to simulate exactly the behavior which has just led to the problem. Instead, it is necessary to test new, currently not available solutions. Therefore, the signal from the problem detectors is not only used to move the switch, but also to start the search for a new solution. To allow for this faculty, reaCog requires a further fundamental expansion.

The main idea is that for internal simulation a new behavioral element has to be selected. This new behavioral element may be selected also from procedures not belonging to the current context. How is this solved by reaCog? In **Figure 7**, the upper, left part (i.e., without SAL net, WTA net, and RTB net) shows a simplified version of the network as presented in **Figure 5**. The expansion depicted at the right side enables the system finding "new solutions" and then testing their qualification to solve the problem. This expansion—that we will call "cognitive expansion" or, as will be motivated in Section Discussion and Conclusions), "attention system"—contains three additional layers, a spreading activation layer (SAL, red circles), a winner-take-all layer (WTA, green circles) and a remember-tested-behavior (RTB, blue circles) layer with identical number of units each. In addition, there is a small network termed Global Phases (lower part of **Figure 7**).

At the bottom, Global Phases, the structure is illustrated that organizes the temporal sequence of finding a behavior as

FIGURE 7 | ReaCog: Walknet plus cognitive expansion. This figure shows an extension of the Walknet structure presented in Figure 5. The motivation unit structure (yellow, e.g., Swing, Swing\_toFront) is replicated on the right side, termed attention system, in three ways. There is a Spreading-Activation-Layer (SAL, red circles), the WTA layer (green circles), and the remember-tested-behavior (RTB units, blue circles) layer. The problem detector (red and yellow, the latter for the internal model) not only activates the switch, but also the spreading activation layer (SAL; red arrows) The activated spreading activation layer units activate their partner units in the WTA network. The winner of the WTA is activating the corresponding motivation unit (dashed black arrows) and the corresponding motor program will be carried out using internal simulation. Note that the connections within the WTA layer are not completely depicted.

a solution to a novel problem. Additional units (gray circles) show temporal properties and are used to organize the switching between stages as explained in the text. Units "count" represent a specific time delay.

## Cognitive Expansion

In the following we will explain the function of the cognitive expansion as depicted in **Figure 7**. The goal of the cognitive expansion network is to search for a new procedural element that allows for a solution of the current problem. The first step is to look for behavioral elements existing in the memory, which are, however, not activated in the current context. As will be explained, only such procedural elements can be selected that can be activated by a motivation unit. Second, the possible contribution of this additional memory element will be tested by internal simulation.

How is this done? The units of the SAL (**Figure 7**, red circles) receive input from morphologically neighboring problem detectors (in **Figure 7**, one example is depicted by a bold, red circle). Neighboring units are connected by positive weights. In this way, an activation arising from a problem detector is spread over the SAL roughly corresponding to a circular wave starting at the position of the unit excited by the problem detector. Further, there is noise added to the units of the spreading activation layer. The middle layer is representing a winner-takeall network. The units of the WTA layer (**Figure 7**, green circles) are activated by the corresponding partner units in the SAL layer. In addition, already active behavioral elements, i.e., their active motivation units, are inhibiting their counterparts in the WTA-layer (**Figure 7**, black solid line with T-shaped end). In this way, currently active behaviors are prevented from being selected for testing in internal simulation. Through the winner-take-all process the units are inhibiting each other in a way that only one unit remains active when the network settles. For the third, the right hand layer, there is again a one-to-one connection to the WTA-layer. These RTB units (**Figure 7**, blue circles) store which of the WTA units have already been tested in an earlier internal simulation run.

The different procedural elements of Walknet and their motivation units are anatomically arranged in a way that this arrangement coarsely reflects the morphological ordering of the legs (**Figure 1**, left). Consequently, the motivation units of neighboring legs as well as the partner units of the Spreading Activation layer (SAL) and of the winner-take-all (WTA) layer are neighboring, too, and thus form some kind of somatotopical map. Thus, the problem detector is not only signaling the problem, but in addition also carries some information where the problem occurred. In this way, the search for a new behavior is not purely random, but follows some heuristics,—there is some probability that a solution may be found morphologically near the cause of the problem—which may accelerate the searching process.

What is the functional role of these three additional layers forming an expansion that we will call "cognitive expansion" or, as will be motivated later in the discussion (Section Discussion and Conclusions), "attention system"? Assume that in our example (**Figure 5**) the problem detector situated in the left hind leg has been activated (**Figure 7**, bold red arrow, starting at the left). As noted earlier, this signal moves the switch from position 1 to position 2 to route the motor output to the body model instead of the body itself. Thereby the ongoing behavior is interrupted. In addition this signal activates one (or several) neighboring units of the Spreading Activation layer. **Figure 8** illustrates the sequential activation of WTA layer, and RTB layer.

The winning WTA unit activates its motivation unit and as a consequence, the corresponding—new—procedural element. After the WTA net has made its decision and has activated the motivation unit of a procedure normally not used in the actual context, simulation using the internal body model will be started to test the contribution of this new procedure. Note that therefore a problem detector is also required inside the internal model which functions in the same way, i.e., it observes static stability of the (internally simulated) body (**Figure 7**, bold yellow arrows).

If during the internal simulation no problem detector becomes active, the procedure appears to be a suitable solution for the given problem. Thus, the solution is found following a search

FIGURE 8 | Illustration of the sequential changes of activation of SAL, WTA, and RTB units. When a problem occurs, the problem detector, on the one hand stops the execution of current behavior (not shown). On the other hand, it induces activity in the spreading activation layer (SAL, red) which indicates where the problem occurred. The activation is spreading vertically in the SAL. Each SAL unit excites its corresponding WTA unit. Importantly, currently active motivation units (yellow) inhibit the WTA units (green units). The WTA units compete among each other producing one winning unit which in turn activates the corresponding motivation unit and behavior. The units in the RTB layer (blue) represent which behavior has been active once during the simulation process and will inhibit a future activation during the WTA selection process.

driven by a heuristic including noise (given to the SAL units). As a next step, this solution is tested for being mechanically appropriate. In this case the switch is set back to position 1 and the corresponding behavior will then be applied in reality. By setting back the switch the real body will provide the sensory input. As the winning WTA unit is still active (thus representing a short term memory), the newly selected procedure will be executed. If, however, already during the internal simulation this "new solution" has proven not to be a solution—defined by a problem detector of the internal model becoming active—the search for a solution will be continued further. To this end, the internal model will be reset to the current state of the body. Then, the SAL net will continue the spreading of its activations and a new behavior will be selected by the WTA-net. In this way the procedure will be repeated until a solution is found.

When the new solution is tested in reality, there are still two possibilities to be considered. If the realization of the proposed solution is successful, behavior continues. However, the solution may also turn out not to be realizable. This might for example happen because the body model does not simulate the physical properties of the body (and the environment) well enough. In this case a—possibly different—problem detector will be activated by starting again a new search procedure, with the internal body model being reset to the current real state of the body as given through the sensors.

In the remainder of this section, the structure that controls the temporal sequences sketched above is explained in detail. As indicated in the lower part of **Figure 7**, the complete procedure is controlled by five specific motivation units, Beh, SAL, WTA, SIM, and Test forming the center of the Global Phases network. These units are coupled via mutual inhibition (not depicted in **Figure 7**) and in part by transient, i.e., high-pass like, units (**Figure 7**, gray units and connections in the lower part).

During normal, i.e., reactively controlled walking the motivation unit "Beh" is active, thereby inhibiting the other four motivation units. If a problem is detected, the problem detector is activated which in turn inhibits the ongoing behavior (motivation unit "Beh") and activates the "SAL" motivation unit. In addition, the switch is moved to bypass the physical body (the switch might be realized by further mutually coupled motivation units, not shown in **Figure 7**) and the current forward movement of the robot is inhibited for some time that corresponds to duration of about one step of the leg (i.e., 100 iterations). This allows sufficient time to test movements before starting to continue forward walking. After a given time required for sensible spreading of activations (e.g., two iterations, triggered by the "Delay" unit shown in gray in **Figure 7**), the SAL motivation unit is inhibited and the WTA motivation unit is activated instead. The relaxation of the WTA net may require a variable number of iterations. A simple solution is to introduce one unit observing the convergence of the WTAnetwork ("Relax"). This unit is activated as soon as the first unit of the WTA layer has reached a given threshold, representing the winning unit.

Only after a winner is detected ("Relax" in **Figure 7**), the "WTA" motivation unit is inhibited and the simulation is started (motivation unit "SIM"). If, after a given time of internal simulation (we use 400 iterations which equals 4 s or about three to four step cycles), no problem occurred, the motivation unit "Test" will be activated instead to start the real behavior. If during the test of the real behavior the problem occurs again or a new problem is detected (in contrast to the situation during simulation), the behavior is inhibited and the "SAL" motivation unit is activated again. If however the behavioral test is successful, too, the motivation unit "Beh" is activated (and the motivation unit "Test" inhibited) to allow continuation of the normal behavior. In contrast, if during simulation a problem is detected, the simulation is interrupted (motivation unit "SIM" is inhibited) and instead the motivation unit "SAL" is excited again to search for a new "idea." The temporal order of activation of the different motivation units of the Global Phases network is controlled by dedicated connections running in parallel to the mutual inhibitory connections (**Figure 7**, gray) of all these units,

Importantly, each internal simulation has to start from the real situation, i.e., the situation that led to the problem. Therefore, the internal body model as well as the control system have to be reset to this state before a new internal simulation is started. This reset is triggered during the "SAL" stage. As the body did not actively move during internal simulation, the current posture and sensor readings can be used to reset the internal body model. It takes the reactive part of the control system only a couple (one or two) iterations to converge to the original state. It turned out that the internal state does not have to be stored explicitly.

The complete procedure controlled by the Global Phase network corresponds to what has been termed "incubation" and "verification" (Helie and Sun, 2010), and is similar to the "noteassess-guide" strategy or "metacognitive loop" as introduced by Anderson et al. (2006). In a mathematical analysis applied for example to logic reasoning systems the latter authors could show that introduction of such a strategy indeed improves the behavior of the complete system. The complete period, during which the body is fixed and the body model is used for internal simulation, may correspond to what Redish (2016), referring to Tolman, has termed "vicarious trial an error."

# RESULTS

# Simulation Results for the Example Scenario

In this section, we will show a dynamic simulation of the reaCog system. The example illustrates the faculty of reaCog to find new solutions to a current problem using its capabilities for planning ahead. (In this study there is no physical robot used yet, but it is represented by a dynamic simulation.). **Figure 6** shows an awkward posture. This configuration can become problematic as the left hind leg is already very far to the back and cannot move further back. Therefore, in this situation the left hind leg has to produce a swing movement. If the position of the left middle leg and right hind leg are positioned very far to the front, lifting the left hind leg can lead to instability, because the center of mass is placed quite far to the rear, between the hind legs. A sensible solution in our paradigm (**Figure 6**) might be the activation of the Swing\_toBack module of the left middle leg: A backward step of the anterior middle leg might allow this leg to take over the body weight, and—as a consequence—afterwards allow lifting of the left hind leg. Thereby, continuation of walking may become possible.

In normal walking the reactive part of the controller is not ending up in such a strange posture. Therefore, we had to introduce an external disturbance to make the system tumble. To this end, the placements of the left middle leg and right hind leg will be changed in a way that during swing movement the target position is pushed further to the front (by a third of a step length). Such a strong change might occur in insects when climbing over irregular ground. When there is no foothold, the insects are starting searching movements to the anterior in order to find a foothold (Dürr and Krause, 2001; Bläsing and Cruse, 2004; Schütz and Dürr, 2011) which can be quite far to the front. This does not pose a problem for the stick insect as stability is strongly supported through the ability to attach the feet to the ground. As the robot cannot use this method, he has to find another solution (for example the one sketched in **Figure 6**).

In the following, with help from **Figures 9**, **10**, we will explain how the system deals with this intervention. **Figure 9** (middle panel) shows a footfall pattern which illustrates the swing movements of the legs over time. A leg which is in swing phase is marked as a black (or red) bar. For the medium velocity chosen a gait is emerging that can be seen in the stepping pattern in the left part of the figure. From a tripod-like starting posture the robot converges more toward a fast tetrapod-like gait (at about 500 iterations). The lower part of **Figure 9** shows still images of the dynamic simulation (see **Supplementary Material Videos 1**, **2**), whereas the upper part provides a top view of the robots' (or internal models') configuration. The upper part shows four specific snapshots of the posture of the walker (top view) facing to the right. Only legs in stance phase, i.e., legs which support stability are depicted.

For the same run, **Figure 10** illustrates the position of each leg over time. The position is plotted on the ordinate showing the movement of the leg (green lines, swing movements during forward walking are pointing upwards; stance movements are going into the opposite direction).

The jumps in the position of the legs are due to the switching from the real robot to the internal model required to reset the internal model. Colors are used as in **Figure 9**. For further explanations see text.

FIGURE 9 | Solving the problem illustrated in Figure 6: Foot fall patterns. The middle panel shows the footfall pattern of the hexapod over time (black/red bars indicate swing movement of the leg). The upper panel shows some critical configurations of the robot (or, during internal simulation, the configuration of the internal model). The robot is walking from left to right. In three cases, the left hind leg is shown as a dashed line indicating that it is supposed to start a swing movement. The lower panel illustrates the behavior by screen shots taken from the Supplementary Material Videos 1, 2. The robot starts with a tripod-like leg configuration and converges to a fast tetrapod gait. The problem is detected at (#2). The section highlighted red represents an unsuccessful internal simulation [ending in an unstable configuration again as shown in (#3)]. The second internal simulation, highlighted green [starting at (#3)], turns out to be successful and solves the problem (backswing of the left middle leg, depicted by red bars, (#4) shows the new posture before the start of the forward swing movement of the left hind leg). Highlighted blue is the application of this solution to the robot (starting at (#5) which shows the robot posture at the beginning of the backward swing movement of the left middle leg). This final test is successful, too, and the robot continues to walk (N indicates center of mass).

As mentioned, we forced the robot into an awkward posture in such a way that the swing movement of the left middle and right hind leg moved very far to the front of their working range, i.e., beyond their normal AEP. Next, the left hind leg marked by a dashed line in **Figure 9** is supposed to start a swing movement. The center of mass would then not be supported anymore by the left middle leg and right hind leg [**Figures 9**, **10** (2), after 580 iterations]. Therefore, the system would tumble backwards.

As a consequence, the problem detector is activated, which stops the overall movement of the robot and triggers the cognitive expansion which then starts motor planning. In the example shown in **Figures 9**, **10** the robot first selects a stance movement in the left hind leg (due to the somatotopical neighborhood, see **Figure 7**, in SAL layer). This stance movement is then applied in internal simulation.

As a result, an unsuccessful internal simulation can be observed (highlighted in red) (2)–(3), which is interrupted when the left hind leg should be lifted, because this action would again lead to an instable configuration of the internal body model [see upper panel, (3)]. Note that during the time highlighted in red (and green, see below) the robot is not moving. Only the internal model is used to provide predictions of the movements.

As a consequence, a second iteration of the cognitive expansion is invoked (this section is highlighted green, as it turns out to be successful): First, activation is further spread in the SAL layer. Second, a behavior is selected in the WTA layer which has not yet been tested. And third, the behavior is applied as internal simulation.

For this second internal simulation, the internal body model and control system have to be reset initially. To this end, it turned out to be sufficient to update, first, the internal model with the values from the real robot structures (this is the starting condition required for the internal simulations; see **Figure 10**, at the border of the red and green section, the position of the leg in the internal model jumps back to the original position of the robot leg). Second, as the control system is behavior-based it depends on the sensor state represented by the current position of the robot. This state can be enforced onto the control system so that the system converges back to its behavioral state.

In the simulation run shown, the behavior selected next is a backward swing movement of the left middle leg (depicted in **Figure 10** by a red line for the position of the left middle leg; correspondingly, in **Figure 9** the swing movement backwards is shown as a red bar). As illustrated in the parts highlighted in green, again the forward movement of the body is interrupted for

some time. During this time the newly selected behavior is tested by internal simulation. When the system starts to accelerate again, the left middle leg now being placed further to the rear helps to support the robot. When the left hind leg starts to swing, the left middle leg is ready to take over and to support the body (shown in **Figure 9** in the upper panel in the body posture at #4 at around 800 iterations). The internal simulation runs further for a given time (here we used additional 300 iterations) in order to guarantee that normal walking can be continued.

When the internal simulation was successful the behavior selected (which is still stored in the WTA layer) will be applied on the (simulated) physical system (see #5 and blue area in **Figures 9**, **10**). This part is still regarded as a test of the behavior. This test is necessary because internal simulation and robot can of course lead to slightly different results which over time might accumulate. For example, in **Figure 9** the behavior of the right middle leg differs between internal simulation and testing the behavior on the robot. The right middle leg is very close to its posterior extreme position and on the verge of starting a swing movement. In both cases, the robot is standing still and not supposed to move further forward. But in the case of the robot (highlighted blue), a small passive movement would be sufficient to initiate a swing movement. Nonetheless, as can be seen from the footfall pattern, the application on the robot is also successful and the system converges to a stable gait pattern. This stresses the robustness of the underlying control approach and highlights how important it is that planning and control are tightly intertwined. In the blue area and beyond, **Figure 10** shows the movements of the leg of the real robot. Immediately after the new behavior has been induced, one can observe how the phases of the individual leg controllers are rearranged. For example, the right front leg is forced to make an early swing movement after the right middle leg has finished its swing movement (see Schilling et al., 2013b). But already after a very short time, a single step of the robot, a stable tetrapod-like gait emerges (as can be seen in **Figure 9**).

The example illustrates the faculty of reaCog to activate behavioral elements out of context in order to find a solution to a current problem. As illustrated, the system (reaCog plus robot) manifests an impressive stable behavior, although various deviations from normal walking behavior can be observed during the complete process.

#### Simulation Series on Disturbed Walking

For a more quantitative evaluation of the performance of the reaCog architecture we performed two additional series of simulations to illustrate the contributions of the different parts of the system. On the one hand, there is the underlying reactive and biological inspired control system (based on Walknet Schilling et al., 2013a). On the other hand, when running into stability problems the cognitive expansion has been introduced which can take over in order to reconfigure the posture in a way that allows to continue stable walking.

Following the approach presented above in detail, we again used the repositioning of a leg during swing movement which means that, for a selected swing movement, the target position is shifted to the front. This represents a quite natural example disturbance as the insects are often climbing through twigs that do not provide many footholds. As a consequence, insects perform searching movements that may shift the end position of the swing movements further to the front.

As a first series of simulations, after a randomly chosen point in time (during the first 10 s of walking) one leg is selected randomly for which the next swing movement is shifted to the front (about 5 cm which equals a third of a complete step length). In this way, different legs are affected in different walking situations. We ran 100 different simulations, therefore each leg was targeted multiple times and in the different stages of the 10 s of walking. As a result, when only one leg is targeted the reactive control system showed to be sufficient and the walker never got unstable independent of which leg was shifted. For all simulations, walking continued for at least 5 more seconds after the disturbance. In most cases, already after one subsequent step the control system has established again a stable walking pattern. Only for an early change in a front leg this requires two stepping cycles. Stability is accomplished mainly through compensating the leg shift. While the shifting of the target position would prolong the next step for the respective leg, the local coordination influences force the leg into an earlier liftoff in order to compensate. Detailed results are provided as **Supplemental Data 1** in Supplementary Material which show for each of the different legs (front, middle, and hind leg) a single run as an example. As can also be seen in the data, the walking pattern emerges quite early in the first or the second step.

As a more severe disturbance, we performed a series of simulations in which two legs were targeted. Again, after a randomly chosen point in time (during the first 10 s of walking) two legs are selected randomly for which the next swing movement is shifted to the front (about 5 cm which equals a third of a complete step length). We performed 100 simulation runs with all kind of combinations between legs multiple times. As already discussed for the example shown above (Section Simulation Results for the Example Scenario), in this case the reactive biologically inspired control system may run into unstable situations that require to stop the walking behavior to avoid that the robot would topple over. In the following we provide results on for how many cases the system struggled with stability and how the cognitive expansion was able to deal with those situations. Overall, there are eight instable situations which were caused by a disturbance of a middle and the diagonal hind leg (a case as described in detail above, Section Simulation Results for the Example Scenario). For these eight simulation runs the cognitive expansion had to take over and has found a solution in all instances. The system always became instable when the other (non-disturbed) hind leg tried to initiate a swing movement. Interestingly, different solutions have been found. On the one hand, a rearrangement of the legs could be observed in a way that one leg was moved backwards and unload the non-disturbed hind leg which afterwards was able to initiate a swing movement. This was accomplished either through moving backwards the anterior middle leg or the contra lateral hind leg. On the other hand, we observed two cases in which the slowing down of the walking speed of the complete system was already sufficient to solve the problem as during the slowing down a swing movement

could be terminated which provided enough support for the walker.

These results show that the cognitive expansion is able to find different suitable solutions. Note, that the solution disrupts the coordination pattern of all the legs. Only together with the reactive system and the coordination rules the system is able to select a movement which enables stable ongoing walking. In some instances the system discarded solutions which we, on a first guess, would have assessed as possible solutions, but which later-on run into conflicts.

# RELATED WORK

In this section, we will compare reaCog as a system with related recent approaches in order to point out differences. While there are many approaches toward cognitive systems and many proposals concerning cognitive architectures, we will concentrate on models that, like reaCog, consider a whole systems approach. First, we will deal with cognitive architectures in general. Second, we will briefly present relevant literature concerning comparable approaches in robotics, because a crucial property of reaCog is that it uses an embodied control structure to run a robot.

# Models of Cognitive Systems

Models of cognitive systems generally address selected aspects of cognition and often focus on specific findings from cognitive experiments (e.g., with respect to memory, attention, spatial imagery; review see Langley et al. (2009), Wintermute (2012). Duch et al. (2008) introduced a distinction between different cognitive architectures. First, these authors identified symbolic approaches. As an example, the original SOAR (State, Operator, and Result; Laird, 2008) has to be noted, a rule-based system in which knowledge is encoded in production rules that allow to state information or derive new knowledge through application of the rules. Second, emergent approaches follow a general bottom-up approach and often start from a connectionist representation. As one example, following a bottom-up approach, Verschure et al. (2003) introduced the DAC (Distributed Adaptive Control) series of robot architectures (Verschure et al., 2003; Verschure and Althaus, 2003). These authors initiated a sequence of experiments in simulation and in real implementation. Verschure started from a reflex-like system and introduced higher levels of control on top of the existing ones which modulated the lower levels and which were subsequently in charge on longer timespans (also introducing memory into the system) and were integrating additional sensory information. The experiments showed that the robots became more adapted to their environment exploiting visual cues for orienting and navigation etc., (Verschure et al., 2003). Many other approaches in emergent systems concentrate on perception, for example, the Neurally Organized Mobile Adaptive Device (NOMAD) which is based on Edelman (1993) Neural Darwinism approach and demonstrates pattern recognition in a mobile robot platform (Krichmar and Snook, 2002). Recently, this has gained broader support in the area of autonomous mental development (Weng et al., 2001) and has established the field of developmental robotics (Cangelosi and Schlesinger, 2015). A particular focus in such architectures concerning learning is currently not covered in reaCog. In general, as pointed out by Langley et al. (2009), these kinds of approaches have not yet demonstrated the broad functionality associated with cognitive architectures (and—as in addition mentioned by Duch et al. (2008)—many of such models are not realized and are often not detailed enough to be implemented as a cognitive system). ReaCog realizes such an emergent system but with focus on a complex behaving system that, in particular, aims at higher cognitive abilities currently not reached by such emergent systems. The third type concerns hybrid approaches which try to bring together the advantages of the other two paradigms, for example ACT-R (Adaptive Components of Thought-Rational, Anderson, 2003). The, in our view, most impressive and comprehensive model of such a cognitive system is presented by the CLARION system (review see Sun et al., 2005; Helie and Sun, 2010) being applied to creative problem solving. This system is detailed enough so that it can be implemented computationally. Applying the so called Explicit-Implicit Interaction (EII) theory and being implemented in the CLARION framework, this system can deal with a number of quantitatively and qualitatively known human data, by far more than can be simulated by our approach as reaCog, in contrast, does not deal with symbolic/verbal information. Apart from this aspect, the basic difference is that the EII/CLARION system comprises a hybrid system as it consists of two modules, the explicit knowledge module and the implicit knowledge module. Whereas, the latter contains knowledge that is not "consciously accessible" in principle, the explicit network contains knowledge that may be accessible. Information may be redundantly stored in both subsystems. Mutual coupling between both modules allows for mutual support when looking for a solution to a problem. In our approach, instead of using representational differences for implicit and explicit knowledge to cope with the different accessibility, we use only one type of representation, that, however, can be differently activated, either being in the reactive mode or in the "attended" mode. In our case, the localist information (motivational units) and the distributed information (procedural networks) are not separated into two modules, but form a common, decentralized structure. In this way, the reaCog system realizes the idea of recruitment as the same clusters are used in motor tasks and cognitive tasks. Whereas, we need an explicit attention system, as given in the spreading activation and winner-take-all layer, in the CLARION model decisions result from the recurrent network finding an attractor state.

Many models of cognition take, quite in contrast to our approach, as a starting point the anatomy of the human brain. A prominent example is the GNOSIS project (Taylor and Zwaan, 2009). It deals with comparatively fine-grained assumptions on functional properties of brain modules, relying on imaging studies as well as on specific neurophysiological data. While GNOSIS concentrates mainly on perceptual, in particular visual input, the motor aspect is somewhat underrepresented. GNOSIS shows the ability to find new solutions to a problem, including the introduction of intermediate goals. Although an attention system is applied, this is used for controlling perception, not for supporting the search, as is the case in reaCog. Related to this, the search procedure—termed non-linguistic reasoning—in GNOSIS appears to be less open as the corresponding network is tailored to the actual problem to avoid a too large search space. In our approach, using the attention system, the complete memory can be used as substrate for finding a solution.4.2 Cognitive Robotic Approaches

The approaches introduced in the previous section are not embodied and it appears difficult to envision how they could be embodied (Duch et al., 2008). Following the basic idea of embodied cognition (Brooks, 1989; Barsalou, 2008; Barsalou et al., 2012) embodiment is assumed as being necessary for any cognitive system. Our approach toward a minimal cognitive system is based on this core assumption. Robotic approaches have been proposed as ideal tools for research on cognition as the focus cannot narrowed down to a singular cognitive phenomenon, but it is required to put a unified system into the full context of different control processes and in interaction with the environment (Pezzulo et al., 2012).

ReaCog as a system is clearly embodied. The procedures cannot by themselves instantiate the behavior, but require a body. The body is a constitutive part of the computational system, because the sensory feedback from the body is crucially required to activate the procedural memories in the appropriate way. The overall behavior emerges from the interaction between controller, body and environment. In the following, we will review relevant embodied robotic approaches.

Today, many robotic approaches deal with the task of learning behaviors. In particular, behaviors should be adaptive. This means, a learned behavior should be transferable to similar movements and applicable in a broader context. Deep learning approaches have proven quite successful in such tasks e.g., Lenz et al. (2015), but many require large datasets for learning. Only recently Levine et al. (2015) presented a powerful reinforcement learning approach in this area. In this approach, the robot uses trial-and-error during online learning to explore possible behaviors. This allows the robot to quickly learn control policies for manipulation skills and has shown to be effective for quite difficult manipulation tasks. When using deep learning methods it is generally difficult to access the learned model. In contrast to reaCog such internal models are therefore not well suited for recruitment in higher-level tasks and planning ahead. In particular, there is no explicit internal body model which could be recruited. Rather, only implicit models are learned and have to be completely acquired anew for every single behavior.

In the following, two exciting robotic examples tightly related to our approach will be addressed in more detail. The approach by Cully et al. (2015) aims at solving similar tasks as reaCog for a hexapod robot. It also applies as a general mechanism the idea of trial-and-error learning when the robot encounters a novel situation. In their case these new situations are walking up a slope or losing a leg. There are some differences compared to reaCog. Most notably, the testing of novel behaviors is done on the real robot. This is possible as the trial-and-error method is not applying discrete behaviors. Instead, central to the approach by Cully et al. (2015) is the idea of a behavioral parametrization which allows to characterize the currently experienced situation in a continuous, low dimensional space. A complete mapping toward optimal behaviors has been constructed in advance offline (Mouret and Clune, 2015). This pre-computed behaviors are exploited when a new situation or problem is encountered. As the behavioral space is continuous, the pre-computed behavior can be used to adapt for finding a new behavior. Further, there is no explicit body model that is shared between different behaviors. Instead, the memory approximates an incomplete body model, as it contains only a limited range of those movements which are geometrically possible. In contrast, reaCog, using its internal body model, allows to exploit all geometrically possible solutions and is not constraint to search in a continuous space, as illustrated by our example case, where a single leg is selected to perform completely out of context.

While there is only a small number of robotic approaches dealing with explicit internal simulation, most of these are using very simple robotic architectures with only a very small number of degrees of freedom [for example see Svensson et al. (2009) or Chersi et al. (2013)]. It should further be mentioned that predictive models are also used to anticipate the visual effects of the robot's movements (e.g., Hoffmann, 2007; Möller and Schenck, 2008). With respect to reaCog the most similar approach has been pursued by Bongard et al. (2006). These authors use a four-legged, eight DoFs robot which, through motor babbling—i.e., randomly selected motor commands learns the relation between motor output and the sensory consequences. This information is used to distinguish between a limited number of given hypotheses concerning the possible structure of the body. Finding the best fitting solution, one body model is selected. After the body model has been learned, in a second step the robot learns to move. To this end, the body model was used to perform different simulated behaviors and was only used as a forward model. Based on a reward given by an external supervisor and an optimizing algorithm, the best controller (sequence of moving the eight joints) was then realized to run the robot. Continuous learning allows the robot to register changes in the body morphology and to update its body model correspondingly. As the most important difference, Bongard et al. (2006) distinguish between the reactive system and the internal predictive body model. The central idea of their approach is that both are learned in distinct phases one after another. In reaCog the body model is part of the reactive system and required for the control of behavior. This allows different controllers driving the same body part and using the same body model for different functions (e.g., using a limb as a leg or as a gripper, Schilling et al., 2013a, **Figure 10**). In addition, different from our approach, Bongard et al. (2006) do not use artificial neural networks (ANN) for the body model and for the controller, but an explicit representation because application of ANN would make it "difficult to assess the correctness of the model" (Bongard et al., 2006, p. 1119). ReaCog deals with a much more complex structure as it deals with 18 DoFs instead of the only eight DoFs used by Bongard et al. (2006) which makes an explicit representation even more problematic.

Different from their approach, we do not consider how the body model and the basic controllers will be learned, but take both as given (or "innate"). While the notion of innate body representations is controversial (de Vignemont, 2010), there is at least a general consensus about that there is some form of innate body model (often referred to as the body schema) reflecting general structural and dynamic properties of the body (Carruthers, 2008) which is shaped and develops further during maturation. This aspect is captured by our body model that encodes general structural relations of the body in service for motor control, but may adapt to developmental changes. While currently only kinematic properties are applied, dynamic influences can be integrated in the model as has been shown in Schilling (2009).

A further important difference concerns the structure of the memory. Whereas, in Bongard's approach one monolithic controller is learned to deal with eight DoFs and producing one specific behavior, in reaCog the controller consists of modularized procedural memories. This memory architecture allows for selection between different states and therefore between different behaviors.

# DISCUSSION AND CONCLUSION

A network termed reaCog has been proposed that is based on the reactive controller Walknet equipped with decentrally organized behavioral modules, or procedures, all connected to motivation units, and a body model. The motivation units form a network that represents a heterarchical architecture allowing for the realization of various internal states. These states result from parallel activation of elements as well as competitive selection between elements.

The body model can be used as an inverse model for controlling motor output, as a forward model for internal simulation of behavior, and it can be exploited to improve erroneous sensor data ("sensor fusion"). Whereas, the reactive part uses the ability of the body model to function as an inverse model, the cognitive expansion exploits the internal body model to be used as a forward model and thereby as a tool for internal simulation of behavior. Internal simulation is used for finding a new solution for a problem detected by a problem detector. To this end, a three-layered network has been introduced that selects a new, currently not used module of the procedural memory, the contribution of which will then be tested through internal simulation. If this simulation turns out to be successful, i.e., shows a solution for the current problem, the corresponding behavior will be executed in reality. Thus, motor planning is possible using an extremely small expansion, a network consisting of essentially six units plus three parallel layers of units connected in a simple way.

In reaCog, there is no explicit, separate planner as used in hybrid systems. Rather, the ability to plan ahead relies on exploiting the reactive basis by operating on it much like a parasite operates on its host, that is, by only controlling the functioning of the reactive part. In other words, the cognitive expansion does not represent a separate planner, but organizes the activity of the reactive part, which is, during planning, not connected with the motor output.

Thus, constitutive elements of reaCog are (1) embodiment, (2) a decentralized organization of various procedures arranged in a heterarchical architecture, (3) a holistic body model allowing for pattern completion that is used in reactive behavior and can be recruited for planning ahead, and (4) a small network, called cognitive expansion, that enables the otherwise reactive system to become—in the sense of McFarland and Bösser (1993) a cognitive one. We are not aware of any other neuronal approach that covers all these properties. Although the network represents a simple architecture, in the following we will argue that properties often attributed to "higher" brains can be found in reaCog, too, thereby approaching the question concerning the basic neuronal requirements of such higher level phenomena.

Before entering into this discussion, one important aspect missing in the current version of reaCog has to be noted. There is long term memory represented by the procedures in the form of "species memory" (Fuster, 1995). There is short term memory as a new solution is stored until the corresponding behavior is executed. There is however no faculty yet to transform the content of this short term memory into a long term memory. The ability to store such a newly acquired procedure as a long term memory would of course be an advantageous property. To gain this capability, the sensory situation accompanying the occurrence of a "problem" should be able to directly elicit activation of the procedure found to solve the problem.

When discussing the properties of a network like reaCog, a crucial aspect concerns the notion of emergence. The rational behind searching for emergent properties is the assumption that many "higher level" properties are not based on dedicated neuronal systems specifically responsible for the respective properties. Rather, emergent properties arise from the cooperation of lower-level elements and are characterized as to require levels of description other than those used to describe the properties of the elements. In the remainder, such emergent properties will, where appropriate, be related to the requirements posed by Langley et al. (2009) supporting the idea that reaCog provides a minimal functional description for some of those requirements.

According to Langley et al. (2009), a cognitive system should show the following properties: (1) Storing motor skills and covering the continuum from fully reactive, closed-loop behavior to (automatic) open-loop behavior; (2) Emergent properties resulting from the cooperation between different independent modules; (3) Long term memory and short term memory; (4) Attention to select sensory input; (5) Decisions on the lower level and "choice" on the higher level; (6) Predictions of possible actions; (7) Problem solving and planning of actions in the world; (8) Recognition and categorization of sensory input; (9) Remembering and episodic memory; (10) Application of symbols and reasoning; (11) To support reasoning, relationships between beliefs have to be realized; (12) Interaction and communication, including representation of verbal symbols; (13) Reflection and explanation (metareasoning); (14) Confronting the interactions between body and mind.

Requirements (1) and (2) are properties of Walknet. Above, we already argued that requirements (3), (6), and (7) can be found in reaCog, too. Below we will argue that also requirements (4) and (5) are fulfilled, but not aspects (8)–(14).

As the cognitive expansion of the reactive network allows the complete system—using psychological terms to describe its function—to "focus" or "concentrate" or "attend" on a specific behavior, we have already earlier termed this expansion "attention system" supporting Langley et al.'s issue (4). Its ability to focus on specific memory elements may correspond to what sometimes has been termed "spot light" (Baars and Franklin, 2007) referring to the observation that the content of only one memory element becomes aware at a given moment in time. Recall, that selection of a specific procedure via the WTA network of the attention system does not mean that the other procedures are suppressed. The cognitive expansion network does not prohibit parallel activation of procedures. This requirement is in line with current developments in the area of cognitive systems research as pointed out by Duch et al. (2008). Inspired by the way how brains are organized, these authors propose, first, that cognitive systems in the future should incorporate a mechanism to focus attention, which is realized in reaCog through simple local competition as realized in the WTA structures. And second, that a neural network-like spreading activation mechanism is required in order to broaden search and follow associations, which is given in the spreading activation layer.

The fifth aspect of Langley et al. (2009) is concerned with action selection on lower levels and "choice" of behavior on a higher level. Action selection is indeed a crucial property of the network. On a lower level, within a given behavioral context—in our case walking—specific procedures compete via local WTA connections. For instance, a leg controller has to decide when to perform swing or stance movements. On an intermediate level, a decision can be made between, for example, forward walking and backward walking. On an even higher level, reaCog, exploiting the cognitive expansion, can select one specific behavioral element to be activated in addition to the currently active units. Therefore, Langley et al.'s requirement (5) is covered, too.

Thus, reaCog shows action selection not only on the reactive level, but also on the cognitive level, whereby the decisions based on internal simulation (or imagined action, "mental" action, or "probehandeln" according to Freud (1911) are not determined strictly by the sensorily given situation. Even if an external observer had the ability to monitor the internal states of the agent controlled by reaCog, the behavior of the agent could not be predicted by this observer. This is the case because, due to the noise in the SAL network, there is a stochastic element contributing to the decision. On the other hand, the final decision is not purely random, because the proposals made by the attention system are tested for feasibility via the internal simulation and are to some extent guided by the somatotopic structure of the SAL network. The proposal is further tested by performing the behavior in reality. In this way, this process of finding a new solution may be viewed as to be based on a Darwinian procedure, starting with an, in part, stochastic "mutation," followed by a selection testing the proposal for "fitness."

It has been stated that in a cognitive system, in order to address memory elements out of context ("global availability," e.g., Dehaene and Changeux, 2011), these elements have to be represented independently, i.e., not embedded in reactive structures. In reaCog, procedures are not represented independently, but are always represented within their context. Nonetheless, the functioning of the cognitive expansion allows to integrate them in another context. In other words, in reaCog, the procedures are globally available. Therefore, global availability may not require procedures being stored independent of any context (or "amodally"). Thus, reaCog represents a concrete architecture showing how global availability might be established in a neural system without requiring independent representation.

There is a group of related terms addressing a fundamental principle of brains. These are the "massive redeployment hypothesis" (Anderson, 2010), the "neural recycling theory" (Dehaene, 2005), the "shared circuits model" of Hurley (2008) and Gallese's "neural exploitation" (Gallese and Lakoff, 2005), summarized by Anderson (2010) by the term "neural reuse." Neural reuse means that a system is able to exploit existing components that do something useful to support a new task, either in the evolutionary time frame or by learning (Anderson, 2010, p. 250). In other words, neural reuse states that existing elements are used for other purposes. ReaCog models neural reuse of two kinds as listed by Anderson. One type, already applied in the current version of reaCog, corresponds to the use of the same procedural elements for both motor control and planning. Here reuse corresponds to the case of having been installed in evolutionary time scales. The second type addressable in reaCog concerns the reuse of procedural elements as a result of learning the integration of a given procedure in a new context as described above, which is, in reaCog, currently only given in the form of short term memory. But the ability to transfer this information into long term memory is a major focus for future work.

Although the structure of reaCog is far away from any morphological similarity to mammalian brains, functionally reaCog shows some similarity and may, therefore, in spite of its simplicity, be considered as a scaffold helpful for the understanding of properties of higher brains. To this end, taking a more abstract view, one might ask whether higher level properties characterized by using psychological terminology might be attributed to reaCog. As noted earlier, in reaCog emergent properties can already be observed at lower levels (e.g., production of different gaits) but they can also be found at higher levels, thereby supporting Langley et al.'s second requirement. Above, we had already used one such higher level term, attention. It has been argued (Cruse and Schilling, 2013, 2015) that further emergent properties as are intentions and emotions might be attributed to a system as reaCog, too, at least on the functional level. When adding some further procedures, reaCog might even be equipped with basic aspects required for Access Consciousness as well as Reflexive Consciousness (Cruse and Schilling, 2013, 2015).

Taken together, Langley et al.'s (Langley et al., 2009) requirements for cognitive systems (1)–(7) are well covered by reaCog. To conclude, we will briefly address the remaining issues (some of which have already been mentioned above): The capability to categorize sensory input, [Langley et al.'s issue (8)] is not given in reaCog as we focus mainly on the motor aspects. As mentioned, learning will be the focus of future work and will address episodic memory (9). Other aspects would require further extension: Langley et al.'s issues (10)–(12) refer to the ability to use (verbal) symbols, a property not given in reaCog. However, a way has been sketched how this might be possible (for a first step toward this property see (Schilling and Spranger, 2010; Cruse and Schilling, 2013; Schilling and Narayanan, 2013), based on ideas of Steels and Belpaeme (2005) and Narayanan (1999).

Langley et al.'s issue (13) concerns Reflection and explanation. This property is not realized in reaCog and may also depend on the ability to apply symbolic knowledge. Issue (14), the property of cognitive systems to "confront the interactions between body and mind" addresses the property of having phenomenal experience, and is not found in reaCog, too (for a discussion of this matter see Cruse and Schilling, 2013, 2015). In summary, a number of emergent properties can be observed in reaCog, including Langley et al.'s issues (1)–(7). Issues (8)–(13) require further expansions.

In this article we focus on the situation that there is a problem which requires immediate solution and as a consequence, immediate internal simulation. As in our situation the body model is needed for simulation, the former cannot be used for controlling other behaviors at the same time. In other words, the body position has to be kept constant during internal simulation. In the following we briefly mention three cases which do not comply with this situation. In the first case internal simulation is not required. In this simple case the network is equipped with reactive procedures that allow for unspecific, general responses in case a problem detector is activated. An ubiquitous example is given by freezing behavior without active search for a solution, hoping that the problem will disappear on its own. Another example might be a procedure that allows emitting a general alarm signal that activates conspecifics. As a second case, one might think of situations that allow to postpone the search for a solution. In this case the normal behavior can be continued until a situation is given that allows to use the internal model without getting into conflict with current behavior. This case would at least require a short term memory to store the problem situation so that this could later be reactivated to start internal simulation, an expansion not yet implemented in the current version of reaCog. As a third, more complex case there might be a network that is able to control any behavior and at the same time, run an internal simulation. Whereas, for the second case a comparatively simple expansion of reaCog would suffice, the latter case appears to be much more demanding. It might, for example, require a second internal model plus the corresponding circuit to control both models independently.

The term "cognition" as used here, is based on the simple definition proposed by McFarland and Bösser (1993), i.e., the faculty of being able to plan ahead. This faculty is achieved here by using a reactive system plus introduction of a "cognitive expansion." As discussed above, such a system appears to be suited to form a basis on which further emergent properties may be realized, properties that are often listed as being required for a system termed cognitive as are Langley et al's requirements (8)–(14), for example. If this view is correct, these properties need not necessarily be explicitly included in such a definition, but appear to result from a system based on reactive structures plus the critical capability of planning ahead, underlining the power of McFarland and Bösser (1993) clear-cut definition of cognition.

#### AUTHOR CONTRIBUTIONS

Conceptualization, methodology and writing—MS and HC. Software, investigation and simulation—MS.

### ACKNOWLEDGMENTS

This research/work was supported by the Cluster of Excellence Cognitive Interaction Technology "CITEC" (EXC 277) at Bielefeld University, which is funded by the German Research Foundation (DFG), by the EC project EMICAB, the Wissenschaftskolleg zu Berlin, the ICSI, and the DAAD (PostDoc grant to MS).

### SUPPLEMENTARY MATERIAL

The Supplementary Material for this article can be found online at: http://journal.frontiersin.org/article/10.3389/fnbot. 2017.00003/full#supplementary-material

Supplemental Data 1 | Showing example runs from additional simulations.

Supplementary Video 1 | AwkwardPosture\_InternalSimulation.mp4. Video showing the example case of testing different behaviors out of context in internal simulation and afterwards applying a successful solution to the robot.

#### Supplementary Video 2 | AwkwardPosture\_InternalSimulation\_

LegPositions.mp4. Video showing the example case of testing different behaviors out of context in internal simulation and afterwards applying a successful solution to the robot. Showing the leg positions over time.

# REFERENCES


Embodied Approach, eds P. Calvo and A. Gomila (Amsterdam: Elsevier), 375–393.


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2017 Schilling and Cruse. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Motor-Skill Learning in an Insect Inspired Neuro-Computational Control System

#### Eleonora Arena<sup>1</sup> , Paolo Arena1, 2, Roland Strauss <sup>3</sup> and Luca Patané<sup>1</sup> \*

<sup>1</sup> Dipartimento di Ingegneria Elettrica, Elettronica, e Informatica, University of Catania, Catania, Italy, <sup>2</sup> National Institute of Biostructures and Biosystems, Rome, Italy, <sup>3</sup> Institut für Zoologie III (Neurobiologie), University of Mainz, Mainz, Germany

In nature, insects show impressive adaptation and learning capabilities. The proposed computational model takes inspiration from specific structures of the insect brain: after proposing key hypotheses on the direct involvement of the mushroom bodies (MBs) and on their neural organization, we developed a new architecture for motor learning to be applied in insect-like walking robots. The proposed model is a nonlinear control system based on spiking neurons. MBs are modeled as a nonlinear recurrent spiking neural network (SNN) with novel characteristics, able to memorize time evolutions of key parameters of the neural motor controller, so that existing motor primitives can be improved. The adopted control scheme enables the structure to efficiently cope with goal-oriented behavioral motor tasks. Here, a six-legged structure, showing a steady-state exponentially stable locomotion pattern, is exposed to the need of learning new motor skills: moving through the environment, the structure is able to modulate motor commands and implements an obstacle climbing procedure. Experimental results on a simulated hexapod robot are reported; they are obtained in a dynamic simulation environment and the robot mimicks the structures of Drosophila melanogaster.

#### Edited by:

Poramate Manoonpong, University of Southern Denmark Odense, Denmark

#### Reviewed by:

Sakyasingha Dasgupta, IBM Research-Tokyo, Japan Björn Brembs, University of Regensburg, Germany

#### \*Correspondence:

Luca Patané lpatane@dieei.unict.it

Received: 01 September 2016 Accepted: 20 February 2017 Published: 08 March 2017

#### Citation:

Arena E, Arena P, Strauss R and Patané L (2017) Motor-Skill Learning in an Insect Inspired Neuro-Computational Control System. Front. Neurorobot. 11:12. doi: 10.3389/fnbot.2017.00012 Keywords: insect brain, insect mushroom bodies, spiking neural controllers, learning, goal-oriented behavior

# 1. INTRODUCTION

Recent results and experiments performed on insects shed light on their highly developed learning and proto-cognitive capabilities enabling them to adapt extremely well to their natural environment (Menzel and Giurfa, 1996; Liu et al., 1999; Tang and Guo, 2001; Chittka and Niven, 2009). Modeling insect brains is an increasingly important issue for the design of learning and control strategies to be applied on autonomously walking robots. Within the insect brain an important paired neuropil with higher control functions are the mushroom bodies (MBs), recently used to model different behavioral functions (Smith et al., 2008; Arena et al., 2013c). Studies on bees and flies identified the MBs as a relevant area for associative learning and memory in odor conditioning experiments (Menzel and Muller, 1996; Menzel, 2001; Scherer et al., 2003; Liu and Davis, 2006). MBs are also involved in behaviors depending on other sensory modalities, like vision (Liu et al., 1999; Menzel, 2001; Tang and Guo, 2001), other types of learning such as choice behaviors (Tang and Guo, 2001; Gronenberg and Lopez-Riquelme, 2004; Brembs, 2009) and, as recently introduced, also in the improvement of gap-climbing tasks (Pick and Strauss, 2005; Kienitz, 2010).

MBs receive olfactory input from the antennal lobes via projection neurons. The latter run in the medial antennal lobe tract, provide input to the MB calyces and continue on to the lateral horn (LH). The mediolateral and the lateral antennal lobe tracts emerge from the antennal lobes as well, but bypass the calyces and project directly to the LH. The LH region controls inborn behavior, whereas the MBs are thought to be involved in learnt behavior. Analysing the interaction between the different neural structures we investigated the emergence of interesting neural activities responsible for specific behaviors in insects, including flies, like attention, expectation, delayed-match to sample tasks, and others (Arena et al., 2012a,b, 2013b).

Major dynamical aspects characterizing the locust olfactory system were already outlined in Mazor and Laurent (2005). Here a principal component analysis on the firing rate of a population of PNs revealed different attractors for different odors. These attractors show two transients and one fixed point, but transients are most significant for an efficient odor classification. This addressed for the first time the importance of transient dynamics to explain and understand neural coding and information processing in the MBs. Following these results, we hypothesized that the role of transient dynamics is relevant for the sensory information coding extending the results obtained in locust olfactory system to the fruit fly. This hypothesis well match with the organization properties of the MBs discussed in Nowotny et al. (2003, 2005). Their model is based on spiking neurons and synaptic plasticity, distributed through different layers. The model is able to show consistent recognition and classification of odors. In the study of Nowotny and colleagues, MBs are assumed to be multi-modal integration centers, combining olfactory and visual inputs. As in our current model, the capabilities are independent of the type and the source of information processed in the MBs.

Wessnitzer and co-authors investigated the interaction between MBs and antennal lobes (ALs) and proposed a computational model for non-elemental learning (Wessnitzer et al., 2012). Different levels of learning and reinforcement mechanisms were considered at the stage of the KCs to create a coincidence detector and non-elemental learning. Reward mechanisms are commonly considered for the creation of aversive and appetitive olfactory memories (Schwaerzel et al., 2003) and the role of dopamine is relevant in Drosophila (Waddell, 2013). We here extended this scheme to memorize specific parameters involved in the motor-skill learning process. On the basis of fruit fly brain structures and on hypotheses related to information processing and learning mechanisms MBs are a structure able to adapt and memorize relevant parameters involved in motor learning. This improves the fly's capabilities when it is trained in repeating a task like climbing over a chasm. Therefore, a simplified computational model of the MB neuropile was developed using a pool of spiking neurons representing the so-called Kenyon Cells (KCs).

The computational model proposed in this paper for motor learning takes the biological characteristics of the MBs into account and, on the basis of the previously introduced hypotheses, arrives at a neuro-computational structure similar to a Liquid State Machine (LSM) proposed by Maass et al. (2002). The information embedded in the dynamical neural lattice is transferred to the lower motor layers by extrinsic MB neurons that have been modeled as read-out maps.

One fundamental difference between the proposed model and the LSM is the presence of local connectivity among the neurons within the liquid layer. This element of our model deserves particular attention: in fact, the structure configures as a locally connected recurrent neural network which is fairly similar to the Cellular Neural/Nonlinear Network (CNN) structure (Manganaro et al., 1999), a paradigm already used for the generation of complex dynamics and for controlling artificial locomotion (Arena et al., 1999) and perception phenomena (Arena et al., 2009). The other important characteristics of the proposed model is related to the hardware implementation: in fact there are a number of analog/logic VLSI CNN-based chips available which implement digitally programmable analog computers characterized by high computational speed and analog, parallel computation capability, typically used for high frame rate visual microprocessors (Rodríguez-Vázquez et al., 2008). However, the suitable adaptation of the MB structure modeled in this paper as a CNN architecture via the suitable addition of trainable read-out maps would allow for the possibility to adopt a well-assessed reference hardware for the real time implementation of the proposed approach.

From the modeling perspective, the developed structure links two main ideas: high parallelism in brain processing and Neural Reuse (Anderson, 2010). According to the firstmentioned, sensory pathways run in parallel and concur to form abstract schemes of the environmental state, useful for motor actions or abstract decisions. The Neural Reuse approach, on the other hand, states that the same neural structure can be concurrently exploited for different tasks. The insect MBs were already addressed as centers where such characteristics could be found, and the control structure herewith introduced makes a step forward to derive an efficient computational model directly useful as a robot behavioral controller (Arena and Patané, 2014).

### 2. MOTOR-SKILL LEARNING IN INSECTS

Among the different forms of neural adaptation encountered in animals, motor-skill learning is a fundamental capability needed to survive in dynamically changing environments and also to cope with accidental impairments of animal's limbs.

Motor-skill learning can be defined as the process to acquire precise, coordinated movements needed to fulfill a task. Due to the importance of this capability,sensory-motor conditioning was one of the earliest types of associative learning found in cockroaches and locusts. It has been demonstrated in the ventral nerve cord of insects (Horridge, 1962) and is probably ubiquitous in moving animals (Byrne, 2008; Dayan and Cohen, 2011).

The motor-skill learning system incrementally improves the motor responses by monitoring the resulting performance: this process guides the adaptive changes. By exploiting the involved sensory motor loops, agents apply operant strategies during motor learning: when a movement is performed, sensory feedback is used to evaluate its accuracy (Brembs and Heisenberg, 2000; Broussard and Karrardjian, 2004).

In insects there are different examples of motor learning processes that adapt motor schemes to specific tasks. For instance, honeybees can adapt the antennal movements to an obstacle after a prolonged presentation of this obstacle. Furthermore, the use of an outside rewarding mechanism dramatically speeds-up the learning process (Erber et al., 1997).

Other insect behaviors involving motor learning were reported by Mohl (1993); he investigated the relevance of proprioception during flight in locust. In an interesting paper on Drosophila motor-skill learning capabilities (Wolf et al., 1992),a series of conditions has been identified for proper motor-skill learning.

First, the fly has a desired target to reach; to fulfill this aim, a number of motor programs are activated in a random sequence. Efference copies of the motor programs are compared with references and if, for a given motor behavior, a meaningful correlation is found, this is applied. Other studies in this direction were performed on bumblebees (Chittka, 1998) and butterflies (Lewis, 1986).

Behavioral studies on insects confirmed that they are able to show sophisticated and adaptive motor-control strategies requiring the joint coordinated activity among the limbs. A particularly suitable experimental setup to inspect motor learning capabilities is the behavioral paradigm of gap crossing, first described by Blasing and Cruse (2004) and Blasing (2006) in relation to stick insects, by Pick and Strauss (2005) for Drosophila and in Goldschmidt et al. (2014) where the coackroach capabilities were considered. Flies with a body length of typically 2.5 mm (and with their wings clipped to disable flight) can cross gaps of up to 4.3 mm when fully exploiting their biomechanical limits. Direct observation and high-speed video analysis of the gap climbing procedure (see Pick and Strauss, 2005 and videos supplied) outlined that flies first visually estimate the gap width via parallax motion generated while approaching the gap. Then, if they consider the gap as being surmountable, they initiate the climbing procedure by combining and successively improving, through several attempts, a number of parameters for climbing. The hind legs are placed as close as possible near the proximal edge; the middle legs are attached to the proximal side wall of the gap and arrange the body horizontally; the front legs stretch out to attach to the opposite gap side. Then the middle legs are detached from the proximal side, swing over and are attached to the distal side surface of the gap. Finally, the hind legs are detached and the fly moves toward the other side. These experiments clearly show that several parameters are modified from their nominal values (for normal walking) and also combined together in several successive phases to maximize the climbing performance.

Later it was shown by Kienitz (2010) that flies improve their climbing abilities when they iteratively climb over gaps of the same width. The short-term improvements after 24 training trials within 1 min were seen in tests 20 min after training; they are missing in plasticity mutants. Rescue of plasticity in the MBs was sufficient to restore the motor-learning capacity. The finding that plasticity in MBs is a prerequisite for motor learning will be taken as our working hypothesis for the development of the proposed computational model. Experiments on gap crossing were also performed with stick insects (Blasing and Cruse, 2004; Blasing, 2006). In these works the authors outlined the role of single leg movements, searching reflexes, and coordination mechanisms as important to fulfill the task. A model of gap crossing behavior was implemented extending a previously developed bioinspired network Walknet (Cruse et al., 1998), to reach simulated results comparable with the biological experiments. Here the gap crossing issue was considered as an extension of normal walking behavior with only limited modifications. In our work we reached a similar conclusion though starting from quite different models. In fact the CPG for normal walking is maintained whereas only a parameter adaptation was introduced to efficiently implement climbing.

The climbing capabilities of other insects like cockroaches were also considered to develop experiments on obstacle climbing and gap crossing using hexapod robots (Goldschmidt et al., 2014). The presence of an actuated joint in the robot body was exploited to improve the capabilities of the system to face with complex situations including gaps and obstacles (Goldschmidt et al., 2014; Dasgupta et al., 2015). In Pavone et al. (2006), the sprawled posture was a key element for solving the obstacle-climbing issue. In other cases the presence of spoked legs is a simple and efficient solution to improve power efficiency and walking capabilities in presence of obstacles (Moore et al., 2002). In some cases hybrid legged and wheeled robots try to take the advantages of both solutions (Arena et al., 2010).

Whereas, these approaches exploit the mechanical structure, other strategies instead consider primarily the adaptive capabilities of the control structure. For instance, for solving the antenna motor control problem, in Krause et al. (2009) an echo-state network was applied to generate the antenna movements in a simulated stick insect robot. The network was able to store specific trajectories and to reproduce them creating smooth transitions between the different solutions available, depending on the control input provided.

Distributed recurrent neural networks, working as reservoirs, were also used in Dasgupta et al. (2015) to create a forward model needed to estimate the ground contact event in each leg of a walking hexapod robot. The prediction error has been used to improve the robot walking capabilities for different types of terrains.

Our approach belongs to this last type of strategies, since it takes into account primarily the adaptive capabilities of a recurrent spiking network to solve a specific motor learning issue.

In fact, in our work, we considered only obstacle climbing scenarios because our Drosophila-like hexapod robot does not contain body joints (i.e., as exploited in Dasgupta et al., 2015 to facilitate also gap crossing); on the other hand it is unfeasible to include in the robot the adhesive capabilities of fly leg tips. Moreover, we assumed that the same computational structure as that one involving the MBs for gap climbing tasks is also involved in obstacle climbing. In the proposed example the external information used to characterize the scenario to be faced, was reduced to the obstacle height (e.g., acquired through a simple visual processing method) in order to learn the set of parameters that allow to fulfill the climbing task.

In particular, the MB intrinsic neurons are here modeled as a spiking network working as a reservoir, able to generate a rich, input-driven dynamics that is projected to other neural centers

using read-out maps that work as MB extrinsic neurons. An important added value obtained through the learning process consists in allowing a generalization of the learned data: in fact the network can generate the suitable output signals also for input patterns not included in the learning set by interpolating the memorized functions.

# 3. MODELING MOTOR-SKILL LEARNING

## 3.1. Known and Hypothesized Biological Functions

Tasks related to motor-skill learning need a specialization of motor functions to optimize performance.To fulfill this aim, a strategy for searching for the most suitable system parameters to be applied for modulating the leg trajectories is envisaged. The generation of pseudo-random parameters constrained only by the insect's body parameters is the initial step needed to improve the ongoing solution iteratively by trial and error. The searching process will produce a subset of successful attempts used to improve the overall system performance, storing the new set of suitable parameters evaluated on the basis of an internal reward function.

In insects, thoracic ganglia can be in charge for the generation of these trials (Horridge, 1962), but MBs should mediate the selection process consisting in a statistical shaping and in the final choice of the successful parameters that modulate the basic behaviors (Kienitz, 2010). Such learning processes are the basic ingredients for the implementation of a short-term working memory.

A neuro-control block scheme model is shown in **Figure 1** where the main elements involved in the proposed model of motor-skill learning are depicted. Plasticity and learning is ubiquitous in the model due to the complexity of the brain functions but for the aim of the proposed work we focused our attention only on specific parts. Therefore, we considered all the interconnections to be fixed except the synaptic output of the MBs, as will be discussed in details in Sections 3.2 and 3.3, in relation to the motor system (CPG). Plasticity and learning inside other blocks, including the visual sensory and pre-processing system, are not treated in this work.

The central complex (CX) is an excitatory center responsible for behavior activations on the basis of visual and mechanosensory inputs. The input signals are here processed through a series of substructures: the protocerebral bridge (PB), the fan-shaped body (FB), and the ellipsoid body (EB) (Hanesch

behavior during learning.

et al., 1989; Strauss, 2002). Moreover, the PB is directly involved in motor control; it is also responsible for the stabilization of the walking direction (Triphan et al., 2010). On the contrary the MBs seem have an inhibitory effect and are fundamental for the adaptive termination of behaviors (Mronz and Strauss, 2001).

MBs present a large complexity at the level of the calyx, due to the different KC types and their interconnection. From the modeling point of view, KC types could be implemented through different non-linear functions (or dynamical systems). No information is available on the dynamics of these neurons and electrophysiological data are in short supply. On the other side powerful neurogenetic tools are available for the fruit fly which allow for precise manipulations of the nervous system in order to address links among specific neural substrates, their functions and specific behaviors they are responsible for.

Learning in Drosophila melanogaster has revealed multiple memory types and phases and recent investigations underlined that not all memory processes occur in MB neurons (Wu et al., 2007; Zhang et al., 2013).

Here we hypothesize that the CX and in particular the PB plays a role in motor learning: it performs adaptation of the motor system parameters shaping the motor behavior while the insect performs a task. The involvement is plausible as the PB seems to control step length for direction (Strauss, 2002; Triphan et al., 2010). This variability is attained in our model (see **Figure 1**) through a random function generator (RFG) which perturbs some relevant leg control parameters. This strategy generates perturbed leg trajectories. On the basis of the expected results, the on-going behavior is evaluated and eventually, MBs receive a reinforcement signal via extrinsic dopaminergic and octopaminergic neurons (Schwaerzel et al., 2003). Memory consolidation occurs overnight. After consolidation, the MBs are assumed to inhibit the perturbation provided by the RFG to allow the memory retrieval. The overall control system designed and implemented, as outlined in the following constitutes a clear example of a bio-inspired embodied, closed-loop neural controller.

# 3.2. MB Model for Motor Learning: Working Hypotheses

In order to design both a biologically plausible and a computationally feasible model of the MBs, the two following hypotheses were formulated:


The following structural elements can be outlined:


The proposed control scheme has been implemented in a computational model embedded on a robot simulated in a realistic dynamical environment. Referring to **Figure 1**, the robot navigates driven by vision: the heading commands are provided to the locomotion controller through external stimuli. An evaluation procedure assesses the suitability of the performed actions in solving the assigned task. An event detector triggers the evaluation process.

The reinforcement signal is passed to the MBs to evaluate the changes generated by the RFG and used to update a set of motor control parameters. Successful parameter updates, leading to significant improvements in the climbing behavior lead to memory formation. A SNN was considered as a plausible model to generate the long-term memory of the best parameters selected during the learning process and to guarantee interpolation capabilities important for the generation of feasible behaviors in situations similar to those ones encountered during the learning procedure. Finally a selector block determines if either a random trial can be performed or the information stored in the SNN can be used for the motor actions. Among the different kinds of neural networks used for solving problems like navigation (Tani, 1996), multi-link system control (Cruse, 2002) and classification, a lot of interest was devoted to Reservoir computing, which mainly includes two different approaches: Echo State Network (ESN) and LSM (Jaeger, 2001; Maass et al., 2002). In previous studies the idea to use non-spiking Recurrent Neural Networks to model the MBs memory and learning functions was explored (Arena et al., 2013a). The core of the newly proposed architecture, inspired by the biology of MBs', resembles the LSM architecture. It consists of a large collection of neurons, the so called liquid layer, receiving time-varying inputs from external sources as well as recurrent connections from other nodes in the liquid layer. The recurrent structure of the network turns the time-dependent input into spatio-temporal pattern in the neurons. These patterns are read out by linear discriminant units. In the last years LSM are becoming a reference point in replicating brain functionalities. However, there is no guaranteed way to analyze the role of each single neuron activity on the overall network dynamics: the control over the process is very weak. This apparent drawback is a consequence of the richness of the dynamics potentially generated within the liquid layer. The side advantage is that the high dimensional complexity can be concurrently exploited through several projections (the read-out maps) to obtain non-linear mappings useful for performing different tasks at the same time. The proposed network differs from the structure reported in Arena et al. (2013a) in many aspects: it consists of a lattice of inhibitory and excitatory spiking (instead of nonspiking) neurons with a random connectivity, which is mainly local (instead of non-local). Moreover, the network configuration in Arena et al. (2013a), for solving the motor learning problem, required a much larger network configuration. This could be addressed to the much richer dynamics generated within the SNN (see Section 5.1). Inputs are here provided as currents that, through a sparse connection, reach the hidden lattice (i.e., the liquid layer). Multiple read-out maps, fully connected with the hidden lattice, can be learned considering the error between the network output, collected through an output neuron for each read-out map, and the target signal. The network details are illustrated in the next section.

#### 3.3. Network Structure and Parameters

Following the biological hints, proposed hypotheses and suggestions from the classical LSM paradigm, the MBs' structure involved in motor learning has been modeled as a spikingbased network consisting of three layers: an input layer, a hidden recurrent neural lattice, and an output layer. The input layer behaves like a filter that randomly redirects input stimuli to a reduced number of neurons in the hidden-layer (KCs lattice). The connectivity percentage used in this work is 15% from the input layer to the KC layer.

The hidden layer is a SNN (i.e., the reservoir network), where each unit is an Izhikevich Class I spiking neuron (Izhikevich, 2000) organized in a square topology with toroidal boundary connections. The regular distribution of the neurons in a squareshaped lattice was selected because, for computational reasons, we considered the simplest structure where we can perform distance metrics. The following differential equations describe the model:

$$\begin{aligned} \dot{\nu} &= 0.04\nu^2 + 5\nu + 140 - u + I\\ \dot{u} &= 0.02(-0.1\nu - u) \end{aligned} \tag{1}$$

following spike-resetting condition:

$$\begin{array}{ll} \text{i } \mathbb{E} \quad \nu \geq 0.03 \,, \text{ then} \end{array} \begin{cases} \nu \gets -0.055\\ \boldsymbol{u} \gets \boldsymbol{u} + \boldsymbol{6} \end{cases} \tag{2}$$

Here v is the membrane potential, I is the synaptic current and u is a recovery variable. Izhikevich neural models are well-known in literature for offering a good compromise between biological plausibility and computational efficiency.

Neurons are connected through synapses: here the spikerate from the pre-synaptic neuron is transformed into a current for the post-synaptic one. The response of the synapses to a pre-synaptic spike is as follows:

$$\varepsilon(t) = \begin{cases} -Wt/\pi \exp\left(t/\pi\right), & \text{if } \quad t > 0\\ 0 & \text{if } \quad t < 0 \end{cases} \tag{3}$$

where τ is the time constant, t is the time passed since the last spike arrived at the pre-synapse and W is the synaptic efficiency. This last parameter can be modulated by learning. This synaptic model was also used to connect the lattice neurons to the output neurons.

The fraction of inhibitory neurons in the pool is about 10%. The connections within the lattice are represented by a synaptic weight with a random uniform distribution in the range (0.5– 1.5), the input weights are equal to 1. The weights of the readout map are subject to training. The generation of the interlayer synaptic connectivity depends on a probabilistic function of the distance di,<sup>j</sup> between the presynaptic (i) and postsynaptic (j) neurons:

where


and

$$\begin{array}{ll}k=2 & \text{if } \in & d\_{i,j} \le 1\\k=1 & \text{if } 1 < d\_{i,j} \le 2\\k=0 & \text{if } \in & d\_{i,j} > 2\end{array} \tag{5}$$

Pij = k ∗ Ci,<sup>j</sup> (4)

The parameters Ci,<sup>j</sup> , reported in the previous table, have been chosen according to Maass et al. (2002). The distance di,<sup>j</sup> = 1 is calculated, either for horizontal or vertical adjacent neurons, considering the neurons as distributed on a regular grid possessing toroidal boundary conditions. From the relations above it derives that the connectivity realized within the lattice is local; this is an important element that facilitates a potential hardware implementation of the control system where the number of connections is drastically reduced and limited to each neuron neighborhood.

The time constant in Equation (3) was randomly chosen among the values τ = 5, 10, 30, and 50 ms. This variability improves the dynamics potentially shown by the network as will be discussed in the following sections. The values of the synaptic time constant have been chosen to obtain significant dynamics in the simulation time window that is limited to 150 ms.

The output layer consists of a series of output neurons, modeled with a linear transfer function and fully connected with the hidden lattice. The output weights are randomly initialized in the interval (−1, 1) and are subject to learning. The integration step used for the reported simulations was fixed to dt = 1.5 ms.

#### 3.4. Learning Mechanism

The time evolution of the target signals that the network need to memorize is generated by shaping the lattice dynamics using read-out maps. An incremental learning rule based on the Least mean square algorithm is adopted to update the synaptic weights of each read-out map. The learning process, resembling the classical delta rule, depends on the lattice activity and on the error


FIGURE 3 | (A) Trend of the mean square error during 100 learning trials (epochs), (B) Comparison between the expected output and the network approximation at the end of the 100 learning trials.

between the current output and the desired target. The updating rule of the synaptic weights is here reported:

$$W\_{i,j}(t+\delta t) = W\_{i,j}(t) + \eta \ast Z\_{i,j}(t) \ast E(t) \tag{6}$$

where η is the learning rate, Zi,j(t) is the synaptic output of the neuron (i, j) at time t and E(t) is the error between the network output neuron and the desired target. Another possibility consists of cumulating the weight variations during the simulation time window, to finally apply the cumulative result during the last simulation step.

# 4. SIMULATION RESULTS

The analysed motor learning process consists of adopting a series of perturbations on specific leg control parameters to reach a success in the assigned task. To apply a smooth perturbation, we adopted as target signal, a cosinusoidal function, whose final value corresponds to the parameter to be applied. In the following simulations we adopted a lattice with 8 x 8 neurons that is a good compromise to obtain a considerable variety of internal dynamics. The learning process needs a series of iterations (here called epochs) to successfully store the information in

the read-out maps. In the following analysis we considered 100 epochs with a learning rate η = 0.5. During each epoch the network is simulated for 100 integration steps. A typical activity of the neural lattice is shown in **Figure 2**. The input given to the network through an external current is related to the information acquired from the environment and, using the learning rule in Equation (6), we can determine the weights of the read-out map in order to follow a target signal as shown in **Figure 3**.

The network allows to interpolate the information acquired during learning as illustrated in **Figure 4**. During the 400 epochs used for the learning phase, two distinct output signals, corresponding to different input currents (Iin = 5 and 30µA), were learned. During the testing phase, besides the two inputs already used in the learning phase, also other input currents were provided obtaining plausible behaviors that interpolate the dynamics of the two learned target signals.

**Figure 5** reports the synaptic activity (Equation 3), in the form of currents generated by the lattice before learning, weighted by the read-out map and summed over the 100 samples for all the spikes emitted by the neurons to reproduce the two target signals. It can be noticed that even a lattice with a limited number of neurons can produce a large variety of dynamics that can be combined by the output neurons. The differences in the synaptic time constant, play a role in increasing the richness of dynamics during the network activity. It is also evident how sensitive the structure is to a change in the input current provided to the lattice; it can generate a drastic change in the temporal evolution of the network dynamics. This allows for a high interpolation capability. The use of spiking networks over nonspiking ones to model nonlinear dynamics is often considered as an additional complication. Our case is an example to the contrary. In fact, in Arena et al. (2013a) nonlinear nonspiking recurrent neural networks were used to model MB activity: the nonspiking recurrent configuration, suitable for solving the motorlearning problem was fixed to 140 non locally connected units, whereas the results presented in this paper were obtained via a network with 64 spiking locally connected neurons in the liquid layer.

# 5. MOTOR LEARNING: APPLICATION TO CLIMBING

### 5.1. Learning New Motor Activities in a Stable Locomotion Controller

The insect brain can be considered as a parallel computing architecture where reflexive paths serve the basic needs for survival, whereas learned paths allow the formation of more complex behaviors.

Regarding motor activities in insects, the thoracic ganglia are mainly responsible for the generation of locomotion gaits, and the Central Pattern Generator (CPG) has widely been accepted as being the core unit for locomotion control but its fine-tuning is usually achieved by sensory information. The approach proposed here considers the task of motor learning as that of finding a suitable way for modifying the basic motor trajectories on the single leg joints so as to improve motor-skills in the light of novel conditions imposed by the environment. Using a control approach, we can realize motor-skill learning through a hierarchical adaptive controller, where, when facing novel conditions, some parameters controlling the leg joint trajectories are modulated. These modulations, shaped by the

kinematic constrains, realize novel leg trajectories which are then applied to assess their suitability for the task. Once the former locomotion conditions are restored, these modulations are withdrawn and the baseline stable locomotor activity reemerges. Sets of successful parametric values are retained, so that they can be re-applied whenever similar conditions should be encountered again. The locomotion controller is made up of basically two networks: one is devoted to generate a stable phase displacement among the legs; the other is shaped on the specific kinematic structure of each leg and constituted by several motor neuron structures, as illustrated in **Figure 6**. The basic cell characterizing the CPG architecture is described by the following equations:

robot motor-skills in a multi-stages task. Starting from Home, an event triggers the request of parameter adaptation for the Step 1 that is tried until a success occurs or a time-out is reached. Within the time-out triggered window, it is possible to evaluate the effectiveness of multiple sets of parameters that persist for about a complete cycle of a leg (i.e., overtime). The success is evaluated by a cumulated reward and, if an improvement is obtained, the parameter evolution is stored in the long-term memory (i.e., read-out maps). The other stages follow the same procedure.

$$\begin{cases}
\dot{x}\_{1,i} = -\varkappa\_{1,i} + (i+\mu+\varepsilon)y\_{1,i} - s\_1y\_{2,i} + i\_1 \\
\dot{x}\_{2,i} = -\varkappa\_{2,i} + s\_2y\_{1,i} + (i+\mu-\varepsilon)y\_{2,i} + i\_2
\end{cases} \tag{7}$$

with y<sup>i</sup> = tanh(xi) and the parameters for each cell: µ = 0.23, ε = 0,s<sup>1</sup> = s<sup>2</sup> = 1, i<sup>1</sup> = i<sup>2</sup> = 0 generate a stable limit cycle (Arena et al., 2005). µ is chosen to approximate the dynamics to a harmonic oscillation. The CPG network is built connecting adjacent cells using links expressing rotational matrices R(φ), as follows:

$$\dot{\boldsymbol{x}}\_{i} = f(\boldsymbol{x}\_{i}, t) + k \sum\_{j \neq i} (\boldsymbol{R}(\phi\_{i,j})\boldsymbol{x}\_{j} - \boldsymbol{x}\_{i}) \text{ with } i, j = 1, \dots, n \tag{8}$$

where the summation involves all the neurons j which are nearest neighbor to the neuron i; n is the total number of cells; f(x<sup>i</sup> , t) represents the reactive dynamics of the i-th uncoupled neurons as reported in Equation (7) and k is the strength of the connections. The sum of terms performs diffusion on adjacent cells and induces phase-locking as a function of rotational matrices (Seo and Slotine, 2007). The presence of local connections is an important added value because it reduces the system complexity in view of a hardware implementation. The bottom layer is designed based on the desired kinematic behavior; it is directly correlated to the morphology of the limb. The network controlling one of the middle legs is sketched in **Figure 6**. The CPG neuron identified with the label R2 is connected through rotational matrices with different angles to a network of motor neurons arranged in a directed tree graph that uses the same neuron model as CPG. The blocks H(•) are Heaviside functions and are used to distinguish, within the limit cycle, between the stance and swing phases: this allows to associate suitable modulation parameters to each part of the cycle, depending on the morphology of the leg. The signals are finally merged to generate the position control command for the coxa, femur and tibia joints. A detailed discussion on the CPG structure and behaviors is reported in a previous study (Arena E. et al., 2012).

The overall network stability was theoretically proven exploiting tools from partial contraction theory on a network made of nonlinear oscillators with Laplacian couplings. As demonstrated in previous studies, the network for gait control has a diffusive, undirected tree-graph configuration, which guarantees asymptotic phase stability independently of any imposed locomotion pattern (Arena et al., 2011; Arena E. et al., 2012).

The stable phase-locked oscillations generated in that way are passed on to the motor neural network for each leg, whose particular structure controls leg motion while maintaining the imposed phase among the legs. Upon this stable basic locomotor activity, the motor-learning controller is added, whose role is to find suitable modulation of the single-leg motions to learn proper trajectories in the presence of specific needs. Basic motor activities are so disturbed to find new solutions for the leg motions, thus implementing motor-skill learning.

#### 5.2. Climbing Experiment

Motor-skill learning in the presented multi-limb system is applied to improve the robot capabilities in solving different tasks involving multiple degrees of freedom; here, in fact a fine tuning of parameters is required to modulate the basic cycling behavior in the different legs of the robot.

Among the possible tasks, in the simulation a step-climbing scenario has been considered in the simulation. In nature, insects are continuously faced with uneven terrains and they adapt their motor responses to accomplish tasks like climbing over surmountable objects. Even flying insects, like D. melanogaster, show exquisite climbing skills, since searching for food in the near-field and courtship are achieved during walking. The aim of motor learning in our experiments is to improve the climbing capabilities of a simulated robot through the modulation in time of a group of parameters used in the leg motor layer. This simulated scenario is a realistic alternative to the gap climbing scenario used in the biological experiments (Blasing and Cruse, 2004; Pick and Strauss, 2005; Kienitz, 2010; Triphan et al., 2010). In fact, due to the adhesive capability of the fly legs (possessing pulvilli and claws), gap climbing is an affordable task for the real insect, whereas this is extremely difficult for a Drosophila-inspired robot that cannot reach the same dexterity as the biological counterpart. In other hexapod robots the presence of an active body joint, inspired by the cockroach, was exploited to improve the system capabilities in gap climbing tasks (Goldschmidt et al., 2014). In our Drosophila-inspired robot, due to the absence of this degree of freedom in the body, we considered obstacle-climbing scenarios, which are a challenging task for legged robots that have to improve their climbing capabilities by learning. For a future direct comparison with biological experiments, the new paradigm lends itself for testing real flies. Moreover, the step climbing scenario can be made more demanding by using slippery surfaces which would reduce the advantage of the animal if compared with the robot.

Step climbing for a robot is quite a complex task and should involve an optimization method to adapt the joint movements to different surfaces. To simplify the problem, the task was split into different phases shown in **Figure 7**.

The approaching phase is guided by the visual system that is able to recognize the distance from the obstacle and its height. When the robot's distance from the step is below a threshold, Phase 1 is activated and the parameters of the front legs are adapted using the RFG to modify its movements, in an attempt to find a foot-hold on the step. For sake of simplicity, a subset of parameters available in the adopted CPG was subjected to learning in this phase.

In details, for the coxa joint the bias value, for the femur joint the gain value, for the tibia joint the bias and gain values were selected for learning. This phase leads to a stable positioning of the front legs on the step, with the body lifted off. The extent of the angular motion of the leg joints, caused by the modulation of the parameter profiles, is used as an index of the energy spent

in this task and to define a reward function. The reward value is then compared with the previously found best value and, if an improvement is obtained, the new sets of functions are stored in the SNN readout map. For the considered task we have a single lattice with one input (i.e., step height) and a total of ten readout maps, one for each parameter to be learned for a specific leg joint. The SNN receives as input a normalized value related to the step height and the lattice dynamics generates a spatiotemporal spiking activity that is transformed in a continuous, non spiking signal, through the output synapses that converge on the output neurons, one for each parameter that is subject to learning.

A series of experiments were performed using a step that is insurmountable unless a gait adaptation is introduced: the height of the step is around 0.9 mm, whereby we chose the simulated Drosophila body length as 3.2 mm and the average height of the center of mass as to be located at about 1 mm above the ground during forward walking.

The joint angular positions caused by the parameter adaptation in the anterior legs are shown in **Figure 8A**. The subsequent phase is similar: here as relevant parameters to be adapted, the bias of femur and tibia joints of the hind legs are considered to facilitate the climbing of the middle legs. The event considered in this phase to evaluate the success

and the consequent passage to the successive phase is the horizontal position of the center of mass of the robot with respect to the obstacle. The parameter adaptation results for the second phase are depicted in **Figure 8B**. During the third phase the robot elevates the hind legs on the step: this is achieved by modulating the gain of the coxa and bias of the femur joint for the middle legs and the gain of the coxa and femur joint of the hind legs (**Figure 8C**). In the actual experiments the function adopted to deliver the randomly generated parameter modulation on the joints is a cosinusoid, however other functions, like exponentials, quarter sinusoids, or sigmoids could be used. Actually the function reaches the steady state value in a given time window that is a portion of a stepping cycle.

In the dynamic simulation herewith reported, we adopted an integration time dt = 0.01 s, a stepping cycle of about 1.5 s: these conditions, the parameters reach the steady state within [20– 60] integration steps. Looking at the learning process, the RFG generates the new parameters to be tested for the first phase. If the trial is successful the robot is re-placed to the starting position to perform a test: this assures the robustness of this new set of parameters. If the robot succeeds, it can proceed to the second phase, otherwise the parameters are discarded and the first phase is repeated. The trial ends when the robot overcomes the last phase or after a given number of attempts (i.e., 15 events). If this time-out occurs, the parameters just used for the phases are discarded because they are not globally suitable for a complete climbing behavior.

In **Figure 9A** an example of a trial is reported: the robot succeeds in the first attempt to find a suitable set of parameters to complete the first and second phase, whereas for the third phase a series of failures both in learning and in test are obtained until the final success is reached (see Supplementary Video 1 for a typical sequence of trials with successes and failures). The success in the trial can be followed by a learning process in the SNN depending on the overall reward value obtained. In **Figure 9B** the distribution of the cumulative reward in a campaign is shown. For each trial the success condition for each phase can be reached multiple times until the complete climbing behavior is tested successfully or otherwise a timeout occurs. If the obtained cumulative reward (i.e., sum of the rewards for each phase) after the third phase is lower than the previously obtained values, the parameters are learned by the network.

To evaluate the interpolation capabilities of the network we also performed a series of learning sessions with higher obstacles (i.e., 1.4 mm) and subsequently we tested the robot with a step height never provided during learning (i.e., 1.2 mm). The best-adapted parameters obtained for the two learned step heights are reported in **Figure 10** together with the network response to the new step with an intermediate height. The obtained results were tested with the simulated robot obtaining a success in the climbing behavior as reported in **Figure 11**. This depicts the motion of the robot's center of mass (COM) and of the tips of each leg when climbing a 1.2 mm step. The edge of the step is placed at 11 mm far from the COM home position (along the y axis) (see Supplementary Video 2). Moreover, a series of snapshots outlining the posture of the fly-inspired robot during the climbing task are depicted in **Figure 12**.

To evaluate the generalization capability of the control system, the previously learned system was tested in a different scenario where a stair-like obstacle was introduced. The robot followed the same climbing procedure as described above, repetitively applied for the three stair steps encountered on its path with height 1.3, 1.1, and 0.9 mm, respectively. The detection of each obstacle produces an effect at the motor level on the basis of the parameter adaptation mechanism induced by the SNN. **Figure 13A** shows the trend of the joint position angles for the left-side legs during the whole climbing procedure. The adapted parameters produce changes in the leg movements during the different climbing phases as illustrated in **Figure 13 B** where the dynamics of the robot COM and the leg tip positions are reported (see Supplementary Video 3).

The results obtained were achieved relying only on the adaptive capabilities of the legs acquired during the learning phase. The body structure was considered rigid as in the fruit fly case. Including in the robotic structure active body joints (Dasgupta et al., 2015), mimicking the body of other insects like cockroaches, would only improve the robot capabilities. Therefore, the proposed control strategy can be also applied to other different robotic structures to improve their motor capabilities in fulfilling either obstacle climbing tasks or other similar scenarios affordable for the robot under consideration.

# 6. REMARKS AND CONCLUSIONS

In this paper a bio-inspired, embodied, closed-loop neural controller has been designed and implemented in a simulated hexapod robot that is requested to improve its motor-skills to face unknown environments. Taking inspiration from the insect brain and in particular from the fruit fly, the following hypotheses were formulated: relevant role of MB neuropiles in the motor learning task; direct transfer of the important role of transient dynamics in the olfactory leaning from the locust to the fly brain and further extension to motor learning; design of a neurocomputational model based on a LSM-like structure for the implementation of obstacle climbing in a simulated hexapod robot.

In details, a computational model for motor-skill learning was developed and realized in a dynamic simulation environment. Inspired by behavioral experimental campaigns of motor learning in real insects, the computational structure consisted in a randomly connected SNN that generates a multitude of nonlinear responses after the presentation of time dependent input signals. By linearly combining the output from the lattice neurons with a weighted function, a reward-based strategy allows to learn the desired target by tuning the weights of a readout map. Looking at MBs in insects, the idea of a pool of neurons enrolled to solve different tasks depending on the specific requested output is next to the concept of Neural Reuse which has a number of biological evidences. The reported results demonstrate that the system can learn, through a rewarddriven mechanism, the time evolution of several independent parameters related to the leg movements, to improve the robot climbing capabilities when exposed to the step-climbing task. The robot was also able to deal with step heights never presented before, exploiting the interpolation abilities of the proposed network.

# AUTHOR CONTRIBUTIONS

All authors listed, have made substantial, direct and intellectual contribution to the work, and approved it for publication.

signals to indicate when the robot completes each climbing phase.

# ACKNOWLEDGMENTS

This work was partially supported by the FIR 2014 Project Self-reconfigurable, flexible, multi-body, lowcost, modular robot systems for exploration of dynamical, hostile or unknown environments, and by MIUR Project CLARA (CLoud Platform for lAndslide Risk Assessment) and by EC FP7 EMICAB (Embodied motion intelligence for cognitive autonomous robots).

### SUPPLEMENTARY MATERIAL

The Supplementary Material for this article can be found online at: http://journal.frontiersin.org/article/10.3389/fnbot. 2017.00012/full#supplementary-material

### REFERENCES


Liu, L., Wolf, R., Ernst, R., and Heisenberg, M. (1999). Context generalization in Drosophila visual learning requires the mushroom bodies. Nature 400, 753–756.


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2017 Arena, Arena, Strauss and Patané. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# **Morphological Properties of Mass–Spring Networks for Optimal Locomotion Learning**

*Gabriel Urbain\*, Jonas Degrave, Benonie Carette, Joni Dambre and Francis Wyffels*

*IDLab, Electronics and Information Systems Department, Ghent University – imec, Ghent, Belgium*

Robots have proven very useful in automating industrial processes. Their rigid components and powerful actuators, however, render them unsafe or unfit to work in normal human environments such as schools or hospitals. Robots made of compliant, softer materials may offer a valid alternative. Yet, the dynamics of these compliant robots are much more complicated compared to normal rigid robots of which all components can be accurately controlled. It is often claimed that, by using the concept of morphological computation, the dynamical complexity can become a strength. On the one hand, the use of flexible materials can lead to higher power efficiency and more fluent and robust motions. On the other hand, using embodiment in a closed-loop controller, part of the control task itself can be outsourced to the body dynamics. This can significantly simplify the additional resources required for locomotion control. To this goal, a first step consists in an exploration of the trade-offs between morphology, efficiency of locomotion, and the ability of a mechanical body to serve as a computational resource. In this work, we use a detailed dynamical model of a Mass–Spring–Damper (MSD) network to study these trade-offs. We first investigate the influence of the network size and compliance on locomotion quality and energy efficiency by optimizing an external open-loop controller using evolutionary algorithms. We find that larger networks can lead to more stable gaits and that the system's optimal compliance to maximize the traveled distance is directly linked to the desired frequency of locomotion. In the last set of experiments, the suitability of MSD bodies for being used in a closed loop is also investigated. Since maximally efficient actuator signals are clearly related to the natural body dynamics, in a sense, the body is tailored for the task of contributing to its own control. Using the same simulation platform, we therefore study how the network states can be successfully used to create a feedback signal and how its accuracy is linked to the body size.

**Keywords: morphological computation, mass–spring networks, morphological control, physical reservoir computing, soft robotics**

# **1. INTRODUCTION**

Since its very early formulation, control theory has tried to automate increasingly complex systems (Fernández Cara and Zuazua Iriondo, 2003). The first implementations of PID controllers using feedback to regulate non-linear systems only originated in the first part of the twentieth century and were improved considerably, in particular with the progress in aerospace. More recently, with the evolution of the computation power and the advances in machine learning, the focus has evolved

#### *Edited by:*

*Poramate Manoonpong, University of Southern Denmark Odense, Denmark*

#### *Reviewed by:*

*Kohei Nakajima, Kyoto University, Japan Keyan Ghazi-Zahedi, Max Planck Institute for Mathematics in the Sciences, Germany Helmut Hauser, University of Bristol, UK*

#### *\*Correspondence:*

*Gabriel Urbain gabriel.urbain@ugent.be*

*Received: 23 December 2016 Accepted: 06 March 2017 Published: 27 March 2017*

#### *Citation:*

*Urbain G, Degrave J, Carette B, Dambre J and Wyffels F (2017) Morphological Properties of Mass–Spring Networks for Optimal Locomotion Learning. Front. Neurorobot. 11:16. doi: 10.3389/fnbot.2017.00016*

**223**

toward the control of highly compliant systems with many degrees of freedom. Passive compliant robots indeed possess dynamical properties closer to animal bodies, whose performances can still not be reached, and show a real advantage for solving complex tasks in noisy human environments.

However, the framework for a theory allowing a deep understanding of such control systems—and hence engineering opportunities—is still under construction. It is largely believed that the concept of morphological computation can partly answer this issue, as it enables more fluent and robust motion control while providing adapted embodied controllers that use the body itself as a computational mean (Paul, 2006; Pfeifer and Bongard, 2006).

Nonetheless, the concept of morphological computation does not have a clear definition as discussed in Müller and Hoffmann (2016). In Füchslin et al. (2013), the authors refer to the first International Conference on Morphological Computation in Venice in 2007, where it was defined as "any process that serves a computational purpose, has clearly assignable input and output states, is programmable (i.e., the behavior can be adapted by varying a set of parameters) and has a sort of teleological embedding." This definition is however rather broad as it also includes every traditional digital computing means. Hereafter, we will restrict our definition to any way of increasing efficiency of computation or control in terms of energy, memory, time, etc. by outsourcing computational tasks to analogical physical systems. This interpretation follows the work of Pfeifer and Bongard (2006) where morphological computation refers to "certain processes are performed by the body that otherwise would have to be performed by the brain" or with the experiments conducted in Hauser et al. (2011). Moreover, it constitutes a fundamental motivation to embodiment which states that steps toward adaptive intelligence do not only come from the controller complexity but also from the interactions with the body and the environment. Broader analysis about the quantification of morphological computation as well as the trade-offs with informational computation include Polani (2011), Zahedi and Ay (2013), Haeufle et al. (2014), Hoffmann and Müller (2014), and Ghazi-Zahedi et al. (2016).

Illustrative applications of morphological computation and embodiment for locomotion are numerous in biology and robotics. For instance, Dickinson et al. (2000) provide an analysis of how animals succeed in efficient locomotion using their muscles not solely as motors but to provide multiple functions varying from brakes to springs and struts. The *passive walker* in McGeer (1990) constitutes an extreme example of an engineered robot exploiting the same concept. This two-legged physical structure is able to walk down a slope in a very natural way without any actuation. This work has been extended later in Collins et al. (2005) to robots with low-power actuators. They show a walking pattern that looks natural and energy efficient compared to traditional stiff controlled robots. In other fields of robotics, we can also cite the works of Iida and Pfeifer (2006) or Degrave et al. (2015), in which dynamical properties of compliant quadruped robots are used to provide low power consumption, to reduce controller computational complexity, and to observe natural transitions between gaits. Examples that clearly benefit from compliance to improve moving can also be found, among others, in Cham et al. (2004) which focuses on hexapod locomotion.

A practical implementation of morphological computation can be inspired from Reservoir Computing (RC). RC denotes a computational framework that enables the approximation of a broad range of dynamical behaviors for which a precise model is not available. RC originates from the domain of recurrent neural networks and is mainly based on the theories of Echo State Networks (ESN) and Liquid State Machines (LSM) as outlined in Lukoševičius and Jaeger (2009). At the time of their introduction, they offered a solution to the training of Recurrent Neural Networks (RNN), which was still considered difficult. They avoided having to train feedback connections and the problems with bifurcations this brings, i.e., the discontinuities in the network outputs observed for some points in the parameter space, by training only the synaptic connections of the readout nodes. The core architecture consists of a randomly connected RNN, the *reservoir*, for which the synaptic weights are sampled from some distribution and then globally rescaled to tune the dynamical regime close to the edge of chaos. RC also resulted in different robotics applications as learning of inverse kinematics of an iCub robot arm from a neural reservoir in Reinhart and Steil (2009) or the creation Central Pattern Generators (CPG) to control human movements in Wyffels et al. (2014) and hexapod locomotion in Dasgupta et al. (2015).

As the reservoir network is constituted of randomly connected non-linear entities, many physical dynamical systems presenting sufficiently complex transformations of their inputs provide similar dynamical properties and can be used as reservoirs. For instance, it has been demonstrated in Hauser et al. (2012) that generic types of physical bodies such as Mass–Spring–Damper (MSD) networks are able to approximate any given time-invariant filter with fading memory and generate adaptive periodic patterns autonomously when a feedback loop is added. This extension of RC is generally referred to as Physical Reservoir Computing (PRC). The expensive step of computing the reservoir transformation is now outsourced to a physical system's natural dynamics. This means that the neuron states will not be explicitly updated digitally anymore, but this computation is transferred to the body's dynamical evolution. Only the readout layer only needs to be engineered, most often using digital computing.

The main advantage of PRC lies in the parallelism of the computations in the physical reservoir and, in the case of robotic locomotion, in the fact that the transformations computed by the robot body are a natural result of the gait. However, PRC is essentially a supervised machine learning technique. By contrast, robotic control is intrinsically a reinforcement learning problem, in which the optimal desired actuator signals are not known *a priori*. In addition, successful reservoir implementations often require the observation of the reservoir state at many different points. In robotics, this implies that for each observation point a sensor needs to be installed.

Numerous applications of PRC have been demonstrated in the past decade. In robotics, highly compliant robot models have been addressed for example to MSD networks in Hauser et al. (2011) (simulation only), tensegrity structures in Caluwaerts et al. (2014) or a real soft robotic platform inspired by an octopus arm in Nakajima et al. (2014, 2015). Closed-loop control of quadruped robot exploiting a spine made with soft material as a reservoir can be found in Zhao et al. (2013). Simulations or implementations of PRC outside robotics include water ripples in Fernando and Sojakka (2003), electro-optical devices in Larger et al. (2012), or pure optical devices in Brunner et al. (2013) and Vandoorne et al. (2014).

This paper presents two main research objectives. First, we design a small scalable simulation setup to provide empirical compliance studies on the locomotion of MSD networks. To our knowledge, such an analysis does not yet exist and should help to evaluate the potential of compliance for locomotion in terms of robustness, efficiency, and stability. To this end, three main experiments are conducted. The first experiment gives an overview about how increasing the number of nodes in a MSD network leads to more stable locomotion. The second experiment provides an analysis on the optimal frequency range for the setup, and the third experiment explores the maximal reachable speeds for different driving powers and underlines the limitations of the design to get high performance. In the second part, we analyze the computational capacity of a MSD body to generate motor control signals and integrate them as a regulation feedback to a forward controller.

# **2. OPEN-LOOP CONTROL**

### **2.1. Materials and Methods**

https://github.com/Gabs48/SpringMassNetworks.

1

To run our experiments and analysis, we designed a MSD network simulator directly implementing mechanical equations using *Python* and *Numpy*. <sup>1</sup> These networks, inspired by Hermans et al. (2014) and Caluwaerts et al. (2013), consist of a set of nodes with mass, connected by spring–damper links which are all actuated separately. The simulation can be performed either in 2D or 3D.

#### 2.1.1. Mass Spring Networks

The MSD morphology is presented in **Figure 1**. Each of the *N* nodes, except those at the end or beginning, is sparsely connected to its closer neighbors by *C* connections. The total number of springs in the network *S* can be easily deduced using geometry:

$$S = \left(N - 1 - \frac{C/2 - 1}{2}\right) \cdot \frac{C}{2} . \tag{1}$$

Each node *i∈*{1, *. . .* , *N*} is represented by its mass *mi*, whereas the passive parameters for each connection are the spring stiffness *k<sup>j</sup>* and the damper coefficients *d<sup>j</sup>* for *j∈*{1, *. . .* , *S*}. In this paper, the notion of compliance will be used. It is defined as the inverse of the stiffness 1/*kj*. If not specified, the default values used in the following experiments are *N* = 20,*C* = 3,*m<sup>i</sup>* = 1 kg, *k<sup>j</sup>* = 100 N/m, and *d<sup>j</sup>* = 10 Ns/m.

In our model, the acceleration, speed, and position of each mass are updated using the force vector **F***<sup>i</sup>* which combines the gravity force, the spring force, the damping force, and the air friction force:

$$\mathbf{F}\_{i} = \mathbf{F}\_{i}^{\epsilon} + \mathbf{F}\_{i}^{d} + \mathbf{F}\_{i}^{\emptyset} + \mathbf{F}\_{i}^{a},\tag{2}$$

where

*•* **F** *s i* is the spring force vector applied on the node *i* and equals the sum of the *j∈*{1, *. . .* , *C*} connected non-linear springs forces for which the equations can be found in Palm (1999):

$$\mathbf{F}\_{j}^{s} = -k\_{j} \cdot \frac{\mathbf{l}\_{j}}{l\_{j}} \cdot \left( \left( l\_{j} - l\_{j,0} \right) + \frac{\alpha}{l\_{j,0}^{2}} \cdot \left( l\_{j} - l\_{j,0} \right)^{3} \right) . \tag{3}$$

In this equation, **l***<sup>j</sup>* represents the spring length vector and *lj*,0 its reference length. The variable *α* is a non-linearity coefficient which will induce a saturation of the spring force for large extension lengths. It also takes inspiration from the work of Hauser et al. (2011) which demonstrates the importance of these non-linearities from a computational consideration.

*•* **F** *d i* is the damper force vector applied on the node *i* and equals the sum of the *j∈*{1, *. . .* , *C*} connected dampers:

$$\mathbf{F}\_{j}^{d} = d\_{j} \, . \, \frac{\mathbf{v}\_{j}}{\nu\_{j}} \, . \, (\boldsymbol{\nu}\_{j} - \boldsymbol{\nu}\_{j,0}) , \tag{4}$$

where **v***<sup>j</sup>* is the vector of extension speed.

*•* **F** *g i* is the gravity force vector:

$$\mathbf{F}\_j^{\mathbb{g}} = \mathbf{g} \ . \ m\_i \ . \ \mathbf{x}\_i^{\mathbb{y}}, \tag{5}$$

where *g* is the gravity constant and equals 9.81 m/s<sup>2</sup> .

*•* **F** *a i* represents the drag force induced by air friction. It is assumed proportional to the speed:

$$\mathbf{F}\_{j}^{s} = -a \ . \ \mathbf{v}\_{i} \,\tag{6}$$

where *a* is the coefficient of air friction and equals 0.1 Ns/m. It has been included to avoid unrealistic models with very high speed. However, these did not occur in the experiments presented in this paper.

The ground reactions are modeled by setting the vertical velocity to zero and the horizontal friction coefficient to infinite. The masses perfectly stick to the ground as soon as they touch it. This is a hard constraint that can impact the nature and the performance of locomotion. However, it simplifies the study of the body influence by assessing perfect friction conditions in every simulation.

#### 2.1.2. Control

To actuate the spring using a control signal, we modulate the reference lengths of the springs *lj*,0. In the simplest and default case, this will be represented by a simple sinusoidal signal like in Hermans et al. (2014):

$$l\_j(t) = l\_{j,0} \ . \ (1 + \ a\_j \ . \sin(\omega\_j \ . t + \phi\_j)) \ . \tag{7}$$

It induces a set of tunable parameters *lj*,0, *ω j* , *ϕ j* for each spring in the simulation.

#### 2.1.3. Physics Solver

The simulation time is discretized using *K* time steps *tk*, and equations are solved numerically using the Verlet algorithm as described in Thijssen (2007). The Verlet integrator leads to more accurate trajectories, especially for periodic oscillations where energy is rigorously conserved due to the time reversibility of this operator. For non-periodic trajectories, one can prove that due to symplecticity, the energy does not drift away and errors remain bounded as demonstrated inYoshida (1990). Although it is more accurate, the Fourth-Order Runge–Kutta integrator requires four force evaluations per update step and is not symplectic. In our implementation, the update equations are slightly changed in order to take the effect of the ground reactions into account.

#### 2.1.4. Loss Function

The goal is to develop a generic approach to obtain robust locomotion in open loop without prior knowledge about the body dynamics. In the case of simulated MSD networks, this implies the optimization of controller and morphology parameters for each specific network. This can be formulated as

$$
\hat{\boldsymbol{\theta}} = \arg\max\_{\boldsymbol{\theta}} f(\boldsymbol{\theta}).\tag{8}
$$

where the score function *f*(*θ*) and the optimization algorithms are detailed below. Typically, the optimized parameters *θ* of the MSD network are the controller amplitude *a<sup>j</sup>* between 0 and 0.25, its frequency between 0 and 10 Hz, its phase *ϕ<sup>j</sup>* between 0 and 2*π*, and the spring stiffness *k<sup>j</sup>* between 0 and 100 N/m. To synchronize the actuators together and impose the fundamental frequency, the angular speeds *ω<sup>j</sup>* are fixed to the same value. In the case of a MSD with *N* = 20 nodes connected to their six closest neighbors (*C* = 6), this represents a total number of springs *S* = 54 (see equation (1)) and therefore 163 parameters to optimize. Locomotion characterization and evaluation is performed through two performance metrics:


$$P = \sum\_{j} k\_{j} \cdot \frac{a\_{j}^{2} \cdot l\_{j,0}^{2} \left(1 + \alpha^{2} a\_{j}^{2}\right)}{4\pi},\tag{9}$$

in which *a<sup>j</sup>* are the relative amplitudes, *α* is the spring nonlinearity factor, and *lj*,0 are the reference lengths of the springs.

Using the ratio of distance to power is unsatisfactory, as this could result in robots that consume very little power because they barely locomote. Instead, we will use the following power efficiency score displayed in **Figure 2**:

$$f(\theta) = \tanh\left(\frac{D(\theta)}{D\_{\rm ref}}\right) . \tanh\left(\frac{P\_{\rm ref}}{P(\theta)}\right),\tag{10}$$

in which *Dref* and *Pref* are reference values allowing to normalize and homogenize the scores. As it is desirable to operate in the linear regime and avoid saturation of the score, we set them to 3,600 and 100, respectively, following the statistics of the observed distance and power values.

#### 2.1.5. Optimization

The aim is to develop an optimization approach that can be applied to highly compliant physical robots, without any need for an analytical model for the body dynamics. The Covariance Matrix Adaptation Evolution Strategy (CMA-ES) as formulated in Hansen (2006) has been selected from a pool of different optimization methods. Indeed, it fits very well for browsing nonconvex parameter landscapes with a lot of local minima. In addition, it presents a good convergence speed and requires very few initialization parameters:

**FIGURE 2 | The efficiency score to quantify locomotion quality increases with traveled distance and decreases with dissipated power**. However, taking directly the ratio between both metrics (left) could lead to optima close to the origin, i.e., where the body barely moves. Using a hyperbolic tangent (left) solves this problem for small powers but requires to select *Pref* and *Dref* carefully to avoid a saturation due to the measure itself.


# **2.2. Results**

In this section, we assess the influence of the MSD network size and compliance on the best locomotion speed found, on the power consumption, and on the noise robustness in our specific example. Three different experiments are described in this context. In the first one, we increase the number of mass nodes in the network to determine its influence on locomotion efficiency. The second investigates how optimal compliance is related to the morphology parameters and the locomotion frequency. Finally, we discuss how the optimized gait changes when the driving power is constrained.

#### 2.2.1. Morphology Analysis

The choices made during the design of a system can contribute to more efficient and robust behaviors for solving sophisticated tasks. In the case of the MSD setup, we can intuitively assume that increasing the number of nodes will broaden the space of available trajectories, therefore increasing the number of optima at the expense of a longer learning process. It is interesting to note that such a tuning does not necessarily imply an increase of complexity, in the sense of the definition presented in Lungarella and Sporns (2006).

To verify this assumption, we have optimized open-loop locomotion controllers for networks with increasing number of nodes and springs. As mentioned before, this optimization consists in tuning the actuators' amplitudes and phases, the spring constants, and the global frequency of locomotion. Other parameters of the MSD network are set to the same value for all bodies, except for the nodes mass. This is normalized by the number of nodes, such that the total mass of the MSD network (20 kg) remains the same in every simulation and the power levels required for locomotion can be compared.

In order to converge toward stable gaits, we add random acceleration impulses during the simulation. Their value is centered around 10% of the mean absolute acceleration and applied on random nodes 5% of the time. In the CMA-ES algorithm, the number of iterations is tuned specifically for each optimization to ensure convergence, since optimizing small structures will converge faster than larger ones. From each optimization run, the best individual is retained. Each optimization is repeated five times in order to average the results and obtain an estimate of the variability of our observations.

**Figure 3** shows the evolution of the averaged best individual score for increasing body size in blue. From left to right, we observe that the scores rapidly decrease for structures of up to five nodes before steadily increasing again. However, the good results in the first part of the curve should be interpreted carefully, taking their robustness to noise into account. To assess this property, we also represented the scores obtained for simulation using the same parameters but without noise on the same figure. We notice that this difference decreases with the number of nodes. This shows that structures with more nodes are more robust to the noise added during the simulation. The evolution of the CMA-ES algorithm represented in **Figure 4** also supports this hypothesis. It shows that the optima of the structures with a small number of nodes are found randomly instead of through convergence of the algorithm, unlike the structures with more nodes. High scores originate from these bodies' reduced stability. This makes them very sensitive to impulse noise as small disturbances can either make them fall over or push them forward. They can therefore rightly be regarded as outliers.

**FIGURE 3 | In this graph, the best individuals for CMA-ES optimizations of different MSD networks are plotted in blue**. Other simulations without noise are then performed on the same individuals with the same parameters in order to identify the outliers due to noise and qualify the stability of locomotion. The low performance of 3, 4, and 5 nodes MSD structures indicate unstable gaits. For larger structures, the score first increases with the number of nodes but saturates rapidly for networks of more than twenty nodes.

number of optimized parameters is higher as well. When the structure is too simple such as the three nodes (one in the upper left corner), the problem cannot converge and the best results encountered during the exploration are mainly due to the random noise added in simulation.

It is finally interesting to note that the score increases gradually starting from six nodes but quickly saturates. A more detailed analysis in **Figure 5** shows that this is due to better performances in terms of traveled distance, whereas dissipated powers are very similar. However, note that this is achieved at the expense of a longer learning process, as pointed out by the number of epochs represented on the graphs X axis of **Figure 4**.

In conclusion, this experiment points out that increasing the number of nodes and springs in the MSD networks leads to an increased robustness to external noise and better speed performances.

#### 2.2.2. Frequency Range Analysis

In this second set of experiments, we try to evaluate the nature of a link between robot compliance, which is defined by 1/*k*, the inverse of spring stiffness, and the optimum efficiency of locomotion.

The resonance frequency of a MSD system with one unique node and spring equals √ *k/m*. It ranges from 0.6 to 1.8 Hz for

**FIGURE 5 | By displaying separately the distance and power components in the loss function of the CMA-ES optimization, we can acknowledge that the observed variation for different nodes number are mainly due to the distance**. As expected by the normalization factor, the driving power remains sensibly equals for each structure.

the *mi*, *kj*, and *d<sup>j</sup>* values that we are using in our setup (as a reminder, *m<sup>i</sup>* varies with the number of nodes). There is therefore a bijective function between compliance and resonance frequency. By extension, we can formulate the hypothesis that the resonance frequency of a MSD structure is directly coupled to its compliance. Since they are composed of several masses and springs, we can expect that the bandwidth of the resonance peak will broaden but still appear at the same frequency.

With this assumption, the study of correlation between compliance and locomotion efficiency can be reformulated to focus on the link between actuation frequency and efficiency. Previous work such as Buchli et al. (2006) for robotic systems or McMahon and Cheng (1990) for models of mammalian gaits highlighted such a link: self-learning systems with different morphology properties tuned their actuation frequency to the resonance of the structure to reach optimal performance in locomotion.

In this setup, MSD structures with 5, 10, 15, and 20 nodes were optimized several times by fixing their global frequency to values between 0 and 10 Hz. In **Figure 6** (on the left), we have represented the results for different numbers of nodes. Each optimization corresponds then to a point on the graph. For some of those points, however, the optimization process was not able to converge to a gait that is both stable (whose pattern does not change in time) and robust (allowing external noisy perturbations). In the graph, this failure is particularly true for structures with few number of nodes simulated at high frequencies. A first empirical conclusion is that the robustness of MSD networks at high frequencies increases with the number of nodes. This represents an additional advantage concerning the size of the system along with the discussion from previous section. In terms of score, however, there is no significant difference between the topologies, and their optimal bandwidths are very similar. The optimal scores are a little lower only for the 5 nodes structures, which corroborates the results from the previous experiment. To get a more accurate measure of the bandwidth, it may even be interesting to combine all the results as they possess very similar resonance frequency. This is presented on the right side of **Figure 6** where we can observe that the structure is optimal over a 3 dB bandwidth in the range [0.3; 5.2 Hz]. The large confidence intervals around 4 and 5 Hz are again explained by the absence of convergence for the structures with a low number of nodes.

To sum up, this experiment provides guidance on the choice of compliance values in the design of a MSD network for locomotion. Choosing the global compliance to optimize a robot of a given mass is conditioned by the frequency at which we plan to actuate the robot. Also, structures with more nodes tolerate a broader range of frequencies while keeping stability.

#### 2.2.3. Performance Limits with Constrained Power

So far, we have used a loss function that combines performance with respect to both traveled distance and energy consumption. However, it may be beneficial to analyze them separately in order to understand the limiting factors and to observe what can be the best compromise between them. The following experiment also allows us to qualitatively characterize the gaits of our structures and to observe possible transitions between different modes.

For this purpose, several optimizations have been performed by constraining the power and forcing their saturation to different values. In this way, one can expect to observe what is the maximum distance an individual can reach for a given power. Since we work outside the boundaries of the desirable operating range of the original cost function, we have now increased the reference value *Dref* to 1,000 m in order to avoid a saturation effect due to the cost function itself.

**Figure 7** shows the evolution of the optimal speed as a function of a constrained power budget. The best individuals are in the upper left corner. As might be expected from the conclusions of the previous section, the 3-Hz frequency gives the best results. Concerning the shape of the curve, we can see that the maximum speed increases almost linearly until 15,000 W and starts saturating beyond that.

This saturation highlights the limits of our model. It helps to understand which factors such as the spring saturation, the ground friction, the air drag, or the geometry play a larger role in performance compared to the driving power. It also situates the previous experiments in the non-saturating range, which helps to appreciate their significance better.

Finally, for very low power, an energy increase does not seem to add any improvement and even the opposite happens for frequencies 1 and 4 Hz.

A visual observation of the locomotion is useful to give more insights about the possible gait transitions on this curve. For this purpose, we have produced a series of videos renditions of individual simulations provided in Supplementary Material. A qualitative analysis of those video shows that the most common gaits consist of displacing the whole structure along a wave movement (each node touches the ground a little after the previous one) or locomoting in two steps (the body touches the ground two times per period with a phase difference of 180°). Concerning the high power saturation, a video was made for each point of the 3-Hz curve. It shows that the most energy-consuming individuals present spring extension close to their saturation, which causes a loss of stability of the locomotion. In the same way, videos were produced in the low-power domain for the points on the 4-Hz curve. For the lowest power, a good two-step alternation of contacts between the body and the ground is observed, whereas the phase shifts between the different contacts with the ground are much less synchronous for the following individuals. The same results have been established each of the 5 times the experiment has been conducted. Progressively with increasing power, a two-step approach with robot–ground contacts phase-shifted by 180°comes up again.

In short, we can stress the role of the body design in locomotion through two principal observations: first, a saturation of the spring leading to a degraded operation in high power; second, a qualitative influence of the optimal gait on the performances for a given morphology and power consumption.

Frontiers in Neurorobotics | www.frontiersin.org

associated with a different efficiency.

# **3. CLOSED-LOOP CONTROL**

# **3.1. Materials and Methods**

The closed-loop control of the MSD network is performed through physical reservoir computing. In this setup, our goal is to reproduce the control signals at a time step *t<sup>k</sup>* using the physical states of the network at times *t<sup>k</sup>−*1*−<sup>n</sup> . . . t<sup>k</sup>−*<sup>1</sup> only. This is performed by training the weights of a linear combination (the readout).

#### 3.1.1. Setup

The closed-loop system is composed of different elements represented at **Figure 8**:

*•* The MSD structure that can be perceived as a physical reservoir because of its dynamics and high complexity. For each time step *tk*, the system's current state is evaluated using the acceleration vectors **a**[*k*], **a**[*k −* 1], and **a**[*k −* 2], which comprise both X and Y components of all the nodes. The choice of acceleration, instead of, e.g., speed, is based on the work of Caluwaerts et al. (2013). Trials using integrated quantities such as position or speed instead have also been evaluated but added a drifting error during training. Also, based on the same work, we choose a buffer size of 3 time steps. In our experiments, smaller values led to deteriorated results but larger ones did not show any significant improvements.

*•* A sensor filter, whose principal role is to model the physical limitations in acceleration sensing. It is composed of an amplitude threshold followed by a low-pass filter. The cutoff frequency at 6 Hz has been chosen very low to eliminate possible oscillations due to our numerical integration method while keeping the locomotion fundamental frequency and its firstorder harmonics. At the output of the filter, a vector **x**[*k*] is sent to the next element.

**FIGURE 8 | The principal components in the closed-loop learning pipeline consist in a readout layer whose weight matrix is trained at each time step and a signal mixer that gradually integrates the feedback in the actuation signal**.

*•* A readout layer, which computes the actuation signals for the next time step based on the current and previous states of the MSD:

$$\mathbf{y}[k+1] = \mathbf{W}\_{\text{out}}^T \ . \ \mathbf{x}[k] . \tag{11}$$

To learn the weights of the output matrix **W***out*, we use the FORCE learning method as in Sussillo and Abbott (2009), whose equations are the following:

$$\mathbf{e}[k+1] = f\_{\text{sigmoid}}\left(\mathbf{W}\_{\text{out}}^T[k] \, . \,\mathbf{x}[k+1] \right) - \mathbf{y}\_{\text{target}}[k+1] \tag{12}$$

$$\mathbf{P}[k+1] = \mathbf{P}[k] - \frac{\mathbf{P}[k] \cdot \mathbf{x}[k+1] \cdot \mathbf{x}^T[k+1] \cdot \mathbf{P}[k]}{1 + \mathbf{x}[k+1]^T \cdot \mathbf{P}[k] \cdot \mathbf{x}[k+1]} \tag{13}$$

$$\mathbf{W}\_{\rm out}[k+1] = \mathbf{W}\_{\rm out}[k] - \mathbf{P}[k+1] \ . \ \mathbf{x}[k] \ . \ \mathbf{e}[k+1]^T \tag{14}$$

$$\mathbf{y}\_{\text{training}}[k+1] = f\_{\text{sigmoid}}\left(\mathbf{W}\_{\text{out}}^T[k+1] \; . \; \mathbf{x}[k+1] \right),\tag{15}$$

where the estimate of the inverse of the correlation matrix **P** is initialized to **I**/*α*. The sigmoid function added ahead of the readout adds non-linearity in the control signal by saturating for too high values.

*•* A signal mixer to avoid a brutal transition from open-loop to closed-loop control. Its role is to incorporate gradually the readout output contribution to the target signal. It is defined by three parameters: the open-loop time *t*ol when the MSD network is run in open-loop mode only; the training time *t*train in which the contribution of closed-loop signal increases linearly and the percentage *β* of feedback in the full control signal before switching to closed-loop mode only.

#### 3.1.2. Parameter Tuning

The *α* parameter of the FORCE learning algorithm plays the role of a regularization variable in the process of learning the **W**out matrix. It must be selected in order to avoid an overfitting that would reduce robustness to undesired forces on the MSD structure but also to ensure a trained signal sufficiently close to the target. This is a major issue since a signal **y**trained with too much noise can easily cause a divergence in the locomotion limit cycle. Tests on signal noise robustness as presented in **Figure 9** allowed to estimate a value of *α* = 0.01 as a good compromise.

The open-loop training and running times can be estimated by analyzing the convergence error of the FORCE algorithm (see **Figure 10**) and are fixed to 12 s of open-loop learning followed by 38 s where the feedback signal is gradually added to the target signal to reach a value of *β* = 95% before closing the loop. Stopping the training before the actuation signal reaches 100% of feedback avoids convergence to a steady state as discussed in Caluwaerts et al. (2013).

#### **3.2. Results**

In order to determine the contribution of the system size in the process of learning its own locomotion gaits, we simulated MSD networks with different numbers of nodes and evaluated the distances traveled over the last 10 s in closed loop. The same simulation was carried out in open loop to provide a reference. The results of these simulations are presented in **Figure 11**. At first sight, it appears that the learning algorithm with its configuration can achieve performances of the same order of magnitude in open and closed loops for the structures between three and twenty-six nodes analyzed in this simulation. However, it is worth noting that MSD with less than 6 nodes already provided non-significant results in open loop.

Alternatively, the study of limit cycles gives an indication of the stability of closed-loop control. In **Figure 12**, we represented the temporal evolution of the internal states **x***<sup>k</sup>* in a 2 coordinate space obtained by PCA. Larger structures lead to smoother limit cycles in closed loop. The limit cycles even diverge from their basin of attraction for very small MSD networks. A simple interpretation is that more nodes lead to more cycles in the physical reservoir, which provides more robust trajectories in the principal

graph, we can deduce that 12 s of simulation is sufficient to consider the convergence of the readout weights.

**FIGURE 11 | In this picture, we plot the traveled distances for the last 10 s of simulation in open loop in blue and closed loop in red**. There is no crucial difference between the two curves, which seems to indicate that the performances in closed loop are close to the one in open loop for all structures.

components reference. This hypothesis is corroborated by analyzing the quality of the generated actuation signals. This can be quantified by plotting the Normalized Root Mean Square Error, as shown in **Figure 13**, which decreases with the number of nodes.

In conclusion, the morphology of MSD bodies has the capability to compute at each time step the next value on the parametric trajectories found in open-loop optimization with a sufficient accuracy for locomotion task. The computation and memory that

was previously embedded in an external controller can be fully distributed in the structure and the readout layer. The size and number of sensor measurements on the structure have a positive effect on the accuracy and stability of the feedback signal.

# **4. DISCUSSION**

In this article, we have tried to study systematically the influence of high-level design choices on the performance of MSD systems. Because of their analytical simplicity and their modularity, those body structures seem indeed adapted to conduct studies on the morphological contribution in the process of locomotion control. This research was divided into two main parts. On the one hand, an open-loop study focused on the benefits of body size to efficiency and stability. A similar analysis was also performed on locomotion frequency and helped to draw conclusions about how compliance can be chosen to increase optimal performance. On the other hand, we aimed at demonstrating the key role of morphology to generate control signals in a completely closed operation mode.

The different trials undertaken in open loop indicated the importance of the structure size to ensure optimal performance in terms of distance traveled and gait stability. Concerning compliance, its relation to the fundamental frequency of locomotion was used to demonstrate a link with the efficiency and to provide a specific suggestion in the design of optimal MSD systems. It has been noted that the frequency response of the different MSD networks shows a bell shape, displaying a degraded score for too high or too low frequencies and that the stability at high frequencies is better for larger structures. Finally, the behavior at different power values has highlighted the limits of the design in reaching high speeds, and a qualitative study has shown the effect of the gait evolution in this phenomenon.

In closed loop, the ability of MSD structures to generate their control signals on the basis of a single, fully connected layer of neurons has been attested. An increase in the size or the number of sensor signals induced a positive influence with regard to the limit cycle stability and the accuracy of the signals generated by the algorithm.

In future work, the main improvement should focus on increasing noise robustness and adaptability on different terrains and facing various obstacles. In this way, the goal is to provide a simple and generic locomotion primitive for complex structures, which learns how to perform actuator synchronization by harvesting the mechanical feedback while taking higher level control inputs such as the locomotion frequency. On the other hand, it would be interesting to generalize our conclusions to both real robots and biologically inspired dynamical models such as quadrupeds and bipeds.

# **AUTHOR CONTRIBUTIONS**

The experiments were conceived by GU, BC, FW, JDegrave, and JDambre and designed by GU and BC. The data were analyzed by GU with help of FW, JDegrave, and JDambre. The manuscript was mostly written by GU, with comments and corrections from FW and JDambre.

# **FUNDING**

The research leading to these results has received funding from the European Unions Horizon 2020 Research and Innovation Programme under Grant Agreement No. 720270 (HBP SGA1).

# **SUPPLEMENTARY MATERIAL**

The Supplementary Material for this article can be found online at http://journal.frontiersin.org/article/10.3389/fnbot.2017.00016/ full#supplementary-material.

**VIDEO S1 | This video presents several simulation renditions**. The different locomotion processes displayed are learned through optimization in open-loop control. It aims at providing the reader a qualitative understanding of the different gait types obtained when constraining the dissipated power at different levels. A discussion on that matter is given in Section 2.2.3 of the related article.

# **REFERENCES**


Sussillo, D., and Abbott, L. F. (2009). Generating coherent patterns of activity from chaotic neural networks. *Neuron* 63, 544–557. doi:10.1016/j.neuron.2009.07.018 Thijssen, J. (2007). *Computational Physics*. Cambridge University Press.


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Copyright © 2017 Urbain, Degrave, Carette, Dambre and Wyffels. This is an openaccess article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

# Modular Neural Mechanisms for Gait Phase Tracking, Prediction, and Selection in Personalizable Knee-Ankle-Foot-Orthoses

Jan-Matthias Braun1,2 \*, Florentin Wörgötter 1,2 and Poramate Manoonpong2,3,4

<sup>1</sup> Computational Neuroscience Group, 3. Physics Institute, Georg-August-University, Göttingen, Germany, <sup>2</sup> Bernstein Focus Neurotechnology, Georg-August-University, Göttingen, Germany, <sup>3</sup> CBR Embodied AI & Neurorobotics Lab, The Maersk Mc-Kinney Moller Institute, University of Southern Denmark, Odense, Denmark, <sup>4</sup> Bio-inspired Robotics & Neural Engineering Lab, School of Information Science & Technology, Vidyasirimedhi Institute of Science & Technology, Rayong, Thailand

Orthoses for the lower limbs support patients to perform movements that they could not perform on their own. In traditional devices, generic gait models for a limited set of supported movements restrict the patients mobility and device acceptance. To overcome such limitations, we propose a modular neural control approach with user feedback for personalizable Knee-Ankle-Foot-Orthoses (KAFO). The modular controller consists of two main neural components: neural orthosis control for gait phase tracking and neural internal models for gait prediction and selection. A user interface providing online feedback allows the user to shape the control output that adjusts the knee damping parameter of a KAFO. The accuracy and robustness of the control approach were investigated in different conditions including walking on flat ground and descending stairs as well as stair climbing. We show that the controller accurately tracks and predicts the user's movements and generates corresponding gaits. Furthermore, based on the modular control architecture, the controller can be extended to support various distinguishable gaits depending on differences in sensory feedback.

Keywords: artificial neural network, neural orthosis control, adaptation, gait classification, level walking, stair climbing, internal model, model invalidation

# 1. INTRODUCTION

Bipedal gait is inherently unstable (Winter, 1995; Milton et al., 2009) and therefore requires constant balancing and support from the lower limbs. This inherent instability can be amplified by changes in nerve, muscle, or bone status. Consequences can range from limited mobility to a complete inability of effective locomotion. Many different supporting techniques have been developed to regain mobility, e.g., crutches, splints, and prostheses, depending on the physical condition. Here, we focus on a knee-ankle-foot-orthosis (KAFO), a device which is attached to the lower limbs and provides mechanical support to its users.

Selecting and fitting such a supportive device for a patient are performed by professional staff. Based on a given patient's condition, the professional staff determines a device providing the support needed to enable or improve the patient's locomotion. This choice has to take into account the patient's remaining abilities, balancing the patient's need for support against the danger of excessive support which might prevent use of the patient's remaining abilities. In this light, it is

#### Edited by:

Florian Röhrbein, Technische Universität München, Germany

#### Reviewed by:

Huanqing Wang, Carleton University, Canada Xin Luo, Chongqing Institute of Green and Intelligent Technology (CAS), China Miriam Zacksenhouse, Technion-Israel Institute of Technology, Israel

> \*Correspondence: Jan-Matthias Braun janbraun@gwdg.de

Received: 23 December 2016 Accepted: 14 June 2018

Published: 25 July 2018

#### Citation:

Braun J-M, Wörgötter F and Manoonpong P (2018) Modular Neural Mechanisms for Gait Phase Tracking, Prediction, and Selection in Personalizable Knee-Ankle-Foot-Orthoses. Front. Neurorobot. 12:37. doi: 10.3389/fnbot.2018.00037 advantageous to maximize the patients' own contribution to train their remaining abilities pursuing the aim to preserve or regain the patients' mobility. This increase in mobility has, for example, been observed in terms of the range of kinematic parameters and walking speed (Irby et al., 2005, 2007).

From the patient's perspective, other factors contribute to the assessment of the chosen device and thus to the actual use or even abandonment of a device. For example, comfort in daily activities, the ability to fast and easily don/doff the device, and cosmetic properties, i.e., how the device alters the selfperception, or the perception of others (Bernhardt et al., 2006; Robinson et al., 2010; McKee and Rivard, 2011), have a high impact on patient satisfaction. Kaufman et al. (1996) presents several studies where abandonment rates "from 60 % to nearly 100 %" were observed. In Phillips and Zhao (1993), from 60 users of lower extremity braces, 35 abandoned their devices. In the list of top reasons, the authors cite whether the user's "personal opinion [was] considered in the selection process." Interestingly, it was not important if there were "alternatives to choose from." A follow-up survey on 250 veterans after 22 months of rehabilitation programs showed that only 16 out of 73 contactable patients were still using their braces. In other words, 78 % had abandoned their devices. In another example, in 31 % of 35 replies, patients expressed that they did not use their brace anymore while "60 % continued to use their wheelchair as their main means of displacement" (Mikelberg and Reid, 1981). Although the studies on device abandonment are from the 1980s and 1990s, they make clear the importance of the patient's opinion concerning the prescribed brace.

Another problem is side effects of orthosis use, which can arise from device limitations or mistrust toward its reliability. Gailey et al. (2008) gives an overview on gait deviations of prosthesis users, including a tendency to favor the intact limb, generating additional stress on the less impaired parts of the body which may induce secondary conditions. There are, for example, (1) degenerative changes (trunk asymmetry, osteoarthritis, and scoliosis), (2) pain (lower back, hip, and/or knee joints), and (3) general deconditioning. These changes can, for example, be observed as asymmetric and slower gaits. Compensatory gait patterns like "increased upper-body lateral sway, ankle plantar flexion of the contralateral foot (vaulting) hip elevation during swing phase (hip hike) or leg circumduction" are listed especially for orthoses with fully extended knees in Yakimovich et al. (2009). Mills et al. (2010) come to the conclusion that "there is a large amount of variability with regard to how patients respond to orthoses." These studies suggest that the patient's gait and deviceperception can be improved by individual fitting. As a deviceinduced gait change surfaces on long timescales (Irby et al., 2005, 2007), Robinson et al. (2010) are speaking about a "lifetime of adjustments," an approach for individual fitting has to facilitate continuous adaptation.

In consequence, to achieve patient acceptance in addition to an optimal medical outcome, one has to consider the user's impression of the device and its fitting as well as the specialist's opinion.

When looking at controller techniques, finite state machine based controllers (FSMs) provide a large fraction of state of the art approaches. In a FSM, the supported gaits are represented by a number of states, which define the control output. Given specific conditions, transitions between these states occur. To achieve adjustment in such an approach, parameters defining the control output or transition conditions may be changed. The number of states and transitions is typically left unchanged, with exceptions like Zlatnik et al. (2002), where a rule database is used. As the complexity and variety of the supported behaviors directly translates to the complexity of the graph, a higher number of states and transitions are needed. The process of designing or extending gait support has to ensure that the controller will not get stuck and that transitions support all possible changes in patient behavior.

Approaches to provide complex behaviors, i.e., more gaits, have to deal with increasing complexity. For example, a controller, consisting of three FSMs for walking, standing and sitting movements, is presented in Varol et al. (2010), where additional transitions switch between these individual FSMs. In Sup et al. (2011), a similar approach handles slopes of varying degrees, using three FSMs for level ground, as well as 5 ◦ and 10 ◦ inclinations. These approaches try to reduce FSM complexity by a divide-and-conquer approach, but only tackle sub-problems, e.g., either level and slope walking or level walking, standing, and sitting. A similar approach is gait (FSM) switching based on a Gaussion mixture models (Varol et al., 2008). The switching is based on a history of means and standard deviations in the input channels treated with dimension reduction; inputs are sampled with 1, 000 Hz. The behavior was optimized in terms of increasing the history length, resulting in a switching delay of 430 ms in the testing condition.

Other approaches try to provide adaptive control. Speed estimation (Herr et al., 2002) and slope estimation in standing (Lawson et al., 2011) try to achieve control which adapts to gait and environment by the adjustment of control parameters at runtime. These approaches are limited by the flexibility designed into the underlying state machines.

These controllers select between gaits chosen at design time and are partially able to adapt to environmental changes or walking speed. The included FSM controllers represent predefined gait models, only allowing to fit the behavior to the patient with design-time selected parameters. With more supported movements comes higher complexity in terms of an increase in states and possible transitions, which allow more parameters to be selected, still the designed gait model may not cover every individual gait. This problem of fitting the controller to the patient gets worse in case of orthoses, where the patient's conditions are more variable. The variability in patients conditions results in large variability in remaining abilities and, thus, in large variability in individual need for support. These varying conditions are often met with very individual avoidance or compensation strategies, resulting in unique gaits which can conflict with the general gait model.

In consequence, we identified five important problems, which dominate the success of a device: (i) individualization according to the patients' neurological status and remaining motor function. As a complication, orthosis patients can show very differentiated medical conditions which have to be compatible with the devices' support. (ii) Typically devices have a specialized design supporting a reduced set of movements, which limit the patients' mobility. (iii) The target group of orthotic devices is typically limited as a consequence of (i) and (ii). (iv) An asymmetric gait due to patients' favoring of healthy limbs leading to gait deviations and secondary conditions. (v) Device acceptance is strongly subjective and depends on users' opinions toward the device and their role in the selection process.

These problems have so far not been approached with a common concept. Here, we assume, that they can be addressed best on the controller side with a shift of focus on extensive patient fitting and behavior adaptation. Thus, we propose a personalized and patient centered approach, which individualizes via training with patient data. With an user interface, patients directly influence the control output, giving them direct feedback on a possible tuning. The patients' gaits are tracked in terms of gait dynamics, i.e., joint-sensor dynamics. This approach of relying on the sensor dynamics makes the controller independent of the actual mechanical structure of the device as well as, for example, the moments or joint angles the patients can apply and maintain. As controller training leads to a high affinity to the gait dynamics of the trained gait, a modular structure is presented. In this modular structure, the number of supported gaits is limited by the systems' ability to differentiate the gaits by their dynamics. These dynamics are determined by the chosen sensors. As the design is not geared toward a specific set of sensors, the modular controller is not explicitly limited toward specific movements. Thus, given suitable data, which signifies the new gait, the extension with a new gait is a simple, formalized process. With this approach, we want to overcome design limitations and extend the target group. Furthermore, training of individual gait with direct patient feedback in the tuning process may lead to a better fitting and understanding of the controller behavior, hopefully leading to more symmetric gait and better device acceptance.

Taken together, in this study we present gait dependent damping modulation based on gait phase tracking. Gait phase tracking is based on observed gait samples and therefore implements personalized gait support for single gaits. The damping modulation is implemented as a one dimensional mapping from the gait phase to the desired damping, which can be adapted via a simple user interface. Together, an implementation of (a) gait phase tracking paired with (b) suitable damping modulation constitutes a supported gait. The second contribution lies in the selection of a suitable gait from a set of supported gaits, allowing to adapt the controller's behavior to gait changes. The most suitable gait is selected based on the gait dynamics, which is predicted by internal models for the gait dynamics. Thus, for the second contribution, now three components constitute a supported gait: (a) gait phase tracking, (b) suitable damping modulation, and (c) a model to predict the gait's dynamics for gait identification. Gait selection on three such defined gaits, for walking on flat ground, stair climbing, and descending stairs, have been tested on a healthy subject with an orthosis prototype provided by Otto Bock. This prototype applies damping to knee flexion, providing support to the users' body. Based on the tests with this prototype, we provide performance data on the method's ability to linearly track the gait phase, as well as its ability to fast and reliably select a suitable single-gait controller. As our long term goals of the study, we aim to implement a fully adaptive controller with the patient's userfeedback. The feedback mechanism will not only enable the user to influence the devices behavior, but also provide the means to control changes made by an adaptive controller.

# 2. MATERIALS AND METHODS

In the introduction, we outlined the five general problems we see with current control schemes and our approach to them, like individualization, fitting, and behavior adaptation based on patient gait data with the inclusion of user feedback. Here, we outline the implementation based on the concrete control scheme (**Figure 1**) for gait phase tracking, prediction, and selection.

We present the hardware platform together with the sensors capturing its configuration in section 2.1. In section 2.2, we introduce the single gait controller as the core neural control module. It consists of gait phase tracking, the timing module, and the shaping module which transforms the gait phase into the control output. The single gait controller relies on the user interface to gather user-feedback (in section 2.3), which is only considered here, as it provides the control output as a function of the gait phase. Based on single gait control, section 2.4 presents predicting gait models and the gait-selection module, which will select a single-gait controller in accordance with current motion. As fundamental basis for the analyses, we describe how the segmentation of continuous recordings into steps has been performed in section 2.5. Finally, the experiments underlying this manuscript are presented in section 2.6.

## 2.1. Hardware

The hardware platform used during controller development and for tests with healthy walkers is based on the Otto Bock C-Leg <sup>R</sup> hydraulic damper attached to the knee joint of a knee-ankle-footorthosis. The damper allows the design of a semi-active orthosis, as it actively manipulates damping of knee-flexion with a motorcontrolled valve. The interface allows to position the valve in 100 configurations from effectively free motion (open valve) over high damping to blocked motion (closed valve).

While the overall design as leg splint with the knee damper system restricts the target group, we aim to have the controller as universally applicable as possible. Therefore, we implemented and tested with two hardware models (**Figure 2**). One hardware model has a compliant ankle joint (**Figures 2A,B**). It uses a carbon fiber bar of high stiffness beneath the knee which is directly attached to the foot, thus fixating the ankle. The other one has an orthopedic ankle joint (**Figures 2C,D**). The orthopedic ankle joint allows either free motion or to constraint the range of motion by blocking it or inserting a spring (comparable to the Otto Bock double action joints). Both models' hardware structure is similar to Otto Bock's C-Brace <sup>R</sup> system, which is equipped with a C-Leg <sup>R</sup> hydraulic damper itself. As each device was fitted for a different healthy user, the data wouldn't have been directly comparable. For this reason, we only include data from the device with the orthopedic ankle joint in this study, while pointing out that the controller itself is independent of the actual hardware structure.

different different hardware layouts shows how versatile the presented controller is.

A data acquisition interface allows sampling of the embedded sensors at 100 Hz. As sensors, we equipped angle sensors at the thigh and the knee-joint, and force sensing resistors (FSRs) in the soles between orthosis frame and shoe. The latter are very sensitive and show a binary switch characteristic due to their measuring range of ≈ 0.1 − 100.0 N. We therefore embedded them in a silicone layer to reduce noise from interactions between the orthosis frame and the shoe. We localized and fixated all sensors on the device to keep the procedure of device application as simple as possible. Additionally, sensor calibration to achieve full range input signals for the artificial neural networks does not have to be recalculated when reequipping the device.

#### 2.2. Neural Control for Gait Tracking

The application of damping to knee-flexion is a one dimensional control problem, where the controller determines a valve position regulating the desired damping. We assume, that the required damping can be determined from the gait phase, i.e., the configuration of the leg as represented by a suitable set of sensors. In practical situations, the controller has to cope with huge variances in space and time. Here, the controller is designed as a feed forward controller to achieve an immediate response to sensory inputs Es(t) in two steps. The timing unit estimates the phase of the gait ϕ(t) from the sensory reading Es(t), while the shaping unit determines the damping c(t) = c(ϕ(t)) given the phase (**Figure 1**). Separating these two steps allows to independently modify the gait tracking and the desired controllers behavior.

For the time discrete control system we write at time step t:

$$\text{tining unit}: \vec{s}^t \mapsto \varphi^t, \varphi^t \in [0, 1) \tag{1}$$

$$\text{shaping unit}: \varphi^t \mapsto \mathcal{c}^t(\varphi^t), \text{with} \mathcal{c}(0) \equiv \mathcal{c}(1) \,. \tag{2}$$

We chose to implement c with a radial-basis-function network as universal function approximator, as detailed in section 2.3.

The gait progress ϕ is modeled as a cyclic, angular variable (**Figure 3**), thus capturing the periodicity feature of walking.

The timing unit is implemented using a multi-layer perceptron network with sigmoidal activation function (Nissen, 2003) with four neurons in one hidden layer and two output neurons, representing ϕ as circular motion in the plane. Thus, the output function is similar to the periodic sensory inputs, which improves learning and accuracy.

$$\hat{\varphi}^t: \vec{s}^t \mapsto \begin{pmatrix} \mathfrak{x}^t\_{\varphi} \\ \mathcal{Y}^t\_{\varphi} \end{pmatrix} = \begin{pmatrix} \cos \left( 2\pi \varphi^t \right) \\ \sin \left( 2\pi \varphi^t \right) \end{pmatrix}.$$

In case of noisy sensors, a low-pass filter can be applied on top of the output function ϕˆ. With reliable sensors, this step is typically not needed. As it only leads to a small delay, it can be applied anyhow.

The gait phase can then be gained using the transformation

$$\varphi^t = \begin{cases} \frac{1}{4} & \text{for } \boldsymbol{x}^t\_{\boldsymbol{\varphi}} = 0 \land \boldsymbol{\upnu}^t\_{\boldsymbol{\varphi}} \ge 0\\ \frac{1}{2\pi} \tan^{-1}(\boldsymbol{\upnu}^t\_{\boldsymbol{\varphi}}/\boldsymbol{\upnu}^t\_{\boldsymbol{\varphi}}) & \text{for } \boldsymbol{x}^t\_{\boldsymbol{\varphi}} \ne 0\\ \frac{3}{4} & \text{for } \boldsymbol{x}^t\_{\boldsymbol{\varphi}} = 0 \land \boldsymbol{\upnu}^t\_{\boldsymbol{\varphi}} < 0 \end{cases}, \boldsymbol{\uprho} \in [0, 1).$$

To facilitate network training, sensor calibration is used to map all values to the range [−1, 1]. The calibration procedure uses: vertical thigh (0), to 90 ◦ flexion (1). The knee angle is mapped from straight (−1) to 90 ◦ flexion (1). For the force sensors, thresholds are chosen such that ground contact maps to ≤ 0 and a free foot to 1.

Training data is then segmented into steps (section 2.5) using the ground contact signal. The step duration l<sup>j</sup> is determined as the number of samples in the jth step. Then, the desired gait phase ϕ<sup>i</sup> and network output o<sup>i</sup> for each sample i are given as

$$
\varphi\_i = \frac{i}{l\_j},
\tag{3}
$$

$$
\vec{o}\_i = \begin{pmatrix} o\_1 \\ o\_2 \end{pmatrix} = \begin{pmatrix} -\sin(2\pi\varphi\_i) \\ -\cos(2\pi\varphi\_i) \end{pmatrix}. \tag{4}
$$

We chose the sensors with the aim to capture motion in terms of sensor dynamics, instead of relying on defined events. The timing module frees the sensory inputs from time dependencies and thus provides a device, gait, and patient independent description of gait progress. The second part, the shaping module, generates the control output. It is augmented by a user interface for direct user-feedback.

#### 2.3. User Defined Output Modulation

The damping function is tailored to the need of the individual user by incorporating user feedback in the shaping of the damping function c(ϕ).

The user interface (**Figure 4**) provides the samples to fit the Radial-Basis-Function network, which provides universal function approximation (Park and Sandberg, 1991; Buhmann, 2003). The sliders represent the applied damping c(ϕ) by values on a grid of supporting points, with the lowest position corresponding to no damping, the highest position to maximum damping. The Radial-Basis-Function network is updated immediately and thus users can immediately experience the effect of their changes. The choice of the Gaussian kernels' widths allows to choose the amount of smoothing of user-input applied during network-training.

We used a network of n = 10 equidistant nodes in the interval [0, 1). The Gaussian transfer function had a half-width of σ = q n 2 .

#### 2.4. Gait Recognition

The neural orthosis controller described in section 2.2 is gait specific: its timing and shaping units were designed to estimate

the phase of a specific gait and generate the damping appropriate for that gait. To support different gaits, we propose to train different controllers for different gaits, and to activate the proper controller based on model based gait recognition. As long as the gaits can be differentiated with the available sensory information (**Figure 1**), the number of supported gaits is not directly limited. The controller can be extended by providing single-gait controller modules together with internal models to identify the corresponding gait (Braun et al., 2014).

Gait recognition is based on the prediction of sensory input: each gait is associated with a predictor P that predicts the sensory input Es t+1 for the next time step based on a subset of the sensory history H<sup>N</sup> = Es t ,Es t−1 , . . . ,Es t−N+1 , where N is the history length. A comparison of the prediction of the next time step's sensory reading pE t+1 to the actual sensory reading Es t+1 in the next time step defines a prediction error. The prediction error allows the decision unit to determine the best fitting gait model by choosing the model with the smallest error within predefined acceptable bounds of error.

To estimate reasonable history lengths N, we assume a minimal step duration Tstep of

$$T\_{step} \gtrsim 1 \text{ s},$$

for complete steps with the orthosis. Further, we assume a stance to swing duration ratio of ≈ 60 : 40 and that gait changes can occur at any time<sup>1</sup> . When the gait changes, the history contains two gaits and will naturally lead to diverging predictions of models trained on a single gait. While this prediction error is critical to achieve a fast invalidation of the old gait, we want the history to contain only one gait swiftly afterwards. To estimate a reasonable scale of the new gait's duration in the transition step, we go for a fraction of 50 % of the swing phase as the smaller gait phase. This translates to around 20 % of the step length. Given the hardware specific sampling frequency of 100 Hz, we determine the maximum number of samples in the history NHistory to

$$N\_{\text{History}} \lessapprox 20 \text{ samples} = \frac{1}{5} \text{ s.}$$

When the history length is chosen larger, a gait change could stay longer in the history than the above requested 20 % of the step length.

The history length is a trade-off between the prediction's accuracy and the supported frequency of gait switches. The prediction accuracy should increase with a longer history, which can cover more details leading to better discrimination. In contrast, the frequency of supported gait switches will decrease, because data from different gaits in the history will lead to lower quality predictions while conflicting gaits stay in the history. The choice of T<sup>H</sup> = 1 5 s allows several gait switches per step with quite accurate results, as shown below.

#### 2.4.1. Predicting Gait Models

The predicting gait models were implemented like the timing unit of the feed-forward gait controller above. Based on a history of sensor data, a single channel was predicted by the predicting model. To this end, a multi-layer perceptron network with 3 neurons in the hidden layer and one output neuron was trained on recorded gait samples to predict the sensory inputs using a history. The history was implemented as a delay line (**Figure 5**).

First experiments showed that the backpropagation learning algorithm tended to exploit the last sensor reading as a good prediction of the next time step. Ignoring most of the history, this was equivalent to the approximation with a constant value, predicting the next time step's state almost only on the preceding one as the error of the approximation was in many cases of the order of magnitude of the signal change, considering step to step fluctuations and sensor noise. This prediction on only one time step was independent of the trained mode. Thus, this approach had no predictive power which related to the actual gait's dynamics.

As a consequence, only a subset of the history is actually used for prediction. Especially for the predicted channel, the current sensory reading is omitted and only older values are used. Effectively, we have coarsened the history to a grid with a width of 101t (**Figure 5**), to exclude simple models. Besides solving the problem of forwarding of the last reading, this sparse selection reduces the computational complexity of the models greatly.

Therefore, to predict channel i pE t+1 i , we use sensory readings of all other channels for t, t − 9, and t − 19, but only the readings for t − 9 and t − 19 of the predicted channel (**Figure 5**).

#### 2.4.2. Prediction-Based Gait Selection

The part of the controller, which selects the current gait based on the model's predictions, will be called the decision unit, in accordance to previous naming conventions. The selection

<sup>1</sup> If we consider, for example, **Figures 13**, **14**, they show a transition from descending stairs to flat walking. We see that after the heel strike follows a period, where the dynamics seem to mostly follow the stair regime. This is followed by a period where differences to both gaits occur. Here, it is difficult to determine the gait until after the maximum knee flexion, when the gait dynamics converge to those of the flat walking regime. As the flat walking gait is only reached during the extension of the knee, we assume that the new gait is reached during the swing phase. Still, the transition changes the whole step's dynamics.

probability of the network to choose it as the best prediction.

process chooses the gait model which minimizes the prediction error Ee t <sup>j</sup> =  pE t j − Es t 2 for the gait j with the sensory inputs Es t 2 . The subscript 2 indicating that the ground contact signal has not been used for the calculation of the prediction error. The absolute value was chosen to increase sensitivity to the amplitude of the prediction error, preventing the low pass filter (below) to average out fluctuations.

These prediction errors often occur in relatively short intervals of the step. Thus, we apply post-processing in form of a low pass filter to prolong the time span that the final fitness-measure is usable.

$$
\tilde{e}\_{i,j}^t = (1 - \beta)\tilde{e}\_{i,j}^{t-1} + \beta e\_{i,j}^t,\\
\beta = 0.9, \quad i \in \{\text{knee,hip}\}.
$$

Model- (j) and channel (i) -specific thresholds θi,<sup>j</sup> suppress prediction noise in the low pass filtered errors e˜ t i,j . These thresholds are chosen for each gait j individually based on the error signal e˜ t i,j for matching gait samples. Remaining prediction errors are counted if they are greater than this threshold θi,<sup>j</sup> (**Figure 6**) for each predicted channel i ∈ knee, hip .

$$f\_i^t = \alpha \cdot \begin{cases} \int\_{\mathbb{T}}^{t-1}, & \text{if } \tilde{e}\_i^t < \theta\_i\\ \max\left(f\_i^{t-1} + 1, 2\right), & \text{else} \end{cases}, \alpha \in \{\mathbb{R} | 0 < \alpha < 1\}.$$

This count f t i is limited to the range [0, 2] and decays with factor α = 0.99. The factor α and the maximum value 2 are chosen such that the value is significant on timescales of steps.

These f t <sup>i</sup> measure the unfitness of the model's predictions per channel and are merged with a gait specific weight γ<sup>j</sup> to reflect the importance of the individual channels,

$$f^t\_j = \gamma\_j f^t\_{thigh} + \left(1 - \gamma\_j\right) f^t\_{knee}.$$

Finally, all gaits with f t <sup>j</sup> <sup>&</sup>gt; 1.1 are discarded and the gait with the lowest f t j , i.e., smallest unfitness, is selected from the remaining gaits. Its feed-forward controller operates the current time step.

#### 2.4.3. Training of Prediciting Models and Selection

Training of predicting models is analogous to the feed-forward controller's timing unit and can use the same recordings. The recordings should reflect the variance in the user's gait and should not be too regular. Then, the perceptron is trained using a backpropagation algorithm.

To improve the performance of the internal models, each model scales the sensory inputs such that typical signals lie in the range (−1, 1). Of course, in addition to optimal working conditions, such a scaling will help to differentiate huge changes in amplitude which might be connected to different gaits. In a converse argument, it supports bad predictions for gaits with too low or too high amplitudes in comparison to a model's training data set.

#### 2.5. Step Segmentation

Segmentation of gait data by step boundaries is needed to create training data as well as for the analysis of the tracking unit's and gait recognition performance. As typical in the literature, the heel-strike marks beginning and end of a step (**Figure 3**), which we determine by flanks of the pressure onset at the heel FSR. Due to the high sensitivity of the sensors and interaction with the orthosis frame and foot, only onsets can be detected and we have to apply filters to compensate varying amplitudes and fluctuations.

The sensory data is assumed to be in the range [−1, 1] with 1 no pressure and −1 high pressure. To improve robustness, we use a hysteresis to detect state changes, changing to ground contact when the sensor goes below 0, and to free heel when > 0.8. Heel-strike detection is implemented with a finite state machine as:

	- a. If ground contact and the current sample is above threshold for free heel, then change the state to free heel.
	- b. If free heel and the current sample is below threshold for ground contact, then change the state to ground contact.

This list of events now describes the heel-strikes in the given recording.

# 2.6. Experiments

#### 2.6.1. Gait Phase Tracking

For single gait support, the following statements hold. (i) We make no assumptions about when and what kind of support

the user needs. (ii) The damping function c is smooth (due to the representation as an RBF function). And (iii) the applied damping at knee-flexion is a direct function of the gait phase and thus of the sensory input c<sup>t</sup> = ct(ϕt(Est)). Thus, the applied damping c only changes when the gait phase ϕ changes and, in consequence, the controller's ability to apply the desired damping at any gait phase solely depends on the properties of the gait phase ϕ. Thus, the quality of the gait phase ϕ determines the quality of the control output.

Ideally, the gait phase ϕ produces a linear mapping for constant motion velocity, as it guarantees the same detail of control for all phases of gait, i.e., control accuracy does not depend on the gait phase. Thus, we investigate the linearity of the gait progress representation and the timing of the heel strike after training. We compare to the ideal gait phase ϕ ′ according to section 2.5, which can only be derived after the heel-strike and therefore has to be acquired for offline processing. To evaluate steps of different duration, we will resample and interpolate each step to 200 samples, leading to an ideal slope of 1ϕ′ = 1 <sup>200</sup> . The deviation of the controller's gait phase to these will be investigated in terms of linearity, monotony, and smoothness.

Furthermore, the timing of the gait phase has to match the timing of the step to provide a useful representation for users, such that the tuning of the control output with the user interface (**Figure 4**) can be done intuitively.

#### 2.6.2. Gait Selection

Due to the controller providing body support at the knee level, it assists in the stance phase, while in the swing phase, all singlegait controllers provide free knee swinging. We therefore argue, that the most important aspect for secure and seamless operation is timely gait switching to prepare for the heel-strike. Thus, to evaluate the accuracy of gait recognition, we check that the controller not only classifies the step correctly but also achieves a correct result prior to heel strike. To quantify correctness and timing, we analyze a walking sequence where a healthy user annotates the intended gait, for example flat walking, stair climbing, and descending stairs. The inclusion of descending stairs requires that we have to deactivate the damping unit for security reasons.

Then, we analyze step by step and measure the time ahead of the heel-strike that the decision unit recognized the step's final gait. **Figure 7** shows how the user's annotations are compared to the controllers' classification: For each step, we compare the controllers' classification against the last valid user annotation. If both match, we measure the duration the correct classification was known and set this duration in relation to the duration of the swing phase to allow the comparison independent of the actual step length. We call this fraction the range of certainty. A range of certainty of zero means that the correct gait was not known prior to heel-strike. For a range of certainty of one, the controller was certain of the used gait for the whole swing phase.

Thus, the range of certainty allows to asses if the controller is able to apply the correct gait model during swing phase, where all single-gait models will provide free knee motion. We then analyze the average success rate and range of certainty for all tested gaits, to determine if the presented controller in combination with the sampling frequency of the data acquisition board can react to gait changes. Then, we quantify the controller's ability to differentiate the tested gaits against each other with the selected set of sensors. We conclude with the investigation of gait changes for steps showing conflicts between the user's annotation and the controller's classification.

To access the orthosis controllers accuracy, we take a reaction time into account. At a sampling rate of 100 Hz and step lengths in the experiment between 1.3 and 1.8 s, a range of certainty of 3 % guarantees that the orthosis controllers' classification is in time for heel-strike.

# 3. RESULTS

#### 3.1. Gait Phase Tracking

The experiments conducted aim to show that a trained gait model is able to track gait progress better than a model trained for other gaits. Control quality depends on the smoothness and monotony of the tracked gait phase ϕ, which we quantify in terms of linearity and the distribution of increments, e.g., discrete slopes. Furthermore, the accuracy of the tracked heel-strike is used as a measure for correct timing.

In a first run, the single gait controllers were trained on runs with 49 steps on even ground and 59 steps on stairs. In a second run, we record the gait phase ϕ of these two controllers for later comparison to the ideal gait phase ϕ ′ . We analyzed 30 steps on even ground and 38 steps climbing stairs of a healthy subject wearing the orthosis. Steps at gait changes were manually removed, due to issues discussed later.

for walking on flat ground and stair climbing on their native and the opposite terrain. The native models (A r <sup>2</sup> = 0.88, D r <sup>2</sup> = 0.72) produce smoother gait phase output in comparison to the unfitting models (B r <sup>2</sup> = 0.33, C r <sup>2</sup> = 0.06). The latter expose phase shifts and strong deviations the ideal gait phase ϕ ′ indicated by the dashed line. The coefficient of determination (r 2 ) supports the notion that the native models are generally following the ideal linear relation. The lower number of steps in (D) increases the influence of the step segmentation, degrading r 2 .

In **Figure 8**, the controller-derived gait phases ϕFlat and ϕStair are plotted against the ideal, offline computed gait phase ϕ ′ . In the case where the model matches the user's gait (**Figures 8A,D**), the ideal gait phase is approximated well. In the mixed cases (**Figures 8B,C**), where the model does not match the gait, the controller's heel strike has a phase shift against the real event. In addition, the flat ground model on stairs (**Figure 8B**) shows 4 steps with almost constant intervals between the steps. The stair climbing model on flat ground (**Figure 8C**) fails to reproduce the gait phase completely; it only oscillates between 0.2 and 0.8. The r 2 values in **Table 1** support that the native model is close to linear and significantly better than the non-native model.

The accuracy in timing of the heel-strike is accurate only for the trained gait, as the phase shift in **Figure 8** and **Table 1** shows. Considering the sampling frequency of 100 Hz and an average duration of 150–200 samples, the gait phase should progress by <sup>360</sup> ◦ <sup>200</sup> – 360 ◦ <sup>150</sup> <sup>=</sup> 1.8 ◦–2.4 ◦ per sample. This value matches with the average precision shown in **Table 1**, which is determined by averaging the phase shift indicated in **Figure 8**.

The distribution of increments 1ϕ in **Figure 9** supports these observations. When considering the variation of increments around 1ϕ′ = 1 <sup>200</sup> , we considered the interval - 1 21ϕ′ , 21ϕ′ . Using this interval, we allow a variation of up to a factor of two in each direction, but do not count extreme or negative increments, as the standard deviation might have. For level walking (**Figure 9A**), the fitting model has 69 % of increments in this interval, while the model for stair climbing only has a fraction of 31 % inside this interval. In the case of steps on stairs (**Figure 9B**), 65 % of the increments are inside for the fitting model and only 40 % for the flat walking model. The histogram for the native models (in red) has its maximum around the optimal slope with lower standard deviation (**Table 1**). For the mixed cases (in blue), the optimal slope is not in the center of the distributions but has a maximum around zero and larger standard deviations. Furthermore, we see the presence

#### TABLE 1 | Linearity of the graphs in Figure 8 according to the r 2 values for a linear regression.


Timing accuracy of heel strike based on the heel-strikes' phase shifts as shown in Figure 8. The standard deviation and skewness relate to Figure 9. For flat ground based on 30 steps and while stair climbing (38 steps for the flat model and 31 steps for the stair climbing model).

(B) Increments while stair climbing.

of significant negative changes for the stair climbing model on flat ground and an increase in larger values in the case of the flat ground model on stairs, i.e., less monotony and smoothness.

The ability to apply a damping pattern to steps of varying length is shown in **Figure 10**. As the abstract gait progress ϕ removes any time dependency from the input, changes in step duration and length are transparently handled. The red bars in **Figure 10** indicate unit lengths: the steps to the right are twice as fast as the ones to the left.

#### 3.2. Gait Selection

In this section, we test the hypothesis that a set of feedforward single-gait controllers can be combined into to a multi-gait controller that enables the correct feed-forward controller to support a wide range of motions. Therefore, to evaluate the accuracy of gait recognition, we have to show that the gait recognition provides a correct result and that this result is available in time for the controller to react to gait changes.

The experiments include walking on flat ground, stair climbing, and descending stairs performed by a healthy subject.

progress ϕ removes any time dependency from the input. Step duration and length are transparently handled. The red bars indicate unit lengths: the steps to the right are twice as fast as the ones to the left.

Prior to use, the gait models were trained with 146 steps for walking on flat ground, 35 steps for stair climbing, and 32 steps for descending stairs. The difference in training set sizes is due

pre-damping.

to every stair run including steps of flat ground and the gaits on stairs being comparatively exhausting. Three independent recordings with 215 steps were used in the evaluation. These include gait transitions between 81 steps on flat ground, 64 steps mixing flat ground and stair climbing and 70 steps mixing flat ground and descending stairs. As the staircase used in the experiment is comprised of sequences of 5 stairs, each of the mixed runs includes the high number of 36 transition steps.

The development of gait certainty over the swing phase (**Figure 11**) shows that the gait for 83 % of steps was known at toe-off. The fraction of correct classification now increases toward above 94 % at heel-strike. This high accuracy is indicative of the fact, that most steps stem from step-sequences with the same gait. Furthermore, it indicates that many gait changes occur during swing phase.

The final classification accuracy, with ranges of certainty of at least 3 %, are plotted in **Figure 12** as confusion matrix between

the user annotations in the rows and the controllers' classification in the columns. Note that the additional column unknown gait in the controllers' classifications, which counts cases, where the prediction errors are unacceptably high for all gait models. In these cases, the application of a fall back controller allows safe operation, for example, knee locking on ground contact, although it is most likely less comfortable. In general, the confusion matrix shows high classification rates between 87 % for descending stairs, 95 % for walking on flat ground, and 100 % for stair climbing. Furthermore, we see a number of steps, where the gait recognition was unable to differentiate or even mixing up walking on flat ground and descending stairs. The wrong classified steps are one transition step each for descending stairs and walking on flat ground.

The dynamics of knee and thigh angle (**Figures 13, 14**) show the transition step between descending stairs and walking on flat ground. It is easy to see that these steps are neither similar to one nor the other gait in 2D when plotting the angles over time, or thigh angle against knee angle. For the predicting models, which are working on a higher dimensional history, the dissimilarity is even more drastic. As a consequence, prediction errors are high for all models for this kind of gait transition step. It is to be expected, though, that many of these transitions fall into swingphase transitions, where highly varying dynamics are possible and the actual control output is not that important for a device supporting mechanically.

# 4. DISCUSSION

The presented neural mechanisms set out as an adaptive orthosis controller, empowering users to control device behavior.

### 4.1. Gait Phase Representation

We implemented a neural single-gait controller to individualize gait support in terms of (1) the patient's gait dynamics with learning from observation and (2) direct user feedback with

FIGURE 13 | Example for mismatch between user-label and gait detection at the transition from stair descent to flat walking. The transition step clearly deviates from earlier and following steps in that it shows mixed characteristics (Figure 14).

an interface for tuning, placing the patient in the loop. The gait phase abstracts gait dynamics and thereby removes dependencies on remaining abilities, except the ability to initiate motion. Furthermore, the gait phase removes the time domain from the sensory inputs. Thus, it transparently supports gaits of different speeds and step lengths (**Figure 10**) as well as standing; it provides immediate reactions to regular and critical events like stumbling. Variability in the training set enables use in varying environments such that a level walking controller supports even ground as well as slopes of several degrees (up to ±15 ◦ were tested but not presented here).

The presented user feedback is a minimal implementation, which allows to define an arbitrary damping function c in sufficient detail and allows the user to adapt c at run-time. It allows the users to understand the controller behavior in an experimental way: this way the users can develop an intuition of how changes to c modify the controllers behavior. Furthermore, it simplifies the mapping from the gait phase to a valve position. Calibration and transformation are not necessary, as the user implicitly deals with these nonlinear operations. From the users' perspective, the user interface allows to define the level of support required. More important, their opinion is directly included in the controller's behavior. This inclusion of the patient's opinion concerns one of the top reasons for device abandonment see (Phillips and Zhao, 1993) and references therein.

Quantitative measurements verify instant reaction to motion, and high accuracy in timing and tracking of the patient's gait. We validated experimentally that the timing unit determines the heel-strike with high accuracy in the order of the sampling frequency. Furthermore, testing under the assumption that the recorded steps were ideally and steadily progressing, the timing unit was shown to approximate a linear progression of gait phase for trained gaits. Our generic approach of function approximation as representation for the control output provides intuitive tuning of the control output.

While the accuracy of gait phase tracking shows that gait models for quite different gaits can be learned, like flat ground walking and stair climbing, it also makes clear, that training leads to specialization of the feed-forward controller. To support movement in different environments, the controller has to be extended with controllers for multiple gaits in such a manner that free motion is possible.

#### 4.2. Gait Selection

Specialization to one gait in the single-gait controller is overcome by a gait selection process based on predicting models. To support a gait, the controller therefore needs (1) a timing module with training samples, (2) the desired output shaping module, and (3) a predicting model which can be trained with the same samples as (1). This modular control approach overcomes design problems which typically restrict supported motion and the patient target group.

Based on the internal models' prediction errors, the gait selection swiftly chooses a single gait controller with the best fitting dynamics. Eighty-four percent of the steps in our experiments were already correctly identified at heel-off, most of them as part of a sequence of the same gait. But, the ≈ 84 % steps include at least 50 % of the 72 transition steps. The ≈ 13 % steps, which are identified between heel-off and heel-strike, indicate that gait recognition has to perform continuous. **Figures 13, 14** indicate that the gait dynamics is not bound to switch at any specific point and shows the flexibility and precision of the presented approach. For example, the initial step after standing phases is typically handled by the stair climbing module, which supports only vertical lift-off.

A fall-back controller, based on the ground contact sensing FSR, enables save operation in cases when gait dynamics are not matched by a model. The requirement for a fall-back controller is especially associated with transition steps, which often are singular events. The use of a history enables swift detection of changes in the motion. But at the same time, a gait change in the history will reduce the precision of predictions. Therefore, the history length not only determines the accuracy of gait prediction, but it also determines the frequency of changes, which can be tracked.

# 4.3. Advantages and Limitations

The greatest advantages of the presented approach lie in (1) its flexibility, as only the equipped sensors determine which gaits can be differentiated, (2) its implicit support for stumbling support, due to ground contact directly shifting the gait phase toward stance phase, (3) device independence, and (4) independence of remaining abilities, as long as circular motion can be initiated.

A difficulty in the evaluation of the presented approach lies in the handling of transition steps. As gait transitions can seemingly happen anytime, training data will not cover them in all possible variations. This singular nature of transition steps was not captured in the user annotations. Nonetheless, the results show that the controller is able to choose a gait with similar dynamics in many cases. In these cases, the user's annotation describes an intention, but not necessarily provides the best match to gait dynamics. In other words, the annotations are only valid for steps without transitions, for which excellent recognition rates could be seen even with 100 Hz sampling rate and 3 channels. For transition steps, a broad selection of training data will allow to address many transitions. For all other cases, the fallback controller has to provide save operation, i.e., guarantee support in stance phase, which can be achieved with the FSRs.

A general problem is the question of the number of supported movements. While three gaits were sufficient to control all motions during experiment sessions, it is still unclear how many gaits need to be supported for comfortable operation in everyday life. At the same time, support for more gaits might fill gaps in gait transitions as more independent motions are supported.

The presented control approach integrates the user into the tuning process and allows to directly model individual movements. We believe that this approach improves the handling of gait deviations and device acceptance. Still, the presented experiments have been conducted with a healthy subject. Thus, patient tests have to be undertaken to understand the interaction and consequences for real patients.

For patient tests, the user interface should be simplified. Instead of defining the damping function via a set of function values over a grid of support points, more suitable parameters should be chosen. A promising idea would be to focus on the start and end points of the support periods. Considering these together with the amplitude and the slope should provide an interface which is easy to understand, but even easier to handle.

# 4.4. Gait Phase Tracking in the Literature

Li et al. (2014) aim for a similar result by gait phase tracking on the contralateral leg. Inference of the controllers internal gait phase is based on the assumption of a constant phase shift to the ipsilateral leg. Besides practical issues with the instrumentation of the contralateral leg which directly impact comfort of use and visual appearance, it is important to note that constant phase shift can only be assumed in non-critical situations. Especially when stumbling or external forces disturb this phase shift, the contralateral leg does not reflect the device's state. The presented approach always faithfully reflects the ipsilateral state, keeping the procedure of device application to the ipsilateral leg.

The first prosthetic device to reduce its wearers energetic cost of walking were presented in Malcolm et al. (2013) and Mooney et al. (2014). The effect was highest, when device activation was triggered at ≈ 43 %. This result indicates that control based on gait progress presents an interesting approach to pursue.

# 4.5. Multi-Gait Support in the Literature

Besides the here presented prediction error to invalidate gait models, many other approaches (Meyer, 1997; Mazzaro et al., 2005; Ding, 2008; Varol et al., 2008, 2010) are proposed. They are based on, for example, Gaussian mixture models or hidden Markov Models. Unfortunately, all of these studies discuss different selections of gaits, gait variations. For this reason, an actual performance comparison is difficult and would most likely be possible for image sequence based approaches (Mazzaro et al., 2005; Meyer, 1997), which are unfitting for prosthetic devices due to their outside-view on the walker. Furthermore, this study was working with a healthy walker. Still, average success rates between 83 and 94 % are comparable to vision based model invalidation approaches (Meyer, 1997; Mazzaro et al., 2005).

Besides image sequence based approaches, the literature mostly covers active prostheses. Due to space and weight limitations, active prostheses are of higher practical relevance than active orthoses, and therefore more present in the literature. Here, we will not cover technical differences, but focus on the controller.

Lawson et al. (2013) present a prosthesis controller for stair ascent and descent. The FSM architecture prevents the easy inclusion of other gaits and the missing support of level walking omits the region of high model overlap in this study. Sup et al. (2011) presents a hierarchy of FSMs, where one outer FSM with a slope estimator selects from three slope specific FSMs for 0, 5, and 10 ◦ , respectively. While the fixation to slopes is incompatible with the gaits of this study, the addition of parameter estimation would provide beneficial input to the presented controller. A history based Gaussian mixture model differentiates standing and walking in Varol et al. (2008), selecting gait-specific FSM controllers on the fly. This study is based on seven signals sampled at 1, 000 Hz. An offline analysis was performed to reduce the dimensionality of the input for the Gaussian mixture models. In another step, the history length was increased, until the method provided a 100 % success rate. History frames of 50, 100, 200, or 400 samples were tried and finally a window size of 100 samples was selected with an overall delay of 430 ms. Later, this approach was extended to include sitting motion (Varol et al., 2010). The selection of sensors was described as task specific. In this study, the optimal delay was 500 ms.

All in all, the cross section of literature shows unique, and often incomparable gait selections and approaches. A similar approach with instantaneous selection was used in Varol et al. (2008, 2010) to differentiate standing, walking, and sitting motion. In contrast, the dynamics based gait tracking in our approach renders the recognition of standing superfluous. This focus on the device configuration simplifies data processing and needs neither explicit models of the device or gait nor expensive preprocessing. The presented work is based on only 3 sensors sampled at 100 Hz. Further improvements can be implemented with estimators of environmental parameters, sensors which provide differentiating inputs, or higher sampling frequencies. Especially with higher sampling frequencies, extensive optimization of the history and controller parameters, like amplification gains and weights could lead to significant improvements.

In comparison to biologically inspired modeling of modular motor learning and control, our control mechanism implements a partial function of the internal models for motor control proposed by Wolpert and Kawato (1998). The internal models are classified into three types: inverse internal model (the system calculates a motor command from a desired trajectory/state information), forward internal model (the system predicts sensory consequences from efference copies), and integrated internal model (the system integrates both inverse and forward models). In our case here, our shaping module acts as an inverse internal model that translates a user desired damping curve (i.e., desired trajectory) into a proper valve control command (i.e., motor command).

# 4.6. Outlook

Further optimization is possible with the many parameters in prediction, gait selection. Here, also other machine learning techniques can be employed (e.g., self-organizing learning of an adaptive resonance model Grossberg, 1987) is possible. The application of additional sensors, like torque sensors in the joints or IMUs, can improve differentiation.

Patients tests can show if the desired aims can be reached with the presented approach in real-world scenarios. Therefore, they are very important for future research.

The most interesting aspect is that our approach provides the building blocks for a completely self-learning controller. We demonstrated generalization of gait patterns, adaptation to changes in gait and in the environment as observed via gait changes. The user interface allows a user to adapt the support to individual needs. Still, at the stage presented here, the controller is not fully adaptive to a user in that it neither 1. automatically updates gait patterns, 2. damping output (lifetime of adjustments), nor does it learn new gaits on its own. Nonetheless, the modular structure allows to pursue these advanced aims. Additionally, other procedures (e.g., reinforcement learning or imitation learning) can be employed for offline training, where the subject provides the reward (good or bad) according to a given profile, or for fitting to the pattern of damping in human walking (Nakanishi et al., 2004).

Observation based training can be implemented at runtime, constructing and improving gait models continuously. A simple approach is to continuously add new samples to the training set and update the multi-layer perceptron networks's weights. The classification of recorded steps can be used to create new models, when new observations contradict existing models. In this way, bootstrapping of the controller can consist of a mostly generic model for walking on flat ground and an appropriate fallback controller. Then, during everyday usage, the controller adapts to the patient and vice-versa, while the patient can always influence the control output. Suggestions for automatic tuning could be generated and tested in accordance with the patient, based on gait quality assessment in the controller. In this way, patients would be empowered to fit their own orthosis, hopefully improving trust into and the general opinion of the device.

#### ETHICS STATEMENT

The experiment was performed in accordance with the ethical standards laid down by the 1964 Declaration of Helsinki. We followed the relevant guidelines of the Germany Psychological Society according to which this experiment, given the conditions explained above, does not need explicit approval by an Ethics Committee (Document: 28.09.2004 DPG: Revision der auf die Forschung bezogenen ethischen Richtlinien).

# REFERENCES


### AUTHOR CONTRIBUTIONS

J-MB, FW, and PM designed the research. J-MB developed the neural mechanisms. J-MB and PM carried out the experiments. J-MB and PM analyzed data. All authors wrote and reviewed the manuscript.

## FUNDING

This research was supported by the BMBF-funded BFNT & BCCN Gottingen with grant numbers 01GQ0810 (project 3A) and 01GQ1005A (project D1), respectively.

### ACKNOWLEDGMENTS

We thank our collaborators from Otto Bock, who provided us with the prototype and helped us with many insightful discussions.

of human walking. PLoS ONE 8:e56137. doi: 10.1371/journal.pone. 0056137


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2018 Braun, Wörgötter and Manoonpong. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Biomechanical Reconstruction Using the Tacit Learning System: Intuitive Control of Prosthetic Hand Rotation

Shintaro Oyama<sup>1</sup> \*, Shingo Shimoda<sup>2</sup> , Fady S. K. Alnajjar <sup>2</sup> , Katsuyuki Iwatsuki <sup>1</sup> , Minoru Hoshiyama<sup>3</sup> , Hirotaka Tanaka<sup>4</sup> and Hitoshi Hirata<sup>1</sup>

<sup>1</sup> Department of Hand Surgery, Nagoya University School of Medicine, Nagoya, Japan, <sup>2</sup> Brain Science Institute-TOYOTA Collaboration Center, RIKEN, Nagoya, Japan, <sup>3</sup> Brain and Mind Research Center, Nagoya University School of Medicine, Nagoya, Japan, <sup>4</sup> Department of Rehabilitation, Chubu Rosai Hospital, Nagoya, Japan

Background: For mechanically reconstructing human biomechanical function, intuitive proportional control, and robustness to unexpected situations are required. Particularly, creating a functional hand prosthesis is a typical challenge in the reconstruction of lost biomechanical function. Nevertheless, currently available control algorithms are in the development phase. The most advanced algorithms for controlling multifunctional prosthesis are machine learning and pattern recognition of myoelectric signals. Despite the increase in computational speed, these methods cannot avoid the requirement of user consciousness and classified separation errors. "Tacit Learning System" is a simple but novel adaptive control strategy that can self-adapt its posture to environment changes. We introduced the strategy in the prosthesis rotation control to achieve compensatory reduction, as well as evaluated the system and its effects on the user.

Methods: We conducted a non-randomized study involving eight prosthesis users to perform a bar relocation task with/without Tacit Learning System support. Hand piece and body motions were recorded continuously with goniometers, videos, and a motion-capture system.

Findings: Reduction in the participants' upper extremity rotatory compensation motion was monitored during the relocation task in all participants. The estimated profile of total body energy consumption improved in five out of six participants.

Interpretation: Our system rapidly accomplished nearly natural motion without unexpected errors. The Tacit Learning System not only adapts human motions but also enhances the human ability to adapt to the system quickly, while the system amplifies compensation generated by the residual limb. The concept can be extended to various situations for reconstructing lost functions that can be compensated.

Keywords: myoelectric prosthesis, artificial intelligence, biomechanical function reconstruction, motor control, magnetoencephalography, interactive musculoskeletal modeling analysis, muscle, sensory synergy

#### Edited by:

Christian Tetzlaff, Max Planck Institute for Dynamics and Self-Organization, Germany

#### Reviewed by:

Jing Jin, East China University of Science and Technology, China Jan-Matthias Braun, University of Göttingen, Germany

> \*Correspondence: Shintaro Oyama oyama.shintaro@ c.mbox.nagoya-u.ac.jp

Received: 16 August 2016 Accepted: 16 November 2016 Published: 29 November 2016

#### Citation:

Oyama S, Shimoda S, Alnajjar FSK, Iwatsuki K, Hoshiyama M, Tanaka H and Hirata H (2016) Biomechanical Reconstruction Using the Tacit Learning System: Intuitive Control of Prosthetic Hand Rotation. Front. Neurorobot. 10:19. doi: 10.3389/fnbot.2016.00019

# INTRODUCTION

When we lose a functional part in our body (e.g., reaching out, walking, trunk control, breathing, watching, etc.), we not only lose functional output but also sensory feedback. Every biomechanical movement is a result of computations in the central-nervous-system (CNS) and at the same time, consecutive sensory feedback prediction and modification of motor behavior goes on in the cerebellum, allowing us to accomplish natural motion, and construct changes in response to the external environment (Brooks et al., 2015). Therefore, reconstruction of lost biomechanical function should not only include fine motor control but also dense sensory feedback that precisely, bi-directionally, and with high frequency communicates with the CNS. However, even the most advanced neuromotor reconstruction technology has not accomplished this communication, and lacks appropriate feedback for natural function. Furthermore, construction of a practical and ergonomic mechanical system that adapts to environmental changes within seconds is difficult due to lack of flexibility in current artificial machine learning.

One typical challenge of reconstructing lost function is the functional hand prosthesis. These are widely used in reconstruction on forearm amputees and congenital forearm deficient individuals for restoring their ability to reach and grasp. Among these, body power and myoelectric prostheses are widely used for motor control. In the past, body powered prosthesis were advantageous in cost, intuitiveness and sensory feedback, but not in function. Thus, a great effort was required to accomplish more function and natural movement in myoelectric prosthesis (Ciancio et al., 2016).

Developments in technology over the past few decades has improved control on multiple functions, with a primary focus on minimizing user burden and increasing prosthesis' function. Nevertheless, increasing the number of myoelectric input channels resulted in non-physiological muscle activation that required exhaustive training (Schulz et al., 2005). Target muscle re-innervation (Kuiken et al., 2007) may be one solution, but is too invasive and less beneficial for transradial amputees which represent the largest proportion of individuals with upper extremity deficiency (Hahne et al., 2012). The development of pattern recognition and machine learning techniques of electromyography (EMG) signals increased the number of degrees of freedom (DOFs) while keeping the number of utilized electrodes low. However, this technique has a critical limitation of low adaptability to environmental changes (Ciancio et al., 2016).

Meanwhile, a large number of studies have used the brain's plasticity to quickly adapt and reorganize cross-modal sensory integration for sensory feedback reconstruction. Since most of the work focuses on tactile feedback for adjusting grip force, it is still a challenge to reconstruct natural sensory feedback and mimic natural control. Recently, a few studies have reported increased sensory information density by neural implants (Ciancio et al., 2016); however, neurophysiological studies have indicated that position in space is estimated by integrating information from multiple sensory inputs rather than direct input. Moreover, as this integrated feedback is noise-robust, useful and cost-effective, adding appropriate sensory integration may result in better reconstruction (Alnajjar et al., 2015).

In our natural motion learning, we use two different modes, i.e., explicit and tacit learning. The former occurs with learner's awareness, while the latter takes place subliminally. When we perform a motor skill, there is a variety in the status of our neuromotor situation, which is subliminal and highly coordinated to express low dimensional motion. The key to a natural control strategy is management of this inherent redundancy in the musculoskeletal system mediated by a high number of DOFs with low dimensional outputs (Metzger et al., 2012).

Recently, several studies have shown that muscle synergy is like a neural strategy that the CNS has adopted to simplify the control of our redundant musculoskeletal system. Additionally, the importance of integrating environmental inputs into suitable low-dimensional signals before sending them to the CNS for simplified control have been documented (Alnajjar et al., 2015). Yet the neural dynamics inside the CNS have not been investigated in detail. Shimoda introduced a biological self-regulatory adaptive control strategy called "Tacit Learning System" (TLS) for posture control with selfsufficiency. This system is designed for unsupervised acquisition of skills or creation of new behavioral structures for adapting to environmental changes. Signal accumulation is a key factor for "Tacit Learning" in the adaptation process and primitive behaviors composed of several reflex actions are gradually tuned into suitable behaviors for the environment (Shimoda and Kimura, 2010; Shimoda et al., 2012, 2013).

Shimoda and his team have succeeded in controlling 36 DOFs in a humanoid bipedal locomotive robot using this TLS and demonstrated a wide adaptation capability to a redundant motor-skeletal system along with robustness to environmental changes compared to conventional machine learning algorithms (Shimoda et al., 2012). We thus hypothesized that introducing TLS into the biomechanical structure as a subsystem will integrate it with muscle synergy to control implicit motion with adaptation to environmental changes, allowing the user to concentrate on explicit tasks like grasping in myoelectric hand prosthesis. Clinically, a compensatory strategy to the rotation function of a lost wrist, involves using proximal residual limbs to achieve the necessary motion, result in increased burden on users that limit prosthesis usage (Metzger et al., 2012). This rotation function of reaching is an example of implicit motion. Consequently, we performed experiments to evaluate the efficacy of TLS in a prosthesis hand model, by appointing the system to regulate wrist rotation to minimize redundant compensatory motion as a biomimetic regulatory system while performing reaching tasks.

# MATERIALS AND METHODS

In this study, a non-randomized experiment was conducted to evaluate efficacy of the TLS and its effects on the central nervous system during the prosthesis control tasks.

# Prosthesis Efficacy Evaluation in Bar Relocation Tasks

Seven men and one woman participated after giving informed consent. All participants were below elbow amputees, and experienced users of the conventional one-degree (hand open and close) myoelectric hand prosthesis. **Table 1** shows the participants' demographic data.

Each participant's prosthesis handpiece was exchanged with the TLS handpiece and their remaining arm sockets were used in the trials. The open/close signal detector on the handpiece was connected to sensors in the socket to allow the participants to control hand motions as usual. Since we could not find adequate forearm rotation tasks for hand prosthesis in the past literature, we placed three plastic bars (3 cm diameter and 10 cm length with the central 3 cm part covered with Velcro tape to increase grasp) horizontally on a table. Participants sat in front of the table, reached out to hold the bars, placed them vertically and then back to horizontally three times (**Figure 1**). This exercise counted as one trial. Initialization of the system was performed as a pre-trial. Participants were instructed to stay still for 5 s with their shoulders at 0◦ flexion, rotation, and abduction, along with elbows at 0◦ flexion. Subsequently, they were instructed to repeat the trials until the rotational support of hand prosthesis made no more improvement. Twenty trials were done in approximately 10 min, and this was sufficient for every patient to achieve convergence of the parameters. Angles of shoulder joints derived from three goniometers placed on the participant's shoulder (flexion, rotation, and abduction) were monitored and fed back to the TLS. Participants' movements were recorded with a computer vision based human body motion capture tracking system (Section Tacit Learning Handpiece and Data Preprocessing) First person sight video (FPV) (Video 1, 2) recording was performed by the camera (HERO, GoPro, Inc., CA, USA) attached to the prosthesis socket. We conducted descriptive type questionnaires to determine participant satisfaction and how effective the participants felt the system was.

# Tacit Learning Handpiece and Data Preprocessing

The system consisted of three goniometer sensors to measure angles of shoulder flexion (θ1), horizontal flexion (θ2), rotation


(θ3), and a handpiece with two actuators (rotation and grip) (**Figure 2**). One actuator was for handpiece wrist rotation. Rotation angle θ<sup>r</sup> was a desired angle of prosthesis wrist rotation, controlled by a low-level controller embedded in the hardware. The other actuator was for grip with an on-off control provided by surface EMG sensing which is commonly used by commercial prosthesis. When the shoulder angles exceeded pre-defined threshold values θ<sup>t</sup> (the value found at unnatural postures), the system tuned the control gain, accumulating extremity joint angles. The control and adaptation laws were defined as follows:

$$
\theta\_r = k\Theta - \theta\_r \tag{1}
$$

$$k = \int qdt\tag{2}$$

$$q = \begin{cases} \Theta \, |\Theta| \ge \theta\_t \\ 0 \, |\Theta| < \theta\_t \end{cases} \tag{3}$$

$$
\Theta = k\_1 \theta\_1 + k\_2 \theta\_2 + k\_3 \theta\_3 \tag{4}
$$

When a linear combination of residual upper limb joint angles Θ in Expression (4) exceeded the settled threshold angle θ<sup>t</sup> in Expression (3), primary reflex modulated rotatory assistance angle θ<sup>r</sup> depending on Θ in Expressions (1) and (2). Expression (1) was a speed control component of rotation. A previous mathematical study suggests that biological arm kinematics are optimized by total energy expenditure (Berret et al., 2008), which is positively correlated to the total joint angle Θ. Thus, we determined the control law of system as minimization of Θ. In this experiment, we set θ<sup>t</sup> = 1, k<sup>1</sup> = 0.1, k<sup>2</sup> = 0.1, and k<sup>3</sup> = 0.5 as the initial values.

#### Motion Capture System

Kinematic patterns of the participants' movements were captured with a motion capture system (Workstation 5.2.4, VICON). Twenty-four markers (spheres covered with reflective tape) were attached to various parts of the participant's body and prosthesis prior to the experiment. The motion capture system consisted of six cameras, which tracked and reconstructed the motion of each of the recorded markers in 3D space.

### Data Analysis

We focused on the tacit learning rotational control of prosthesis on this study.

Hence, we computed system energy consumption by using the software for Interactive Musculoskeletal Modeling (SIMM, MusculoGraphics, Inc., Santa Rosa, California, USA). It is a graphical software system for developing and analyzing models of musculoskeletal structures, and performs inverse dynamics calculations from motion capture data (Delp and Loan, 1995; Neptune et al., 2008). It creates a musculoskeletal model consisting of representations of bones, muscles, and ligaments by calculating the joint moments. In this study, we used a standardized musculoskeletal model calculated from the participants' body weight, height, and sex. Pre-trial system energy in all participants was normalized as one.

FIGURE 1 | A schematic figure of the trial. After moving three bars vertically, the participants were instructed to place these three bars back to where they were horizontally.

# RESULTS

All participants successfully completed their assigned tasks. Online video (Video 1, 2, 3) shows participant 3 working on his tasks. "After learning" represents 20 trials after the first one. Adaptation advanced in both wearer and prosthesis in a short while as shown in the videos (Video 1: Without TLS assistance. Video 2: After twenty trials). After 20 trials, the shoulder rotation angle (θ3) decreased in all participants as shown in **Figure 3**. Total system energy estimated by SIMM decreased in five out of six patients (**Figure 4**). Energy estimation was not possible in participants 7 and 8 due to failure of the motion-capture marker. **Figure 5** shows changes in the actual estimated system energy data during trials in participant 1. The graphs show system energy before and after TLS learning. The compensation rotation angle of shoulder [Θ in Section Tacit Learning Handpiece and Data Preprocessing, Expression (4)] in participant 1 decreased after TLS learning as shown in **Figure 6**. Seven out of eight participants were comfortable with TLS assistance. No participant required special training before the trials. After TLS learning of bar rotational tasks, participant 8 volunteered to open two types of drawers and turn the oven indicator (Video 4). **Figure 7** shows rotation angle of the prosthesis wrist during the tasks. Although TLS did not experience any of these tasks, it provided good

movement for forearm rotation tasks) decreased after trials.

assistance and showed generalized performance for rotational support despite changes in arm posture. This participant was also satisfied with the intuitiveness of TLS support as determined by the descriptive type questionnaires.

### DISCUSSION

A good hand prosthesis should reconstruct the original dexterity of human hands. While far from complete, in this endeavor we replicated one of the most complex biomechanical structures. Improvements in EMG signal analysis (Tenore et al., 2009), Targeted Muscular Reinnervation (TMR) including sensory feedback (Kuiken et al., 2007; Ohnishi et al., 2007; Li et al., 2010), brain interface (Yanagisawa et al., 2011), peripheral nerve interface (Navarro et al., 2005), and new training systems (Pilarski et al., 2011) were invented, but these methods required a certain period of special training or special surgery invasions. Furthermore, none of these methods satisfied the

contrary demands for intuitiveness, multi-functionality, and cost. Moreover, due to the lack of flexibility in present control methods to adapt with environmental changes including complex nature of the bio-signals, repeated calibration is often required by patients and physiotherapists (Ciancio et al., 2016).

The present work focuses on reconstructing each joint's movement, but not muscle synergy. Reconstruction of muscle synergy does not involve isolating and reorganizing biokinematic outputs from residual function, but expanding muscle synergy in a biological way. In other words, an optimization algorithm should be introduced for mimicking human-like motion and finding the natural output from residual limb to cope with. Shimoda described that human kinematic output is unpredictable for machines with a model-based strategy that does not represent certain posture situations (i.e., forearm rotations, elbow flexion/extension) due to intrinsic fragility (Shimoda and Kimura, 2008). To cope with all motions, it is necessary to model all possible posture changes and device control actions

in every model. To solve these issues, various types of biomimetic and self-organizing learning methods including artificial neural networks have been proposed, but the capability of current learning methods to adapt to unknown situations is not sufficient in terms of learning speed and the level of generalization.

The "Tacit Learning System" introduced by Shimoda has two main advantages for controlling the prosthesis compared to other control methods: learning speed and a simple, inexpensive system with intrinsic robustness (Shimoda et al., 2012, 2013). Furthermore, as Alnajjar described, this controller has a role in reduction of sensory stimulus dimension. This is called "sensory synergy" in contrast to muscle synergy. They defined "sensory synergy" as "a group of weighted sensory inputs whose function is to provide the quality of the resulting motion as feedback to the CNS through a single synergy recruitment signal in order to facilitate the generation of the next command, thus accelerating the search time for the optimal muscle synergy." In particular, in TLS, the controller modulates sensory synergies contributed by acquired sensory signals and inferred artificial sensory synergies into motor commands. Consequently, activated motor commands of the prosthesis enable intuitive motor control by the wearer and simultaneous confirmation with visual feedback. In short, the output of sensory synergy is used as an input to both the CNS and the TLS, and control signals for the prosthesis device are created through motor synergy that combines signals from the CNS and the prosthesis device (Alnajjar et al., 2015).

Our results from the bar relocation experiment convinced us that this system has high affinity toward the CNS. It was easy to add on the conventional system, required no special training, reduced users' burden and is low-cost. The level of satisfaction was high.

Recently, we reported a case report from a magnetoencephalography study on the effect of the TLS system on CNS. This report showed that the coherence value among sensorimotor-related cortices in the dominant hemisphere increased only while watching a video of oneself using the prosthesis with TLS support and vice versa. This result is no more than a showcase, but we are preparing for a future clinical study evaluating the effect of the "Tacit Learning System" prosthesis on CNS based on the evidence of this basic study.

A limitation of this study is that we tested this system on limited tasks and thus, it is still in the prototype phase currently. We tried several learning motions and determined that various motions could progress the learning in a similar way to that shown in the results. This robustness is justified with the experiments of motion generalizations by the drawer opening task. In cases where less extreme motions were used in the training sessions, the learning speed was slow, and it took many trials to learn the appropriate behaviors. For this study, we choose a simple relocation task to control the learning environment for all participants and to compare the differences in the learning process. The users did not try the system in real life tasks like cooking, housework etc. However, results of additional tasks performed by participant 8 in an additional experiment suggest robustness of the system in different situations. Short battery life is also a concern. The system continuously senses upper limb motions and tries to adjust prosthesis positions at all times, so battery drainage is three to five times greater than the conventional systems. Setting the threshold adjustment may be a solution. Higher threshold to a TLS support may increase battery life but may result in reduced support, which needs to be considered according to the users' lifestyle.

In summary, we introduced a novel "Tacit Learning System," a self-regulatory strategy in a myoelectric prosthesis, to control wrist rotation and confirmed its efficacy in conventional type myoelectric prosthesis users. We infer that TLS showed the ability to recover the lost function by adjusting compensatory overreaction generated by residual function. Theoretically, it can be used for recovering functions in other situations such as lower limb amputation, palsy in association with functional electric stimulation, or even in ventilation failure if residual function is present.

#### HUMAN AND ANIMAL RIGHTS AND INFORMED CONSENT

This study was approved by the Ethics Committee of the Nagoya University, School of Medicine (Approval

#### REFERENCES


number 2012-0145). All participants gave informed consent before enrolling in the study, and all procedures were performed in accordance with ethical standards of the responsible committee on human experimentation and with the Helsinki Declaration of 1975, revised in 2000 and 2008.

#### AUTHOR CONTRIBUTIONS

Conceived and designed the experiment: SO, SS, KI, and HH. Performed and analyzed the data for experiment: SO, SS, FA, and KI. "Tacit Learning System" design, settings, and construction: SS and FA. Participant registrations: KI and HT. Wrote the paper: SO, SS, KI, MH, and HH.

#### SUPPLEMENTARY MATERIAL

The Supplementary Material for this article can be found online at: http://journal.frontiersin.org/article/10.3389/fnbot. 2016.00019/full#supplementary-material

prostheses. Expert Rev. Med. Devices 4, 43–53. doi: 10.1586/17434440. 4.1.43


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2016 Oyama, Shimoda, Alnajjar, Iwatsuki, Hoshiyama, Tanaka and Hirata. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Obstacle Avoidance and Target Acquisition for Robot Navigation Using a Mixed Signal Analog/Digital Neuromorphic Processing System

Moritz B. Milde<sup>1</sup> , Hermann Blum1 †, Alexander Dietmüller 1 †, Dora Sumislawska<sup>1</sup> , Jörg Conradt <sup>2</sup> , Giacomo Indiveri <sup>1</sup> and Yulia Sandamirskaya<sup>1</sup> \*

1 Institute of Neuroinformatics, University of Zurich and ETH Zurich, Zurich, Switzerland, <sup>2</sup> Neuroscientific System Theory, Department of Electrical and Computer Engineering, Technical University of Munich, Munich, Germany

Neuromorphic hardware emulates dynamics of biological neural networks in electronic circuits offering an alternative to the von Neumann computing architecture that is low-power, inherently parallel, and event-driven. This hardware allows to implement neural-network based robotic controllers in an energy-efficient way with low latency, but requires solving the problem of device variability, characteristic for analog electronic circuits. In this work, we interfaced a mixed-signal analog-digital neuromorphic processor ROLLS to a neuromorphic dynamic vision sensor (DVS) mounted on a robotic vehicle and developed an autonomous neuromorphic agent that is able to perform neurally inspired obstacle-avoidance and target acquisition. We developed a neural network architecture that can cope with device variability and verified its robustness in different environmental situations, e.g., moving obstacles, moving target, clutter, and poor light conditions. We demonstrate how this network, combined with the properties of the DVS, allows the robot to avoid obstacles using a simple biologically-inspired dynamics. We also show how a Dynamic Neural Field for target acquisition can be implemented in spiking neuromorphic hardware. This work demonstrates an implementation of working obstacle avoidance and target acquisition using mixed signal analog/digital neuromorphic hardware.

Keywords: neuromorphic controller, obstacle avoidance, target acquisition, neurorobotics, dynamic vision sensor, dynamic neural fields

# 1. INTRODUCTION

Collision avoidance is one of the most basic tasks in mobile robotics that ensures safety of the robotic platform, as well as the objects and users around it. Biological neural processing systems, including relatively small ones such as those of insects, are impressive in their ability to avoid obstacles robustly at high speeds in complex dynamical environments. Relatively simple neuronal architectures have already been proposed to implement robust obstacle avoidance (e.g., Blanchard et al., 2000; Iida, 2001; Rind and Santer, 2004), while probably the most simple conceptual formulation of a neuronal controller for obstacle avoidance is the famous Braitenberg vehicle (Braitenberg, 1986). When such neuronal control architectures are implemented on a conventional computer, analog sensor signals are converted and stored in digital variables. A large number of numerical computations are performed then, which are required to model the involved neuronal dynamics in software.

#### Edited by:

Christian Tetzlaff, Max Planck Institute for Dynamics and Self Organization (MPG), Germany

#### Reviewed by:

Tomas Kulvicius, University of Göttingen, Germany Bernd Porr, University of Glasgow, United Kingdom

#### \*Correspondence:

Yulia Sandamirskaya ysandamirskaya@ini.uzh.ch

† These authors have contributed equally to this work.

Received: 29 November 2016 Accepted: 22 May 2017 Published: 11 July 2017

#### Citation:

Milde MB, Blum H, Dietmüller A, Sumislawska D, Conradt J, Indiveri G and Sandamirskaya Y (2017) Obstacle Avoidance and Target Acquisition for Robot Navigation Using a Mixed Signal Analog/Digital Neuromorphic Processing System. Front. Neurorobot. 11:28. doi: 10.3389/fnbot.2017.00028

Neuromorphic hardware offers a physical computational substrate for directly emulating such neuronal architectures in real time (Indiveri et al., 2009; Furber et al., 2012; Benjamin et al., 2014; Chicca et al., 2014), enabling low latency and massively parallel, event-based computation. Neuromorphic electronic circuits can implement dynamics of neurons and synapses using digital (Furber et al., 2012) or analog (Benjamin et al., 2014; Qiao et al., 2015) designs and allow for arbitrary connectivity between artificial neurons. The analog implementations of artificial neural networks are particularly promising, due to their potential smaller size and lower power consumption figures than digital systems (for a review see Indiveri et al., 2011; Hasler and Marr, 2013). But these features come at a price of precision and reliability. Indeed, with analog designs, the device mismatch effects (i.e., variation in properties of artificial neurons across the device) have to be taken into account for the development of robust functional architectures (Neftci et al., 2011).

A promising strategy for taking these issues into account is to implement the mechanisms used in biological neural networks, which face the same problem of using an unreliable computing substrate that consists of noisy neurons and synapses driven by stochastic biological and diffusion processes. These biological mechanisms include adaptation and learning, but also using population coding (Ermentrout, 1998; Pouget et al., 2000; Averbeck et al., 2006) and recurrent connections (Wilson and Cowan, 1973; Douglas et al., 1995) to stabilize behaviorally relevant decisions and states against neuronal and sensory noise. In this work, we show that by using the population-coding strategy in a mixed signal analog/digital neuromorphic hardware, it is possible to cope with the variability of its analog circuits and to produce reliably the desired behavior on a robot.

We present a first proof of concept implementation of such a neuromorphic approach to robot navigation. Specifically, we demonstrate a reactive vision-based obstacle avoidance strategy using a neurally-inspired event-based Dynamic Vision Sensor (DVS) (Lichtsteiner et al., 2006) and a Reconfigurable On-Line Learning (ROLLS) neuromorphic processor (Qiao et al., 2015). The proposed architecture is event-driven and uses the neural populations on the ROLLS device to determine the steering direction and speed of the robot based on the events produced by the DVS. In the development phase, we use a miniature computer Parallella<sup>1</sup> solely to manage the traffic of events (spikes) between the neuromorphic devices, and to store and visualize data from the experiments. The Parallella board can be removed from the behavioral loop in target applications, leading to a purely neuromorphic implementation. In this paper, we demonstrate the robustness and limits of our system in a number of experiments with the small robotic vehicle "Pushbot<sup>2</sup> " in a robotic arena, as well as in an unstructured office environment.

Several neuromorphic controllers for robots were developed in the recent years, e.g., a SpiNNacker system (Furber et al., 2012) was used to learn sensory-motor associations with robots (Conradt et al., 2015; Stewart et al., 2016), a neuralarray integrated circuit was used to plan routes in a known environment (Koziol et al., 2014), three populations of analog low-power subthreshold VLSI integrate-and-fire neurons were employed to control a robotic arm (Perez-Peña et al., 2013). Our system goes along similar lines and realizes a reactive robot navigation controller that uses a mixed signal analog/digital approach, and exploits the features of the ROLLS neuromorphic processor.

In this work we follow a dynamical systems—attractor dynamics—approach to robot navigation (Bicho et al., 2000), which formalizes one of the famous Braitenberg vehicles (Braitenberg, 1986). The neuronal architecture in our work is realized using a number of neuronal populations on the neuromorphic device ROLLS. The dynamical properties of neuronal populations and their interconnectivity allow to process a large amount of sensory signals in parallel, detecting the most salient signals and stabilizing these detection decisions in order to generate robustly closed-loop behavior in real-world unstructured and noisy environments (Sandamirskaya, 2013; Indiveri and Liu, 2015). Here, we demonstrate the feasibility of deployment of a neuromorphic processor for the closed loop reactive control. We found several limitations of the simple Braitenberg-vehicle approach and suggest extensions of the simple architecture that solve these problems, leading to robust obstacle avoidance and target acquisition in our robotic setup.

#### 2. MATERIALS AND METHODS

The experimental setup used in this work consists of the Pushbot robotic vehicle with an embedded DVS camera (eDVS) and the ROLLS neuromorphic processor. A miniature computing board Parallella is used to direct the flow of events between the robot and the ROLLS. **Figure 1A** shows the components of our hardware setup, while **Figure 1B** shows the information flow between different hardware components.

The Pushbot communicates with the Parallella board via a wireless interface for receiving motor commands and for sending address-events produced by the DVS. Using a dedicated WiFi network, we achieve communication latency below 10 ms, which was enough to demonstrate functionality of our system at speeds, possible with the Pushbot.

The ROLLS device is interfaced to the Parallella board using an embedded FPGA, which is used to configure the neural network connectivity on the chip and to direct stimulating events to neurons and synapses in real time. The Parallella board runs a simple program that manages the stream of events between the neuromorphic processor and the robot.

#### 2.1. The ROLLS Neuromorphic Processor

The ROLLS is a mixed signal analog/digital neuromorphic chip (Qiao et al., 2015) that comprises 256 spiking silicon neurons, implemented using analog electronic circuits which can express biologically plausible neural dynamics. The neurons can be configured to be fully connected with three sets of synaptic connections: an array of 256 × 256 non-plastic ("programmable") synapses, 256 plastic ("learnable") synapses that realize a variant of the Spike-Timing-Dependent Plasticity (STDP) rule (Mitra et al., 2009), and 4 additional ("virtual")

<sup>1</sup>https://www.parallella.org

<sup>2</sup>http://inilabs.com/products/pushbot

synapses that can be used to receive external inputs. In this work, only the programmable synapses were used for setting up the neuronal control architecture, as no online-learning was employed for the navigation task.

**Figure 2** shows a block diagram of the ROLLS device, in which 256 spiking neurons, implemented using analog electronic circuits (Indiveri et al., 2006), are shown as triangles on the right, and 256 × 256 non-plastic ("programmable") synapses, which can be used to create a neuronal architecture on the ROLLS, as well as 256 "virtual" synapses used to stimulate neurons externally, are shown as white squares. A digital Address Event Representation (AER) circuitry allows to stimulate neurons and synapses on the chip, as well as to read-out spike events off chip; a temperature-compensated digital bias-generator allows to control parameters of analog electronic neurons and synapses, such as the refractory period or membrane time constant.

The programmable synapses share a set of biases that determine their weight values, their activation threshold, and time constants. These three parameters determine the synaptic strength and dynamics of the respective connection between two neurons. A structural limitation of the hardware is that each synapse can only assume one of eight possible weight values (four excitatory and four inhibitory values). This means that in a neuronal architecture, several different populations might have to share weights, which limits the complexity of the architecture. ROLLS consumes ∼4 mW of power in typical experiments, run here. The ROLLS parameters (biases) used in this work are listed in the Appendix (Supplementary Material, Appendix A).

#### 2.2. The DVS Camera

The Dynamic Vision Sensor (DVS) is an event-based camera, inspired by the mammalian retina (Lichtsteiner et al., 2006; Liu and Delbruck, 2010). **Figure 3** shows a typical output of the DVS camera accumulated over 0.5 s (right) from the Pushbot robot driving in the office (left).

Each pixel of the DVS is sensitive to a relative temporal contrast change. If such change is detected, each pixel sends out an event at the time in which the change was detected (asynchronous real-time operation). Each event e is a vector: e = (x, y, ts, p), where x and y define the pixel location in retinal reference frame, ts is the time stamp, and p is the polarity of the event. The event polarity encodes whether the luminance of the pixel increased (an "on" event) or decreased (an "off " event). All pixels share a common transmission bus, which uses the Address Event Representation (AER) protocol to transmit the address-events off chip.

The AER representation and asynchronous nature of communication makes this sensor low power, low latency, and low-bandwidth, as the amount of data transmitted is very small (typically, a very small subset of pixels produce events). Indeed, if there is no change in the visual scene, no information is transmitted off the camera. If a change is detected, it is communicated instantaneously, taking only a few microseconds to transfer the data off-chip.

For the obstacle avoidance scenario, important properties of the DVS are its low data rate, high dynamic range, and small sensitivity to lighting conditions (Lichtsteiner et al., 2006). The challenges are noise, inherent in the sensor, its inability to detect homogeneous surfaces, and relatively small spatial resolution (128 × 128 pixels), as well as a limited field of view (60◦ ). New versions of DVS are currently available, which would further improve performance of the system. Moreover, more sophisticated object-detection algorithms for DVS are currently being developed (Moeys et al., 2016).

The embedded version of the DVS (eDVS) camera (Müller and Conradt, 2011) used in this work uses an ARM Cortex microcontroller to initializes the DVS, capture events, send them to the wireless network, and to receive and process commands for motor control of the Pushbot.

### 2.3. Neuromorphic Robot

The robot used in this work is the mobile autonomous platform Pushbot, which consists of a 10 × 10 cm chassis with two motors driving two independent tracks for propulsion (left and right). The predominant component on the small robot is an eDVS (Section 2.2), which acquires and provides sensory information and controls actuator output, including the robot's motors, through its embedded microcontroller. The sensor's integrated 9 DOF IMU reports changes of velocity and orientation. The robot actuators include a buzzer, two parallel, horizontal forward laser pointers and an LED on top, which all can show arbitrary activation patterns. The Pushbot is powered by 4 AA-batteries, which ensure ∼2 h operation time.

The robot communicates through WLAN at up to 12 Mbps, which allows remote reading of sensory data (including events from the eDVS) and setting velocities with a latency < 10 ms.

Address-Event Representation, used to communicate spikes (it consists of an index of the spike-emitting neuron).

The Pushbot robot is too small to carry the current experimental hardware setup. In principle, however, it is possible to place the ROLLS chip directly on a robot, removing the WiFi latency.

### 2.4. Spiking Neural Network Architecture

The core of the system presented here is a simple neural network architecture that is realized in the ROLLS device and allows the robot to avoid obstacles and approach a simple target. The "connectionist" scheme of the obstacle avoidance part of the architecture is shown in **Figure 4A**, while the scheme of the target acquisition architecture is shown in **Figure 4B**.

For obstacle avoidance, we configured two neuronal populations of 16 neurons each to represent a sensed obstacle to the right ("obstacle right," or OR) and to the left ("obstacle left," or OL) from the robot's heading direction. Each neuron in the OL and OR populations receives a spike for each DVS pixel that produces an event in the left (right) part of the sensor, respectively (we used the lower half of the sensor for obstacle avoidance). The spiking neurons in the two obstacle populations sum up the camera events according to their neuronal integrateand-fire dynamics (equations can be found in Appendix B (Supplementary Material)). If enough events arrive from the same neighborhood, the respective neuron will fire, otherwise it will ignore events that are caused by the sensor noise. Thus, the obstacle representing neuronal populations achieve basic filtering of the DVS events. The output spikes of the neuronal populations signal the detection of an object in the respective half of the field of view.

Each of the obstacle detecting neuronal populations is connected to a motor population "drive left, DL" or "drive right, DR" (with 16 neurons per population). Consequently, if an obstacle is detected on the right, the drive left population is stimulated, and vice versa. The drive populations inhibit each other, implementing a winner-take-all dynamics. Thus, a

decision about the direction of an obstacle-avoiding movement is taken and stabilized at this stage by the dynamics of neuronal populations on the chip.

The drive populations, in their turn, inhibit both obstacle detecting populations, since during a turning movement of the robot, many more events are generated by the DVS, compared to those generated during translational motion. This inhibition compensates for this expected increase in the input rate, similar to the motor re-afferent signals in biological neural systems (Dean et al., 2009). This modification of the simple Braitenberg vehicle principle is required to enable robust and fast behavior.

The speed of the robot is controlled by a neuronal population, "speed, sp," which receives input from a constantly firing "exc," excitatory population. The latter group of neurons has strong recurrent connections and continually fires when triggered by a transient activity pulse. In an obstacle-free environment, the speed population sets a constant speed for the robot. The obstacle detecting populations OL and OR inhibit the speed population, making the robot slow down if obstacles are present. The decreasing speed ensures a collision-free avoidance maneuver.

These six populations comprise only 96 neurons, and represent all that is needed to implement the obstacle avoidance dynamics in this architecture (**Figure 4A**).

The control signals sent to the robot are, first, the angular velocity, va, that is proportional to the difference in the number of spikes per neuron emitted between the two drive populations (Equation 1), and, second, the forward velocity, calculated based on the number of spikes per neuron emitted by the speed population (Equation 2):

$$\nu\_a = c\_{turn} \left( \frac{N\_{DL}^{spike}}{N\_{DL}^n} - \frac{N\_{DR}^{spike}}{N\_{DR}^n} \right), \tag{1}$$

$$\nu\_f = c\_{speed} \frac{N\_{sp}^{spike}}{N\_{sp}^n},\tag{2}$$

where N spike XX are the numbers of spikes, obtained from the respective populations [drive left (DL), drive right (DR), and speed (sp)] in a fixed time-window, we used 500 and 50 ms in an improved version); N n XX is the number of neurons in the respective population; and cturn and cspeed are turn- and speed-factors (user-defined constants), respectively.

Thus, we used neural population dynamics to represent angular and translational velocities of the robot and used the firing rate of the respective populations of neurons as the control variable.

2.4.1. Dynamic Neural Field for Target Representation To represent targets of the navigation dynamics, we use a Dynamic Neural Fields (DNFs) architecture as defined in Bicho et al. (2000). DNFs are population-based models of dynamics of large homogeneous neuronal populations, which have been successfully used in modeling elementary cognitive function in humans (Schöner and Spencer, 2015), as well as in implementing cognitive representations for robots (Erlhagen and Bicho, 2006; Bicho et al., 2011; Sandamirskaya et al., 2013). DNFs can be easily realized in neuromorphic hardware by setting a winnertake-all (WTA) connectivity network in a neural population (Sandamirskaya, 2013). Each neuron in a soft WTA network has a positive recurrent connection to itself and to its 2–4

nearest neighbors, implementing the lateral excitation of the DNF interaction kernel. Furthermore, all neurons have inhibitory connections to the rest of the WTA network, implementing the global inhibition of a DNF. These inhibitory connections can be either direct, as used here, or be relayed through an inhibitory population, which is a more biologically plausible structure.

In our architecture, we select 128 neurons on the ROLLS chip to represent visually perceived targets. Each neuron in this population receives events from the upper half of each column of the 128 × 128 sensor frame from the eDVS and integrates these events according to its neuronal dynamics: only events that consistently are emitted from the same column lead to firing of the neuron. The nearby neurons support each other's activation, while inhibiting further neurons in the WTA population (**Figure 4B**).

This connectivity stabilizes localized blobs of most salient sensory events, filtering out sensor noise and objects that are too large (inhibition starts to play role within object representation) or too small (not enough lateral excitation is engaged). Thus, the WTA connectivity stabilizes the target representation. The target in our experiments was a blinking LED of the second robot, which was detected in the DNF realized on the ROLLS. While this target could be easily detected since the blinking LED produces many events, more sophisticated vision algorithms are being developed to pursue an arbitrary target (Moeys et al., 2016).

The target population was divided in three regions: neurons of the DNF that receive inputs to the left from midline of the DVS frame drive the "drive left" population, whereas neurons that receive input from the right half of the DVS frame drive the "drive right" population. We did not connect the central 16 neurons of the target DNF to the drive populations to ensure more smooth target pursue when the target is in the center of the DVS frame (**Figure 4B**).

#### 2.4.2. Combining Obstacle Avoidance and Target Acquisition

The two neuronal populations that ultimately determine the robot's steering direction (DR and DL) sum-up contributions from the obstacle-representing populations and the targetrepresenting WTA population (**Figure 4**). The obstacle contribution is made effectively stronger than the target contribution by setting the ROLLS biases accordingly. Thus, in the presence of an obstacle in the robot's field of view, an obstacle avoidance maneuver is preferred.

**Figure 5** shows the connectivity matrix used to configure the non-plastic connections on the ROLLS chip to realize both obstacle avoidance and target acquisition. This plot shows the weights of non-plastic synapses on the ROLLS chip (blue being the negative weights and red the positive weights; the same color code is used for the different weights as in **Figure 4**), which connect groups of neurons (different populations, labeled on the right side of the figure) among each other. Withingroup connections are marked with black squared frames on the diagonal of the connectivity matrix. Violet and orange arrows show inputs and outputs of the architecture, respectively.

This connectivity matrix is sent to the ROLLS device to configure the neuronal architecture on the chip, i.e., to "program" the device.

# 3. DEMONSTRATIONS

We verified the performance of our system in a number of demonstrations, reported next. Overall, over 100 runs were performed with different parameter settings. In the following, we will provide an overview for the experiments and describe a few of them in greater detail to provide intuition of how the neural architecture works. For most experiments, we let the robot drive in a robotic arena with a white background and salient obstacles. We used a tape with a contrastive texture to make the walls of the arena visible to the robot. In four runs, we let the robot drive for several minutes freely in the office.

# 3.1. Probing the Obstacle Avoidance: A Single Static Obstacle

In the first set of experiments, we let the robot drive straight toward a single object (a colored block 2.5 cm wide and 10 cm high) and measured the distance from the object at which the robot crossed a virtual line perpendicular to the robot's initial heading direction, on which the object is located (e.g., see the distance between the robot and the "cup" object at the last position of the robot in **Figure 6**). We varied the speed factor of the architecture from 0.1 (∼0.07 m/s) to 3.0 (∼1 m/s) and have verified the effectiveness of the obstacle avoidance maneuver. Furthermore, we have increased the turning factor from 0.5 to 1.0 to improve performance at high speeds and have tested colordependence of the obstacle perception with the DVS. **Table 1** shows results of these measurements. Each trial was repeated 3 times and mean over the trials was calculated.

The table allows to note the following characteristics of the architecture at the chosen parametrization. First, the performance drops at very low speeds (speed factor 0.1), especially for red and yellow objects, due to an insufficient number of DVS events to drive the neuronal populations on ROLLS. Second, there is a trade-off between this effect and the expected decay in performance (in terms of the decreasing distance to the obstacle) with increasing speed. Thus, at a turning factor 0.5, best performance is achieved for the blue object at speed factor 0.5 and for the red object at speed factor 1. Distance to the obstacle can be further increased by increasing the turn factor. Thus, at turn factor 1 and speed factor 1 best performance (i.e., largest distance to the obstacle) can be achieved for both the blue and red objects. Yellow object provides too little contrast to be reliably perceived by the DVS in our set-up.

**Figure 6** demonstrates how the neuronal architecture on the ROLLS chip realizes obstacle avoidance with the Pushbot. On the left, an overlay of video frames (recording the top view of the arena) shows the robot's trajectory when avoiding a single obstacle (here, a cup) in one of the runs. Numbers (1–3) mark important moments in time during the turning movement. On the right, summed activity of the neuronal populations on the ROLLS device is shown over time. The same moments in time

are marked with numbers as in the left figure. In this case, already the obstacle detecting populations had a clear "winner"—the left population forms an increasing activity bump over time, which drives the "drive right" population, inducing a right turn of the robot. The bottom plot shows the commands that are sent to the robot (speed and angular velocity): the robot slows down in front of the obstacle and turns to the right.

We have performed several further trials, varying the lighting conditions (normal, dark, very dark) and parameters of the architecture. Since the architecture uses the difference in spiking activity, induced by sensory events from the two halves of the visual space, avoiding a single obstacle works robustly, although the camera might miss objects with a low contrast (e.g., yellow block in our white arena). More advanced noise filtering would improve performance. While more extended version of the performed tests will be reported elsewhere, **Figure 7** show results of some of the successful and unsuccessful runs.

## 3.2. Avoiding a Pair of Obstacles

We repeated the controlled obstacle avoidance experiment with two and three blocks in different positions. Each configuration was tested twenty times without crashes at speed 0.35 m/s (speed factor 0.5).

**Figure 8** shows an exemplary run that explains how the robot avoids a pair of obstacles. This example is important, since in the attractor dynamics approach to navigation, distance between the two objects determines a decision to move around or between the objects.

Snapshots from the overhead camera are shown on the left of **Figure 8**. Output of the DVS, accumulated in 500 ms time windows around the time when the snapshots were taken<sup>3</sup> , is shown in the second column, and the spiking activity of neuronal populations recorded from the ROLLS chip is shown in the two right-most columns. Activity is shown of the obstacle representing left (red) and right (blue) neuronal populations (third column), the left (red) and right (blue) drive populations, and the speed population (gray, forth column). Each of these populations has 16 neurons, dots represent their spikes<sup>4</sup> .

At the moment, depicted in the top row of **Figure 8**, the robot senses an obstacle on the right, although the DVS output is rather weak. Note that the neuronal population filters out sensory noise of the DVS and only detects events that cluster in time and in space. The robot turns left, driven by the activated drive left population and now the obstacle on the right becomes visible, providing a strong signal to the right obstacle population and, consequently, to the drive left population (second and third row). Eventually, the obstacle on the right dominates and the robot drives past both obstacles on the left side (forth row).

Thus, with the chosen parametrization of the neuronal network architecture, the robot tends to go around a pair of objects, avoiding the space between them. This behavior could be changed, making the connections between the obstacle representing populations and drive populations stronger. However, for a robot equipped with a DVS, such strategy is

<sup>3</sup>We dropped 80% of DVS events randomly in our architecture; moreover, we only used 5% of all remaining events for plotting.

<sup>4</sup>Only 5% of the ROLLS spikes (every 20th spike) are shown in all our plots.

FIGURE 6 | An example of an obstacle avoidance maneuver. Left: Overlay of video frames showing the trajectory of the robot. Right: activity of the neuronal populations on the chip (Top: the left and right obstacle detecting populations; Middle: the left and right drive populations), and the motor commands, sent to the robot (Bottom plot).

TABLE 1 | Collision avoidance at different speeds: distance to the obstacle when crossing the obstacle-line (mean over 3 trials ± standard deviation in [cm]) at different speed- and turn-factors and for different colors of the obstacle.


\* signifies trials when a collision happened.

safer, since for homogeneous objects, the DVS can only sense the edges, where a temporal contrast change can be induced by the robot's motion. The robot thus might miss the central part of an object and avoiding pairs of close objects is a safer strategy. Adaptive connectivity that depends on the robot speed is also feasible.

# 3.3. Avoiding a Moving Obstacle

In these experiment, the robot is driving straight in the arena while we move an obstacle (a coffee mug) into its path. We repeat this experiment six times with varying speed factors (0.1–2) of the robot. The robot was capable to avoid collisions in all tested cases. In fact, avoiding a moving obstacle is more robust than avoiding a static obstacle because the moving obstacle produces more DVS events than a static one at the same robot speed.

**Figure 9** shows how the robot avoids a moving obstacle. The same arrangement of plots was used as in **Figure 8**, described in Section 3.2. The robot was moving with cspeed = 0.5 (0.35 m/s) here, the cup was moved at ∼0.20 m/s.

# 3.4. Cluttered Environment

In the following set of experiments, we randomly placed obstacles (8–12 wooden pieces) in the arena and let the robot drive around at an average speed (0.35 m/s). We analyzed the performance of the architecture, suggesting a number of modifications to cope with its limitations.

**Figure 10** demonstrates behavior of the obstacle avoidance system in a cluttered environment. In particular, we let the robot drive in an arena, in which 8 obstacles were randomly distributed. The robot successfully avoids obstacles in its way with two exceptions: the robot touches the blue obstacle in the center of the arena, which entered the field of view too late for a maneuver, and also collides with the yellow object, which did not provide enough contrast to produce the required number of DVS events. These collisions point to two limitations of the current setup, which, first, uses single camera with a narrow field of view and, second, drops 80% of events to improve signal to noise ratio (the latter deprives performance for objects with low contrast against the background). Using more sophisticated noise filter would improve visibility of the faint obstacles. Note that we used rather small objects on these trials (blocks of 2 × 5 cm), which posed a challenge for the event-based detection, especially taking into account our very simplistic noise-reduction strategy.

To improve behavior in a cluttered environment, we modified the architecture, adding two more populations on the ROLLS chip, which receive input from the inertia measurement unit of the Pushbot and which suppress obstacle populations when the robot is turning. Moreover, we replaced the homogeneous connections between the obstacle and the drive populations with graded connections that become stronger for obstacles detected in the center than in the periphery of DVS field of view. This allows the robot to make shorter avoidance maneuvers and avoid obstacles in a denser configuration at a higher speed. **Figure 11**

shows a successful run with the modified architecture. Here, we also changed the sampling mechanism used to calculate the robot commands, replacing a fixed time window with a running average. This allowed us to avoid obstacles in the cluttered environment without collisions at speed as high as 0.5 m/s.

# 3.5. Variability of Behavior

Since behavior of our robot is controlled by activity of neuronal populations, implemented in analog neuromorphic hardware, the behavior of the robot has some variability, even when exactly the same parameters of the architecture and the same hardware biases are used. Despite this variability, the robot's goal—avoiding obstacles—remains fulfilled. Such variability of behavior can be used as a drive for exploration, which may be exploited in learning scenarios in more complex architectures, built on top of our elementary obstacle avoidance system.

**Figure 12** demonstrates variability behavior of our neuronal controller. In the figure, we show three trials, in which the robot avoids a two-blocks configuration, starting from exactly the same position and with the same configuration of the neuronal controller (speed factor 0.5, turn factor 0.5). Mismatch in the neuronal populations implemented in analog neuromorphic hardware, variability of the DVS output, and its dependence on the robot's movements lead to strong differences in trajectories. In particular, in the case shown in **Figure 12**, the trajectories may bifurcate and the robot might avoid the two obstacles on the right, or on the left side.

# 3.6. Obstacle-Avoidance in a Real-World Environment

Finally, we tried our architecture outside of the arena as well. The robot was placed on the floor in the office and drove around avoiding both furniture and people. The high amount of background activity compared to the arena did not diminish the effectiveness of the architecture: in four 0.5–1.5-min long trials, the robot only crashes once after it maneuvered itself into a dark corner under a table where the DVS sensor could not provide sufficient information to recognize obstacles.

**Figure 13** shows an example of the Pushbot robot driving in the office environment. On the left, three snapshots from the video camera recording the driving robot are shown (full videos can be see in the Supplementary Material). The snapshots show the robot navigating the office environment with its task being to avoid collisions. The middle column of plots shows pairs of eDVS events, accumulated over 500 ms around the moment in time in the corresponding snapshot on the left, and respective histograms of events from the center region, used for obstacle avoidance. Events above the mid-line of the eDVS field-of-view are shown with transparency to emphasize that they were not used for obstacle avoidance: only events from the region of the eDVS field-of-view between the two vertical lines in **Figure 13** were used.

Histograms below the eDVS plots show the events from this region of the field of view, summed over the eDVS columns. These events drive the obstacle left (red colored part of the

FIGURE 8 | Avoiding a pair of obstacles. First column: Snapshots of four moments in time during avoidance of a cup, moved into the robot's trajectory. Second column: DVS "frames"—events, accumulated over a 0.5 s time window. Green dots are off events, blue dots are on events. Events in the upper part of the frame were not considered for the obstacle avoidance. Third column: Activity of the obstacle representing populations in 0.6–1.5 s before the camera snapshot in the first column was taken (red—left population (nOL), blue – right population (nOR); each population has 16 neurons). Forth column: Activity of the drive left (red), drive right (blue), and speed population on the ROLLS chip in the same time as on the plots in column 3.

histogram) and obstacle right (blue part of the histogram) neuronal populations on the ROLLS chip.

The right column shows activity of the neuronal populations on the ROLLS chip over time, as in the previous figures. Black vertical lines mark time moments that correspond to the three snapshots in the left column. These plots allow to see that although the left and right obstacle populations are often activated concurrently, only one of the drive populations (either left or right) is active at any moment, leading to a clear decision to turn in either direction in the presence of perceived obstacles. The speed plot shows that movement of the robot is not very smooth—it slows down and accelerates often based on the sensed presence of obstacles. This behavior is improved in the modified architecture, briefly described in Section 3.4.

When driving around the office, robot faced very different lighting conditions, as can be seen already in the three snapshots presented here. This variation in lighting conditions did not effect obstacle avoidance in most cases, since the DVS is sensitive to relative change of each pixel's intensity, which varies less than the absolute intensity when the amount of ambient light changes. However, in an extreme case, shown in the lower snapshot in **Figure 13**, the robot collided with the metal foot of the chair. This was the only collision recorded.

# 3.7. Target Acquisition

In addition to obstacle avoidance we also tested target acquisition in ten experiments using a second robot with a blinking LED as target. The robot successfully turns and drives toward the target every time (at speed and turn factors = 0.5). In 8 out of 10 experiments the target is recognized as an obstacle when approached and is avoided; in two experiments, the robot failed to recognize target as obstacle after approaching it.

Obviously, the simple visual preprocessing that we used did not allow us to distinguish the target from obstacles (other than through their position in the upper or lower part of the field-of-view of the DVS). Moreover, we would need an object detection algorithm to detect the target and segregate it from the background. This vision processing is outside the scope of our work, but there is a multitude of studies going in this direction (Moeys et al., 2016) using modern deep/convolutional neural networks learning techniques.

Right: Summed activation of neurons in populations on the ROLLS chip over the time of the experiment. Obstacle and turn (left and right) population are shown, as well as the commands sent to the robot (angular velocity and speed).

**Figure 14** shows target acquisition for a static target and demonstrates that the robot can approach the target object. At a short distance, the obstacle component takes over and the robots turns away after approaching the target. The figure shows the overlayed snapshots from the overhead camera, showing how the robot turns toward the second robot, standing on the left side of the image. When getting close to the second robot (∼10 cm), the robot perceives the target as an obstacle, which

has a stronger contribution to its movement dynamics and the robot turns away. On the left, the spiking activity of the target representation on the ROLLS chip is shown (raster plot where each dot represents a spike<sup>5</sup> ). We can see that the robot perceives its target consistently on the left. After the eighth second, the obstacle contribution on the right becomes dominant and the robot turns left strongly.

**Figure 15** shows how the robot can chase a moving target. We have controlled the second Pushbot remotely and have turned its LED on (at 200 Hz, 75% on-time). The LED provided a rather strong (though spatially very small) input to the DVS of the second, autonomously navigating robot. This input was integrated by our target WTA (DNF) population, which, however, also received a large amount of input from the background (in the upper part of the field of view the robot could see behind the arena's walls). Input from the localized LED was stronger and more concise than more distributed input from the background and such localized input was enhanced by the DNF's (WTA's) lateral connections. Consequently, the respective location in the target WTA formed a "winner" (localized activity bump in the DNF terminology) and inhibited the interfering inputs from other locations.

In the figure, four snapshots of the video recording the two robots are shown (top row). The leading robot was covered with white paper to reduce interference from the obstacle avoidance dynamics as the robots get close to each other (the space in the arena and the small size of the blinking LED forced us to put the robots rather close to each other, so that the target robot could be occasionally perceived as an obstacle).

In the second row in **Figure 15**, the summed over 500 ms events of the DVS are shown, around the same time points as the snapshots. Only the upper part of the field-of-view was considered for target acquisition. This part is very noisy, since the robot "sees" outside the arena and perceives objects in the background, which made target acquisition very challenging. Still, the blinking LED provided the strongest input and in most cases the target DNF was able to select its input as the target and suppress the competing inputs from the background—see activity of neurons in the target DNF in the bottom plot.

This last plot shows spiking activity of 215 neurons of the ROLLS chip, used to drive the robot (we don't show the constantly firing nexc population here). We can see that the target DNF (WTA) successfully selects the correct target in most cases, only loosing it from sight twice, as the robot receives particularly many DVS events from the background during turning. The lower part of this raster plot shows activity of the obstacle populations, the drive populations, and the speed population, thus the dynamics of the whole architecture can be seen here.

# 4. DISCUSSION

This paper presents a neuronal architecture for reactive obstacle avoidance and target acquisition, implemented using a mixedsignal analog/digital neuromorphic processor (Qiao et al., 2015) and a silicon retina camera DVS as the only source of information about the environment. We have demonstrated that the robot, controlled by interconnected populations of artificial spiking neurons, is capable of avoiding multiple objects (including moving objects) at an average movement speed (up to 0.35 m/s with our proof of concept setup). We have also demonstrated that the system works in a real-world office environment, where background clutter poses a challenge for the DVS on a moving vehicle, creating many distracting events. We demonstrated that also the target acquisition neural architecture can cope well with this challenge, which was relevant even in the robotic arena. The distributed DNF representation of the target, supported by lateral interactions of the WTA neuronal population, enabled robust detection and reliable selection of the target against background.

The reactive approach to obstacle avoidance that we adopt in this work has a long history of success, starting with the neurally inspired turtle robot more than half a century ago, as reviewed by Holland (1997). Later, Valentino Braitenberg analyzed a number of hypothetical vehicles, or creatures, that use reactive control to produce complex behaviors (Braitenberg, 1986). His controllers were realized as simple "nervous systems" that directly linked the sensors to the motors of the vehicle. Using similar sensorimotor, or behavioral modules as building blocks, Rodney Brooks developed a behavior-based controller paradigm for roaring vehicles, known as "subsumption architecture" (Brooks, 1991). Although this framework did not scale well for complex tasks and is not ideally suited for online learning methods, this type of controller is at the heart of highly successful real-world robotic systems such as the autonomous vacuum cleaners, and has been adopted, to some extent, in a wide range of impressive controllers for autonomous robots (e.g., Khansari-Zadeh and Billard, 2012).

The dynamical systems approach to robot navigation (Schöner et al., 1995) is an attempt to mathematically formalize reactive control for autonomous robots using

<sup>5</sup>Remember, that only 5% (every 20th) of all spikes from the ROLLS processor are shown.

differential equations that specify attractors and repellors for behavioral variables that control the robot's heading direction and speed (Bicho et al., 2000). In this framework, obstacle avoidance has been integrated with target acquisition and successful navigation in an unknown environment has been demonstrated both for vehicles and robotic arms (Reimann et al., 2011). This approach is similar to another successful reactive approach to obstacle avoidance: the potential field approach (e.g., Haddad et al., 1998), in which the target creates a global minimum in a potential that drives the robot, whereas obstacles create elevations in this potential. However, the use of Cartesian space instead of robot-centered velocity space used in

FIGURE 12 | Variability of the robot's behavior. Left: Overlay of video camera frames recording the robot, avoiding a pair of obstacles; top view. Three different trials are recorded and overlayed here (trajectories are shown with green lines 1–3). Right: Velocity commands, received by the robot from the neuronal architecture (angular velocity and speed) for the three trials (from top to bottom).

this potential field approach makes it prone to getting trapped in local minima.

In mixed-signal analog /digital neuromorphic hardware, the neuronal dynamics is taken care of by the physics of analog electronic circuits, avoiding loosing digital computational resources on simulating them. Thus, neuromorphic implementation of simple biologically inspired obstacleavoidance architectures can lead to low-latency (on the order of microseconds) and power-efficient (on the order of milliwatts) solutions, analogous to the ones used by insects. In contrast, more conventional obstacle-avoidance systems require a substantial amount of computing resources to process and store sensory data, detect obstacles, and compute motor commands. Neuromorphic implementation of such low-level processing will allow to use analog sensory signals directly, avoiding their digital representation and storage, while at the same time allowing to build complex neural-network based computing architectures, that could be used for solving cognitive tasks, such as task planning, map building, or object recognition.

We consider the work proposed as a first feasibility study, which still has a number of limitations that we will address in our future work. The main limitation is variability of neuronal behavior because of parameter drift on the analog hardware: the parameters of the hardware neural network change the network properties as the experimental setup conditions (temperature, humidity, etc.) change. This is a serious limitation of the hardware used, which makes in challenging to implement complex architectures that have to balance contributions of different behavioral modules (e.g., controlling turning and forward velocities, or obstacle avoidance and target acquisition). We are currently working on algorithms and methods for automatically re-tuning these parameters in a principled fashion with optimization and machine learning techniques. In addition, we are designing new versions of the neuromorphic hardware with on-board stabilization of the chip parameters, and more resources for simplifying the fine-tuning process of the architectures. However, approach employed here—use of populations of artificial neurons in place of single nodes in the architecture—allowed us to generate behavior with the state of the art analog neuromorphic hardware.

Apart from the hardware limitations, our simple architecture currently allows robust obstacle avoidance at moderate speeds (∼0.35 m/s). Since the robot slows down when an obstacle is detected, movement appears to be "jerky." Although the smoothness of the robot movement could be improved by tuning the coupling strength between the obstacle and drive populations, the best solution would involve improving the visual pre-processing stages. In our setup, the DVS detects local contrast changes and produces different amount of events depending on the objects in the environment, but also modulated by the robot translational and rotational movements. Currently we ignore about 80% of all DVS events to remove both noise and to reduce bandwidth. This very basic strategy improves the signal to noise ratio, because the architecture enhances the spatially and temporally coherent inputs and suppresses the effect of random inputs. However, we plan to study a more principled approach to pre-processing and noise reduction, and to investigate other biologically inspired architectures for obstacle avoidance, for example inspired by the fly's EMD (Elementary Motion Detector) (Hassenstein and Reichardt, 1956) or the locust's LGMD (looming detector Lobula Giant Movement Detector) (Gabbiani et al., 2002; Rind and Santer, 2004). We are currently working on neuromorphic implementation of these algorithms (Milde et al., 2016; Salt et al., 2017).

populations.

Moreover, the 500 ms time window that we used to create plots of DVS events and average spiking activity was also used in our controller for counting spikes when calculating motor commands, sent to the robot. In our preliminary experiments on optimizing the controller, we have reduced this time window to 50 ms and, more importantly, replaced it with a slidingwindow calculation of the average firing rate of the drive and speed neuronal populations. A more principled solution to this problem would be development of a more direct hardware interface between the spiking neuromorphic processor and the robot's motors, so that spikes can control the motor rotation directly, as suggested by Perez-Peña et al. (2013).

Our target acquisition network can also be further improved: the main strategy will be to introduce target representations in a reference frame that moves with the robot, but has a fixed orientation. Such representation will allow the robot to turn back to a target that has been lost from sight due to an obstacle avoidance maneuver. Furthermore, increasing the strength of lateral interactions in the WTA (DNF) population will allow to stabilize the target representation, allowing it to form a "working memory," which will support target acquisition behavior in cluttered environments. To still make the system reactive and allow it to follow the visible target, control of the strength of lateral interactions will be introduced, increasing their strength when target is being lost from view and decreasing their strength when the target is visible. Detecting the target based on its features perceived with a DVS is a separate topic of ongoing research both in our lab and worldwide (e.g., Lagorce et al., 2015).

Despite of this list of necessary improvements, our neuromorphic architecture is an important stepping stone toward robotic controllers, realized directly in neurally inspired hardware, being the first architecture for closedloop robot navigation that uses analog neuromorphic processor and minimal preprocessing of visual input, obtained with a silicon retina DVS. Such neuromorphic controllers may become an energy efficient, fast, and adaptive alternative to conventional digital computers and microcontrollers used today to control both low-level and cognitive behaviors of robots. While neural network implementations using the conventional computing architecture are typically time- and energy consuming, implementation of neuronal architecture using analog neuromorphic hardware approaches the efficiency of biological neural networks. Building neuronal models for higher cognitive function using, for instance, the framework of Dynamic Neural Fields (Sandamirskaya, 2013) or the Neuro-Engineering Framework (Eliasmith, 2005), will allow to add more complex behaviors to the robot's repertoire, e.g., finding a particular object, grasping and transporting it, as well as map formation and goaldirected navigation, which is the goal of our current research efforts.

# AUTHOR CONTRIBUTIONS

MM: conceptualization of the model, analysis of the results, writing up. HB and AD: implementation of combined obstacle avoidance and target acquisition, experiments, results analysis, writing up; DS: implementation of first version of obstacle avoidance, parameter tuning on the chip, state of the art analysis; JC: support with robotic hardware and middleware, analysis of the results, writing up; GI: support with neuromorphic hardware, and state of the art and result analysis, writing

### REFERENCES


up; YS: conceptualization of the model, development of the architecture, experiment design, analysis of the results, embedding in the literature, discussion of the results, writing, and overall supervision of the project.

# FUNDING

Supported by EU H2020-MSCA-IF-2015 grant 707373 ECogNet, University of Zurich grant Forschugnskredit, FK-16-106, and EU ERC-2010-StG 20091028 grant 257219 NeuroP, as well as INIForum and Samsung Global Research Project.

# ACKNOWLEDGMENTS

We would like to thank Aleksandar Kodzhabashev and Julien Martel for their help with the software code used in this work. This work has started at the Capo Caccia 2016 Workshop for Neuromorphic Engineering.

# SUPPLEMENTARY MATERIAL

The Supplementary Material for this article can be found online at: http://journal.frontiersin.org/article/10.3389/fnbot. 2017.00028/full#supplementary-material


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2017 Milde, Blum, Dietmüller, Sumislawska, Conradt, Indiveri and Sandamirskaya. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.