# HUMAN-IN-THE-LOOP ROBOT CONTROL AND LEARNING

EDITED BY : Luka Peternel, Jan Babič, Erhan Oztop, Tetsunari Inamura and Dingguo Zhang PUBLISHED IN : Frontiers in Robotics and AI and Frontiers in Neuroscience

#### Frontiers eBook Copyright Statement

The copyright in the text of individual articles in this eBook is the property of their respective authors or their respective institutions or funders. The copyright in graphics and images within each article may be subject to copyright of other parties. In both cases this is subject to a license granted to Frontiers. The compilation of articles constituting this eBook is the property of Frontiers.

Each article within this eBook, and the eBook itself, are published under the most recent version of the Creative Commons CC-BY licence. The version current at the date of publication of this eBook is CC-BY 4.0. If the CC-BY licence is updated, the licence granted by Frontiers is automatically updated to the new version.

When exercising any right under the CC-BY licence, Frontiers must be attributed as the original publisher of the article or eBook, as applicable.

Authors have the responsibility of ensuring that any graphics or other materials which are the property of others may be included in the CC-BY licence, but this should be checked before relying on the CC-BY licence to reproduce those materials. Any copyright notices relating to those materials must be complied with.

Copyright and source acknowledgement notices may not be removed and must be displayed in any copy, derivative work or partial copy which includes the elements in question.

All copyright, and all rights therein, are protected by national and international copyright laws. The above represents a summary only. For further information please read Frontiers' Conditions for Website Use and Copyright Statement, and the applicable CC-BY licence.

ISSN 1664-8714 ISBN 978-2-88963-312-8 DOI 10.3389/978-2-88963-312-8

#### About Frontiers

Frontiers is more than just an open-access publisher of scholarly articles: it is a pioneering approach to the world of academia, radically improving the way scholarly research is managed. The grand vision of Frontiers is a world where all people have an equal opportunity to seek, share and generate knowledge. Frontiers provides immediate and permanent online open access to all its publications, but this alone is not enough to realize our grand goals.

#### Frontiers Journal Series

The Frontiers Journal Series is a multi-tier and interdisciplinary set of open-access, online journals, promising a paradigm shift from the current review, selection and dissemination processes in academic publishing. All Frontiers journals are driven by researchers for researchers; therefore, they constitute a service to the scholarly community. At the same time, the Frontiers Journal Series operates on a revolutionary invention, the tiered publishing system, initially addressing specific communities of scholars, and gradually climbing up to broader public understanding, thus serving the interests of the lay society, too.

#### Dedication to Quality

Each Frontiers article is a landmark of the highest quality, thanks to genuinely collaborative interactions between authors and review editors, who include some of the world's best academicians. Research must be certified by peers before entering a stream of knowledge that may eventually reach the public - and shape society; therefore, Frontiers only applies the most rigorous and unbiased reviews.

Frontiers revolutionizes research publishing by freely delivering the most outstanding research, evaluated with no bias from both the academic and social point of view. By applying the most advanced information technologies, Frontiers is catapulting scholarly publishing into a new generation.

#### What are Frontiers Research Topics?

Frontiers Research Topics are very popular trademarks of the Frontiers Journals Series: they are collections of at least ten articles, all centered on a particular subject. With their unique mix of varied contributions from Original Research to Review Articles, Frontiers Research Topics unify the most influential researchers, the latest key findings and historical advances in a hot research area! Find out more on how to host your own Frontiers Research Topic or contribute to one as an author by contacting the Frontiers Editorial Office: researchtopics@frontiersin.org

# HUMAN-IN-THE-LOOP ROBOT CONTROL AND LEARNING

Topic Editors:

Luka Peternel, Delft University of Technology, Netherlands Jan Babič, Jožef Stefan Institute (IJS), Slovenia Erhan Oztop, Özyeğin University, Turkey Tetsunari Inamura, National Institute of Informatics, Japan Dingguo Zhang, University of Bath, United Kingdom

In the past years there has been considerable effort to move robots from industrial environments to our daily lives where they can collaborate and interact with humans to improve our life quality. One of the key challenges in this direction is to make a suitable robot control system that can adapt to humans and interactively learn from humans to facilitate the efficient and safe co-existence of the two. The applications of such robotic systems include: service robotics and physical human-robot collaboration, assistive and rehabilitation robotics, semi-autonomous cars, etc. To achieve the goal of integrating robotic systems into these applications, several important research directions must be explored.

One such direction is the study of skill transfer, where a human operator's skilled executions are used to obtain an autonomous controller. Another important direction is shared control, where a robotic controller and humans control the same body, tool, mechanism, car, etc. Shared control, in turn invokes very rich research questions such as co-adaptation between the human and the robot, where the two agents can benefit from each other's skills or must adapt to each other's behavior to achieve effective cooperative task executions.

The aim of this Research Topic is to help bridge the gap between the state-of-the-art and above-mentioned goals through novel multidisciplinary approaches in human-in-the-loop robot control and learning.

Citation: Peternel, L., Babič, J., Oztop, E., Inamura, T., Zhang, D., eds. (2020). Human-in-the-Loop Robot Control and Learning. Lausanne: Frontiers Media SA. doi: 10.3389/978-2-88963-312-8

# Table of Contents

	- *Transcutaneous Electrical Nerve Stimulation* Mengnan Li, Dingguo Zhang, Yao Chen, Xinyu Chai, Longwen He, Ying Chen, Jinyao Guo and Xiaohong Sui

Muhammad Afiq Dzulkifli, Nur Azah Hamzaid, Glen M. Davis and Nazirah Hasnan

*125 Rapid Decoding of Hand Gestures in Electrocorticography Using Recurrent Neural Networks*

Gang Pan, Jia-Jun Li, Yu Qi, Hang Yu, Jun-Ming Zhu, Xiao-Xiang Zheng, Yue-Ming Wang and Shao-Min Zhang


Yaqi Chu, Xingang Zhao, Yijun Zou, Weiliang Xu, Jianda Han and Yiwen Zhao


Francesco Scotto di Luzio, Davide Simonetti, Francesca Cordella, Sandra Miccinilli, Silvia Sterzi, Francesco Draicchio and Loredana Zollo *192 An Adaptive and Hybrid End-Point/Joint Impedance Controller for Lower Limb Exoskeletons*

Serena Maggioni, Nils Reinert, Lars Lünenburger and Alejandro Melendez-Calderon

*209 Human-In-The-Loop Control and Task Learning for Pneumatically Actuated Muscle Based Robots*

Tatsuya Teramae, Koji Ishihara, Jan Babič, Jun Morimoto and Erhan Oztop

*219 Muscle Synergy Alteration of Human During Walking With Lower Limb Exoskeleton*

Zhan Li, Huxian Liu, Ziguang Yin and Kejia Chen

*Sugeeth Gopinathan1,2\*, Sonja K. Ötting 1,3 and Jochen J. Steil <sup>2</sup>*

*1CoR-Lab, Bielefeld University, Bielefeld, Germany, 2 Institut für Robotik und Prozessinformatik, Technische Universität Braunschweig, Braunschweig, Germany, 3Work and Organizational Psychology, Department of Psychology, Bielefeld University, Bielefeld, Germany*

An ideal physical human–robot interaction (pHRI) should offer the users robotic systems that are easy to handle, intuitive to use, ergonomic and adaptive to human habits and preferences. But the variance in the user behavior is often high and rather unpredictable, which hinders the development of such systems. This article introduces a Personalized Adaptive Stiffness controller for pHRI that is calibrated for the user's force profile and validates its performance in an extensive user study with 49 participants on two different tasks. The user study compares the new scheme to conventional fixed stiffness or gravitation compensation controllers on the 7DOF KUKA LWR IVb by employing two typical jointmanipulation tasks. The results clearly point out the importance of considering task specific parameters and human specific parameters while designing control modes for pHRI. The analysis shows that for simpler tasks a standard fixed controller may perform sufficiently well and that respective task dependency strongly prevails over individual differences. In the more complex task, quantitative and qualitative results reveal differences between the respective control modes, where the Personalized Adaptive Stiffness controller excels in terms of both performance gain and user preference. Further analysis shows that human and task parameters can be combined and quantified by considering the manipulability of a simplified human arm model. The analysis of user's interaction force profiles confirms this finding.

Keywords: assistance systems, personalized controllers, adaptive stiffness mode, physical human–robot interaction (pHRI), manipulability in HRI

## 1. INTRODUCTION

As opposed to conventional industrial robotics where the robots are programmed to accomplish a fixed and repetitive task, interactive scenarios demand flexible robotic systems where the robot assists the human worker by collaborating with them, increasingly often through physical human–robot interaction (pHRI). Lightweight robots are replacing the traditional industrial robots in such tasks due to their obvious advantages: they are less dangerous, and the added compliance allows the users to work in close proximity and thus collaborate with the robot. This collaboration is a major step forward in achieving flexibility in industrial tasks, because the implicit technical knowledge that the human workers possess about the task can be incorporated directly by collaboration, without added effort of modeling or programming.

#### *Edited by:*

*Luka Peternel, Fondazione Istituto Italiano di Tecnologia, Italy*

#### *Reviewed by:*

*Tadej Petric, Jožef Stefan Institute, Slovenia Leonel Rozo, Fondazione Istituto Italiano di Tecnologia, Italy*

*\*Correspondence: Sugeeth Gopinathan sgopinathan@techfak.uni-bielefeld.de*

#### *Specialty section:*

*This article was submitted to Humanoid Robotics, a section of the journal Frontiers in Robotics and AI*

*Received: 30 June 2017 Accepted: 24 October 2017 Published: 24 November 2017*

#### *Citation:*

*Gopinathan S, Ötting SK and Steil JJ (2017) A User Study on Personalized Stiffness Control and Task Specificity in Physical Human–Robot Interaction. Front. Robot. AI 4:58. doi: 10.3389/frobt.2017.00058*

Although it is widely assumed that pHRI will improve flexibility and productivity by taking advantage of the human's cognitive and perceptual skills, it is unclear how this interaction in detail may be made more ergonomical and pleasant for the user. For this aim, a few number of novel platforms are commercially available that allow the adaptation of the robot controller to make the human–robot interaction smoother. The online adaptation of impedance characteristics is possible, and such manipulators behave like a spring damper system that reacts to external forces (Buchli et al., 2011). However, substantial variation in human interaction forces coupled with unpredictable human behavior make it difficult to design a suitable pHRI system. Another factor, which will substantially affect pHRIs is the task itself. Unique task characteristics, such as geometry, difficulty level, and requirement of precision, have a sizable effect on how a human worker interacts with the robot during task completion. Each task is unique, and each individual approaches a task with a unique strategy, which might be substantially different among users. This variance in interaction is strongly connected to their physical limitations as well as to their personal preferences. Hence, not only user interaction forces but also the physical characteristics of the users such as differences in height, body proportions, left or right handedness, the distance the user keeps with the robot, or varying cognitive skills can introduce substantial variance. This demands personalization of the robots to be capable of accommodating user-specific dynamics.

In summary, task specific characteristics and human parameters play an important role in user interaction and the resulting variance in user behavior. They therefore should be investigated further. Most of the current literature—except a few such as Medina et al. (2011) and Rozo et al. (2015)—ignores these aspect and focuses entirely on adapting robot controllers to the user interaction forces. Rozo et al. (2015) use Gaussian mixture model to learn cooperative robot skills in the context of human–robot object transportation. This method allows the robot to automatically encode the human demonstrations and its relation to the task parameters. Medina et al. (2011) proposed a method for gaining knowledge as well as acquiring semantic labels for interaction experience on joint manipulation without supervision, aiming at improving the robots joint-manipulation skills. Various schemes based on variable admittance or impedance control have been proposed to improve the interaction quality, where the user interaction is mapped into robot stiffness, hence trying to reduce the effort in pHRI. Dimeas and Aspragathos (2014) implemented a variable admittance controller that is based on a Fuzzy inference system and an adaptation algorithm to vary the admittance parameters. Here, the Fuzzy inference system relies on the measured velocity and the human force and proposes suitable controller gains. In Lecours et al. (2012), a variable admittance control is discussed to improve intuitiveness in interaction by adjusting the admittance parameters based on the acceleration and velocity of the end effector. The parameters are then tuned online by certain heuristics. In Khan et al. (2015), a muscle circumference sensor is used to estimate the human interactive force, and a Radial Basis Function Neural Network is used to predict the desired human motion. Li et al. (2015) use game theory and policy iteration to analyze the pHRI and subsequently try to estimate the control objective of the user. This prediction is thereafter used to adapt the robot's objective to user objective to coordinate the interaction. Ranatunga et al. (2015) try to account for the variability in human dynamics and propose a controller that can incorporate human intent, nominal task models, as well as variations in the robot dynamics. The proposed scheme consists of an outer-loop model tuned using an inverse control technique and an inner-loop that uses a neuroadaptive controller to linearize the robot dynamics.

These often rather complicated adaptation schemes have neither been evaluated nor tested with naive users, that is, with non-experts who have no prior knowledge about the robots and their programming. In addition, the implicit assumption that such adaptations are beneficial for task performance or user satisfaction has not yet been validated on any reasonable tasks. Also the importance of task specific parameters and the variance this introduces in human–robot interaction has not been discussed. We hypothesize that determining these highly variable human characteristics and task parameters and analyzing their effects on the smoothness and efficiency of the pHRI is a crucial factor toward practical applications of pHRI and deserves more attention. Despite these clear indications, apparently no commercially available and practically used control scheme embodies such adaptivity or personalization and experimental experience is shallow.

In Gopinathan et al. (2017), a user study was conducted, and a novel personalized adaptation control was discussed. The personalized adaptive control mode used is parametrized based on interaction force limits of each individual user. Hence, each user will have a unique interaction experience based on their corresponding limitations. The Personalized Adaptive Stiffness control mode is evaluated with non-expert users, comparing its performance and interaction quality to standard constant stiffness or gravity compensation modes that are widely used for pHRI. In this article, we elaborate the results of the user study and investigate additional characteristics, which may have significant effects on pHRI. While in principle it would be desirable to comparatively evaluate this approach additionally with all the methods discussed earlier and add more human factors, this is clearly beyond the scope of a single user study. The current contribution starts at the even more basic question of whether adaptive schemes can actually perform better for non-expert users than simple fixed standard control techniques that are widely applied in practice. The evaluation of data from the study will shed more light on the significance of considering task specificity and importance of human specific parameters while designing control strategies for pHRI.

In Section 2, the robotic system, the control scheme, and the interaction control modes are described in detail. Section 3 describes the study design, the tasks users performed in the experiment, the questionnaire users had to answer during the experiment, and the dependent variables that are considered while designing the evaluation regime. Section 4 tabulates the detailed results of the conducted experiments and provides a statistical comparison of the results from the experiments. In Section 5, task specificity is discussed along with the results of force analysis and manipulability analysis. Finally, in Section 6, we discuss the lessons learned and how future research could unfold.

### 2. THE ROBOTIC SYSTEM

The robotic system is designed to emulate common industrial applications (e.g., welding or gluing) where the robotic arm is used as a tool and the user moves it kinesthetically by physically touching the robot's end effector. The control modes are implemented within the *Compliant Control Architecture* (Nordmann et al., 2012), and the program flow of the experiment is implemented using a *Domain-Specific Language* (Nordmann and Wrede, 2015). This section describes the robotic system used and elaborates on the implementation of the interaction control schemes. **Figure 1** shows the experiment setup.

### 2.1. Compliant Robot Platform

The platform for our user study consists of a KUKA Light Weight Robot (LWR IV) (Bischoff et al., 2010) equipped with a BarrettHand (BH8) (Townsend, 2000). The LWR IV is a redundant robot with seven joints equipped with torque sensors in each joint. The LWR IV is an actively compliant robot and has an impedance based control scheme (Albu-Schäffer et al., 2007). The BarrettHand that is attached to the LWR IV is a multi-fingered programmable grasper, equipped with fingertip torque sensors and tactile sensors at the palm of the grasper. This grasper is used in the experiment to achieve certain interaction tasks as explained in Section 3. In addition, a detachable rod is attached to the BarrettHand for accomplishing the tasks in the experiment.

### 2.2. Interaction Control Scheme

The compliant platform allows the users to move the end effector kinesthetically within the robot's workspace. The user interaction at the end effector will produce a Cartesian displacement Δ*x* from the current end-effector position *x*, the new desired Cartesian equilibrium is as follows:

$$
\lambda^\* = \mathfrak{x} + \Delta \mathfrak{x}.\tag{1}
$$

The analytical controller named CBF controller, proposed in Emmerich et al. (2013) is based on Grupen and Huber (2005) and is used here for converting the user input in task space *x*\* into joint space *q*\* . A redundancy resolution *qc* is selected to get the best inverse kinematic solution that satisfies the desired task criteria. The controller generates nullspace motion to maintain the preferred redundancy resolution configuration *qc*, while achieving as primary task the Cartesian target displacement as follows:

$$\begin{aligned} \Delta q &= J^\dagger(q) \Delta \mathbf{x}^\dagger + (I - J^\dagger J) \Delta q\_\epsilon \\ \Delta q\_\epsilon &= q - q\_\epsilon, \quad \Delta \mathbf{x}^\dagger = (1 - \alpha) \Delta \mathbf{x} \\ q^\* &= q + \Delta q. \end{aligned} \tag{2}$$

Here *J* † constitutes the Moore–Penrose Pseudoinverse of the task Jacobian. This implementation allows the user to interact seamlessly with the robot and move the end effector. A smoothing component was used to prevent the robot arm from drifting away after the interaction, the smoothing factor *α* adapts the Cartesian displacement and was chosen to be 0.5. Both Δ*x*′ and *qc* are fed simultaneously into the hierarchical controller. The hierarchical controller prioritizes the tasks, treating the smoothed displacement as the primary task and the redundancy resolution as the secondary task. The controller then sends Δ*q* to the robot which corresponds to the user given Cartesian displacement Δ*x*. **Figure 2** shows the control scheme architecture. During the experiment, the built-in Joint Impedance mode of KUKA LWR is used, the stiffness and damping values are chosen to suit the different control modes under consideration, the values for the stiffness were selected from a pre-study conducted with 8 participants. The joint stiffness mode was selected for the tasks for allowing the users more freedom in the interaction and give them the possibility to reconfigure the robot if necessary.

### 2.3. Interaction Control Modes

In this study, four control modes are compared, see **Table 1**. The implementation of the controllers is based on the architecture described in Section 2.2. The damping is kept constant during the interaction for all control modes, whereas the stiffness values are varied accordingly in each mode to attain desired interaction strategy. The stiffness values of high and medium stiffness modes were set to constant values based on the results from the pre-study.

Table 1 | Overview of four control modes that are compared in the study.


The current position of the robot is continuously tracked by the control loop and forms the reference for the Joint Impedance Controller. The robot can be moved freely by the user but will hold its position even when no external force is applied. As described in Steil et al. (2014), the native gravity compensation mode is reimplemented using the above specified control scheme. Hence, switching of control modes in LWR IV controller during the experiment was avoided.

In the *assisted Gravity Compensation mode*, forces applied by the users are not resisted by the robot. In this mode, the robot is compliant and the user can move the robot through physical interaction. The *High Stiffness mode* offers higher resistance to the user when interaction occurs. This might not be ergonomically good for the user, since throughout the interaction a high force at the end effector needs to be applied. In *Medium Stiffness mode*, the robot offers a slight resistance to user interaction.

The fourth mode is the *Personalized Adaptive Stiffness mode*. This is a personalized mode where a linear heuristic is used to adapt the stiffness of the robot online. A similar approach was used in Dimeas and Aspragathos (2014) where a heuristic was used to vary the impedance parameters based on the change in velocity of the robot. In our case, we keep the damping at a constant value and vary the stiffness based on the instantaneous interaction force. The stiffness is linearly proportional to the applied force. The individual *fmax* and *fmin* are calculated for each user during the initial warm-up phase of the experiment (see Section 3.2.1) and are used to set the limits of control mode. The stiffness varies between a maximum and minimum value, *kmax* and *kmin* as follows:

$$k\_{\rm var} = \left(\frac{(k\_{\rm max} - k\_{\rm min})}{(f\_{\rm min} - f\_{\rm max})}\right) \ast f\_{\rm resultant} + k\_{\rm max} \,. \tag{3}$$

From experimental trials conducted in the pre-study, *kmax* is set to 1,000 Nm/rad and *kmin* is set to 10 Nm/rad. The instantaneous resultant force applied at the end effector is measured as *fresultant*. Based on equation (3), a stiffness is calculated (*kvar*). This stiffness is then filtered using a second order low-pass filter and forwarded to the controller. This control mode adapts to the forces which the user applies and is personalized to work within the user's force range. The integration of the adaptation into the interaction control scheme is shown in **Figure 2**.

### 3. STUDY DESIGN

To compare the four interaction control modes, we designed a user study as within-subjects study, where each participant experiences all four control modes. This design has been chosen because it is economic and eliminates possible influences from individual-related confounding variables (Field, 2013). The interaction control modes were activated in random order to prevent the occurrence of sequencing effects.

### 3.1. Ethic Statement

Before starting the user study, we consulted Bielefeld University's ethics committee, which approved of the study as being ethically innocuous. In addition, the study setup was inspected and approved by the official safety officer. Each of the participants was given a short briefing prior to the experiment containing information about the study process and data that would be assessed. The subjects had also the possibility to ask questions before the experiment and were insured that it was possible to quit participating at every point in time and that in this case the incomplete data would be deleted and not enter the analysis. All participants gave their oral informed consent in accordance with Declaration of Helsinki. For data protection reasons, no written statements were obtained not to store any personal data. This was in agreement with usual practice in such studies and in accordance with Bielefeld University's ethic committee guidelines. After finishing the experiment, the participants were debriefed and given additional information regarding the study.

### 3.2. Study Setup

This section describes the experiment phases and procedures that employ a two-stage model similar to Wrede et al. (2013). The first phase is a *warm-up* where the user interacts with the robot and an individual force profile is recorded for subsequent calibration of the *Personalized Adaptive Stiffness mode*. In the second phase, the user is asked to complete certain tasks, e.g., moving a tool point attached to the robot along a predefined trajectory, using different control modes. In addition, each participant has to fill in a questionnaire. The flow of the study is shown in **Figure 3**.

#### 3.2.1. Warm-up Phase

In this first phase, the user plays a pick and place game and interacts with the robot by physically moving its end effector. Five objects are randomly placed in the workspace and the user moves the BarrettHand above the object and presses the palm onto the object. This action is sensed by the palm sensor and the fingers of the BarrettHand close, and the object is grasped. Then the user is asked to move the robot end effector to the target location marked in the robot workspace, place the object on target location and press it downwards causing the BarrettHand fingers open and release the object. After finishing this task, the user proceeds to the next object and repeats the game. The robot's stiffness is set to a medium value at this phase. This allows us to record the normal working force limits of the user. **Figure 1** shows one of the participants interacting with the robot in this phase.

Besides providing an opportunity for the users to get used to the robot, this warm-up phase serves a secondary purpose: while the user participates in the pick and place game, a force observer program continuously monitors the forces applied by the user at the end effector. During each interaction, the maximum and minimum forces are stored and finally averaged. The underlying assumption is that each user has different physical capabilities (some users may be stronger than others) and hence the force applied by each user will vary. If we calibrate a Personalized Adaptive stiffness controller to work between these force limits, each user gets his/her own personalized adaptive controller respecting their physical capabilities. Hence, from this phase *fmax*, the maximum interaction force and *fmin*, the minimum interaction force from each user are calculated.

#### 3.2.2. Task Phase

In the second phase, the users perform two tasks of varying complexity with different control modes. The tasks are designed to emulate common industrial tasks like welding or gluing where the user has to move a tool in a predetermined trajectory for completing each task. From experimental pre-trials, it was determined that these tasks should be neither too easy nor too complex. The users have to move the tip of the tool/end effector along a predetermined trajectory (e.g., a spiral) from a start to end position to complete the task. The users have to perform each task with four control modes. The control modes are activated in random order to eliminate possible sequencing effects, whereas the users have no information on the control mode they are using in each trial.

### *3.2.2.1. Drawing Task*

The two tasks vary in difficulty. In the first task, the user is asked to draw a predefined figure on the flat surface of a table. During this task, the user has to care for task accuracy while maintaining contact with the flat surface. As experienced in the pre-study, moving the robot tool around curves leads to errors. It is also not easy to maintain contact to the surface while following a non-straight contour. For the purpose of standardization of the experiment, a spiral image is placed on the workspace, and the user has to follow this spiral trajectory starting from the outside of the curve and ending at the center point. In **Figure 1**, a study participant can be seen following a spiral with the tool attached to the end effector.

#### *3.2.2.2. Contour-Following Task*

The second task is simpler than the first task. It is easy but not trivial as it involves moving the tool in 3D space. An adapted version of the wire-loop game is constructed in the robot workspace. The user has to move the tool along the edge from one end to the other to finish the task. This task resembles gluing or welding along the edge of a workpiece. In **Figure 1**, a study participant can be seen moving the tool along the edge of the adapted wire-loop game.

#### 3.2.3. Questionnaire

The data for the qualitative analysis were collected by means of questionnaire. Before the first task, the participants answered several questions on control variables (e.g., previous experience with robots). After each task, the participants rated how they perceived the interaction with the robotic arm. After the completion of all tasks, the participants answered additional questions on demographic variables.

The questionnaire was adapted to the task and the robotic arm used in the experiment. The items concerning the interaction quality asked for the rating of how easy it was to use the robotic arm (ease of use), how controllable (control) and reliable (reliability) the robot was, how enjoyable the interaction was (enjoyment), and how satisfied the participants were with the robot (user satisfaction). The items used for this were selected items from the sub-scales perceived ease of use, perceived enjoyment, and perception of external control from the Technology Acceptance Model (Venkatesh, 2000), supplemented with items from the sub-scales reliability and system satisfaction from the Integrated Model of user satisfaction and technology acceptance (Wixom and Todd, 2005). Sample items are as follows: "*Interacting with the system did not require a lot of my mental effort*" for ease of use, "*I had control over using the system*" for control, "*The operation of the robot was dependable*" for reliability, "*I found using the system to be enjoyable*" for enjoyment, and "*All things considered, I was very satisfied with the interaction with the robot*" for user satisfaction. The participants rated their agreement with the presented statements on a 5-point answer scale (5 = I agree/1 = I do not agree).

### 3.3. Dependent Variables

This section describes the criteria used to compare the interaction control modes based on their performance.

#### 3.3.1. Variables for Quantitative Analysis

The following variables are used to analyze the performance of the users.

*Time of completion*: the time required to move the end effector from the starting point to the target point.

*Procrustes analysis*: a rigid shape analysis that uses isomorphic scaling, translation, and rotation to find the best fit between two or more landmarked shapes (Ross, 2004). Procrustes analysis quantifies the similarity between the task trajectory generated by the user and the target trajectory. This criterion refers to the quality and the effectiveness of each control mode. The goodnessof-fit criterion used in this analysis is the sum of squared errors. It returns a measure of dissimilarity *d*, the similarity measure is calculated as *s* = (1 − *d*).

*Smoothness*: a movement is considered smooth when it happens without interruptions. Smoothness is generally used to determine the controllability of a system (Balasubramanian et al., 2015). Hence, a trajectory with maximum smoothness will result in maximal movement efficiency (Burdet et al., 2013). Also a smooth interaction ensures a reduced interaction effort from the user side, hence improving the human–robot interface (Olsen and Goodrich, 2012). One of the most commonly used smoothness measures is the number of peaks (NP). The peaks are identified as the number of maxima in a given trajectory, see equation (4). This quantifies the smoothness to a measurable quantity (Montes et al., 2014). The total number of peaks in each dimension X, Y, and Z is calculated from the recorded data and the sum of the peaks in X and Y is counted.

$$NP \stackrel{\triangle}{=} \# \left\{ \left( \frac{dX}{dt} \right)\_{\text{maximum}} \right\}$$

$$\left( \left( \frac{dX}{dt} \right)\_{\text{maximum}} \stackrel{\triangle}{=} \left\{ \left( \frac{dX}{dt} \right) ; \left( \frac{d^2X}{dt^2} \right) = 0 \, and \left( \frac{d^3X}{dt^3} \right) < 0 \right\}. \tag{4}$$

Another method of quantifying smoothness is representing it as a function of jerk equation (5), which is the time derivative of acceleration (Hogan, 1984). The jerk cost is a scalar, which could be used for judging the smoothness of the trajectory (Shadmehr and Wise, 2005). The jerk cost of the individual axis is calculated for each trajectory, and the sum is then represented as the total jerk cost for each user generated trajectory.

$$X\_i^t = \left(\frac{d^3X\_i}{dt^3}\right), j = \sum\_{i=1}^{i=n} \left\|X\_i\right\|\_2, X\_i = \begin{pmatrix} \varkappa\_i\\\varkappa\_i\\\varkappa\_i \end{pmatrix}. \tag{5}$$

*Arc length*: the total length traversed while moving along the given trajectory. It is related to the accuracy in task completion. Larger arc length means more deviation the user had from the intended path. The arc length can be calculated as equation (6).

$$\mathcal{S} = \sum\_{i=1}^{i=n} \sqrt{\Delta x\_i^2 + \Delta y\_i^2 + \Delta z\_i^2} \,. \tag{6}$$

#### 3.3.2. Variables for Qualitative Analysis

To analyze the interaction quality, the participants rated their perception of the interaction quality after each task. Each criterion of interaction quality (ease of use, reliability, external control, enjoyment, and user satisfaction) is briefly described below. Perceived ease of use is one of the main determinants of system use. It is the degree to which a user believes that using a system will be free of effort (Davis, 1989). Reliability and control are system characteristics, influencing how users experience the use of the system. Reliability refers to the degree to which a user believes he or she can depend on the system's operations to be reliable and predictable (Wixom and Todd, 2005). The perceived control is the degree to which a user believes that he or she has control over using the system (Venkatesh and Bala, 2008). Enjoyment and user satisfaction capture affective perceptions of using the system in question. Enjoyment is the extent to which "the activity of using a specific system is perceived to be enjoyable in its own right, aside from any performance consequences resulting from system use" (Venkatesh et al., 2003, p. 351). User satisfaction represents the degree of favorableness the user shows with respect to the system (Wixom and Todd, 2005).

### 4. EXPERIMENT RESULTS

In this section, the criteria for performance and interaction quality are analyzed to evaluate the four control modes for both tasks. A second order low-pass filter was used to eliminate noise in the data. To analyze whether there are differences between the controllers, we applied the repeated measures analysis of variance (ANOVA). This procedure is recommended to compare the mean values of experimental groups, where the same participants experience all experimental conditions (in this case, the four control modes). The results in this section will be reported with the full test statistics (e.g., F(3) = 7.19, p < 0.001). Here, the p-value indicates the significant difference between the compared groups (level of significance: 0.05). Subsequently, pairwise comparisons (*post hoc* test: Bonferroni) determine which groups differ significantly from each other (level of significance: 0.05). The execution of repeated measures ANOVA has several requirements: The most important is the absence of sphericity. If sphericity is detected, the usage of Greenhouse–Geisser corrected tests is recommended (Field, 2013). The corresponding results are reported and interpreted in the same way as mentioned earlier. **Figure 4** shows the performance of participant 23 while using different control modes for completing both mentioned tasks.

### 4.1. Participants

N = 49 users participated in the experiment, where 74.5% were male, M age = 31.67, SD age = 10.46, and 78.7% right-handed. The data from two participants were removed because of inconsistencies in the data, primarily caused by not following the given instructions. The participants were mainly full-time working 44.7%, 31.9% were students, 10.6% part-time working, and 4.3% not working. The educational level was high, with 53.2% having a university degree, 25.5% having a higher vocational education. The participants were recruited through snowball sampling, following an initial advertisement. The user study titled "*Human–Robot Interaction User Study*" has been approved by Ethics Commission of Bielefeld University.

### 4.2. Hypothesis

Based on the characteristics of the four control modes described earlier, we had the following hypotheses on the outcomes of this comparison: H1: The gravity compensation mode will be faster but less accurate than medium stiffness or high stiffness. H2: The

high stiffness mode will be slower but more accurate than medium stiffness or gravity compensation. H3: The medium stiffness mode will be in between gravity compensation and high stiffness mode in terms of time and accuracy. H4: The adaptive stiffness mode excels the other modes in terms of time and accuracy.

## 4.3. Drawing Task

#### 4.3.1. Quantitative Analysis

Repeated measures ANOVA showed significant differences between the controllers for all performance criteria. The detailed ANOVA test statistics can be found in **Table 2**. The means and standard deviation of each criteria for the four control modes for both the tasks are shown in **Figure 5**. The *post hoc* pairwise comparisons showed the following results: For time, there is a significant difference between Adaptive Stiffness and High Stiffness (*p* < 0.001), while Adaptive Stiffness did not differ significantly from Gravity Compensation (*p* = 1.000) and Medium Stiffness (*p* = 1.000), even though the mean time for Gravity Compensation is slightly lower than that of Adaptive Stiffness and that of Medium Stiffness. This indicates similar performance using Adaptive Stiffness or Gravity Compensation modes.

For procrustes, there is a significant difference between Adaptive Stiffness and Medium Stiffness (*p* = 0.040) as well as between Adaptive Stiffness and Gravity Compensation (*p* = 0.009). High Stiffness and Adaptive Stiffness do not differ significantly (*p* = 1.000), even though the value for High Stiffness


*Significant p-values are highlighted.*

Figure 5 | Error graphs showing means and SDs of each criteria for the four control modes for contour-following task (C.F.T.) and drawing task (D.T.). The top figure shows the results of the qualitative analysis, and the bottom figures shows the results of the quantitative analysis.

is slightly better than that of Adaptive Stiffness. This indicates similar performance using Adaptive Stiffness or High Stiffness modes, both showing better performance than the other modes.

For the number of peaks, there is a significant difference between Adaptive Stiffness and High Stiffness (*p* = 0.007), Adaptive Stiffness and Medium Stiffness (*p* = 0.008), and Adaptive Stiffness and Gravity Compensation (*p* < 0.001). The mean number of peaks for Adaptive Stiffness is lower than other three control modes. This confirms superior performance of Adaptive Stiffness mode compared with the other three modes.

For jerk cost, there is a significant difference between the performance of Adaptive Stiffness and Medium Stiffness (*p* = 0.008). The data hint at a difference between Adaptive Stiffness and Gravity Compensation (*p* = 0.131). The performance of High Stiffness and Adaptive Stiffness does not differ significantly (*p* = 1.000), even though the mean jerk for Adaptive Stiffness is better than that of High Stiffness.

For arc length, Gravity Compensation differs significantly from all other controllers (*p* < 0.001). Adaptive Stiffness is similar to Medium (*p* = 0.410) and High Stiffness (*p* = 1.000).

#### 4.3.2. Qualitative Analysis

Repeated measures ANOVA showed significant differences between the controllers for all criteria of interaction quality. For ease of use, Adaptive Stiffness differs significantly from Gravity Compensation (*p* < 0.001) and marginally significant from High Stiffness (*p* = 0.086). Adaptive Stiffness and Medium Stiffness do not differ significantly. Adaptive Stiffness and Medium Stiffness have therefore the best mean ratings for ease of use.

For enjoyment, Gravity Compensation differs marginally significant from Medium Stiffness (*p* = 0.082), with a lower mean value for Gravity Compensation. The other controllers do not differ significantly from each other.

For reliability, Gravity Compensation differs significantly from Adaptive Stiffness (*p* = 0.005), High (*p* = 0.014), and Medium Stiffness (*p* = 0.009). The results for control are similar: Gravity Compensation differs significantly from Adaptive Stiffness (*p* = 0.001), High (*p* = 0.011), and Medium Stiffness (*p* = 0.012). For both, Gravity Compensation has the lowest mean rating, while Adaptive Stiffness and the other controllers do not differ.

For user satisfaction, Adaptive Stiffness differs significantly from Gravity Compensation (*p* = 0.001) and High Stiffness (*p* = 0.017). Gravity Compensation differs significantly from Medium Stiffness (*p* = 0.021). The means show that Adaptive Stiffness and Medium Stiffness have the best mean ratings, followed by High Stiffness and Gravity Compensation.

## 4.4. Contour-Following Task

#### 4.4.1. Quantitative Analysis

Repeated measures ANOVA showed significant differences between the controllers for two of the four analyzed performance criteria (time and number of peaks). For time, Adaptive Stiffness is significantly different from High Stiffness (*p*= 0.003). Although not significantly, the mean time for Adaptive Stiffness is lower than all other modes. The results show that with the Adaptive Stiffness mode the users are slightly faster.

For number of peaks, there is a significant difference between the performance of Adaptive Stiffness and Gravity Compensation (*p* < 0.001). Adaptive Stiffness and High Stiffness (*p* = 0.07), and Adaptive Stiffness and Medium Stiffness (p = 0.087) are also different. The mean number of peaks for Adaptive Stiffness is lower than other three control modes.

Even though there are no significant differences for procrustes and jerk cost, there are some points worth mentioning: For procrustes, the mean accuracy of High Stiffness is better than other modes. The results are interesting since in a simple task the performance in accuracy is not much different. In fact, as expected, High Stiffness is slightly better. This is another hint into needs of task dependent control modes since the results of procrustes in the drawing task data show a vast difference. For jerk cost, there is a difference between the performance of the controllers, the Medium Stiffness being slightly better than the other modes.

#### 4.4.2. Qualitative Analysis

Repeated measures ANOVA showed significant differences between the controllers for all criteria of interaction quality. For ease of use, Adaptive Stiffness is similar to Gravity Compensation and High Stiffness. Medium Stiffness differs significantly from High Stiffness (*p* = 0.004), Adaptive Stiffness (*p* = 0.025), and Gravity Compensation (*p*= 0.001). Here, Medium Stiffness clearly excels over the other controllers. Analysis of the dataset perceived enjoyment shows significant difference between the controllers. Medium stiffness is slightly better than Gravity Compensation and High Stiffness.

For the dataset reliability, there is a significant difference between the controllers. Medium Stiffness is slightly better than High Stiffness and Gravity Compensation. The dataset external control showed a significant difference between the controllers. Medium Stiffness differs significantly from Gravity Compensation, and High Stiffness is slightly better than Adaptive Stiffness.

For user satisfaction, Adaptive Stiffness differs significantly from Gravity Compensation (*p* = 0.001) and High Stiffness (*p* = 0.017). Gravity Compensation differs significantly from Medium Stiffness (*p* = 0.021). Here, High and Medium Stiffness have the best ratings, closely followed by Adaptive Stiffness and with the lowest mean ratings for Gravity Compensation.

### 4.5. Statistical Comparison of Results

We conducted factorial repeated measures ANOVAs to find differences between the tasks, the controllers and their interaction. For this analysis factors, namely, task (contour-following/drawing) and controller (Gravity Compensation/Adaptive Stiffness/ Medium Stiffness/High Stiffness) and their interaction term are included as independent variables. A statistical interaction occurs when the effect of one independent variable on the dependent variable changes depending on the level of another independent variable. A main effect is the effect of one of the independent variables on the dependent variable, ignoring the effects of all other independent variables.

The results of the analysis show if the criteria of performance and interaction quality differ significantly in these cases, (a) between the tasks, when the controllers are not considered, (b) between the controllers, when the tasks are not considered, and (c) between the controllers, dependent on the task that is fulfilled. Here, (a) displays the difference in difficulty between the tasks, (b) confirms the results from section 4.3 and 4.4, and (c) shows whether the controllers might be able to compensate for effects of task difficulty. The full ANOVA test statistics and the differences of the means (*MDiff* = *Mcontour* − *Mdrawing*) are displayed in **Table 3**.

We did not run this analysis for the criterion time of completion, because the time of completion is highly task specific and its analysis will not give any information about differences in performance caused by task difficulty.

#### 4.5.1. Analysis of Quantitative Performance

The results for data of procrustes analysis show significant main effects for tasks (*Mcontour* = 0.94; *Mdrawing* = 0.58) and for controllers. In addition, there is a significant interaction effect. Here, the difference between the tasks is smaller when the Adaptive Stiffness or High Stiffness controllers are used, compared with Gravity Compensation and Medium Stiffness. For number of peaks, there is only a significant main effect for controllers, but neither a main effect for task nor an interaction effect. For jerk cost, there is a main effect for task (*Mcontour* = 1.35; *Mdrawing* = 1.23), but no main effect for controllers. There is a marginally significant interaction effect. The difference between the tasks is the smallest with Medium Stiffness and the largest with Adaptive Stiffness.

#### 4.5.2. Analysis of Qualitative Performance

For ease of use, there is a significant main effect for task (*Mcontour*= 4.38; *Mdrawing*= 4.16) and for controllers. The interaction effect is significant as well. The difference between the tasks is smaller when the Adaptive Stiffness controller is used, compared with the other controllers. For enjoyment and reliability, there is a main effect for task (*enjoyment*: *Mcontour* = 4.14; *Mdrawing* = 3.89; *reliability*: *Mcontour*= 4.29; *Mdrawing*= 4.03) and for controllers, but there is no significant interaction effect. For control, there is a main effect for task (*Mcontour* = 4.55; *Mdrawing* = 4.35) and for controllers. There is a marginally significant interaction effect. The difference between the tasks is the smallest when the Adaptive Stiffness controller is used, compared with the other controllers. For user satisfaction, there is a main effect (*Mcontour* = 4.38; *Mdrawing* = 4.17) for task and for controllers as well as a significant interaction effect. The difference between the tasks is smaller and opposed when the Adaptive Stiffness is used, compared with the other controllers.

### 5. TASK SPECIFICITY

To learn about the effects of task parameters on the task execution and the individual interaction, the forces that users exerted on the end effector are analyzed in this section. In addition to the forces, the manipulability and human specific parameters like arm lengths are analyzed for the drawing task. For the latter part, four distinct users are selected with different body proportions, and their data are analyzed for observing the effects of user-specific parameters on task execution. The **Figure 6** shows the human model used for the analysis. For this particular task, human arm is modeled as a 3 DOF articulate arm with two links. The human interaction model can be defined as shown in **Figure 6**, here *h* is the height of the user's shoulder, *d* is the distance to the task *l*<sup>1</sup> and *l*2 are the arm parameters. This simplified human arm model is used for further analysis.

The distance to the task is known from the experiment setup, the other human parameters were measured manually. **Table 4** shows the arm parameters of the selected users, the *user*1 was the shortest, the *user*4 was the tallest, *user*2 and *user*3 had medium body proportions.

### 5.1. Force Analysis

The forces of one of the study participants while performing the drawing task are shown in **Figure 7**. The green sections in the plot correspond to the region of increasing force, and the red sections of the plots correspond to the decreasing interaction force. A clear pattern is visible: each peak in the force plot corresponds to a particular section in the task. This strongly points at the correlation between task characteristics and variation of the user interaction forces. Further inspection of the data showed that the observed pattern is apparent for each user who performed the drawing task.


*Significant p-values are highlighted.*

### 5.2. Manipulability

The concept of manipulability was proposed by Yoshikawa (1985) as a quantitative measure of the ability in positioning and orienting of robotic arms. It is useful for conducting a task space analysis of robotic manipulators in terms of their ability to generate the velocity, acceleration, and the exerted forces (Chiacchio, 2000). This information can be used to determine the best configuration for task execution and also for designing experimental setups, which are suited for certain tasks (Vahrenkamp et al., 2012). Petrič et al. (2016) studied the manipulability related to human arm and proposed a method that allows the user to perform tasks in arms configurations which are otherwise unsuitable due to lack of manipulability.

The manipulability is given by the following equation:

$$\mathcal{W} = \sqrt{\det\left(\mathbf{J}\_l \mathbf{J}\_l^T\right)},\tag{7}$$

where **J***t* is the translational Jacobian.

Based on the discussed human model, the variation of manipulability for the drawing task for each human parameter is calculated. **Figure 8** shows the variation of manipulability when each parameter changes. The maximum and minimum manipulability for the task is calculated for each parameter variation and is plotted. It is noticeable

that the manipulability increases initially as the parameters vary and suddenly drops after a particular threshold. This points out to a possible singularity and hints at the fact that for a particular task there exists a single configuration of human model that gives optimal performance, or more realistically: for each user, there exists a particular task configuration where the manipulability is maximized.

The manipulability variation for the different users while performing the drawing task was calculated. **Figure 9** shows the results from the analysis of the considered users, it is noticeable that there is a clear pattern in the manipulability variation for *user*1, *user*2, and *user*3. For these users, the pattern of manipulability variation along the task is similar and is a clear repetition, while for *user*4 the manipulability variation is different from other users. Another noticeable result is the value of the manipulability and its relation to task accuracy. From **Table 4**, it can be seen that the accuracy of the task that is represented as the mean procrustes of each user over four task repetition is strongly related to the manipulability. Thus, the human parameters, distance to task, and height to the task are important factors to be considered while designing tasks involving human–robot interaction. The scalar manipulability measure we explored here does not give the full picture as our aim was to introduce the useful concept of human-arm manipulability and discuss the importance of considering the human parameters. Hence, consideration of more extensive facets of manipulability like manipulability ellipsoids discussed in Rozo et al. (2017) will definitely improve the current existing systems as it will make it possible to develop control strategies, which can take into consideration the intricate task characteristics like directional changes that are otherwise hard to model.

### 5.3. Transmission Ratio

The concept of velocity and force transmission ratio is mentioned in Faroni et al. (2016), where the maximization of manipulability in a certain direction was discussed.

For an n-DOF manipulator and m-dimensional task space, Cartesian velocity is given by the following equation:

$$
\dot{\mathfrak{x}} = J\dot{\mathfrak{q}},\tag{8}
$$

where *x* ∈ *<sup>m</sup>* is the task velocity, *q* ∈*<sup>n</sup>* is the joint velocity vector, and *J* is *m* × *n* Jacobian matrix. The force transmission ratio *α* and velocity transmission ratio *β* can be represented as follows:

$$\alpha = \left\| J^{\dagger} \frac{\dot{\mathfrak{X}}}{\|\mathbf{x}\|} \right\|, \quad \beta = \frac{1}{\alpha}. \tag{9}$$

These quantities can be used to maximize the manipulability of a robot along a desired direction (Faroni et al., 2016). Thus,

Table 4 | Variation of the arm parameters of four selected users, the predicted manipulability and task accuracy.


Figure 7 | The plot shows one of the participants performing the drawing task. The tracked path, the force variation along the task, and the stiffness variation along the task trajectory are shown in the plot.

by analyzing these ratios, we can observe the change in direction of the task and its effect in interaction forces. A higher force transmission ratio results in larger forces applied and lower error transmission rate. The same effect will result from low velocity transmission ratio due to Kineto-static duality. Knowing this information beforehand will facilitate designing of kinesthetic teaching and other interaction modes keeping in mind the workspace of human and configurations, which permits maximum precision. This will also sets benchmarks for training users in industry to accomplish interaction tasks efficiently.

Using the simplified human arm model discussed earlier, the transmission ratio for the human arm while executing the task is calculated. **Figure 10** shows the correlation between the force

transmission ratio and the interaction forces for the four users while performing the same task using the same control modes under same condition. It is clearly noticeable that the transmission factor and the interaction forces are strongly correlated.

### 6. DISCUSSION

From the results discussed in the previous Section 4.3.2 and by comparing the mean values from **Figure 5**, clearly the users complete the drawing task faster with the (assisted) Gravity Compensation mode, the down side being bad performance in terms of both quantitative performance and interaction quality. Meanwhile, the High Stiffness mode is accurate but slower and the interaction quality is bad. These results verify the hypotheses H1 and H2 mentioned in Section 4.2. The Personalized Adaptive Stiffness mode has no significant difference in time of completion when compared with Gravity Compensation mode and at the same time the smoothness of Adaptive Stiffness mode is even superior to High Stiffness, having lower number of peaks. The procrustes in the task completion shows no significant difference between Adaptive Stiffness and High Stiffness. These both results together verify the hypothesis H4.

Looking at the criteria for interaction quality, see 4.3.1, the Adaptive Stiffness control is clearly preferred over the Gravity Compensation mode concerning ease of use, reliability, control, and overall user satisfaction. While compared with the High Stiffness mode, the Adaptive Stiffness mode is preferred in

terms of ease of use, experience of control, and overall user satisfaction.

**Figure 11** shows that the Adaptive Stiffness mode ranks high in every comparison criterion we have used for the drawing task. It has a net rating of 9/10, where it got 9 top ranks in 10 compared criteria. Medium Stiffness with 6/10 is the second best mode, and Gravity Compensation comes last, although commonly used in practice. Hence, the online adaptation of stiffness that is personalized for each user receives the best outcome in terms of interaction quality and performance, although our adaptation scheme is rather simple and directly proportional to the measured force. Given that the level of accuracy of maximum stiffness is almost reached, we hypothesize that a more advanced adaption scheme may not achieve much better performance. However, it could possibly reduce effort for the user and could be investigated in future research.

Interestingly, the analysis of the contour-following task in Section 4.3.1 and Section 4.3.2 shows that the users prefer the Medium Stiffness mode for completing the contour-following task. It has high user ratings in all the interaction quality criteria. The accuracy of all the modes is similar for this task, and time of completion for Medium Stiffness and Gravity Compensation is not significantly different, see **Figure 5**.

From **Figure 11**, we can see that the Medium Stiffness mode has the best ranks in criteria of interaction quality, it has an overall rank of 6/10. We can conclude that the more complex the task, the higher the need of adaptation of the robot parameters. It is clear that for the simple task a medium stiffness mode is sufficient and will result in good interaction quality. This strong difference in the results between the two tasks indicates that task specificity is highly relevant when designing interaction strategies for pHRI.

The results from Section 4.5 show that the Adaptive Stiffness controller is in most cases able to (at least some degree) compensate for the differences of performance and interaction quality between tasks of different difficulty. The results are clearer for the criteria of interaction quality than for the performance criteria. Hence, the Personalized Adaptive Stiffness mode is still performing better. The performance could be augmented by combining

both these factors, i.e., having personalization and inclusion of task specific parameters.

The variation of interaction forces of one participant while performing the drawing task is shown in **Figure 7**, by visual inspection it is clear that the pattern observed extends to each user who performed the drawing task and this pattern is task dependent. This variation of force is a clear task specific parameter, and this information could be used constructively to improve the user interaction by incorporating this information while designing the task. By observing the results discussed in Section 5.3, we can infer that this correlation is not only a result of the task specificity but also the user kinematics. The manipulability measure discussed in Section 5.2 and the transmission ratio results discussed in 5.3 clearly point out the effects of task dependency and in addition to this strongly points out the fact that estimation and inclusion of human specific parameters are also important for better task design. By including these parameters, the systems can be designed in such a way that the users never run into singularities of their arm configurations and at the same time the task could be pre-optimized from an ergonomic perspective.

In addition, from the presented results, we can hypothesize that by using the kinematics of the human arm and in turn calculating its manipulability over a given task it is possible to quantify and predict the performance of a user for a given task and task configuration. Hence, considering the human manipulability will help improving the pHRI further, since it is possible to adapt the task configuration or the robot parameters to compensate for the changes in human manipulability. Hence, if we try to optimize the human manipulability online, this will lead to an adaptation scheme that will maximize the user performance and user comfort. Such an adaptation can be used in parallel with a personalized adaptation mode, which adapts not only to the varying user forces. This combination can be used quite conveniently by the users to overcome difficulties arising from task configuration and physical constraint, since it adapts to both task and physical characteristics.

### 7. CONCLUSION

The analysis of the data collected from 49 users from the user study clearly supports the hypothesis that Personalized Adaptive control takes pHRI to the next level, if the task is sufficiently complex. Although the personalization scheme tested here is relatively simple and calibrated only for the force limits of the users, the experiments clearly show that the Personalized Adaptive control was suited for collaborative task execution and

### REFERENCES


will result in good performance. In addition, a medium stiffness mode will give satisfactory results for a simple task and complex adaptations may not guarantee better results in such scenarios. The inferences drawn from the second experiment along with the inference drawn form the analysis of task specificity support that consideration of more human factors could not only further improve the system as a whole but also enhance the user's experience and satisfaction.

Further results show that deploying a human model coupled with task parameters may result in efficient physical human– robot interaction. The human manipulability that we discussed combines both the task characteristics and human kinematics in a meaningful way and gives us a relative performance measure, which can be used for improving the HRI. While we acknowledge that these results need further investigations, the observed strong correlations suggest promising research ideas for our future works. In particular, we would like to perform more comprehensive user studies with both expert and inexperienced users. Furthermore, the idea of incorporating these results in industrial HRI scenarios where humans' ease and comfort is used to reconfigure the task and robot configuration will be investigated.

### ETHICS STATEMENT

This study was carried out in accordance with the recommendations of "Ethics Committee, Bielefeld University". All subjects gave oral informed consent in accordance with the Declaration of Helsinki. The protocol was approved by the "Ethics Committee, Bielefeld University." The title of the permitted study is "Human– Robot Interaction User Study."

### AUTHOR CONTRIBUTIONS

SG contributed to programming of the robot and the experiment setup, conducting the user study, and analysis of the data. SÖ contributed to the study design involving human users and analysis of the data. JS contributed to the experiment design, development of the conceptual idea, data analysis and evaluation. All authors contributed to writing the paper.

### ACKNOWLEDGMENTS

This work is done as a part of "*Human Centered Cyber-Physical Systems in Industry 4.0*" project in *NRW Fortschrittskolleg: Gestaltung von flexiblen Arbeitswelten—Menschenzentrierte Nutzung von Cyber-Physical Systems in Industrie 4.0*.

robotics research and manufacturing," in *Robotics (ISR), 2010 41st International Symposium* (Munich: VDE), 1–8.


Field, A. (2013). *Discovering Statistics Using IBM SPSS Statistics*. London: SAGE.


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

The reviewer, LR, and handling Editor declared their shared affiliation.

*Copyright © 2017 Gopinathan, Ötting and Steil. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

# **A Hybrid Framework for Understanding and Predicting Human Reaching Motions**

*Ozgur S. Oguz\*, Zhehua Zhou and Dirk Wollherr*

*Department of Electrical and Computer Engineering (EI), Technical University of Munich (TUM), Munich, Germany*

Robots collaborating naturally with a human partner in a confined workspace need to understand and predict human motions. For understanding, a model-based approach is required as the human motor control system relies on the biomechanical properties to control and execute actions. The model-based control models explain human motions descriptively, which in turn enables predicting and analyzing human movement behaviors. In motor control, reaching motions are framed as an optimization problem. However, different optimality criteria predict disparate motion behavior. Therefore, the inverse problem—finding the optimality criterion from a given arm motion trajectory—is not unique. This paper implements an inverse optimal control (IOC) approach to determine the combination of cost functions that governs a motion execution. The results indicate that reaching motions depend on a trade-off between kinematics and dynamics related cost functions. However, the computational efficiency is not sufficient for online prediction to be utilized for HRI. In order to predict human reaching motions with high efficiency and accuracy, we combine the IOC approach with a probabilistic movement primitives formulation. This hybrid model allows an online-capable prediction while taking into account motor variability and the interpersonal differences. The proposed framework affords a descriptive and a generative model of human reaching motions which can be effectively utilized online for human-in-the-loop robot control and task execution.

**Keywords: inverse optimal control, human motion modeling, reaching motion prediction, human-in-the-loop control, human–robot collaboration, probabilistic movement primitives**

### **1. INTRODUCTION**

As robots become more present in our social lives, the necessity for interaction and collaboration between humans and robots is becoming more apparent. Although there are several major facets of providing robots with such capability, e.g., motion planning or decision-making, the human aspect has to be prioritized and integrated into robot interaction skills. Requirements for such a humanin-the-loop formulation is twofold: describe (*understand*) how human motions are controlled and generate (*predict*) human-like motions. A descriptive model helps us understand how the biomechanical properties are used by the central nervous system (CNS) for controlling human body to execute a vast collection of motor behaviors. Such an understanding is useful for a multitude of problems, e.g., motor performance evaluation for detecting disabilities due to neural disorders by comparing control models of patients and healthy subjects (Manto et al., 2012); sports performance evaluation by analyzing the identified control models of athletes (Yarrow et al., 2009); detection of deviations of personal motion behaviors w.r.t. the previously identified motor control models,

#### *Edited by:*

*Erhan Oztop, Özyegin University, Turkey*

#### *Reviewed by:*

*Melih Kandemir, Özyegin University, Turkey Takamitsu Matsubara, Nara Institute of Science and Technology (NAIST), Japan*

> *\*Correspondence: Ozgur S. Oguz o.oguz@tum.de*

#### *Specialty section:*

*This article was submitted to Humanoid Robotics, a section of the journal Frontiers in Robotics and AI*

*Received: 14 June 2017 Accepted: 05 March 2018 Published: 27 March 2018*

#### *Citation:*

*Oguz OS, Zhou Z and Wollherr D (2018) A Hybrid Framework for Understanding and Predicting Human Reaching Motions. Front. Robot. AI 5:27. doi: 10.3389/frobt.2018.00027*

**21**

e.g., due to exhaustion (Shadmehr et al., 2010). Specifically, for human–robot interaction (HRI), the robot can plan its motions in a way to allow the human partner to rely more on energyefficient control models. In addition, person-specific control models enable the robot to detect the underlying cause of behavioral anomalies for providing better assistance and safety.

A generative model allows estimating human-like motion trajectories. In this work, the focus is using such models to *predict* human motions, rather than transferring them to robots to generate human-like movement behaviors. For close dyadic collaboration, where the partners share a workspace with the possibility of overlapping motions, they should be able to predict each other's intent and the required motion that can support this intention. Considering how swiftly two humans work together in a confined workspace, the challenges for a human–robot team become obvious; the robot has to take into account human partner's intention and movement in order to control its own motion for achieving effective cooperative task executions. In essence, early prediction of the human motion allows an immediate initiation of the replanning process and an early adaptation of the robot motion (Dinh et al., 2015; Gabler et al., 2017; Oguz et al., 2017). Therefore, the ability of understanding and predicting human motions effectively is the key to achieving swift close human–robot collaboration.

The focus in this work is twofold. First, descriptive models of human reaching motions are investigated and experimentally evaluated. Second, a hybrid framework is proposed, which combines those descriptive models with a data-driven probabilistic approach and realizes online-capable human motion prediction (**Figure 1**). Such a framework not only enables effective robot control for human-in-the-loop scenarios but also they can be directly used for controlling the robot.

Currently, there is no commonly accepted model that explains how the human CNS controls human motions and the latent biomechanical properties of the human motion are not fully understood. Knowing the underlying principles of human motion execution is essential for reproducing human-like motion behaviors accurately in a given setting. However, not every single person exhibits the same motion patterns. These differences might be due to their learning experiences and physiological differences (Rosenbaum, 2009). Moreover, even the motion behaviors of the same person show variations due to motor noise (Todorov and Jordan, 2002). Considering all those intricacies, finding motion behavior models, even for simple reaching tasks, poses challenging research questions.

As the observations of the human motions' behavioral aspect suggest an appealing modeling problem, the human body as a biomechanical system introduces challenges in terms of formulating methods for finding those models. Motor control redundancy and the non-linear characteristic of the human arm as a dynamical system are the most important problems to tackle. A common feature of motor control is that the task requirements can be met by infinitely many diverse movements. Thus, stating only the boundary conditions of the motion for given dynamics leads to an ill-defined problem. The ambiguity caused by this problem

can be resolved if an optimality principle is applied. Accordingly, the basis of many scientific theories on human motor control is formed by optimality principles (Engelbrecht, 2001). A large number of models of open-loop motor control exist and each model claims to describe human motion, but several models are incompatible with others (Todorov, 2004). The characteristics of the human arm movements and the human as an organism define the starting point for the derivation of a cost function. Many cost functions have been proposed to model human reaching motions, however, all of those methods are only verified for specific settings, mostly in 2D (Flash and Hogan, 1985; Uno et al., 1989; Harris and Wolpert, 1998). Hence, their generalization capability to a wider range of human reaching motion behavior in 3D space is unclear. Moreover, as some recent studies suggest, humans might be optimizing two classes of cost functions, one for kinematics and the other for dynamics (Berret et al., 2011; Albrecht et al., 2012). However, finding the contribution of such multiple cost functions is also not trivial as it is a non-linear optimization problem.

Building on the results of prior research studies and their insights, we hypothesize humans utilize multiple models, rather than a single one, to control their motions. Since kinematics is essential for producing smooth motions, and the human arm is a dynamical system, it is reasonable to consider kinematics and dynamics related costs in combination. Hence, we identify possible costs from literature to account for both aspects. In order to find the contribution of each model for the realization of human motion behaviors, we frame such an inverse optimal control (IOC) problem as a bi-level optimization formulation. However, this formulation treats the human motion generation as a deterministic problem. In essence, it is only suitable for modeling average behavior over a group of humans. In order to afford both intra- and interpersonal motion variability, we propose a hybrid framework by extending the IOC formulation with a data-driven probabilistic method. Specifically, by utilizing probabilistic movement primitives (ProMPs), our framework allows for integrating person-specific variations into the IOCbased average motion behavior models during online interaction. Therefore, we can learn a distribution of motion behavior per person, and rollout predictive trajectories from this distribution online, while updating at the same time the multiple model representation to describe the person-specific cost optimization behavior.

We conducted a comprehensive experiment in 3D (**Figure 2**) that covers significantly more cases than prior studies (Albrecht et al., 2011). This extended experiment provides us with critical insights on the interplay between the parameters of the reaching tasks and the contribution of kinematics and dynamics related models. We identify a trade-off between those models with respect to the initial and final joint angle configurations. With the proposed hybrid framework, we are able to determine personal preferences as well as the motor variability per person. It also enables accurate and computationally efficient online prediction of human motion behaviors, which can be integrated into any human–robot collaboration scenario.

In this work, we focus on building descriptive as well as generative models for human motion behavior. By utilizing such models, we aim for efficient and accurate prediction of human motions during human–robot collaboration to realize a natural interaction between partners. To that end, the main contributions of this paper are:


### **2. RELATED WORK**

Many experimental studies have revealed that arm motions exhibit invariant parameters which do not significantly change with movement speed, load, or direction (Soechting and Lacquaniti, 1981; Lacquaniti and Soechting, 1982; Papaxanthis et al., 2003). For motor control, these parameters are utilized to describe pointto-point reaching motions (Soechting and Flanders, 1991). It is assumed that the CNS follows some specific principles when planning the motions (Engelbrecht, 2001). Therefore, optimal control theory becomes the central mathematical formulation to model, describe, and understand motor control by the CNS (Bertsekas et al., 1995; Todorov, 2004), as it emphasizes the optimality of biological movements by minimizing some performance criteria. In literature, several optimal control models have been proposed to describe the point-to-point arm movements, e.g., the minimum hand jerk (Flash and Hogan, 1985), the minimum torque change (Uno et al., 1989), and the minimum variance (Harris and Wolpert, 1998). These models are proven to be efficient in representing the experimental data. However, they are only verified within specific settings, and exhibit, in some cases, dissimilar patterns. Hence, the exact variables optimized in the brain still remain unclear. Later studies suggest that, instead of a single cost function, the CNS might actually consider a weighted combination of costs during the optimization (Cruse and Brüwer, 1987; Rosenbaum et al., 1995; Desmurget et al., 1998; Wolpert and Kawato, 1998; Gielen, 2009). It has already been verified that the trade-off between the objective (task-related) and the subjective (subject-related) cost functions exists in the CNS (Liu and Todorov, 2007), however, there is still no clear explanation about how the subjective costs are combined in reaching motions. In Berret et al. (2011), this cost combination hypothesis was tested in point-to-bar reaching motions on a vertical 2D plane. An inverse optimal control framework, which was initially proposed in Mombaur et al. (2010) for locomotion planning, was applied to identify the contribution of different cost functions. Though their results support the idea of the combined cost functions, an in-depth analysis on how this combination is formed in 3D reaching motions and whether there is a relationship between the degree of contribution and the reaching task parameters is still missing.

Inverse reinforcement learning (IRL), also sometimes used synonymously with inverse optimal control (IOC), is another line of formulation to find control models, or optimal policies given

some demonstrations or observations. However, most of the stateof-the-art methods operate on features rather than raw states, without relying on the dynamical system as a hard constraint on the optimization problem. In essence, the best combination of features, which are extracted during an agent interacting with the environment, is solved for minimizing a pre-defined cost function (Ziebart et al., 2008; Ratliff et al., 2009; Theodorou et al., 2010; Levine et al., 2011; Mainprice et al., 2016). A recent approach by Finn et al. (2016) extends such IRL formulation by tackling the requirement on defining informative features with using neural networks to parameterize the cost function. Essentially, this approach learns non-linear cost functions from user demonstrations, at the same time as learning a policy to perform the task. This formulation can be applied to complex, non-linear cost function representations and high-dimensional problems. However, this is still not directly comparable to solving optimal control problems where the dynamical system is a constraint at each time step, and hence the resulting behaviors are not guaranteed to be generated by the underlying model.

In contrast to creating an optimal control model, another approach to predict human motions is to use data-driven methods. These methods focus more on finding a representation from a given data set (Mainprice and Berenson, 2013; Koppula and Saxena, 2016). Statistical approaches require training data to discover patterns for different arm motions. In that sense, a rigorous and time-consuming data collection process is unavoidable. Other data-driven approaches which do not rely on statistical formulations, e.g., dynamic movement primitives (DMPs) (Ijspeert et al., 2013), require only a minimal set of training data. In an earlier work, we combined optimal control models with DMPs to predict human reaching motion behaviors while avoiding obstacles (Oguz et al., 2016). In that regard, Interaction Primitives (IPs) were developed based on DMPs as a compact

representation of a dyadic activity to predict and plan interaction behaviors (Amor et al., 2014). IPs are learned as a distribution over DMP parameters by training on two interacting partners' trajectories. These IPs encode reciprocal dependencies of dyad movements during the execution of a specific task. The robot then mimics one partner by using the learned model to interact with a human in a similar task. In essence, the learned distributions are conditioned on an observed partial trajectory to predict a human partner's movement for the rest of the task, which in turn enables the robot to correlate its own motion w.r.t. the human to achieve a successful cooperation. Recently, Environment-adaptive Interaction Primitives (EaIPs) were proposed by extending the IPs with the integration of environmental parameters of the task (Cui et al., 2016). Hence, EaIPs enable inferring movement behavior by conditioning on not only the partner trajectory but also the task and environment related features. However, these are pure datadriven approaches, and thus, they can neither elicit the underlying principles of human interaction movement control, nor provide any means to analyze optimality of observed movements. In addition, our proposed hybrid framework is flexible to integrate such interaction primitives as the data-driven part of the formulation to predict human motions, which can further be integrated into a trajectory optimizer for the robot motion planning in HRI scenarios (Oguz et al., 2017).

Finally, human motor control by the CNS is recognized as a stochastic system (Todorov and Jordan, 2002), thus the variance of the motion should be considered in the trajectory prediction. In Paraschos et al. (2013), a probabilistic movement primitives (ProMPs) approach was proposed with the ability to encode the variance in a general probabilistic framework for representing and learning movement primitives (Schaal et al., 2005). The ProMPs has been successfully implemented in human–robot interaction (Wang et al., 2013) and human–robot collaboration (Maeda et al., 2016a,b) scenarios for controlling the robot motion. For a close cooperation between the robot and human, a precise prediction of the human behavior is essential (Mainprice and Berenson, 2013). However, while predicting human motions with the ProMPs, the integration of the kinematics and dynamics of the human arm is not straightforward. Our work combines an optimal control model with the ProMPs, in order to make use of the advantages from both methods.

### **3. OPTIMALITY CRITERIA FOR HUMAN REACHING MOTIONS**

In this section, we explain the formulation of finding the optimality criteria for human reaching motions in 3D. Many of the influential studies in neuroscience have relied on the hypothesis that the human as a biological entity should minimize a quantitative measure (Engelbrecht, 2001). Based on this, the reaching motion can be formulated as an optimal control problem (OCP), where a given cost function is optimized and used to describe the motion characteristic. Later studies on motor control, learning, and adaptation suggest that instead of a single cost function, a composite of different performance criteria can better explain human behaviors (Berret et al., 2011). In order to identify how these cost functions are combined, an inverse optimal control framework is presented in this section. Through this framework, we attempt to reveal the underlying principles of human reaching behavior while utilizing those models also for predicting human motions.

### **3.1. Model of the Musculoskeletal System**

To formulate the reaching motions as an OCP, a representation of the arm dynamics is required and serves as a constraint during the optimization. A widely used approximation of the arm model in 3D is to consider it as articulated rigid bodies. By ignoring the hand movements, a common model of the musculoskeletal system for the arm consists of four degree-of-freedoms (DoFs) (Van der Helm, 1994a,b), where the shoulder joint has three rotations (roll, yaw, and pitch) and the elbow joint has one rotation (pitch). In our experiments, the recorded 3D reaching motions merely use the roll rotation of the shoulder joint, thus it is neglected in our model. This simplification can highly increase the computational efficiency of the OCP, while still preserving enough accuracy on the results. From the classical Lagrangian formalism (Murray et al., 1994), the dynamics of the 3-DoF arm model can be expressed as

$$
\tau = M(q)\ddot{q} + \mathcal{C}(q, \dot{q})\dot{q} + \mathcal{G}(q), \tag{1}
$$

where the variable *q* = (*q*1*, q*2*, q*3) *<sup>⊤</sup>* denotes the joint angles and *τ* = (*τ*1*, τ*2*, τ*3) *<sup>⊤</sup>* represents the torques. Time derivatives of *q*, i.e., *q*˙ and ¨*q*, are the joint angle velocities and joint angle accelerations, respectively. *M*, *C*, and *G* are the inertia matrix, the Coriolis/centripetal terms, and the gravitational vector, respectively. The viscous frictions and elastic properties of the tissues are neglected as they are difficult to estimate. A visualization of the arm model is presented in **Figures 2B,C**. The upper arm length and the forearm length, as well as the mass, inertia, and distance to the center of mass are defined as described in Lemay and Crago (1996) and Valero-Cuevas et al. (2009). When the arm is in fully stretched out position, *q*1, *q*2, and *q*<sup>3</sup> all have zero rotations.

### **3.2. Inverse Optimal Control as a Bi-Level Optimization Problem**

The purpose of IOC is to identify the formulation of the OCP, specifically the cost function it optimizes, which best reproduce the observations. A numerical method for solving an IOC problem is to reformulate it as a bi-level optimization problem (Berret et al., 2011). This method relies on the assumption that the optimal cost function is a composite of several plausible basic cost functions. The contribution of each basic cost function is defined through a weight vector, and this weight vector is identified by using the bi-level optimization framework presented in equation (2)

$$\begin{aligned} \text{Upper level program}: \quad & \min\_{\mathbf{x}} \Phi(\mathbf{x}\_{\alpha}^{\*}, \mathbf{x}^{obs}), \\ & \text{with} \quad \sum\_{i=1}^{N} \alpha\_{i} = 1, \\ & \Updownarrow \\ \text{Lower level program}: \quad & \min\_{\mathbf{x}, \mathbf{u}} f(\mathbf{x}, \mathbf{u} | \alpha) := \sum\_{i=1}^{N} \alpha\_{i} l\_{i}, \end{aligned} \tag{2}$$

s*.*t*. g*(*x, u*) *≤* 0*, h*(*x, u*) = 0*.*

#### 3.2.1. Lower Level Program

The lower level program of the bi-level optimization is a direct OCP (Bertsekas et al., 1995) given by

$$\min\_{\mathbf{x},\mathbf{u}} J(\mathbf{x},\mathfrak{u}|\mathbf{\alpha}) := \sum\_{i=1}^{N} \alpha\_{i} l\_{i}, \quad \text{s.t.} \quad \mathfrak{g}(\mathbf{x},\mathfrak{u}) \le 0, \quad h(\mathbf{x},\mathfrak{u}) = 0. \tag{3}$$

The goal of OCP is to find the optimal trajectory which minimizes a given cost function *J*. Here, *J* is assumed to be a linear combination of *N* basic cost functions *J<sup>i</sup>* (*i* = 1*. . .N*) which are weighted by the weight vector *α* = (*α*1, *α*2, *. . .*, *αN*). The variables *x* and *u* are the vector of system states and control signals, respectively. With above explained arm model, the system states in this work are given as *x <sup>⊤</sup>* = (*q <sup>⊤</sup>, q*˙ *<sup>⊤</sup>,* ¨*q <sup>⊤</sup>*). Since the joint torques *τ* are smoothly generated by muscle contractions (Uno et al., 1989), the control signals are defined as the time derivative of torques *u* = *τ*˙ . Thus the OCP of reaching motions can be stated mathematically as: *find the admissible system state trajectory x ∗ <sup>α</sup>*(*t*) *and control signal trajectory u ∗ <sup>α</sup>*(*t*) *in time T, which minimize the cost function J with respect to a given weight vector α, while satisfying the system dynamics and the task constraints.* For reaching motions, the task constraints contain two parts: the initial condition *x*(0) = *x*<sup>s</sup> and the final condition *x*(*T*) = *x<sup>e</sup>* as the boundary constraints; limitations on joint angles *q*min *≤ q*(*t*) *≤ q*max as the inequality constraint. The constraints of joint angle velocities and joint angle accelerations are set to a large range since during the preliminary analysis they are identified to be merely active.

One classical method to solve OCP is to first transform it into a non-linear programming (NLP) problem with constraints, then solve it by using structure exploiting numerical NLP solution methods. In our work, we utilize the multiple shooting method (Diehl, 2011) with ACADO toolkit (Houska et al., 2011) to resolve OCPs.

#### 3.2.2. Selection of Basic Cost Functions

The core part of the IOC framework is to select a set of reasonable basic cost functions. For arm movements, several cost functions

#### **TABLE 1** | Cost functions proposed in literature.


*Variables x, y, z are the hand positions in Cartesian space. M denotes the inertia matrix. Corresponding references for the proposed criteria are given as: minimum hand jerk (Flash and Hogan, 1985), minimum joint angle acceleration (Ben-Itzhak and Karniel, 2008), minimum joint angle jerk (Wada et al., 2001), minimum torque change (Uno et al., 1989), minimum torque (Nelson, 1983), minimum absolute work (Nishii and Murakami, 2002; Berret et al., 2008), and minimum geodesic (Biess et al., 2007).*

were proposed in the past. These cost functions can be categorized into subjective and objective cost functions. Subjective cost functions refer to the decision from a subject, such as the minimum hand jerk (Flash and Hogan, 1985), while objective cost functions are task related. Since the integration of objective cost functions into OCP is difficult, only subjective cost functions are considered in this work. In literature, various subjective cost functions are proven to be useful in explaining human reaching motions (see **Table 1**). Generally, these cost functions can be grouped as two classes: (a) *kinematic cost functions*: the minimum hand jerk (Flash and Hogan, 1985), the minimum joint angle acceleration (Ben-Itzhak and Karniel, 2008), and the minimum joint angle jerk (Wada et al., 2001) are typical ones and (b) *dynamic cost functions*: the minimum torque change (Uno et al., 1989), the minimum torque (Nelson, 1983), and the minimum absolute work (Nishii and Murakami, 2002; Berret et al., 2008) (also referred as minimum energy throughout this work) belong to this class; and finally the minimum *geodesic* criterion (Biess et al., 2007) is a junction of kinematic and dynamic cost functions, which yields the shortest path in configuration space while taking the kinetic energy into consideration. An example of the optimal end-effector trajectories solved from OCPs with respect to different basic cost functions is given in **Figure 3**. Among these proposed cost functions, we select five of them as the basic cost functions for IOC, which are the minimum hand jerk *JHJ*, the minimum joint angle jerk *JJJ*, the minimum torque change *JTC*, the minimum energy *JEnr*, and the minimum geodesic *JGeo*. The minimum joint angle acceleration is ignored since it gives quite similar solution to the minimum joint angle jerk, then the identification between these two cost functions is difficult. In addition, the minimum torque criterion is also neglected because in our preliminary tests we found it has the largest error in describing the reaching motions. Thus, the combined cost function *J* for the direct OCP is defined as

$$J = \alpha\_1 l\_{\text{fl}} + \alpha\_2 l\_{\text{fl}} + \alpha\_3 l\_{\text{TC}} + \alpha\_4 l\_{\text{Geo}} + \alpha\_5 l\_{\text{Em}}.\tag{4}$$

One more important issue in combining the basic cost functions, due to the different units, is that the range of the objective values of different cost functions are usually considerably different, thus they cannot directly be equally compared in equation (4). To overcome this problem, we introduce another scalar factor vector *S*, with the purpose to balance the objective values of selected basic cost functions to the same range. Thus, equation (4) is transformed into

$$J = \sum\_{i} S\_{i} \alpha\_{i} l\_{i}, \quad i \in \{H\}, \mkern-1.70, \spacecorner L \rm C, \spacecorner G \rm O \rm o, \spacecorner Enr\}. \tag{5}$$

To obtain the scalar factor vector for a given reaching task, five optimal trajectories *x ∗ <sup>i</sup>* with respect to each basic cost function are first computed by solving the corresponding OCPs. Based on the results, the range of the objective value of each basic cost function can be defined through the minimal and maximal values as *Range<sup>i</sup>* = [*Ji,min*, *Ji,max*]. Since all selected basic cost functions are integral cost terms and always produce positive values during the optimization, the minimal values are zeros for all cost functions *Ji*,min = 0. Then the scalar factor vector can be generated directly by comparing the maximal values *Ji*,max. In our experimental data, we found that the minimum joint angle jerk *JJJ* tends to have the largest maximal objective value, therefore, we set the scalar factor of the minimum joint angle jerk to 1, then the ratios between other basic cost functions and the minimum joint angle jerk are chosen to be the corresponding scalar factors

$$\mathcal{S}\_i = \frac{J\_{i,\text{max}}}{J\_{\text{II,max}}}.\tag{6}$$

Note that the scalar factor vector varies when at least either the initial condition *x<sup>s</sup>* or the final condition *x<sup>e</sup>* changes. Thus before running the IOC for each given observation, the scalar factor vector needs to be determined in order to ensure the accuracy of the result.

#### 3.2.3. Upper Level Program

The purpose of the upper level program is to find the optimal weight vector *α*\* which minimizes the distance error between the optimal trajectory *x ∗ <sup>α</sup>* obtained from the lower level program and the observation *x obs*. This optimization problem can be represented as

$$\min\_{\alpha} \Phi \left( \mathfrak{x}\_{\alpha}^{\*}, \mathfrak{x}^{\text{obs}} \right), \quad \text{with} \quad \sum\_{i=1}^{N} \alpha\_{i} = 1,\tag{7}$$

where Φ is a metric which measures the distance error.

Selecting a good metric Φ is crucial in the bi-level optimization framework since it highly affects the decision on the optimal weight vector. The recorded observations are usually the position trajectories in Cartesian space represented by *x*, *y*, *z* coordinates. These observations cannot be directly compared by Φ because, on the one hand, the system states *x* are defined as joint angles, on the other hand, the position trajectories usually contain uncertainties, which come from: (1) the error from the torso movement and (2) the difference between the subject's actual arm length and the defined musculoskeletal system's arm length. No consistent results can be derived if a direct comparison to the position trajectories is implemented in Φ.

To address this problem, we transform the recorded position trajectories to the *relative position trajectories in arm model coordinate system* through the following steps:

1. Record the Cartesian position trajectories of the shoulder joint *t<sup>s</sup>* = (*ts,x*, *ts,y*, *ts,z*), the elbow joint *t<sup>e</sup>* = (*te,x*, *te,y*, *te,z*), and the wrist joint *t<sup>w</sup>* = (*tw,x*, *tw,y*, *tw,z*).

clear. Only exception is the similarity of the predicted trajectories by minimum joint angle acceleration and the minimum joint angle jerk as they overlap in the figure.


The relative end-effector trajectory *t obs* eliminates the error caused by different arm lengths and the torso movements, thus it can be compared to the solution calculated from the lower level program.

Based on the feature compared in Φ, two different types of the distance metric can be formulated: one is *the joint angle metric*, where the observed joint angle trajectory *q obs* is compared to the optimal system states trajectory *x ∗ <sup>α</sup>*, which also contains the joint angle trajectory *q ∗ α* ; another is *the end-effector trajectory metric*, where at first the optimal end-effector trajectory *t ∗ <sup>α</sup>* is computed from the optimal joint angle trajectory *q ∗ α* by using the same forward kinematics function *δ*, then the distance error is calculated between the relative end-effector observation *t obs* and *t ∗ α*.

In our preliminary tests, we found that the end-effector trajectory metric has a better performance than the joint angle metric. Possible reason is that the three joint angles actually have different degrees of influence on the reaching motions (Nguyen and Dingwell, 2012). However, it is not straightforward to determine the contribution of different joint angles, which could introduce further uncertainties and errors. Similar problem also occurs when combining the joint angle metric and the end-effector metric, since they have different units and it is difficult to balance them into the same range. Therefore, in our work, the distance metric of the upper level program only considers the end-effector trajectories, which can be treated as comparing two 3-dimensional signals. The dynamic time warping (DTW) algorithm (Vintsyuk, 1968) is implemented to calculate the distance error. In time series analysis, DTW is used for measuring the similarity between two temporal sequences which may vary in speed. The sequences are first warped in the time dimension and then compared to each other. With this, equation (7) can be stated as

$$\min\_{\mathbf{x}} \Phi \left( \mathbf{x}\_a^\*, \mathbf{x}^{obs} \right) := \min\_{\mathbf{x}} D \left( t\_{\mathbf{x}}^\*, t^{obs} \right), \tag{8}$$

where *D* denotes the DTW calculation.

To solve equation (8), common gradient-based methods and stochastic optimization algorithms are not applicable because of two reasons. First, the metric Φ is non-differentiable with respect to the weight vector *α*; second, before each calculation of Φ, a direct OCP must be solved in advance, thus it usually takes a few minutes for one evaluation. Specifically, the stochastic optimization algorithms (e.g., particle swarm optimization (Eberhart and Kennedy, 1995)) are also not suitable here, since they require more samples which will result in infeasible computation time. Hence, the upper level program is optimized by a robust derivative-free optimization (DFO) method. Here, we use the method called CONDOR (Berghen and Bersini, 2005) for COnstrained, Nonlinear, Direct, parallel optimization, which is a parallel extension of the Powell's method (Powell, 2004) based on the trust region algorithm (Sorensen, 1982). Through a local approximation of Φ, it can find the optimal solution more efficiently than the common pattern search and stochastic optimization techniques. To reduce the computation time, the initial value of *α* should be set properly before the optimization. Since among the five elements of *α* only four are actually independent, and OCPs corresponding to the costs *J*(*α*) and *J*(*λα*), *λ >* 0 are identical, a practical strategy is to fix one element to one and then adjust the remaining components with respect to it (Mombaur et al., 2010). As all the basic cost functions are scaled into the same range, the value of other components can be restricted to stay in [0,1]. During the optimization process, if any element is found larger than one, the optimization should be restarted with setting this element to one. In our experimental data, setting the weight of joint angle jerk to one gives the best results in most cases. After around 100 iterations, the algorithm converges to a local minimum. Note that due to the high nonlinearity of the problem formulation, the global minimum is not available in the bi-level optimization (Albrecht et al., 2012). In order to get more accurate results while keeping a reasonable computation time, we set the initial value of *α* to (0.5, 1, 0.5, 0.5, 0.5) and solve it three times with different initial search radii (Powell, 2004) as 0.15, 0.3, and 0.45, so that most range is covered within three IOC calculations. The one results in the minimum distance error is considered as the final optimal weight vector *α\** and is normalized for later analysis.

### **3.3. Representation of the Reaching Motions**

From the IOC formulation, we acquire a weighted combination of cost functions, which specifies the contribution of each model for the realization of a reaching motion. For each specific motion behavior, one composite model needs to be found. However, we can only have a limited number of different composite models, due to the computational time limit. To utilize the composite model in general cases, a mapping from the motion parameters to the contribution of cost functions is required. According to the results of the initial experiment we conducted, which is detailed out in Section 5.1.3, a correlation between the initial and the final joint angles (*qs*, *qe*) and the optimal weight vector *α*\* is identified. Here, we use the Gaussian Process Regression (GPR) model (Rasmussen and Williams, 2005) to represent the mapping as

$$
\boldsymbol{\alpha}^\* = \text{GPR}\left(\boldsymbol{q}\_s, \boldsymbol{q}\_e\right), \tag{9}
$$

where *GPR* denotes the GPR model. The optimal weight vector returned by the GPR model is a distribution with mean and variance. Note that the GPR model can be replaced by other similar stochastic models, but we find that the GPR model is more suitable in our case since it requires less data. This GPR model provides a connection between the IOC formulation and the ProMPs in our hybrid online prediction framework.

### **4. HYBRID ONLINE PREDICTION FRAMEWORK**

In literature, many prediction methods for human motion were proposed. Among them, two classes of the methods are widely used: (1) *model-based methods*, where a motion model is created based on minimizing a criterion, such as the minimum hand jerk model (Flash and Hogan, 1985), the minimum joint angle jerk model (Wada et al., 2001), and the minimum variance model (Harris and Wolpert, 1998). Then the solution to the model is considered as the prediction; (2) *data-driven methods*, where a set of data (observations) should be available before building a generative model for predicting human motions. The characteristic of the motion can be learned from the data and then the prediction is generated by reproducing this characteristic and in some cases with variation. Gaussian Mixture Models (McLachlan and Basford, 1988; Calinon et al., 2010), dynamic movement primitives (Ijspeert et al., 2013), and probabilistic movement primitives (Paraschos et al., 2013) are typical data-driven methods. In this section, we propose a hybrid online prediction framework for reaching motions by combining a model-based method and a data-driven method. Instead of using the motion model with single cost function, a composite model is obtained by the IOC framework. In order to deal with the motor variability of the reaching motion (Todorov and Jordan, 2002), this composite model is combined with the ProMPs. ProMPs are selected due to both their capability on learning a model with a very small amount of observations (in our experiments 5–10 samples seem to be enough), and also their computational efficiency for rolling-out predictive trajectories online. Especially, it is known that GMMs tend to perform poorly in high-dimensional spaces when few data points are available (Calinon, 2016). In the rest of this section, first a brief explanation of the ProMPs is presented, then a comparison between the predictions of the composite model and the ProMPs is discussed. Finally, the hybrid prediction framework is explained in detail.

### **4.1. Probabilistic Movement Primitives**

The ProMPs is a probabilistic formulation for movement primitives. It is able to capture the variance information of trajectories and represent the behavior in stochastic systems. Given a discrete trajectory *X* = {*xt*}, *t* = 0*. . .T* defined by states *x<sup>t</sup>* over time *T*, a weight vector *ω* is used to represent the trajectory as

$$\mathbf{y}\_t = \begin{bmatrix} \boldsymbol{\pi}\_t, \dot{\boldsymbol{\pi}}\_t \end{bmatrix}^\top = \boldsymbol{\Phi}\_t^\top \boldsymbol{\omega} + \boldsymbol{\mathfrak{e}}\_\mathcal{V}, \tag{10}$$

where Φ*<sup>t</sup>* = [*ϕt, ϕ*˙ *<sup>t</sup>*] denotes the *n ×* 2 dimensional timedependent basis matrix for states *x<sup>t</sup>* and the velocities *x*˙*t*. *n* is the number of basis functions and *ϵ<sup>y</sup> ∼ N* (0*,* Σ*y*) is zeromean independent and identically distributed Gaussian noise. The mean of the trajectory can be obtained by weighting Φ*<sup>t</sup>* with *ω*. The probability of observing a trajectory *X* with a given *ω* is represented by a linear basis function model as

$$p(\mathbf{X}|\boldsymbol{\omega}) = \prod\_{t} \mathcal{N}\left(\mathbf{y}\_{t}|\boldsymbol{\Phi}\_{t}^{\top}\boldsymbol{\omega}, \boldsymbol{\Sigma}\_{\mathbf{y}}\right). \tag{11}$$

In order to capture the variance, a Gaussian distribution *p* (*ω*; *θ*) = *N* (*ω|µ<sup>ω</sup>,***Σ***ω*) over the weight vector *ω* is introduced with parameters *θ* = {*µω*, **Σ***ω*}. Then the distribution of *y<sup>t</sup>* at time *t* is given by

$$p(\boldsymbol{y}\_{i};\boldsymbol{\theta}) = \int \mathcal{N}\left(\boldsymbol{y}\_{i}|\boldsymbol{\Phi}\_{t}^{\top}\boldsymbol{\omega}, \boldsymbol{\Sigma}\_{\boldsymbol{\mathcal{Y}}}\right) \mathcal{N}\left(\boldsymbol{\omega}|\boldsymbol{\mu}\_{\boldsymbol{\omega}}, \boldsymbol{\Sigma}\_{\boldsymbol{\omega}}\right) d\boldsymbol{\omega}$$

$$= \mathcal{N}\left(\boldsymbol{y}\_{i}|\boldsymbol{\Phi}\_{t}^{\top}\boldsymbol{\mu}\_{\boldsymbol{\omega}}, \boldsymbol{\Phi}\_{t}^{\top}\boldsymbol{\Sigma}\_{\boldsymbol{\omega}}\boldsymbol{\Phi}\_{t} + \boldsymbol{\Sigma}\_{\boldsymbol{\mathcal{Y}}}\right). \tag{12}$$

With equation (12), the mean and the variance of the states for any time point *t* can be derived. If a set of observations is available, the parameters *θ* can be learned by using the maximum likelihood estimation (Lazaric and Ghavamzadeh, 2010). In reaching motions, the distribution *p*(*ω*; *θ*) can be considered as a representation of the motor variability. For more details of the ProMPs please refer to Paraschos et al. (2013).

**TABLE 2** | Different perspectives of the composite model prediction and the ProMPs prediction.


### **4.2. Comparison Between the Composite Model Prediction and the ProMPs Prediction**

Both the composite model formulation and the ProMPs framework have clear advantages and drawbacks, but they are also complementary. By combining them into a hybrid prediction framework, the advantages of both methods can be exploited at the same time (**Table 2**).

The composite model represents the underlying principles of reaching motion control. Several motion models have been proven to be accurate in describing the movements, such as the minimum hand jerk model on some tasks, and the minimum torque change model on others, in 2D reaching motions. The composite model we proposed inherits those capabilities and extends it to the 3D reaching motions. It helps us explain how humans execute and control their reaching motions, while extracting such information from the data-driven methods is not trivial. However, the biggest obstacle in implementing the composite model prediction in online case is the computation time. Before rolling out the optimal trajectory, an OCP needs to be solved, which usually takes several minutes, even when the state-of-theart solvers are used (Diehl, 2011). However, in real world settings, the reaching motions take no longer than a few seconds, thus the data-driven methods are more suitable in the online case, as they are computationally more efficient.

Another important reason of using the ProMPs as the datadriven method in the hybrid prediction framework is that it allows describing the motor variability given sample demonstrations (Paraschos et al., 2013). As explained in Todorov and Jordan (2002), human motor control is a stochastic system with signal-dependent noise (Harris and Wolpert, 1998), thus reaching motions are expected to show variance. Since it is not straightforward to consider the variance within an IOC problem, we formulate our composite model as a deterministic OCP. On the other hand, as the ProMPs formulation employs a probabilistic function to represent the motion, the obtained model is not a single trajectory but a distribution of trajectories. Hence, while the composite model describes an optimal average behavior as an initial guess, the ProMPs enables capturing the multiplicative noise due to motor control. However, to understand the control model due to such noise, the model-based IOC computation and a follow-up GPR update is still required.

### **4.3. Prediction Framework**

The idea of the hybrid prediction framework is, for a given reaching task, to use the composite model to generate the initial training data for the ProMPs. Then in the online phase, the ProMPs can rollout predicted trajectories with high efficiency while also

**FIGURE 4** | Overview of the prediction framework (upper right in **Figure 1**). *q*<sup>s</sup> and *q*<sup>e</sup> are the initial and final joint angle configurations. *α*\* is the estimated optimal weight vector and *t ∗ <sup>α</sup>* is the corresponding optimal solution from OCP. *t<sup>m</sup>* denotes the mean of the converged trajectory distribution extracted from the ProMPs, *α<sup>n</sup>* is the new obtained optimal weight vector, which is used to update the GPR model.

learning the variance by using each motion observation as new training data. After several observations, the parameters of the ProMPs converge (the details is explained in Section 5.2.2), then the mean of the converged trajectory distribution is calculated to update the composite model. An overview of the framework is given in **Figure 4**, and the details of this hybrid model are explained next.

### 4.3.1. Initialization With the Composite Model

Usually for a given reaching task, the starting position and the target position are known. Through the inverse kinematics, the initial joint angle configuration *q*<sup>s</sup> and the final joint angle configuration *q*<sup>e</sup> can be approximated. By using the GPR model trained on the IOC results, a distribution of the estimated optimal weight vector is available. However, due to the limited amount of data for training the GPR model, the variance cannot be learned accurately. Thus, only the mean value of the distribution *α\** is used here. After solving the OCP with respect to *α\**, the optimal joint angle trajectory *q ∗ α* and its corresponding end-effector trajectory *t ∗ <sup>α</sup>* are obtained. *t ∗ <sup>α</sup>* is considered as the training data for the ProMPs. As the OCP gives a deterministic solution, no variance information can be derived. Hence the ProMPs is initialized by learning the parameters from the optimal trajectory *t ∗ <sup>α</sup>*, while setting the variance to a large value.

### 4.3.2. Predicting While Learning

During online prediction, a trajectory along with the variance for each time point is generated by the ProMPs. This variance information is useful for human–robot interaction scenarios where the robot should also consider the uncertainties of human behaviors. The observations recorded during the prediction are utilized to update the ProMPs to get a more accurate representation of the variance. After each movement, the observation is added to the data storage which contains all the previous observations. Subsequently, the ProMPs update their parameters from the new data storage. With the incorporation of each motion observation, parameters of the ProMPs as well as the variance information converge.

#### 4.3.3. GPR Model Updating

Once the ProMPs system becomes stable, the mean of the converged trajectory distribution *t<sup>m</sup>* can be extracted. This trajectory can be considered as the average behavior of the recorded subject for this reaching task. Then in a separate updating process, *t<sup>m</sup>* is used by the IOC framework to get the corresponding optimal weight vector. The new optimal weight vector *α<sup>n</sup>* is used to update the GPR model. Therefore, with more information returned from the real recordings, the GPR model also becomes more accurate in describing the mapping from the initial and final joint angles to the optimal weight vector.

### **4.4. Motor Variability and Interpersonal Variance**

The motor variability is essential in describing human behaviors (Todorov and Jordan, 2002), as it can be considered as the uncertainties of human motions (e.g., the noise in motor command). It represents the fact that for a given reaching task, even the same subject is expected to execute the motion in slightly different trajectories. This phenomenon has been reported in sensorimotor control by demonstrating such variability on observed experimental data for a multitude of tasks, e.g., locomotion (Winter, 1989), writing (Bernstein, 1967), pointing (Tseng et al., 2002), reaching (Haggard et al., 1995), and grasping (Cole and Abbs, 1986). Usually for simple tasks, this difference is not large and can be modeled as a probabilistic distribution (Knill and Pouget, 2004; Koppula and Saxena, 2013). However, such probabilistic models cannot explain the underlying cause of observing such motor variability, which is known to be due to additive and multiplicative noise in the motor control and is treated as the intra-subject variance in this work. Apart from the motor variability, there are also motion behavior differences between subjects (Vu et al., 2016a), which we call interpersonal variance in this work. The existence of such a disparity can be verified through the contribution of basic cost functions, as shown in the next section. The interpersonal variance suggests that humans plan their motions in a personal way, which reflects the dissimilarity of the control structure due to learning and adaptation effects, along with biomechanical differences. Thus, the updated GPR model from the hybrid prediction framework is actually a person-specific model.

### **5. EXPERIMENTS AND RESULTS**

In this section, two experiments and their corresponding results are presented. One is designed for the IOC framework with the purpose of understanding the characteristics of human reaching motions, and the other is used to test the performance of the hybrid online prediction framework.

### **5.1. Experiment for the IOC Framework**

To cover the reaching motions in a relatively large range, we designed an experiment for point-to-point reaching tasks consisting of 12 starting postures and 9 target regions. The recorded trajectories were analyzed through the IOC framework. Based on the obtained optimal weight vectors, we find that the contribution of basic cost functions has a relationship with the initial and final joint angle configurations. Besides, the composite cost function is proven to have less error in describing the reaching motions in almost all tasks compared to the single cost models. This result encourages us to use the composite model in the prediction rather than a model with single cost function. In the rest of this subsection, at first the details about the experimental setup are presented, then the results from the IOC framework are discussed.

### 5.1.1. Experimental Setup and Data Collection

A visualization of the experimental setup is presented in **Figure 2A**. Participants were required to sit before a board which was placed vertical to the ground surface. Nine target areas and one reference point were marked on the board as square regions with the side length equal to 5 cm. The distances between the target areas and the reference point are shown in **Figure 5B**. Before the experiment, the sitting height of the participant was adjusted by setting a straight line between the reference point and the center of the shoulder joint vertical to the board surface. Then the distance between the center of the shoulder joint and the board surface was selected as 80% of the arm length. These distances were chosen to ensure that the participants can reach all nine targets easily without moving their torso.

Since we want to cover a large range of reaching motions, every participant was asked to reach the nine targets from 12 different starting arm postures. According to the joint angle limits we defined in the arm model, these starting postures were chosen from the combination of three different *q*1, two different *q*2, and two different *q*<sup>3</sup> (3 *×* 2 *×* 2 = 12) configurations (see **Table 3**). As shown in **Figure 5A**, the pitch rotation of the shoulder joint *q*<sup>1</sup> is selected as three configurations: up, middle, and down, respectively. The yaw rotation of the shoulder joint *q*<sup>2</sup> and the rotation of the elbow joint *q*<sup>3</sup> are chosen from the stretched to the side configurations and a configuration in the middle of the joint angle limits. With nine targets for each starting posture, 108 (12 starting postures *×* 9 targets) cases of the reaching motions were considered in the experiment.

Before the recording, the arm posture was determined by measuring all three joint angles to ensure all participants shared the same starting joint angle configuration. The participants were given the following instructions. First, in order to discard the decision-making process of target selection, the subject needs to reach the nine targets in a fixed order as from target one to target nine. Second, the participant should strictly put his arm in the previously set starting posture before executing the followup reaching task. A set of special reference tools were prepared and put beside the participants. These tools consist of two bars and their end points indicate the positions of the elbow and wrist joints for the given starting posture. Reference tools were placed in appropriate positions so that during the reaching motion they do not block any potential motion trajectory. Third, in order to eliminate the effect of locating targets during the movement, before the execution of the reaching tasks, the participants should

**FIGURE 5** | Experimental setup. **(A)** Visualization of 12 starting joint angle configurations. P1 to P4 are the postures with *q*<sup>1</sup> in the middle (no rotation), while P5 to P8 are the postures with *q*<sup>1</sup> in the up region and P9 to P12 with *q*<sup>1</sup> in the down region. **(B)** Target areas on the board surface. RP denotes the reference point. Observations are the actual positions where the 108 averaged trajectories terminate on the board surface.



*P1 to P12 are the 12 predefined starting postures. q1,S, q2,S, and q3,S are the three starting joint angles with respect to the stretched out posture. The values are computed by using all 15 subjects' data.*

look at the targets rather than the reference tool. Fourth, the participants were told to avoid using the roll rotation of the shoulder joint, which is ignored in our arm model. In addition, all participants were trained before the experiments to get familiar with the setup and the task. If any unintended motion was detected during the recording, corresponding tasks were executed again. Between each starting posture, enough rest time was provided for avoiding fatigue. To reduce the noise, every target in every starting posture was reached two times, thus a total of 216 (108 cases *×* 2 times) trajectories were recorded for one participant.

The data were collected from fifteen subjects (11 males, age: 27 *±* 4; weight: 67 *±* 9 kg, height: 172 *±* 5 cm) who all gave written informed consent for their participation. All the participants were right-handed with normal vision ability. None of them received any information about the purpose of the experiment. The study was approved by the ethics committee of the Technical University of Munich School of Medicine. The reaching motions were recorded by the multicamera motion capture system Qualisys at a frequency of 250 Hz. With the built-in filter function, the smooth position trajectories of the shoulder, elbow, and wrist joints can be directly obtained from the tracking system and used for the IOC calculations.

### 5.1.2. Average Motion Behavior

In our IOC framework, we are interested in the control structures for the human reaching motion behavior in a general sense, rather than the individual differences. We also intend to provide a base model to be extended for person-specific motion behaviors during prediction. Hence, we compute the average trajectories from all 15 subjects, and the IOC problems are solved for these trajectories. Besides, the averaging process also saves a lot of computation time. Since the IOC calculation for one trajectory roughly takes 4 h, the analysis on all 1,620 (15 subjects *×* 108 cases) trajectories would require an immense amount of time. **Table 3** gives the mean values and the SDs of 12 starting joint angles calculated from all subjects' data. The SDs indicate that for the same starting posture, all subjects started their reaching motions with a relatively small joint angle difference, which enables the feasibility of averaging the trajectories. If not mentioned explicitly, all the IOC results presented in the following part are based on the averaged trajectories.

#### 5.1.3. Results for the IOC Framework

After the IOC calculations, we obtained one optimal weight vector for each reaching task. The contribution of basic cost functions in 108 different cases are analyzed next.

#### *5.1.3.1. Performance in Describing the Reaching Motions*

To verify the performance of the composite model, the optimal trajectory solved with it is compared to the optimal trajectories computed for each single basic cost function. The distance error between each optimal trajectory and the average motion behavior is measured through the DTW-based comparison separately. The results show that, almost for all cases, the composite model has a better performance in describing the reaching motions. Even though the distance metric we used in the upper level program of the IOC framework only considers the end-effector trajectory, the composite model still has less errors in the joint angle trajectories. **Figure 6** presents the distance error averaged from all 108 cases. The p-test results indicate that, there are significant decreases on the distance error when comparing the composite model to all other five basic cost functions (p*<sup>i</sup> <* 0.0001, *i* = 1, *. . .* , 5). In joint angle trajectories, except the minimum joint angle jerk cost function (p = 0.1813), we still observe significant decreases (p *<* 0.0001). The reason is, in 3D reaching motions, the observed joint angle trajectories are bell-shaped, which are quite close to the results derived from the minimum joint angle jerk cost function, especially when the reaching motion enforces approaching the joint angle limits (e.g., reaching target one). After we removed the cases of reaching target one in the comparison, there is still a significant decrease (p *<* 0.05), now for all the cases, on the distance error in describing the joint angle trajectories with the composite model. Furthermore, it should be noted that, optimizing only dynamics related cost functions leads to inconsistent arm trajectories in terms of joint and Cartesian displacements (a single case is shown in **Figure 3**). By contrast, even though maximizing smoothness in joint space (angel jerk, i.e., kinematic cost) was efficient to fit the angular and Cartesian displacements, it is reported by Vu et al. (2016b) that it fails to describe the movement in torque space accurately. It appears that the composite optimality criterion comprising different biomechanical properties is the only model that can explain both kinematic and dynamic aspects of the reaching behaviors.

#### *5.1.3.2. Influence of the Initial and Final Conditions*

In order to get a deeper understanding of the human reaching motions, an analysis on identifying the possible factors which influence the contribution of basic cost functions is performed. We conduct the *N*-way independent analysis of variance (ANOVA) on our results with four factors, the three starting joint angles *q*1,*s*, *q*2,*s*, *q*3,*<sup>s</sup>* and the target index *T*. As ANOVA checks the importance of one or more factors by comparing the response variable means at different factor levels, the results obtained can be utilized to identify the factors which have statistical significant influence on the examined variable. In **Table 4**, we list the corresponding results from our ANOVA analysis when selecting the response variable as the contribution of five different basic cost functions as well as the sum of dynamics related cost functions (the minimum torque change + the minimum energy), respectively.

one.

measured by comparing the end-effector trajectories. **(B,C)** Distance error measured by comparing the joint angle trajectories with and without considering target


*RV denotes the response variable, selected as the contribution of each basic cost function (HJ: hand jerk, JJ: joint angle jerk, TC: torque change, Geo: geodesic, Enr: energy) and the dynamic related cost functions (Dyn: dynamics, which is the sum of the minimum torque change and the minimum energy). Four variables are considered as the factors, which are the three starting joint angles q1,S, q2,S, q3,S and the target index T. RV:factor indicates the ANOVA result of the influence of the factor on the corresponding response variable (e.g., HJ:q1,S means the influence of q1,S on the contribution of the minimum hand jerk cost function). Sum.Sq. and Mean.Sq. are the sum of squares due to each source and the mean squares for each source, respectively. F is the F-statistic, which is the ratio of the mean squares. p is the p-values, which represents the probability that the F-statistic can take a value larger than a computed test-statistic value. Other ANOVA results (e.g., the degree of freedom) are not listed here.*

From ANOVA analysis, it can be concluded that the starting joint angles of the two shoulder rotations have influences on the contributions of the cost functions: *q*1,*<sup>s</sup>* has influence on the contribution of the hand jerk (F(2,58) = 19.5487, p *<* 0.0001), the joint angle jerk (F(2,58) = 10.7701, p *<* 0.001), the torque change (F(2,58) = 12.7500, p *<* 0.0001), the energy (F(2,58) = 7.7557, p *<* 0.001), and the dynamics (F(2,58) = 19.3833, p *<* 0.0001); while *q*2,*<sup>s</sup>* has influence on the hand jerk (F(1,58) = 7.6063, p *<* 0.01), the energy (F(1,58) = 8.5667, p *<* 0.01), and the dynamics (F(1,58) = 13.3516, p *<* 0.001). For the target position, only the dynamics is affected (F(8,58) = 2.4267, p *<* 0.05). Finally, the starting joint angle of the elbow rotation *q*3,*<sup>s</sup>* has no influence on the contribution of basic cost functions (all p *>* 0.05).

In order to identify how the target position, which can be expressed by the three final joint angles *q*1,*E*, *q*2,*E*, *q*3,*E*, affects the contribution of the dynamics, an individual analysis is conducted on the trajectories of each subject with one starting posture (fully stretched out posture P1) and six targets (top row: T1, T4, and T6, bottom row: T3, T6, and T9). Thus 90 (15 subjects *×* 6 trajectories) IOC calculations are performed. Then p-test is utilized to find if there is a significant difference between different final joint angles. The results suggest that only *q*1,*<sup>E</sup>* has influence on the contribution of the dynamics related cost, which indicates that only the height of the targets matters. This can be verified in **Figure 7**, where we compare the contributions of the dynamics related cost between two sets of targets (top vs bottom row). From these results, the interpersonal variance can also be observed, where the changes are different for each subject, and sometimes this difference can be considerably large.

#### *5.1.3.3. Transition Between Different Reaching Tasks*

According to the previous results, three factors are identified to be related to the contribution of basic cost functions, which are the two starting joint angles of the shoulder joint *q*1,*S*, *q*2,*<sup>S</sup>* and the change of the pitch rotation of the shoulder joint *q*1,*Change* = *q*1,*<sup>E</sup> − q*1,*S*. In order to identify how exactly these factors affect the contribution, two 3D scatter plots are given in **Figures 8A,B**. Considering the musculoskeletal loading as the criterion to describe the comfortableness of the reaching motions (Kee and Lee, 2012; Zenk et al., 2012), the fully stretched down posture can be treated as the most comfortable posture. Then the more rotations required to execute the reaching tasks from the fully stretched down posture, the more uncomfortable the motion is. It can be observed that, for comfortable reaching motions (leftdown region of the figures), the dynamics related cost function has less contribution while the kinematics has higher, compared to the uncomfortable reaching tasks (right-up region of the figures), where the opposite trend is observed. Based on this, we propose a *discomfort metric* by combining the three factors along with their corresponding joint angle limits as

$$D \text{is} = \left(\frac{90 - q\_{1, \text{S}}}{180}\right) + \beta\_1 \frac{q\_{2, \text{S}}}{180} + \beta\_2 \left(\frac{q\_{1, \text{Charge}}}{180}\right),\tag{13}$$

where *Dis* denotes the discomfort value calculated by a linear combination of the three factors by using the weights *β*<sup>1</sup> and *β*2. Then for a given pair of weights (*β*1, *β*2), a set of discomfort values can be derived for all 108 reaching tasks *Dis<sup>i</sup>* (*i* = 1*. . .*108). Each discomfort value has its corresponding contribution value of the dynamics related cost function *C<sup>i</sup>* (*i* = 1*. . .*108), hence a simple linear least square regression model can be created from the data set (*Disi*, *Ci*) (*i* = 1*. . .*108) as

$$\mathbf{y} = \theta\_1 + \theta\_2 \mathbf{x}.\tag{14}$$

By changing the weights, different linear regression models *y<sup>β</sup>*1,*β*<sup>2</sup> are obtained. The coefficient of determination (Ross, 2014) *R* 2 for each model is given by

$$R\_{\beta\_1, \beta\_2}^2 = 1 - \frac{\sum\_{i=1}^{108} \left(\mathbf{C}\_i - \mathbf{y}\_{i, \beta\_1, \beta\_2}\right)^2}{\sum\_{i=1}^{108} \left(\mathbf{C}\_i - \bar{\mathbf{C}}\right)^2},\tag{15}$$

where *C<sup>i</sup>* is the actual contribution value, *yiβ*1,*β*<sup>2</sup> represents the calculated contribution value from the linear regression model *y<sup>β</sup>*1,*β*2, *C*¯ is the mean value of *C*. *R* <sup>2</sup> measures of how well a model can represent the data, and falls between 0 and 1. The higher the value of *R* 2 , the better the model is at predicting the data. Therefore, the optimal pair of the weights is derived by maximizing *R* 2

$$(\beta\_1^\*, \beta\_2^\*) = \max\_{\beta\_1, \beta\_2} \mathcal{R}^2\_{\beta\_1, \beta\_2}. \tag{16}$$

the dynamics to the target three from to the target one).

Solving equation (16) with respect to the contribution of the dynamics yields the optimal weights as *β ∗* <sup>1</sup> = 0*.*8150 and *β ∗* <sup>2</sup> = *−*0*.*4477. By using the discomfort values derived with this optimal weights, the contribution of the kinematics related cost function can also be explained. Corresponding results are presented in **Figures 8C,D**.

Since human motor control is considered as a stochastic system and we do not know exactly how these factors are combined (e.g., linear or non-linear), the discomfort metric presented here is a proof-of-concept of the transition between different reaching tasks. Due to the absence of the description of the variance, the results contain noise, but the trade-off between the dynamics and the kinematics is still observable. This finding supports the idea to use a GPR model to describe the mapping from the initial and final joint angle configurations to the optimal weight vector.

### **5.2. Experiment for Hybrid Prediction Framework**

In this subsection, an experiment designed to test the performance of the proposed hybrid online prediction framework is presented. The experiment is based on a simple pick-and-place task with one picking position and four targets. The accuracy of the ProMPs predictions as well as the updating process is analyzed here.

### 5.2.1. Experimental Setup and Data Collection

As shown in **Figure 9**, the experiment is designed as a pick-andplace task with LEGO bricks. The picking position is fixed during the experiment, and four placing regions with different heights are selected as targets. Each region consists of four possible positions as four corners of a square for placing the bricks. Experiment includes 16 pick-and-place movements (4 targets *×* 4 times) per subject. Every subject is required to repeat the whole experiment ten times, thus in total 160 trajectories, 40 for each target, are recorded for one subject. We collected the data from five subjects and performed the analysis on those 800 trajectories. We neglect the hand and finger movements and only predict the position of the wrist joint.

### 5.2.2. Results of the Hybrid Prediction Framework

Here, we present the corresponding results from the prediction experiment. First, the prediction accuracy of ProMPs is tested by looking into the distance error between the prediction and the observation. Then, the updating process for the GPR model is analyzed both to provide the evidence on the interpersonal variance, and also to demonstrate the ability of our hybrid prediction framework in describing this variance.

#### *5.2.2.1. Performance of the Predictions by ProMPs*

We conduct an offline analysis to investigate the performance of the ProMPs-based predictions more in depth. After initialization, the ProMPs are utilized to generate predictions for the observations. For each observation, we use the first 30% of the observed points to rollout the prediction, and the distance error between the prediction and the observation is measured through DTW. After each prediction, the observation is used to update the ProMPs in order to learn the variance as well. For the next observation, the updated ProMPs is then used, and this updating process keeps running until the last observation.

The distance errors for each subject and each target are presented in **Figures 10A–D**. The distance error is calculated between the prediction and the observation. Note that, this comparison is performed in Cartesian space, while during the initialization of the ProMPs, the trajectory generated from the composite model is a relative end-effector trajectory in arm model coordinate system (see Section 3.2.3). Since the relative end-effector trajectory ignores the shoulder translations and the torso movements, which are not avoidable in real reaching motions, and the model's arm length is usually different than the actual arm length of the subject,

**FIGURE 8** | Transition between different cases. **(A,B)** Contribution of the dynamics (left) and the kinematics (right) related costs with respect to three factors. *q*1*,S*, *q*2*,S* are the two starting joint angles of the shoulder rotations, *q*1*,Change* is the change between the final and the initial angle of the pitch rotation of the shoulder joint. The colors indicate the contribution ratio of corresponding cost. **(C,D)** Relationship between the proposed discomfort metric and the contribution of the dynamics and the kinematics related costs. Red lines are the linear regression models created based on the discomfort value with respect to the optimal weights (*β ∗* 1 *, β∗* 2 ). Another least-squares ellipse fitting is also presented to demonstrate the trend with variance.

the first prediction has large error. However, this initial error diminishes by later updates, and after several updates (around 5), the distance error becomes stable with a small value (around 2*–* 4 cm for trajectory distance error averaged over the data points). In the end, as shown in **Figure 10F**, the predictions get closer to the observations for each subject.

During the prediction process, the variance is also learned by updating the ProMPs. We initialized the variance to a large value, and observe that after several updates the ProMPs converges to a stable distribution. **Figure 10E** shows the Kullback–Leibler (KL) divergence of comparing the updated ProMPs distribution with the previous one for target one. The results indicate that after around 10 iterations the distribution converges for each subject. An example of the learned distribution, which is defined by the mean values and the corresponding variances for each point in all dimensions, is presented in **Figures 10G–I**. Hence, the motor variability is captured by person-specific distribution in the ProMPs. Subsequently, the mean trajectory from the distribution is treated as the average behavior of that specific subject for the corresponding reaching task.

### *5.2.2.2. Updating the GPR Model*

Due to the limited amount of available training data, the mapping represented by the GPR model is not accurate enough. Besides, because of the interpersonal variance, the error between the estimated weight vector and the actual one can be large in some cases. Thus, we need to update the GPR model through a separate updating process. To do this, we first extract the mean trajectory from the converged ProMPs learned from 40 observations, and then apply the same IOC calculation on this trajectory to get a new weight vector. This new weight vector is used to update the GPR model. Note that, since we also want to model the interpersonal variance, the GPR model is updated separately with respect to each subjects' behavior. A comparison of the distance error between the observation and the optimal trajectories solved with the previous weight vector and the new weight vector is presented in **Figures 11A–D**. As we only want to look into the distance error caused by the weight vector, the trajectories compared here are the relative end-effector trajectories, which have less error due to ignoring the shoulder translations and the torso movements. The results indicate that the error diminishes after the update. After

**FIGURE 9** | Experiment for the hybrid online prediction framework. S is the starting position and T1 to T4 are the four target regions. Each region consists of four possible placing positions as four corners of a square for the LEGO bricks.

several updates on the GPR model, the interpersonal variance can be represented in each person-specific GPR model. We also observe that even for the same tasks the new weight vectors vary between different subjects (**Figures 11E–H**). This supports the existence of the interpersonal variance while emphasizing the importance of this updating process in our framework.

### **6. DISCUSSION**

Facilitating efficient and safe co-existence of humans and robots is a multifaceted challenge. In this paper, we focus on developing a human motion modeling and prediction framework that can be effectively used for robot control during dyadic interaction. One of the key insights of this work is that the interpersonal difference is not negligible regarding the contribution of cost functions. Even though motor variability was acknowledged in previous studies and some stochastic optimal control formulations were suggested as models for the motor control functionality of the CNS, the interpersonal variance has not been studied in such detail. The research presented in this work is a first step for combining modelbased and probabilistic data-driven approaches in order to look into this topic, especially from the perspective of how this can be used for human-in-the-loop robot control. In essence, the hybrid framework enables personalized modeling and prediction of human motion behaviors, which can be integrated into robot control to provide personalized, safe, and efficient assistance to the human partner. However, there are still many aspects that need further investigation both for human motion modeling and its effective integration on robot control.

### **6.1. On the Human-in-the-Loop Robot Control and HRI**

As robots have become ubiquitous in our daily lives, the goal is to provide safe yet natural interaction between human–robot dyads. To this end, novel robot control architectures which take into account human motion behavior are required. As robots are expected to adapt their motion behaviors with respect to their human counterparts, understanding how humans control and execute their motions is critical. The outcome of human motion modeling is twofold: on the one hand, the models learned can be used to predict human motions during interaction so that the robot can take proactive actions. On the other hand, such models enable building robot control architectures for realizing human-like motions to provide natural interaction. The proposed hybrid framework focuses on the former, and it also lays out the underlying control mechanism for human motor control while demonstrating the trade-off between kinematic and dynamic properties used for arm reaching control. Even though there were recent studies on transferring such optimal control formulations learned from human motion data to robot control (e.g., locomotion (Mombaur et al., 2010), reaching motion (Albrecht et al., 2011)), our findings would enhance such methods by building adaptive control methods to achieve a similar trade-off as human motor control seems to utilize.

The model-based optimal control formulation can further be utilized for other HRI settings, e.g., in physical HRI to provide the required assistance by the robot to the human partner in order to reduce the effort spent by the human which can be detected from the increase in dynamics related costs contribution. In addition, the trade-off analysis can be extended to understand how reciprocal influence of partners' movement affect the cost distribution, which in turn help us construct suitable control and motion planning strategies for the robot to provide optimal assistance constrained on similar cost distributions.

As humans collaborate with each other naturally and safely in close proximity, we hypothesized that one crucial requirement for dyads is to be able to estimate the collaborating partner's motions. In that regard, it is also essential for a robot to predict the motion of human partners. This prediction needs to be efficient (online-capable) in order to choose actions proactively, and to (re-)plan the motion in a way to realize a collision-free trajectory while still achieving the task. The proposed hybrid framework enables such an efficient prediction as well as an update on the cost combination per person. The ProMP-based human motion prediction component of this work has already been integrated into a stochastic trajectory optimization framework (Oguz et al., 2017). The efficiency of our motion prediction enables the robot to re-optimize its motion frequently at short intervals while considering the predicted human motion distribution as a dynamic obstacle to avoid. Hence, any changes in the expected movement can still be taken into account to achieve a responsive and safe interaction. Furthermore, since our hybrid architecture also updates personal motion models during interaction, the effect of robot movement on human partner's motion can still be captured, which is expected to increase the accuracy of predictions during the course of the interaction.

In that regard, Interaction Primitives (IPs) (Amor et al., 2014) and its extension Environment-adaptive IPs (EaIPs) (Cui et al., 2016) also provide a data-driven approach to predict a human partner's movement and then to plan the robot motion accordingly. As ProMP formulation already builds on the idea of learning a distribution over some demonstrated trajectories, it can also be extended to account for the coupling between two agents by

**FIGURE 10** | Results of predicting with the ProMPs. **(A–D)** Distance error between the observations and the ProMPs predictions of five subjects for target one to target four. The errors converge after several updates. **(E)** KL-divergence of comparing the updated distribution with the previous one for target one. It can be observed that after 10 iterations the value is quite small, which indicates that the distribution converges. **(F)** The ProMPs predictions and the observations in the last iteration of subject one for all four targets. **(G–I)** The ProMPs predictions in the last observation of subject one for target one. Each plot presented the mean and the variance of *x*,*y*, *z* positions in Cartesian space, respectively.

learning a distribution over two persons' trajectories executed during a joint interaction task. Similarly, learning a joint distribution including the environment-related features would be a feasible improvement. The learned human motion models can still be fed to the IOC formulation to extract the optimal cost distributions that best describes those interactive movement behaviors. The reciprocal influence of partners on their individual cost utilization poses an interesting research question that can be analyzed from the IOC perspective. Our modular hybrid framework also allows integration of any movement representation that can effectively predict human movement behaviors. In that regard, the IOC formulation can easily be integrated with (Ea)IPs to model, understand, and predict human interaction behaviors.

Finally, one critical issue has to be noted. Since those formulations only rely on data-driven formulations, there is no guarantee on a safe and effective motion generation for the robot, especially in close proximity interaction scenarios. However, our approach has the potential to utilize underlying cost function distributions learned from human movement behaviors for robot motion generation, which can then be combined with a learning approach to achieve a generalized safe policy. In that regard, we can combine the reachability analysis (Akametalu et al., 2014) with our model-based optimal control formulation to ensure the safety when the robot is planning its interaction movement. In essence, by the reachability analysis, the states that lead to an unsafe situation will be eliminated, and the learning process is performed within the safe region (Fisac et al., 2017). This analysis and the required computations are based on the dynamical model of the system and may not be feasible with the purely data-driven approaches, such as IPs.

### **6.2. Limitations**

The IOC framework enables the identification of combination of basic cost functions in 3D reaching tasks. The results suggest a trade-off between the dynamics and kinematics related cost functions. With a proper definition of the system model and

**FIGURE 11** | Results of the GPR model updating process. **(A–D)** The distance error between the optimal trajectories solved with respect to the initial weight vector and the updated weight vector for target one to target four, respectively. **(E–H)** The contribution of basic cost functions calculated from the mean trajectories of five subjects for target one to target four, respectively.

a set of reasonable cost functions, the IOC framework can be generalized to other problems, e.g., locomotion planning (Mombaur et al., 2010), car driving (Kraus et al., 2010). However, there are several limitations of the IOC framework, one of which is the complexity of finding the global minimum. Even though we tried to cover an extensive search range of the weight vector, the result is arguably still an approximation of the global minimum. Due to the complex non-linear formulation of the IOC framework, no efficient method has been proposed on addressing this problem yet. Second, the lack of the description of variance weakens the accuracy in terms of modeling the motion behavior. Since the IOC framework results in a deterministic solution, it cannot consider the interpersonal variance and the motor variability during the optimization. When we represent the trade-off between kinematics and dynamics related costs regarding the reaching tasks, the variance makes it hard to identify a clear relationship. Therefore, the discomfort metric we proposed is a proof-of-concept, and a deeper investigation is required to uncover how exactly the motion parameters affect the contribution of basic cost functions.

In the proposed hybrid prediction framework, we combine a model-based prediction method with a data-driven method. A GPR model is used to represent the mapping from the initial and final conditions to the optimal weight vector. However, due to the limited amount of data, the GPR model is not sufficient for representing the variance in motion behavior. It is also found to be effective only when the reaching motions are in the descriptive range of the training data. For prediction purpose, we use the trajectory obtained from the composite model to initialize the ProMPs. The reason we want to include this initialization phase other than directly using the ProMPs is that the subsequent updates on the composite models are much faster than solving the IOC problem from scratch for each person (e.g., 100 upper level optimization iterations take around 4 h vs. 15 iterations take around half an hour). It also allows to make the prediction immediately without extra data collection. Note that, because of the fact that the arm model ignores the shoulder translation and the torso movements, which are not avoidable in real reaching motions, the current initialization process still has some errors. If a full upper body model is considered in the IOC framework, this error could be minimized. However, this will immensely increase the computational load, hence this extension may not be feasible.

### **REFERENCES**


## **7. CONCLUSION**

In this work, we investigate the underlying principles of human reaching motions and propose a hybrid framework to utilize our findings in motion prediction. To uncover the criteria of the reaching motion control, we implement an inverse optimal control framework to identify the contribution of basic cost functions which can best represent the human behaviors. The IOC results indicate a trade-off between the dynamics and kinematics related cost functions depending on the reaching tasks. Then to apply the composite cost function for predicting human motions, we combine the model-based optimal control formulation with the data-driven probabilistic movement primitives method. With this hybrid prediction framework, we learn the motor variability as well as the interpersonal variance at the same time. The demonstrated high accuracy and efficiency of this hybrid framework encourages its usage in HRI settings. For human-in-the-loop robot control, a high-level planner for the robot can exploit such a hybrid model to choose its next task, plan a collision-free motion trajectory, and as a result achieve safe, efficient, and natural dyadic interaction with the human partner.

## **ETHICS STATEMENT**

This study was carried out in accordance with the recommendations of Ethics Committee of Technical University of Munich (TUM) School of Medicine with written informed consent from all subjects. All subjects gave written informed consent in accordance with the Declaration of Helsinki. The protocol was approved by the Ethics Committee of TUM School of Medicine.

### **AUTHOR CONTRIBUTIONS**

OO formulated and initiated the research. ZZ finalized the research and conducted the experiments. OO, ZZ, and DW wrote the paper.

## **FUNDING**

This research has partially been supported by the SIEMENS AG.


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

The reviewer, MK, and handling Editor declared their shared affiliation.

*Copyright © 2018 Oguz, Zhou and Wollherr. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

# A Study of the Effects of Electrode Number and Decoding Algorithm on Online EEG-Based BCI Behavioral Performance

Jianjun Meng<sup>1</sup> \*, Bradley J. Edelman<sup>2</sup> , Jaron Olsoe<sup>2</sup> , Gabriel Jacobs <sup>2</sup> , Shuying Zhang<sup>2</sup> , Angeliki Beyko<sup>3</sup> and Bin He<sup>1</sup>

*<sup>1</sup> Department of Biomedical Engineering, Carnegie Mellon University, Pittsburgh, PA, United States, <sup>2</sup> Department of Biomedical Engineering, University of Minnesota, Minneapolis, MN, United States, <sup>3</sup> Institute for Engineering in Medicine, University of Minnesota, Minneapolis, MN, United States*

#### Edited by:

*Tetsunari Inamura, National Institute of Informatics, Japan*

#### Reviewed by:

*Mahnaz Arvaneh, University of Sheffield, United Kingdom Jing Jin, East China University of Science and Technology, China*

\*Correspondence:

*Jianjun Meng jianjunm@andrew.cmu.edu*

#### Specialty section:

*This article was submitted to Neural Technology, a section of the journal Frontiers in Neuroscience*

Received: *01 December 2017* Accepted: *22 March 2018* Published: *06 April 2018*

#### Citation:

*Meng J, Edelman BJ, Olsoe J, Jacobs G, Zhang S, Beyko A and He B (2018) A Study of the Effects of Electrode Number and Decoding Algorithm on Online EEG-Based BCI Behavioral Performance. Front. Neurosci. 12:227. doi: 10.3389/fnins.2018.00227* Motor imagery–based brain–computer interface (BCI) using electroencephalography (EEG) has demonstrated promising applications by directly decoding users' movement related mental intention. The selection of control signals, e.g., the channel configuration and decoding algorithm, plays a vital role in the online performance and progressing of BCI control. While several offline analyses report the effect of these factors on BCI accuracy for a single session—performance increases asymptotically by increasing the number of channels, saturates, and then decreases—no online study, to the best of our knowledge, has yet been performed to compare for a single session or across training. The purpose of the current study is to assess, in a group of forty-five subjects, the effect of channel number and decoding method on the progression of BCI performance across multiple training sessions and the corresponding neurophysiological changes. The 45 subjects were divided into three groups using Laplacian Filtering (LAP/S) with nine channels, Common Spatial Pattern (CSP/L) with 40 channels and CSP (CSP/S) with nine channels for online decoding. At the first training session, subjects using CSP/L displayed no significant difference compared to CSP/S but a higher average BCI performance over those using LAP/S. Despite the average performance when using the LAP/S method was initially lower, but LAP/S displayed improvement over first three sessions, whereas the other two groups did not. Additionally, analysis of the recorded EEG during BCI control indicates that the LAP/S produces control signals that are more strongly correlated with the target location and a higher R-square value was shown at the fifth session. In the present study, we found that subjects' average online BCI performance using a large EEG montage does not show significantly better performance after the first session than a smaller montage comprised of a common subset of these electrodes. The LAP/S method with a small EEG montage allowed the subjects to improve their skills across sessions, but no improvement was shown for the CSP method.

Keywords: BCI, EEG, electrode number, CSP, channel configuration

### INTRODUCTION

Brain-computer interface (BCI) has attracted considerable attention during the past few decades and aims to construct a direct interface between the human brain and peripheral devices (He et al., 2013). Various signal sources such as endogenous motor imagery-based sensorimotor rhythms (Wolpaw and McFarland, 2004) and slow cortical potential (Birbaumer et al., 2003), and exogenous P300 (Jin et al., 2010, 2011) and steadystate visual evoked potentials (Chen et al., 2015) could be used to build the BCI system. Motor imagery-based BCI using electroencephalography (EEG) has shown promise in the control of virtual objects, such as computer cursors and virtual helicopters (Birbaumer et al., 2003; Wolpaw and McFarland, 2004; Royer et al., 2010), and physical devices, such as wheelchairs, quadcopters and robotic arms (Pfurtscheller et al., 2003; Carlson and Millan, 2013; LaFleur et al., 2013; Meng et al., 2016). Human brains display characteristic spatial modulation of sensorimotor rhythms when a user performs certain types of motor imagination, e.g., imagining single or bilateral hand movements, movement of the feet, etc. (Wolpaw et al., 2002; Pfurtscheller et al., 2006). This modulation of sensorimotor rhythms can be captured by computer algorithms and translated into various control commands for output devices. Recently, the study of non-invasive EEG-based BCI has sparked intensified interests, spreading to the inclusion of additional complimentary neurotechnologies such as EEG source imaging (Edelman et al., 2016), vivid imagination strategy (Qiu et al., 2017), hybrid modality (Kaiser et al., 2014), non-invasive neuromodulation (Baxter et al., 2016), and functional magnetic resonance imaging (Zich et al., 2015) to enhance the usability and performance of these systems. Additional studies have utilized EEG-based BCIs to explore neurophysiological foundations of BCI performance such as predicting user performance (Hammer et al., 2012), measuring brain plasticity through training (Pichiorri et al., 2011), the relationship between attempted movement and motor imagery (Blokland et al., 2015), as well as various clinical applications such as how BCI training may aid in stroke rehabilitation (Ramos-Murguialday et al., 2013; Pichiorri et al., 2015) and in the use of lower limb exoskeletons (King et al., 2015; Donati et al., 2016) and robotic arms for reach and grasp (Meng et al., 2016). These studies aim to further improve the performance of non-invasive BCI from various aspects in order to bring this technology into everyday life.

There are several factors, e.g., the chosen frequency bands, the channel configuration, the associated decoding methods, etc., which are known to affect the performance of BCI system (Blankertz et al., 2008b; Arvaneh et al., 2011; Ang et al., 2012; Meng et al., 2013). The selection of channel configuration, more often entangled with frequency/spectral optimization, and associated computer algorithms have been studied by various offline analyses (Lal et al., 2004; Blankertz et al., 2008b; Arvaneh et al., 2011; Meng et al., 2013). Despite a conflicting consensus within the field regarding the use of large numbers of EEG channels (electrodes) on BCI performance, it has been suggested by offline analyses that the classification accuracy of motor imagery/BCI tasks can increase as more channels are added (Lal et al., 2004; Sannelli et al., 2010; Arvaneh et al., 2011; Meng et al., 2013; Shan et al., 2015; Qiu et al., 2016), but begins to decrease after a certain number due to the redundant and/or irrelevant information introduced into the classifier. Usually, the optimal number of channels depends on the algorithm used, the subject and the application. One state-of-the-art signal processing method termed common spatial patterns (CSP) has gained significant attention due to its efficiency to extract useful motor imagery-related information from multiple channels (Blankertz et al., 2008b; Arvaneh et al., 2011; Meng et al., 2013). Several offline studies have shown that CSP works optimally when using about 20–50 electrodes, however, the optimal number varied among subjects and applications (Lal et al., 2004; Sannelli et al., 2010; Arvaneh et al., 2011; Meng et al., 2013; Shan et al., 2015; Qiu et al., 2016). Advanced signal processing techniques utilizing large numbers of EEG channels, such as CSP, may therefore additionally be able to boost BCI performance. Despite few online applications (Guger et al., 2000; Blankertz et al., 2008a), CSP has primarily been used to perform offline analysis of prerecorded EEG and has been compared to only one or two competing methods at a time (Guger et al., 2000; Blankertz et al., 2008a,b; Sannelli et al., 2010; Arvaneh et al., 2011; Lotte and Guan, 2011; Ang et al., 2012; Meng et al., 2013; Samek et al., 2014; Shan et al., 2015; Qiu et al., 2016). Furthermore, these studies often utilize data from one recording session and do not consider the learning process of subjects that may arise under longitudinal training paradigms. On the other hand, the LAP method uses a fixed configuration and a relatively small number of channels distributed around the sensorimotor area. There are multiple studies demonstrating online performance across several training sessions via the LAP method (Wolpaw and McFarland, 2004; Royer et al., 2010; Baxter et al., 2016; Meng et al., 2016), however, most of these studies do not investigate or report the learning process of subjects in part due to insufficient number of subjects and low statistical power. Meanwhile, many recent studies using BCI for stroke recovery, recovery of lower limb injury or robotic arm control highlight the requirement of online BCI learning in multiple/long-term sessions (Ramos-Murguialday et al., 2013; King et al., 2015; Pichiorri et al., 2015; Donati et al., 2016; Meng et al., 2016). Furthermore, the BCI learning of subjects in long-term sessions continuously occurs and interacts with the adaptation of the BCI system. Therefore, it is necessary to not only evaluate various channel configurations in an online fashion, but also to investigate how the use of these approaches affect the learning process involved in BCI control.

It is clear from many offline studies that the modulation of sensorimotor rhythms can be better captured by advanced signal processing algorithms (Guger et al., 2000; Lal et al., 2004; Blankertz et al., 2008a,b; Sannelli et al., 2010; Arvaneh et al., 2011; Lotte and Guan, 2011; Ang et al., 2012; Meng et al., 2013; Samek et al., 2014; Shan et al., 2015; Qiu et al., 2016) and may facilitate the learning process of BCI control. This facilitation might manifest in the form of higher decoding accuracies and overall BCI performance, which would possibly encourage the participants to maintain engagement throughout the training. However, to the best of our knowledge, it has yet to be tested whether or not the inclusion of large numbers of electrodes and advanced signal processing algorithms lead to more efficient BCI training and increased performance in an online setting. In the present study, we attempt to answer this question by investigating BCI learning over multiple sessions in terms of the performance of BCI control, a behavioral representation of the users' ability to modulate his/her sensorimotor rhythms.

In this study, we used two particular channel configurations, composed of vastly different numbers of electrodes in order to explore the learning performance of subjects. For each channel configuration, decoding algorithms were utilized that have been shown in literature to optimize the information collected among the included sensors; Laplacian Filtering (LAP/S) for the small channel configuration (Hjorth, 1975; McFarland et al., 1997), CSP/L for the large channel configuration, and CSP/S for the small channel configuration (Ramoser et al., 2000; Blankertz et al., 2008b). Forty-five participants were recruited to participate in this study. To address the question of how the channel configuration and decoding algorithm affects initial BCI learning, the subjects were randomly distributed into separate experimental groups utilizing different channel configurations and decoding algorithms. Each subject participated in multiple sessions of BCI experiments using a 1-dimensional (1D) cursor control paradigm. The group averaged performance across sessions was compared among the three groups.

### METHODS

### Experimental Setup

#### Subjects and Data Acquisition

Forty-five BCI naïve subjects were recruited for this study. The 45 subjects (22 females; mean age, 23.7 ± 7.7; range 18–54; seven of them are left handed, one ambidextrous) were randomly assigned to one of three groups **(Group one: G1, Group two: G2, and Group three: G3)**. For each subject, five sessions of 1D horizontal motor imagery BCI control (**Figure 1A**) were performed via either the small channel configuration (**Figure 1B**) or multichannel configuration (**Figure 1C**) and the associated online decoding algorithms. The average interval between any two of the five consecutive sessions for all subjects is 7.24 ± 8.85 days and the minimum interval for each subject is 1 day. The small channel configuration (**G1**, 15 subjects) employed the AR spectrum algorithm which extracted the amplitude of the alpha-beta rhythm (8–26 Hz) from channels C3 and C4. Before extracting the spectrum signals from those two channels, they were spatially filtered by LAP/S Filter (McFarland et al., 1997). Whereas, the CSP algorithm was used to adjust the weight coefficients applied to the multichannel configuration (**G2**, 15 subjects). Correspondingly, band-pass filtering (8–26 Hz) was employed to the multiple channels before the CSP algorithms was applied. As a control, the remaining 15 BCI naïve subjects **(G3)** were instructed to perform five sessions of the same BCI task via the small channel configuration and CSP decoding algorithm (CSP/S, same band-pass filtering). All participants were informed about the experimental protocol and written informed consent from all participants was acquired before participating in the study; the study protocol was approved by the Institutional Review Board (IRB) of the University of Minnesota.

64-channel EEG was acquired using a Neuroscan EEG acquisition system. The reference was located on the vertex and the ground electrode was on the forehead. During EEG recordings, subjects were seated in a comfortable chair with their hands on the armrests and faced a computer monitor at a distance of one meter. All electrode impedances were maintained below 5 k. The EEG signals were recorded at a sampling rate of 1,000 Hz and band-pass filtered from 0.5 to 200 Hz by a Neuroscan Synamps RT amplifier (Neuroscan Inc, Charlotte, NC). A notch filter of 60 Hz was applied to the raw EEG signals.

### Study Design

The 45 subjects who were naïve to BCI fell into one of the three groups randomly. Each subject was instructed to imagine movement of their left hand or right hand to control the left or right cursor movement, respectively. Subjects were instructed to perform kinesthetic motor imagination in the first person perspective (Neuper et al., 2005). There were 10 runs per session, each composed of 25 trials per run with left and right targets presented in block randomized order. Thus, the left and right targets were roughly equal in each session. In each session, there was no feedback for the first run of 25 trials and was used as the training data for CSP/L or CSP/S. There was no feedback for the first trial in each run as the data from this time period was used as training data (buffer initiation) for LAP/S. After the first training run or trial in the respective method, the feedback was provided to subjects. For each run, the trial started with a black screen for 3 s during which the subjects were instructed to relax and try their best to eliminate body movement. A yellow bar appeared after second 3 on either the left or right side of the screen and was maintained for 3 s, followed by the appearance of a pink cursor at second 6 which was allowed to move based on the subject's brain rhythms (as shown in **Figure 1A**). Subjects were given a maximum of 6 s in each trial to hit the correct target; thus, each trial could result in a hit, miss, or abort. After a 1 s postfeedback period a new trial repeated under the same procedure. The movement of the cursor was presented by BCI2000 (Schalk et al., 2004).

All of the subjects were blinded to which group they belonged to. The first group G1 included 15 subjects using the small channel configuration and LAP/S filtering method for online decoding during the first three sessions (G1:LAP/S); the last two sessions were switched to the multichannel configuration and CSP/L for online decoding (G1:LAP/S->CSP/L). For the second group, 15 subjects participated in three sessions of BCI control via the multichannel configuration and CSP/L as the online decoding method (G2:CSP/L) and then switched to the small channel configuration and LAP/S method for their last two sessions (G2:CSP/L->LAP/S, as shown in **Figure 1D**). The 15 subjects in the third group performed the same BCI task via the small channel configuration using CSP as the decoding method throughout the five sessions (CSP/S, G3:CSP/S). The ability to operate a BCI for participants is expected to be generalized from one decoding method to another if they are to use the same underlying neurophysiological mechanism. Changing the decoding method can act as a perturbation

to a particular subject's control strategy, however, could also potentially help make such a control strategy more robust in the long run. The switch-over design among the first two groups was aimed to test how changing the decoding method would affect a subject's control strategy by means of behavioral performance. The experimental schedule for each subject was balanced between each subject's earliest next available day and our lab's availability; **Figure 2A** displays the scheme for recruiting and scheduling subjects. The statistics of inter-session interval for all subjects are displayed in **Figure 2B**.

### Multichannel (CSP/L) Online Decoding

Forty electrodes were selected and are marked green in **Figure 1C**. Channels at the periphery of the cap were ignored during online processing to avoid large, common artifacts caused by facial muscle movements. Those 40 channels were bandpass filtered by an 8–26 Hz Butterworth filter and then spatially filtered by three pairs of the most distinct CSP filters. The power in a sliding time window of 2 s in each of spatially and spectrally filtered EEG signals were used to calculate the movement of the cursor in each time step. CSP aims to maximize one class covariance while minimizing the other class covariance. The equivalent optimization problem aims to maximize one specific class covariance with the normalization constraint of two class covariance. The solution is given by the generalized eigenvalue decomposition (Ramoser et al., 2000; Blankertz et al., 2008b). Linear discriminant analysis (LDA) was used as the classifier. In the online BCI experiments in the current study, the normalized value of LDA output was mapped to the cursor velocity in either the positive (right) or negative (left) directions. For the first run of each session, the cursor did not move but the subject was required to do the same motor imagination as subsequent runs in order to get the training data to train the CSP and LDA classifier. After the first run, the spatial filters and LDA classifier were retrained based on the data of the previous two runs, 50 trials, (for the second run, only the data of first run were used). A batch mode (Qin et al., 2007; Meng et al., 2014) was applied to update the classifier online after each run.

### Small Channel (LAP/S) Online Decoding

The signals of channel C3 and C4 were spatially filtered by LAP filter (**Figure 1B**) before they were used to calculate the power spectrum. Spectral power of the alpha-beta rhythm estimated by the Autoregressive (AR) method for channel C3 and C4 was input to a linear classifier and linearly mapped to the 1D left/right cursor velocity (LaFleur et al., 2013; Meng et al., 2016). For the first run of each session, the cursor did not move for the first trial but the subject was instructed to do the same motor imagination in order to calibrate a normalizer, after the first trial the cursor began to move (feedback). The normalizer took the output of the linear classifier as input and transformed it

into a zero mean and unit variance control signal for velocitybased cursor control. The output of linear classifier was updated online by the normalizer (Schalk et al., 2004; Baxter et al., 2016).

### CSP/S Online Decoding

The signals from the small channel configuration (**Figure 1B**) were used for this paradigm and the same procedures of CSP and LDA were used to extract three pairs of features and classify the movement of the cursor. Similarly, the spatial filters and LDA classifier were updated online in a batch mode after each run.

### Group R-Squared Evaluation in Sensor Space

We used the coefficient of determination, R-squared (r 2 )value, to measure how strongly the means of the two distributions (left and right hand imagination) differ relative to the band power variance. In an offline analysis, trials contaminated by artifacts were removed to alleviate the obscurity of cortical activity caused by any electrical sources unrelated to the task, e.g., swallowing or head movement. Trials were rejected if the activity satisfied one of the following criteria: first, the power spectrum of the trial during the feedback period deviated from the baseline by ±35 dB in the 7–35 Hz frequency band (Delorme and Makeig, 2004); second, the feedback duration was < 2 s. All of the trials including hit, miss and abort trials are used for the calculation of R-squared value. Artifactual trials were removed as described above in order to utilize only clean EEG signals, which could accurately capture the intention of the subjects. This procedure left an average of 200 ± 37 trials remaining for each subject and each session. A large Laplacian filter was applied to all of the recorded data (Hjorth, 1975; McFarland, 2015). Since a broad frequency band of 8–26 Hz was applied to all of the methods online, it is desirable to see the changes of R-squared values across sessions in the same frequency band for offline analysis. R-squared values were first calculated in each electrode in the frequency band of 8–26 Hz from all of the non-rejected trials for each subject and each session. Then the R-squared values were averaged over the subjects in each session.

## RESULTS

### Online BCI Performance Results

First, the online BCI performance for each method was averaged over groups of subjects for each session. The group averaged percent valid correct (PVC) for all of subjects in different groups across the five sessions is shown in **Figure 3**. The PVC is defined as the ratio of the correct target hit trials vs. all of the valid outcomes. Thus, invalid outcomes corresponding to those trials when neither a correct nor an incorrect target was hit (abort) were excluded in the calculation of PVC. The first group is indicated by the green line, the second group the red line, and the third group the blue line. Because of the switch-over employed in sessions four and five for the first two groups, the line color remains the same for original groups, however, the markers switch. Thus, a green line with green circles represents the first group for the first three sessions, whereas a green line with red stars represents the same group in the final two switch-over sessions. The same visualization method is applied to the second group (refer to **Figure 1D** for details). The light-gray highlighted region indicates when the switch-over sessions began.

A linear mixed effect model (lme) was employed to evaluate the statistical significance of group performance over time (across sessions) with post-hoc Tukey's tests used to correct for multiple comparisons. There was a significant difference (p = 0.03) in PVC between G1 (LAP/S, PVC ± S.E.M was 61.8 ± 4.6%) and G2 (CSP/L, 76.0 ± 2.6%) and no significant difference between G1:LAP/S and G3 (CSP/S, 72.7 ± 3.7%), or G2:CSP/L and G3:CSP/S in the first session. There were no significant differences among any group pairings in the following sessions. In the fifth session, we found a significant difference in PVC between G2 (LAP/S, 83.8 ± 4.7%) and G1 (CSP/L, 73.8 ± 3.7%), and between G2 (LAP/S, 83.8 ± 4.7%) and G3 (CSP/S, 73.3 ± 3.7%) before correction of multiple comparisons, however, these effects did not survive the correction. The group averaged PVC using the CSP decoding method and multichannel configuration (CSP/L, marked with red stars) showed significantly higher initial performance than the LAP/S decoding method using the small channel configuration at the first session. However, the performance of this group varied across the first three sessions without a clear trend and the variation (red star) continued after

switching over to the LAP/S method as well. A similar high group average PVC by the small channel configuration (CSP/S, blue line) was shown at the first session, but no significant difference was shown compared to LAP/S decoding method. There was small variation in the performance for group three using the CSP/S method across the five sessions. The performance via the LAP/S decoding method using the small channel configuration (green circle) displayed a significant improvement, which is further discussed in section Offline BCI Performance Analysis, from the first to third sessions despite starting from a lower initial group averaged PVC. The change of BCI performance between session 4 and session 5 after switching over also shows greater improvement in group G2:CSP/L->LAP/S compared to G1:LAP/S->CSP/L (see analysis results in section Offline BCI Performance Analysis as well). In the following subsections, whether the randomly assigned subjects in three groups had equal natural abilities to control a BCI and whether the three methods produced different longitudinal effects of BCI control across the first three sessions are evaluated.

### Evaluation of BCI Performance Progress for Different Methods

In order to see the effects of the three different methods (here denoted as method A-LAP/S; method B-CSP/L and method C-CSP/S) and time (different sessions), the first three sessions and the last three sessions were analyzed separately. The performance of the first session (baseline) was subtracted from the second and third session to get the change of performance from baseline for each subject after training. Similarly, the performance of the third session was subtracted from the fourth and fifth to obtain the change of performance from the time point right before switching over methods for each subject. Although the offline cross validation analysis in section Offline BCI Performance Analysis showed that there was no significant difference in discriminative abilities between groups before the first online session (the offline cross validation in section Offline BCI Performance Analysis was used to assess subjects' discriminative abilities between groups), the difference in subject ability could be further compensated and evaluated by comparing their change of performance session by session. Thus, by baseline correcting the subjects' performance, the effect of initial group differences can be minimized. The change in performance (dependent variable, DV) for three different subgroups underwent treatment A, B, and C, respectively, for two different time points (repeated measures). A mixed repeated measures ANOVA was used to determine whether the three different methods produced different BCI performance over time. Prior to switching over methods, the statistics for the main effect of method is F(2, 42) = 3.23, p = 0.049, n <sup>2</sup> = 0.1 (generalized Eta-Squared measure of effect size); main effect of time is F(1, 42) = 1.12, p = 0.30, n <sup>2</sup> < 0.01; interaction effect of method and time is F(2, 42) = 0.29, p = 0.75, n <sup>2</sup> < 0.01. The statistics indicate there is a significant difference in the main effect of method. Post-hoc linear mixed effect models (lme) and Tukey's Test were performed between each method for multiple comparisons of means. The results are summarized in the **Figure 3B**. There is a marginally significant difference (p = 0.05) in change of BCI performance from session 1 to session 3 between method G1:LAP/S–G2:CSP/L (PVC ± S.E.M, 11.6 ± 5.0%); no significant difference (p = 0.81) between method G3:CSP/S– G2:CSP/L (3.0 ± 5.0%); no significant difference (p = 0.20) between method G3:CSP/S–G1:LAP/S (−8.6 ± 5.0%). There are no significant differences in the change of BCI performance from session 1 to session 2 (i.e., Session 2-1) among all of the three methods. After switching over methods, a similar mixed repeated measures ANOVA was used to assess the three different methods. No significant difference in main effects was shown; the statistics for the main effect of method is F(2, 42) = 0.98, p = 0.39, n <sup>2</sup> = 0.04; main effect of time is F(1, 42) = 0.17, p = 0.68, n <sup>2</sup> < 0.01. There was a significant interaction effect between method and time, F(2, 42) = 5.35, p = 0.009, n <sup>2</sup> = 0.03. Since a significant interaction was found, we did further posthoc analysis on the change of BCI performance between session 4 and session 3, between session 5 and session 4 to find the reason for such significant interaction. The results are shown in **Figure 4**. A significant difference between G2: CSP/L->LAP/S and G1:LAP/S->CSP/L was shown for the change of BCI performance between session 4 and session 5 (see **Figure 4B**).

A linear mixed effect model was applied to test whether the carry-over effect was significant via the sequence of G1 (LAPS, session 3) (CSPL, session 4) and the sequence of G2 (CSPL, session3) (LAPS, session 4) in order to exclude the possibility of biasing the results for session 4 and session 5 due to the switch over design (Lawson, 2014). The result showed that there is no significant carry-over effect (p = 0.67) between the two sequences.

### Offline BCI Performance Analysis

Although the subjects were recruited and randomly assigned to different groups, it is worth investigating the genuine ability of each group to produce discriminable brain patterns and ensure that this ability was roughly equal. Then we can exclude the possibility that the difference of online performance between different groups, especially at the first training session, is caused by subjects' natural BCI ability. The offline analysis was performed for all of the subjects on their first session and subsequent four sessions of BCI data. Since CSP has been used as a benchmark method for numerous offline analyses (Blankertz et al., 2008b; Tangermann et al., 2012), CSP/L with the large channel configuration and a time segment of 2 s right after the cursor appears (3 s after the target cue was presented) was used as the method to assess the discriminability of subjects in all three groups. A 5 × 5 fold cross-validation (CV) was used to estimate the offline performance accuracy for each subject and each session. The group average CV accuracy in each group and session is shown in **Figure 5A**. Particularly, a one-way analysis of variance (ANOVA) was used to test whether there were statistically significant differences between the means of three groups at the first session. The average CV results and SEM for each method were G1:LAP/S (PVC ± S.E.M) 70.0 ± 3.6%, G2:CSP/L 73.0 ± 2.7% and G3:CSP/S 75.0 ± 3.7%. The statistical results of F(2,42) = 0.46, p = 0.63 indicate that there was no statistically significant differences between the means of the three groups' offline performance at the first session. Thus, we can conclude that the genuine ability to produce discriminable brain patterns in different groups was roughly equal.

### Group Average R-Squared Value in Sensor Space

The R-squared value shows the difference of means of EEG mu and beta power between collections of left and right hand motor imagination relative to their band power variance. Thus, it could provide information about electrophysiological changes across training sessions. The group averaged R-squared value over all of the participants for each decoding method was calculated and is visualized in **Figure 6**. In each row, the group average R-squared value with respect to each method is displayed, and in each column the group averaged results with respect to each session is shown. Note that, for the decoding method of G1:LAP/S and G2:CSP/L, the subjects were switched-over into the other processing and decoding scheme at the fourth session. The color scale for each method is globally normalized to indicate the Rsquared values relative to each method. For all of the three rows, focal distributions around channels C3 and C4 show stronger R-squared values, which implies sensorimotor areas are actively modulated by all of the three methods. Specifically, a mixed repeated measures ANOVA was used to assess whether there is a difference of R-squared values between the different methods across sessions in channels C3 and C4. Similar to the online performance analysis, the first three sessions and the last two sessions were analyzed separately. There is a significant main effect of method [F(2, 42) = 5.47, p = 0.008] and interaction effect [F(2, 42) = 3.25, p = 0.049] for the R-squared values at channel C3 in the ANOVA accounting for the three groups and final two sessions. The post-hoc linear mixed effect models (lme) and Tukey's Test were performed between each method for multiple comparisons of means. The results are summarized in **Figure 7**. There is a significant difference in R-squared values at channel C3 between methods G2:CSP/L->LAP/S and G1:LAP/S- >CSP/L (p = 0.0015), between methods G2:CSP/L->LAP/S and G3:CSP/S (p = 0.0007) at the fifth session. This indicates that EEG mu and beta waves are more consistently modulated for the LAP/S method at the fifth training session on the group level.

### Typical Spatial Patterns Derived by the CSP/L Method

Considering the results of different online BCI performance shown in the above analysis, feature extraction and classification were analyzed in the current section to probe the EEG for what factors might affect the progress of online BCI performance. Since similar linear classifier and adaptation schemes were

FIGURE 4 | (A) Statistical significance test of change in performance between session 3 and session 4 for all three methods. (B) Statistical significance test of change in performance between session 4 and session 5 for all three methods.

used for all of the LAP/S, CSP/L, and CSP/S groups, there is little chance that the classifiers cause the different online BCI performance. We therefore focus on the feature extraction method which is more likely to result in performance differences. An important reason for incorporating multiple channels is the expectation of deriving more task-related information through advanced signal processing algorithms, like spatial filtering. The scalp topographies of three pairs of the highest ranked spatial patterns calculated by CSP/L for each subject and each run during each session were explored with the above question in mind. A typical example for a particular subject is shown in **Figure 8**. In **Figure 8A**, spatial patterns derived by CSP/L for one run are shown; the first and second spatial pattern in the first row together with the second and third to last spatial patterns in the second row display neurophysiological consistency with the expected location of event-related (de)synchronization (ERD/ERS) which is a typical signature of motor imagery tasks. Contrasting with these four patterns, the third spatial pattern and last spatial pattern might capture non-sensorimotor related activities of the EEG, such as frontal and occipital activities which happen to be extracted by the algorithm. The temporal dynamics of ERD/ERS filtered by CSP/L, corresponding to the spatial patterns in **Figure 8A**, are shown in **Figure 8C**; in general, the temporal signals for right-hand imagination (red line) show larger amplitude oscillations than signals for left-hand imagination (blue line) in the first row of **Figure 8C** and vice versa in the second row. Similar spatial patterns for another run are shown in **Figure 8B**; a similar combination that fit with prior neurophysiological knowledge about sensorimotor related activities and non-sensorimotor related activities were observed. The temporal dynamics of ERD/ERS for another two left-hand and right-hand imagery trials are shown in the **Figure 8D** with their spatial patterns of CSP/L corresponds to **Figure 8B**. In general, the temporal dynamics of those sensorimotor related activities and non-sensorimotor related activities show similar ERD/ERS at the interested frequency band, but the spatial distributions might vary run by run. An obvious difference between the multichannel or small channel configuration via CSP/L or CSP/S decoding and the small channel configuration via LAP/S decoding was that the weighting coefficient for each electrode (spatial filtering) via CSP/L or CSP/S was automatically derived and adapted by the data. On the contrary, the weight coefficients for the electrodes were fixed throughout the whole sessions of training via LAP/S decoding. The variation of the

FIGURE 6 | Topography of group averaged R-squared values for the three online decoding methods across five sessions. Each row shows the group results with respect to the different methods, and each column shows the results with respect to each session. The color scale for each method and session is globally normalized. The maximum R-squared value for each condition is marked as a black star and its value is shown rightly above each topography. The group averaged R-squared values for the LAP/S method displayed the largest R-squared value in the focal area surrounding channels C3 and C4 at the fifth session.

FIGURE 7 | Statistical analysis of group averaged R-squared values, in channels C3 and C4, for the three online decoding methods across five sessions. Group mean ± S.E.M (standard errors of the mean) R-squared values in channels C3 and C4 were shown in the left and right panel, respectively, for the three different methods. Statistically significant different R-squared values were found between method G2:CSP/L->LAP/S and G1:LAP/S->CSP/L, between method G2:CSP/L->LAP/S and G3:CSP/S, in channel C3 at the fifth session.

coefficient for each electrode by CSP method for a particular subject and averaged over subjects is depicted in **Figure 9** across runs within a single session in the online BCI control. The spatial filter for each subject was normalized in order to make the average across subjects unbiased. There were clear variations for each coefficient and each subject across each run in a session. The change of coefficient optimized by CSP method caused the change of spatial pattern on a run by run basis which might disturb subjects' consistent modulation of sensorimotor rhythms; this is well reflected as an example in **Figure 8** as well.

### DISCUSSION

### BCI Performance Progress With Respect to Decoding Method and Number of Channels

The number of channels used in offline analyses has been widely considered to be a critical factor that affects the performance of separating and classifying different motor imagery tasks (Lal et al., 2004; Blankertz et al., 2008b; Sannelli et al., 2010; Arvaneh et al., 2011; Ang et al., 2012; Meng et al., 2013; Shan et al., 2015; Qiu et al., 2016). However, in the current online study, changing the number of channels from 40 electrodes (multichannel CSP/L configuration) to nine electrodes (CSP/S), while maintaining the same decoding method, resulted in no significant difference in group averaged PVC across multiple sessions (refer to **Figure 3**). Another observation was that there was no trend of improvement in performance across the training sessions for the decoding method of either CSP/L or CSP/S. However, there was a significant improvement of performance from the first session to the third session for the G1:LAP/S compared to G2:CSP/L although a relatively lower initial group averaged performance was observed at the first session. Note that G2:CSP/L showed an overall higher average performance over the three sessions, but there was no significant difference between G1 and G2 from session 2 to session 3. For the last two sessions after switchingover methods, interestingly we found that G2:CSP/L->LAP/S showed significant improvement in change of BCI performance between session 4 and session 5 compared to G1:LAP/S->CSP/L; the performance of the CSP/S group without switching-over presented plateaued performance at the first session and varied a little bit across five sessions. No significant effects were found among the three methods at the end of training session five.

The topographies of the group averaged R-squared values of all three methods in sensor space (**Figure 6**) clearly show that modulation of brain rhythms near the left C3 and right C4 channels were induced by all methods across the multiple training sessions. This might imply that subjects successfully learned to modulate broad band alpha and beta rhythms from the bilateral motor cortical areas, however, the efficiency of modulation is different at the end of the last session (session 5). G2:CSP/L->LAP/S has a significantly higher group averaged Rsquared value compared to the other two groups in channel C3 at session 5. From the topographies of group averaged R-squared values via the LAP/S, the regions of noticeable modulation initially covered a large portion of cortical areas and showed smaller R-squared values around C3 and C4. In later sessions, especially at the fifth session, larger R-squared values were induced in focal regions around the bilateral motor cortical areas. This concept is in line with the findings of the fMRI studies which have compared the cortical activation maps of people deemed to be skilled and unskilled at motor imagination (Guillot et al., 2008). This research involving both motor execution and motor imagery has revealed that cortical activation is more distributed in the unskilled motor imagery group than in the skilled motor imagery group. In the current study, subjects using the LAP/S in the first session were considered to be less skilled than in the later sessions. This could partly explain why the group averaged R-squared sensor space topography at the first session was more spread, weaker compared to their corresponding topography at the latter sessions. However, this trend did not appear in the two CSP decoding groups using either the multichannel configuration or small channel configuration. Contrary to this concept, both of the two CSP decoding groups show plateaued group performance at the first training session. This is likely the case because the larger scalp coverage and optimizing properties of the CSP method allows for a more

distributed brain modulation to control the BCI, increasing the ease of control at the beginning of training (**Figure 5**).

This was not too surprising when considering the group averaged PVC across multiple channels. The performance of these two groups started at a high performance right away but fluctuated throughout the sessions and there was no trend of improvement on performance observed throughout the training processes (**Figure 3**). This might imply that subjects could not consolidate their skills of consistently modulating the alpha and beta rhythms in the electrodes covering a focal region of the sensorimotor areas. The typical spatial patterns derived by CSP/L (**Figure 7**) and the variation of resulting weighting coefficients for each electrode (**Figure 8**), corresponding to the variation of spatial patterns run by run, could help to explain this. Based on the CSP algorithm, the spatial patterns are derived through a data-driven approach and there is no guarantee that each spatial pattern is electrophysiologically relevant to motor imagery modulation, which means that non-sensorimotor related modulation could also contribute to the discrimination of the different motor imagery tasks. By detecting various non-sensorimotor related patterns on a run by run or session by session basis, and when considering the non-stationary behavior of the EEG signal itself, dramatic fluctuations of the weighting coefficient for each electrode is a reasonable outcome even though it is detrimental to inducing stable patterns of modulation for the subjects. Thus, due to the dynamic montage weights when using the CSP algorithm, subjects might not concretely consolidate the skills necessary for controlling a BCI; although the CSP method might provide optimal session-specific performance, it could also lead to inconsistent modulation of subjects' brain rhythms. The careful scrutiny of removing those non-sensorimotor related filters might help to alleviate this problem and needs to be carefully examined by additional experiments in the future investigation.

### Indications From Switch-Over Sessions

The ability of operating a BCI for participants is expected to be generalized from one decoding method to another if they are to use the same underlying neurophysiological mechanism. A better decoding method could potentially help the user be more robust to perturbations; the switch-over design among the first two groups was aimed to test this. The group averaged PVC of the first two groups in the fourth session presented similar performance, but showed different improvement of BCI performance in the fifth session. For the participants of G1:LAP/S->CSP/L, an increased average PVC after switching followed by a decreased average PVC in the subsequent session was observed; for the participants in G2:CSP/L->LAP/S, the average PVC showed a smaller improvement after switching and a big improvement at the fifth session although none of these changes is significant. This difference in trends is supported by the mixed effect ANOVA test results; the results (section Online BCI Performance Results) on the change of BCI performance for the last two sessions compared to the third session before switching methods showed a significant effect of the interaction between method and time. Further post-hoc analysis shows that there is a significant difference in change of BCI performance between G1:LAP/S->CSP/L and G2:CSP/L->LAP/S at session 5 compared to session 4. Although there is a significant difference in BCI performance between LAP/S and CSP/S at the fifth session before correcting for multiple comparisons, it is not significant after the correction. The analysis for the first three sessions support the idea that LAP/S allows group one to improve on average during the first three sessions; it seems that LAP/S might also allow group two which starts out with CSP/L and switches over to LAP/S for the last two sessions to increase average BCI performance as well, but more data are needed to support this. On the other hand, there is no clear trend shown for either the CSP/L (group one before switching over and group two after switching over) or CSP/S. Considering these analyses, the trend and R-squared topography, the collective results indicate that LAP/S might help the user be more robust to perturbations. Nevertheless, not only does the method affect the online BCI performance over the course of multiple sessions but other factors such as motivation and mental status might also play a large role (Nijber and Kübler, 2010; Ahn and Jun, 2015). Subjects might lose motivation quickly after multiple sessions of the same paradigm, causing performance variation, and must be considered in long-term BCI training (Kübler et al., 2004).

### Limitations of the Study and Future Work

There are many more channel configurations and associated decoding algorithms other than LAP/S, CSP/L and CSP/S available in the non-invasive EEG based BCI literatures (Blankertz et al., 2006; Lotte et al., 2007). We cannot compare all of them through online experimental validation considering limitation of time and resources. However, the two configurations and decoding methods selected in this study are commonly used and consider both the ends of the spectrum in terms of channel numbers. The small channel configuration includes nine electrodes, the peripheral electrodes surrounding channel C3, C4 used to filter out noise and enhance the signal-to-noise ratio of those two channels. Similar trend to improvement of BCI performance by using only channels C3, C4 was observed as well in previous work (Cassady et al., 2014). For the multichannel configuration, previous offline analyses indicate that the performance increases from using a few channels, saturates, and then decrease after an optimal number of channels (Lal et al., 2004; Sannelli et al., 2010; Arvaneh et al., 2011; Meng et al., 2013; Shan et al., 2015; Qiu et al., 2016). In this study, we choose 40 electrodes for the large channel configuration because most offline analysis shows that performance begins to saturate or decrease by using more than 30–50 electrodes (Arvaneh et al., 2011; Shan et al., 2015; Qiu et al., 2016). There are different kinds of algorithms to extract information from signals of multiple channels. CSP is one state-of-the-art decoding method that aims to maximize the difference of class covariance; weight coefficients for multiple channels are automatically optimized through generalized eigenvalue decomposition. There are also different ways to update the classifier for the CSP method. In this study, we chose to update the CSP weight coefficients and classifier using data collected from a buffering pool of 50 trials after each run, where a similar approach was applied in a previous study (Yao et al., 2014). Since the learning process was the major research question in this study, the CSP weight coefficients and classifier update method might not have been a major factor affecting the learning process since the same approach was applied to all subjects across all of sessions. Other methods adjust the weight coefficients according to different evaluation criterions. Those methods could induce similar problems of the frequent weight variations due to the incorporation of non-sensorimotor related modulation. Thus, we speculate that improving BCI performance of motor imagination could be better induced and consolidated by stable and electrophysiologically meaningful patterns focusing on sensorimotor areas regardless the number of channels.

The current design has only three sessions for G1:LAP/S and G2:CSP/L before switching over and two sessions for G1:LAP/S- >CSP/L and G2:CSP/L->LAP/S after switching. Since we did not collect longitudinal data, it is difficult to conclude whether LAP/S will ultimately outperform CSP/L even though LAP/S provides more room for learning at the initial session It would be desirable to have more sessions both before and after switching over to better highlight performance trends, if there are any. Also there are variations for inter-session intervals among the subjects due to practical limitation, even though attempts were made to schedule as soon as possible the next session. For human study in a large group (45 subjects) with multiple sessions each, it is practically impossible to optimize all parameters such as number of sessions, inter-session intervals, etc. The motivation for the study and schedule conflicts of subjects might have to be kept in mind before planning longer sessions. However, the present results provide data on important issues that may impact BCI performance. It would also be interesting to see when the subjects' performance would plateau for the LAP/S method in a group level since a significant improvement of BCI performance was found across the first three sessions; this will be our future investigation.

### CONCLUSION

In this study, two channel configurations representing a small number and large number of channels, along with LAP/S decoding and multichannel decoding algorithm—CSP—were utilized and compared to study the online learning process during multiple sessions. Throughout the multiple learning sessions we found that CSP decoding method, for online BCI control based on the either multiple channels or a small number of channels, shows no difference for BCI online performance but shows high group average performance at the initial sessions than the LAP/S method. Nevertheless, the high performance plateaued at the early training phase and may partially be caused by non-sensorimotor related modulation. Thus, no improvement during multiple sessions was observed. On the contrary, the LAP/S decoding method, for online BCI control via a small numbers of channels, started at a lower group average accuracy, but a trend of improvement and stable pattern of modulation over multiple sessions was observed. The results of the switchover study imply that LAP/S might help subjects to be resistant to perturbations to a certain degree. These results altogether implicate that devising a successful decoding algorithm for online application specifically requires incorporating consideration of

### REFERENCES


subjects' engagement and learning progress in longitudinal BCI control. With respect to long term BCI use, it appears necessary to exclude non-sensorimotor related modulation, which could facilitate higher performance in single individual sessions but might not be beneficial for skill consolidation in longitudinal sessions.

### AUTHOR CONTRIBUTIONS

JM and BH conceived the project, designed the experiments, interpreted the results and wrote the paper. JM, JO, GJ, SZ, and AB performed the experiments and analyzed the data. BE contributed to data analysis, interpretation of results and the writing of the manuscript.

### ACKNOWLEDGMENTS

The authors would like to thank the reviewers for constructive comments, and to Christopher Cline, Bryan Baxter, James Stieger, Taylour Hanson, Elias Boroda, and Taylor Streitz for useful discussions and technical assistance. This work was supported in part by the NIH AT009263, EB021027, NS096761, and NSF CBET-1264782.


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2018 Meng, Edelman, Olsoe, Jacobs, Zhang, Beyko and He. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Discrimination and Recognition of Phantom Finger Sensation Through Transcutaneous Electrical Nerve Stimulation

Mengnan Li <sup>1</sup> , Dingguo Zhang<sup>2</sup> , Yao Chen<sup>1</sup> , Xinyu Chai <sup>1</sup> , Longwen He<sup>3</sup> , Ying Chen<sup>1</sup> , Jinyao Guo<sup>1</sup> and Xiaohong Sui <sup>1</sup> \*

<sup>1</sup> School of Biomedical Engineering, Shanghai Jiao Tong University, Shanghai, China, <sup>2</sup> School of Mechanical Engineering, Shanghai Jiao Tong University, Shanghai, China, <sup>3</sup> Shanghai Health 51 Net Technology Co., Ltd, Shanghai, China

Edited by:

Ioan Opris, University of Miami, United States

#### Reviewed by:

Mohit Shivdasani, University of New South Wales, Australia Felix Scholkmann, UniversitätsSpital Zürich, Switzerland

> \*Correspondence: Xiaohong Sui suixhong@sjtu.edu.cn

#### Specialty section:

This article was submitted to Neural Technology, a section of the journal Frontiers in Neuroscience

Received: 19 December 2017 Accepted: 10 April 2018 Published: 30 April 2018

#### Citation:

Li M, Zhang D, Chen Y, Chai X, He L, Chen Y, Guo J and Sui X (2018) Discrimination and Recognition of Phantom Finger Sensation Through Transcutaneous Electrical Nerve Stimulation. Front. Neurosci. 12:283. doi: 10.3389/fnins.2018.00283 Tactile sensory feedback would make a significant contribution to the state-of-the-art prosthetic hands for achieving dexterous manipulation over objects. Phantom finger sensation, also called referred sensation of lost fingers, can be noninvasively evoked by transcutaneous electrical nerve stimulation (TENS) of the phantom finger territories (PFTs) near the stump for upper-limb amputees. As such, intuitive sensations pertaining to lost fingers could be non-invasively generated. However, the encoding of stimulation parameters into tactile sensations that can be intuitively interpreted by the users remains a significant challenge. Further, how discriminative such artificial tactile sensation with TENS of the PFTs is still unknown. In this study, we systematically characterized the tactile discrimination across different phantom fingers on the stump skin by TENS among six subjects. Charge-balanced and biphasic stimulating current pulses were adopted. The pulse amplitude (PA), the pulse frequency (PF) and the pulse width (PW) were modulated to evaluate the detection threshold, perceived touch intensity, and the just-noticeable difference (JND) of the phantom finger sensation. Particularly, the recognition of phantom fingers under simultaneous stimulation was assessed. The psychophysical experiments revealed that subjects could discern fine variations of stimuli with comfortable sensation of phantom fingers including D1 (phantom thumb), D2 (phantom index finger), D3 (Phantom middle finger), and D5 (Phantom pinky finger). With respect to PA, PF, and PW modulations, the detection thresholds across the four phantom fingers were achieved by the method of constant stimuli based on a two-alternative forced-choice (2AFC) paradigm. For each modulation, the perceived intensity, which was indexed by skin indentations on the contralateral intact finger pulp, reinforced gradually with enhancing stimuli within lower-intensity range. Particularly, the curve of the indentation depth vs. PF almost reached a plateau with PF more than 200 Hz. Moreover, the performance of phantom finger recognition deteriorated with the increasing number of phantom fingers under simultaneous TENS. For one, two and four stimulating channels, the corresponding recognition rate of an individual PFT were respective 85.83, 67.67, and 46.44%. The results of the present work would provide direct guidelines regarding the optimization of stimulating strategies to deliver artificial tactile sensation by TENS for clinical applications.

Keywords: sensory feedback, TENS, just-noticeable difference, upper-limb prosthesis, phantom finger discrimination

### INTRODUCTION

Amputation inevitably brings huge damage to both physical and mental health for upper-limb amputees (Kejlaa, 1992). Prosthetic hands, especially myoelectric prostheses, can help the amputees regain a significant functional improvement, which leads to more independence and higher quality of daily lives. Typically, the myoelectric signal is recorded near the residual limb to estimate the user's intention, which usually employs an open-loop control strategy without meaningful information about the manipulation situation transmitting to the users. However, a bidirectional communication bridging the amputees and prosthetic hands is necessary for the dexterous movement execution (Rothwell et al., 1982). Currently, prosthesis users mainly rely on visual feedback to gain information on the operational status of the prosthesis, which leads to a significant mental burden. Sensory feedback is critical for getting body ownership which can help an amputee feel that the prosthesis is a part of his body rather than an alien tool, and its incorporation into the prosthetic hands would be helpful for better device compliance from the user (Biddiss and Chau, 2007; Marasco et al., 2011; Saal and Bensmaia, 2015). Practically, the significance of sensory feedback has been noticed ever since the 1950s (Clippinger et al., 1974), and has attracted great interest in recent years (Jiang et al., 2012; Antfolk et al., 2013b; Delhaye et al., 2016; Svensson et al., 2017).

The sense of touch originated from normal hands carries complicated and comprehensive information like shape, temperature, size and texture of objects. Manipulation over objects by prosthetic hands can be slow, stiff and non-intuitive without tactile feedback (Delhaye et al., 2016). Besides, absence of tactile sensation from original hands also contributes to the emotional disorders involving anxiety, depression for the upper-limb amputees (Saradjian et al., 2008; Østlie et al., 2011). Consequently, tactile sensation is also the key for the maintenance of emotional balance (Hertenstein et al., 2009) and mental health (Bexton et al., 1954; Gilmartin et al., 2013) after amputation.

Tactile sensory feedback for prosthetic hands could be delivered via either invasive or noninvasive methods (Saal and Bensmaia, 2015; Delhaye et al., 2016; Svensson et al., 2017). Invasive methods included implantable devices at the central and peripheral neural pathways through cortical microstimulation (Chen et al., 2014; Flesher et al., 2016), spinal-cord stimulation (Schouenborg, 2008), peripheral nerve stimulation (Ortiz-Catalan et al., 2014; Tan et al., 2014) and target sensory reinnervation (TSR) (Kuiken et al., 2007). With these invasive methods, sensations of lost fingers or palms were partly restored for some amputees (Tan et al., 2014; Graczyk et al., 2016). However, there are still some big challenges to achieve clinical viability due to various issues such as the risk of infection in surgery, biological rejection, chronic validation or electrode replacements, etc. (Lipschutz, 2017; Svensson et al., 2017).

On the other hand, the non-invasive ways were explored by using mechanical or electrical stimulation, which resulted in the corresponding mechanotactile (Ehrsson et al., 2008), vibrotactile (Antfolk et al., 2012), electrotactile (Clemente et al., 2016; D'Anna et al., 2017) or combinational feedback schemes (Clemente and Cipriani, 2014). Previous studies showed that mechanical or electrical stimulation at the skin of the residual limb evoked the phantom illusion of the amputees (Mulvey et al., 2009; Antfolk et al., 2013a; D'Anna et al., 2017), which was stated as referred sensation of the lost hand after amputation (Ramachandran and Hirstein, 1998; Louis and York, 2006). Considering its integration and programmable characteristics, transcutaneous electrical nerve stimulation (TENS) was employed to elicit phantom finger sensations, meaning referred sensation of lost fingers (Chai et al., 2015; D'Anna et al., 2017). TENS of the median and ulnar nerves by surface electrodes were reported to produce hand sensations for normal subjects (Forst et al., 2015), and referred sensations of phantom fingers or palms for amputated subjects (D'Anna et al., 2017) for a short period. These referred sensations were most paresthesia-like (D'Anna et al., 2017), and positions of the referred sensation were influenced by the electrode location and arm positions (Forst et al., 2015). In addition, the local skin sensation under the stimulating electrode could strongly influence the recognition of phantom fingers (D'Anna et al., 2017). Chai et al. (2015) characterized the induced sensory modalities by TENS of the phantom finger maps or territories (PFTs) on the skin of the residual stump, and indicated the long-term stability of these PFTs. Chen et al. (2017) further observed the phantom finger sensation by TENS in the somatosensory cortex using magnetoencephalography (MEG) functional neuroimaging technique. Therefore, TENS of the phantom finger territories (PFTs) will be a promising approach that has the advantage of a somatotopic sensation scheme and avoids necessity of surgery. However, the critical question that how discriminative the artificial tactile sensation under TENS of the PFTs remains unanswered.

Tactile discrimination of phantom finger sensation is closely associated with stimulating parameters exerted on the PFTs. In the present study, we carried out classical psychophysical experiments to systematically characterize the perceptual properties by varying pulse amplitude (PA), pulse frequency (PF), and pulse width (PW). To determine the effective parameter range, we measured the detection thresholds and

**57**

Li et al. Discrimination of Phantom Finger Sensation

upper limits which would elicit uncomfortable sensations. And then, within available parameter ranges, we further assessed the perceived intensities indexed by the indentation depth on the contralateral intact finger pulps. The just-noticeable difference (JND), also called the difference threshold, and Weber fractions were evaluated to estimate the subjects' capability to distinguish among different stimuli. Finally, the phantom finger recognition was characterized under simultaneous stimulation.

### MATERIALS AND METHODS

### Subjects

Ten volunteers were randomly recruited. Prior to the psychophysical experiment, an interview was first conducted to find out each volunteer's medical history, phantom limb sensations and whether they experienced phantom limb pain now or in history. In our psychophysical experiments, the participants had a unilateral forearm amputation, and remained psychologically healthy with PFTs near the stump. And then six adult forearm amputees (subjects 1–6, three males and three females, average age ± SD: 50 ± 13, years after amputation: 16.7 ± 11.5) were recruited. The other four volunteers were excluded without phantom finger sensation. One of them was with congenital forearm deficiency (Subject 7), one as a forearm amputee (Subject 8), and the other two with shoulder-level amputation (Subjects 9 and 10). All the ten volunteers were right handed before amputation, and the general information were presented in **Table 1**.

### Identification of PFTs

For all the six subjects, phantom finger sensations were evoked when certain skin regions near the residual stump were touched by a stylus pen with 4 mm in diameter. These regions were confirmed as PFTs. Subjects 1–3 and 6 possessed five independent PFTs, which was designated in the experiments as phantom digits D1–D5. Subject 5 had four independent PFTs without phantom sensation of the ring finger (D4). Subject 4 also had five PFTs, but the territories D2 and D3 could not be clearly discriminated. In order to be consistent for comparing the tactile discrimination among the six subjects, four PFTs labeled as D1, D2, D3, and D5 were investigated under TENS to produce phantom finger sensations corresponding to lost thumb, index, middle and pinky fingers, respectively.

The detailed procedures for locating PFTs were described as follows: (1) The subject sat in a wooden chair comfortably with his/her amputated stump naturally placed on the table, and then the stump skin was cleaned with alcohol wipes. The subject's eyes were covered with an eyeshade. (2) A stylus was used to touch the volar side of the residual stump skin, and the subject was required to quickly report if specific phantom finger sensations were produced or not. Then the stylus was moved to the next point until the whole volar side was covered. In the end, the sites corresponding to the same phantom finger were connected to form a PFT outline. The most sensitive point (MSP) referred to a finger pulp in each PFT was also clearly identified and marked. This whole process was repeated twice for each subject to validate the PFTs. Each process took approximately 35 min, and a break of 1–2 min was randomly given to allow the subjects to have a relax.

Sometimes in the procedure of identifying the PFTs, a gentle touch on the stump skin by the stylus pen only produced a local sensation of the stump skin, and the phantom finger sensations were evoked with much stronger press. The regions originated from the phantom finger pulp, back, sides, and root were all covered inside a PFT. Although skin sites referred to the phantom palm and opisthenar were also reported, these were not involved within the PFTs. Two typical PFTs were shown in **Figure 1B**, and the MSP was denoted as a sign "×," which was considered as the TENS target location to produce the most obvious phantom finger sensation.

### Experimental Devices

The current stimulator (STG 4004 stimulator, MultiChannel Systems MCS GmbH, Germany) can generate four-channel independent stimulating current pulses, which are cathodic-first, biphasic and charge-balanced (**Figure 1D**). The PA can be finely modulated from −16 to 16 mA with the resolution of 0.2 µA, and can hold a maximum output compliance voltage of 120 V. The PW ranges from 20 µs to infinite with a minimum interval of 20 µs. Since the pulse period can be elongated gradually from 40 µs to more than tens of hours, the corresponding PF ranges from almost zero to 25 kHz. All the stimulating parameters can be readily programmed by the control software compatible with the stimulator hardware.

To quantitatively characterize the perceived intensity of phantom finger sensation under TENS, a compact punching machine (**Figure 1A**) was designed to apply indentation to the contralateral intact finger pulp. The indentation depths were modulated by moving the indenter, which was a plastic rod with circular cross section mounted on a moving stage. This stage was driven by a step motor through a ball screw pair. The laptop computer was used to program the exact indentation depth, and the step precision was ±20µm. The exact test configuration and the layout of the apparatus parts were schematically illustrated in **Figure 1A**.

To impose electrical stimuli on the MSP in a PFT, the flexible electrode array (Customized from Shanghai Benevolence Electronic Technology Co. Ltd., Shanghai, China) was utilized. All the electrodes were coated with a thin layer of conductive hydrogel adhesive. Two adjacent circular electrodes were defined as the stimulating and reference electrodes, respectively. Each electrode was 7 mm in diameter, and the center-to-center distance was 12 mm. The psychophysical experiments were carried out in the laboratory at 26◦C.

### Experimental Setup

To characterize the phantom finger sensation through TENS, a set of four experiments (**Figure 2**) were carried out including detection threshold determination, perceived intensity quantification, electrical stimulus discrimination and phantom finger recognition. Each experiment was divided into corresponding experimental sessions. Each session included four stimulating blocks with respect to D1, D2, D3, and D5 regions. For each block, tens to hundreds of stimulating trials were


<sup>a</sup>Strength of phantom limb pain was graded with a visual analog scale (VAS) between 0 and 10.

implemented. In total, there were approximately 2,200 trials for each subject. Considering the necessary breaks between trials, blocks and sessions, the whole experimental process occupied about 10 h. Thus, every subject was required to participate in these experiments twice in 2 or 3 days to maintain a relatively constant mental state.

#### Detection Thresholds

The detection thresholds under TENS in each PFT were tested in terms of PA, PW and PF modulations. The procedures were double-blinded for both the experimenter and subjects. The PA, PF, and PW were set as the predetermined typical values of 1.5 mA, 50 Hz, and 200 µs, respectively. Obvious and comfortable phantom finger sensations were elicited for all the six subjects with these typical stimulating parameters. Prior to finding out the detection thresholds, the upper stimulus limits leading to an uncomfortable sensation were obtained by the method of minimal change.

Urban (1910) pointed out when determining the detection thresholds with the classical method of constant stimuli, a random stimulating order should be applied. It was also reported that the stimulus intensity must scale from the sub- and the supra- threshold values. For these reasons, it was necessary to determine the rough threshold range including both sub- and supra- thresholds in stage 1 (here by the method of limit). Based on that, the test stimuli could be further narrowed down to determine the detection thresholds with the method of constant stimuli. So the procedure was detailed into two stages including

(C) and phantom finger recognition (D). The former three experiments were carried out under PA, PF, and PW modulations with four blocks corresponding to D1, D2, D3, and D5. There existed three recognition levels for phantom fingers. The stimulating trials were ordered pseudo-randomly within each block. Short breaks between trials, blocks, sessions were about 2, 30 s, 5 min, respectively.

rough confirmation of threshold ranges and fine determination of detection thresholds.

In stage 1, the rough thresholds of PA, PW and PF were measured using the method of minimal change, which provided a solid basis for the selection of testing values in the fine determination of detection thresholds in stage 2. During stage 1, the stimulating pulse trains lasted 3 s. With PF at 50 Hz and PW of 200 µs, PA increased from a lower value of 0.4 mA by a step of 0.1 mA until the subject reported that the stimuli were perceived. Similarly, for rough determination of PW, PA and PF were respectively set as 1.5 mA and 50 Hz, and PW started from 20 µs with an increasing of 20 µs at each step. Also, for rough determination of PF, PF increased from 1 Hz by 1 Hz with PA and PW set as 1.5 mA and 200 µs, respectively. For four PFTs among six subjects, the rough thresholds of PA ranged from 0.6 to 1.5 mA, those of PW from 60 to 120 µs, and those of PF from 1 to 17 Hz.

On the basis of the rough threshold ranges and the output precision of the stimulator, as listed in **Figure 2**, the testing values in stage 2 were chosen as 0.5, 0.75, 1, 1.25, 1.5, and 1.75 mA for PA, 20, 40, 80, 120, 160 µs for PW, and 1.5625, 3.125, 6.25, 12.5, 25 Hz for PF across these six subjects, where testing values of PF decreasing from 25 Hz by 25/2<sup>n</sup> .

In stage 2, detection thresholds were finely determined based on the method of constant stimuli by adopting two-alternative forced-choice (2AFC) paradigm (**Figure 1C**), where the subject reported which of the two intervals contained the stimulus (**Figure 2A**). During this task, the subject was instructed to focus on two gray areas on the computer screen. Two 2-s-long stimulating intervals (Interval I first and then Interval II) were presented with 1-s break in between. Each 2-s-long interval was initiated by a centered cross in the gray area. The 1-s-long current stimuli were randomly exerted in one of the second half periods within Intervals I and II. There were no current stimuli within the first 1-s period, which helped the subject concentrate on the moment when the phantom finger sensation generated. Immediately after the disappearance of the right cross, the subject was required to report which interval contained the stimulus.

Four stimulating blocks were presented in terms of four PFTs. Within each block, each trial was repeated 7 times for PA, PW and PF modulations, respectively, and the stimulus order within these two intervals in one trial was pseudo-randomized. Then the responses to every trial in each block were fitted by a sigmoid function. Within each trial during stage 2, the expected probability of correct judgment was 50% if the subject did not detect the stimuli at all, or otherwise the probability would rise to 100% if the subject readily detected the stimuli. Therefore, the detection thresholds were defined as the values of PA, PW, and PF that each subject could correctly identify 75% of the stimuli (**Figure 3**). The same criterion was also employed for intracortical sensory feedback (Flesher et al., 2016), and the probability of reaching this rate by chance was about 13.7% in our experiments.

#### Perceived Intensity Quantification

The PA, PW, and PF are the three common stimulus parameters which can be independently manipulated to introduce sensory feedback. In the previous work (Chai et al., 2017), multiple sensory modalities were produced by varying these three parameters. And here, we investigated the effects of these three parameters on the perceived intensities. Charge-balanced and cathodic-first stimulating current pulses were adopted in our psychophysical experiments, and variations in both PA and PW also led to changes of charge per phase. And then the indentation depth as a function of the charge per phase was further explored.

During the perceived intensity quantification, the finger being mechanically pressed on the healthy hand matched the phantom digit being tested. For example, when we applied TENS of D1, the contralateral thumb was mechanically pressed. The mechanical apparatus was kept stable on the table. The ball screw transferred the rotational displacement of the step motor into the linear displacement of the stage. The indenter protruded from the stage, and exerted the pressure on the finger pulp. There was enough space to put any of fingers between the indenter and the baseplate of the punching machine. The subject put their fingers in the baseplate axially below the indenter in a relaxed state. At point zero, there was no gap between fingers and the baseplate. The subjects could need to adjust the hand gesture to make sure that the finger pulps were in a relaxed state without introducing prestress in fingers. As such, the subject could readily judge the pressure intensity.

The perceived intensity of phantom finger sensation during TENS was quantitatively estimated by comparison with the indentation depth in the contralateral intact finger pulp. Every trial consisted of a 3-s-long constant current pulse train followed by a mechanical indentation. Immediately after a 3 s-long pulse train was applied into a PFT, the mechanical pressure was exerted on contralateral intact finger pulp through the indenter controlled by the punching machine shown in **Figure 1A**. The indentation depth was finely modulated until the perceived intensity matched to that of the electrical stimulation, and then the indentation depth was recorded. The stronger the phantom finger sensation, the deeper the indentation depth in the healthy counterpart finger. Consequently, the indentation depth was considered to be closely related to the perceived intensity of phantom finger sensation. The perceived intensities or the indentation depths were quantified in correspondence with PA, PF, and PW. Taking account of the detection thresholds, the testing values during perceived intensity quantification were listed in **Figure 2B** with PA, PF, and PW as typical values. The stimulating trials within each block were ordered randomly for every stimulating parameter.

Specifically, the modulation procedure of the indentation depth was further elaborated here. The position that the subject first detected the pressure was set as zero position. Then the depth increased from 0 with a step of 0.2 mm until the subject indicated that the mechanical intensity stronger than that of the electrical stimulation, and then was reduced by a step size of 0.04 mm until another reversal.

#### Electrical Stimulus Discrimination

The capability for a subject to discriminate the difference of stimuli is very important for artificial sensory feedback. The JNDs, also called difference thresholds, were adopted to characterize the capability to discriminate PA, PW, and PF based on the 2AFC paradigm. Similar to the determination of detection thresholds, two intervals appeared within each trial. Two 1-slong current pulse trains, called respectively reference and test stimuli, were applied within the second half periods of these two intervals as shown in **Figure 2C**. The participant was requested to report the exact interval where a stronger sensation occurred. Within one trial, the two stimulating pulse trains constituted a reference/test stimuli pair and only differed in one parameter among PA, PW, and PF, with the other two fixed at the typical values.

For PA discrimination, PW and PF were held as 200 µs and 50 Hz, respectively. The reference PAs were 1 mA and 2 mA. The test PAs were set as 50, 75, 90, 110, 125, 150% of the corresponding reference values.

For PW discrimination, PA and PF were held as 1.5 mA and 50 Hz, respectively. The reference PWs were chosen as 80, 200, and 400 µs. Considering a precision step of 20 µs, the test PWs for the reference 80 µs were 20, 40, 60, 100, 120, and 140 µs. For the other two reference PWs, 50, 75, 90, 110, 125, 150% of the reference values were selected as test stimuli for PW discrimination.

For PF discrimination, PA and PW were held as 1.5 mA and 200 µs, respectively. The reference PFs were defined as 50, 100, 200, and 400 Hz, and the test PFs were approximately 50, 75, 90, 110, 125, 150% of the reference counterparts. Since the pulse frequency PF was achieved by modulating the pulse period (1/PF), so the nearest frequencies to achieve these reference

thresholds of PA were 0.99 ± 0.39 mA, 0.78 ± 0.28 mA, 0.89 ± 0.28 mA, 1.26 ± 0.57 mA (A). The mean detection thresholds in PF were 2.23 ± 0.75 Hz, 2.3 ± 0.62 Hz, 2.17 ± 0.71 Hz, 3.13 ± 0.81 Hz (B). The mean detection thresholds in PW were 114.3 ± 48.75 µs, 98.3 ± 29.30 µs, 109.67 ± 36.61 µs, 131 ± 50.30µs (C).

percentages were used. For example, for the 50 Hz reference, the test values were 25, 40, 45.5, 55.6, 62.5, and 76.9 Hz (**Figure 2C**).

During the electrical stimulus discrimination, each trial was repeated 7 times within one block. Both the order of a reference/test stimulus pair and the stimulus order within the pair were pseudo-randomized and doubleblinded for both the experimenter and the subjects in each block.

#### Phantom Finger Recognition

The experiment of phantom finger recognition was carried out in three levels with typical stimuli, i.e., PA of 1.5 mA, PW of 200 µs and PF of 50 Hz. The participant was required to point out which phantom finger or fingers were perceived. **Figure 2D** showed the stimulating combinations for phantom finger recognition. For Level 1, only one phantom finger was under TENS with D1, D2, D3, and D5 as the possible stimulating sites. For Levels 2 and 3, two or four PFTs at most were under simultaneous electrical stimulation to test the subjects' recognition ability of an individual PFT, and there were respectively 10 or 15 possible PFT grouping combinations. So the chance levels were 25, 10, and 6.7% for Levels 1, 2, and 3, respectively. Each trial repeated five times, and the stimuli were applied randomly in each block and double-blinded for both the experimenter and the subjects. Only a short-time stimulation less than 3 min was applied to assist the subjects' familiarization with the experiments as to Levels 2 and 3. There was no special training provided for multi-digit identification.

### RESULTS

### Detection Thresholds

The rough upper limits to induce uncomfortable sensation were about 3 mA, 400 Hz, and 600 µs for PA, PW, and PF, and the detection thresholds were much lower than these upper limits. **Figure 3** clearly showed the detection thresholds across six subjects. The PA detection thresholds (with PF and PW as typical values) across D1, D2, D3, and D5 were 0.99 ± 0.39 mA, 0.78 ± 0.28 mA, 0.89 ± 0.28 mA, 1.26 ± 0.57 mA, respectively. The PF detection thresholds (with PA and PW as typical values) were 2.23 ± 0.75 Hz, 2.3 ± 0.62 Hz, 2.17 ± 0.71 Hz, and 3.13 ± 0.81 Hz, respectively. The PW detection thresholds (with PA and PF as typical values) were 114.3 ± 48.75 µs, 98.3 ± 29.30 µs, 109.67 ± 36.61 µs, and 131 ± 50.30 µs, respectively. Since 200 µs and 1.5 mA were assigned to PA and PW modulations, respectively, the thresholds in terms of charge per phase were correspondingly calculated as 0.178–0.252 µC for PA and 0.147– 0.195 µC for PW adapted from **Figures 3A**, **4C**. The averaged charge threshold for PA was 0.215 µC which was moderately greater than 0.171 µC for PW. The One-way ANOVA analysis results indicated that the four PFTs had no significant difference on the detection thresholds (P > 0.05).

### Perceived Intensity Quantification

During TENS of the PFTs, the subjects experienced a wide range of perceived intensities indexed by the indentation depth in the contralateral intact finger pulps. **Figure 4** illustrated that the indentation depth increased with enhancing electrical stimulus. The curves of the indentation depths vs. stimuli were basically in compliance with Steven's power function about the perceived intensity (Stevens, 1957). For lower stimuli, the slopes of curves were much steeper than those of the stronger stimuli. In the cases of PA and PW modulations, the depth boosted gradually with the advancing stimulus (**Figures 4A,C**). By comparison, the depth advanced much slower with PF of larger than 200 Hz (**Figure 4B**). What's more, the subject described the sensation in the low frequency below 10 Hz as "clearly but very slightly" corresponding to a very low indentation depth. When considering the relationships between the indentation depth and the charge per phase, **Figures 4A,C** were replotted in **Figure 4D**. The perceived intensity demonstrated a linear correlation with the enlarging charge in each phase. Especially, for charges from 0.2 to 0.5 µC in **Figure 4D**, the tendencies associated with PA and PW modulations matched well.

In **Figure 4**, the plots of the indentation depth vs. PA did not reach zero. The reason was that the lowest amplitude for PA modulation for this experiment was 0.9 mA, and an obvious perception was produced for the perceived intensity quantification experiments. So there was no zero for the indentation depth in terms of PA modulation. While for the PW modulation, the indentation depth reached zero since no perception was produced as to 20 and 40 µs, and the perception appeared under PW of 80 µs as listed in **Figure 2**. Moreover, at 3.125 Hz, there was still some gentle perception induced from the TENS of PFTs, and thus the indentation depth did not reach zero either.

In terms of the operational definition about detection thresholds, the subjects still had a probability of less than 75% to perceive the subthreshold stimulation. Different from this definition, the subjects definitely knew that there would be a stimulus applied to the PFTs during the experiments of perceived intensity quantification. Consequently, the subjects could perceive the electrical stimulation under small stimulus intensities. This could be the main reason why there was some difference between the lowest values in **Figure 4** and the detection thresholds in **Figure 3**.

The plateau in the plots in **Figure 4B** indicated that the perceived intensity would not change much beyond a high frequency such as 100 or 200 Hz. Practically, the perceived intensity was still advanced for the high frequency. However, the discrimination deteriorated correspondingly, which was further observed from the plots in **Figure 5C** that the Weber fraction increased gradually beyond 200 Hz. As a result, a typical sigmoid curve appeared for 50 Hz in the JND experiment, and the discrimination data did not fit a sigmoid very well for 400 Hz as shown in **Figure 5A**.

### Electrical Stimulus Discrimination

During this experiment, the subjects were required to judge whether the test or reference stimulus was stronger within every trial. Responses by participants were converted into a probability value based on their accuracy of identifying the correct interval with the stronger stimulus. A sigmoid was fitted and upper and lower limits on this probability function were defined as 25% and 75% probability of correctly identifying the stronger stimulus. For a given reference stimulus, the JND was yielded by averaging DL<sup>u</sup> and DL<sup>l</sup> in Equation (1).

$$\text{JND} = \frac{(DL\_{\text{ul}} + DL\_{l})}{2} \tag{1}$$

As shown in **Figure 5A**, the DL<sup>u</sup> and DL<sup>l</sup> respectively denoted the differences between the reference stimulus with the upper

FIGURE 4 | Perceived intensity quantification indexed by the indentation depth in PA, PF and PW modulations across four phantom fingers among six subjects. The solid lines and shaded regions denoted the mean and standard deviation values. (A) Indentation depth vs. PA; (B) Indentation depth vs. PF; (C) Indentation depth vs. PW; (D) Indentation depth vs. Charge per phase. Black for PA modulation with constant PW of 200 µs, and Red for PW modulation with constant PA of 1.5 mA. The vertical dotted line at 0.3 µC indicated the same indentation depth with common parameters (1.5 mA × 200 µs = 0.3 µC) under PA and PW modulations.

limit (Lu) and lower limit (L<sup>l</sup> ) of the discriminated test stimuli. **Figure 5A** showed two curves illustrating how to define the JND for the PF modulation. To investigate the stimulus discrimination of detectable and comfortable PA, PF and PW stimuli, the Weber fraction (Ekman, 1959) was computed as shown in Equation (2).

$$Weber\,\text{fraction} = \frac{JND}{reference\,\,\text{timulus}}\tag{2}$$

In this experiment, the average Weber fractions ranged from 0.11 to 0.18 for the PA modulation, 0.14–0.32 for PF, and 0.1–0.265 for PW. The relationships of Weber fraction with different stimuli were plotted in **Figure 5** across D1, D2, D3, and D5 PFTs for six subjects. For PA and PW modulations, the Weber fractions were usually lower than 0.2, and decreased with enhancing stimulus. For PF modulation, the Weber fractions were a little larger and slightly increased within available frequency range. According to Weber's law, the Weber fraction was approximately considered as constant (Kandel et al., 2012), but this rule was not applicable for the low and high intensities with a given stimulus range (Gescheider, 1997). Here in this experiment, both 1 mA in PA and 100 µs in PW were considered as low intensities and 400 Hz in PF as the high frequency. For low intensities of PA and PW, it was sometimes very hard for some subjects to judge whether the test or reference stimulus was stronger.

By ignoring low intensities of 1 mA and 100 µs, and high intensity of 400 Hz, the proposed "optimal range" of the stimuli, which elicited a clearly discriminative sensation without uncomfortable feeling such as pain, were 1.2–2.8 mA in PA, 10– 350 Hz in PF and 150–600 µs in PW. And then the corresponding Weber fractions were defined as 0.1 in PA, 0.2 in PF and 0.1 in PW.

### Phantom Finger Recognition

The recognition performance of different PFTs was assessed in terms of three levels with typical values of PA, PF, and PW.

FIGURE 5 | Electrical stimulus discrimination in PA, PF, and PW modulations across four PFTs involving six subjects. The solid lines and shaded regions indicated the mean and standard deviations, respectively. (A) Two examples as to getting the just-noticeable difference (JND) in PF (D2 in Subject 1). The reference PFs were 50 Hz (left) and 400 Hz (right). A sigmoid curve was fitted and upper and lower limits on the probability function were defined as 25% and 75% probability of correctly identifying the stronger stimulus. At last, the JND was calculated by averaging the DLl and DLu. (B) The Weber fraction vs. PA; (C) The Weber fractions vs. PF; (D) The Weber fractions in PW modulation.

The more the possible number of PFTs under simultaneous stimulation, the poorer the recognition performance of the individual PFT. For Levels 1–3, the correct recognition ratios about individual PFTs were 85.83% (103/120) (chance level: 25%), 67.67% (203/300) (chance level 10%), and 46.44% (209/450) (chance level 6.7%), respectively. For Level 1 (**Figure 6A**), the leading incorrect justice was produced due to the sensation influence from the adjacent phantom fingers (16/120). In Level 2 (**Figure 6B**), the misjudgments were classified into three types. The first type was the incomplete judgment (41/300). Only one phantom finger was correctly identified with two PFTs under simultaneous TENS, e.g., D1 and D2 under TENS were identified as only phantom index finger. The second was the excessive judgment (12/300). Sensation of two phantom fingers were reported with only one PFT under TENS, e.g., phantom thumb and index fingers were reported with D1 under TENS. The third was mixed with both incomplete and excessive judgments (43/300). One of two PFTs under simultaneous TENS was identified correctly but the other was misjudged as another PFT, e.g., D1 and D3 under TENS were reported as phantom thumb & index fingers. For Level 3 (**Figure 6C**), when TENS was applied to four PFTs at most, there were more misjudgments which were also classified as incomplete judgment (142/450, excessive judgment (33/450),

most were under simultaneous TENS; (C) Level 3 (chance level: 6.7%): four PFTs at most were under simultaneous TENS.

and mixed misjudgment with both incomplete and excessive judgments (62/450). There were very few reports that none of the phantom fingers was identified correctly in more-than-one PFTs stimulation (5/510).

### DISCUSSION

Our normal hand is so dexterous, with 27 degrees of freedom. Hand muscles are innervated by thousands of afferent nerve fibers which convey different (sometimes overlapping) information about objects under manipulation (Abraira and Ginty, 2013; Saal and Bensmaia, 2014). For prosthetic hands, restoring tactile feedback requires multiple stimulating channels to convey adequate information that causes appropriate tactile discrimination in association with detection and interpretation of those stimuli. Kandel et al. (2012) and Saal and Bensmaia (2015) also denoted that stimulating location and perceived intensity were critical attributes for encoding the tactile information for a specific channel and pattern coding of united activities in several channels. In addition, the existence of referred sensations near the stump about phantom limb (PL) (Hunter et al., 2008), phantom hand (PH) (Anani and Körner, 1979), and phantom finger (PF) (Björkman et al., 2016) provided a good pathway to realize artificial tactile feedback. Consequently, our present work characterized the discriminability of the perceived intensity and phantom fingers under TENS in PA, PF, and PW modulations. Four experiments were carried out including detection thresholds, perceived intensity quantification, electrical stimulus discrimination, and phantom finger recognition.

The purpose of our experiment for the detection threshold was to determine the range of parameters without causing uncomfortable sensations. We chose the method of constant stimuli in a 2AFC paradigm (Kandel et al., 2012) which could reduce the impact of a subject's error of habituation and anticipation compared with the method of minimal change. An important premise was that the subject knew there was definitely a stimulus in one of two intervals within a trial, and he/she was required to choose a preferred one. The detection threshold charge in our finding was about 0.2 µC (1 mA × 0.2 ms = 0.2 µC) lower than 0.6 µC or so for TENS of median or ulnar nerves deep beneath the skin (D'Anna et al., 2017). Under TENS of PFTs, there were no induced strong local sensation of skin or muscle movement happening otherwise for TENS of median or ulnar nerve. By adopting extraneural Cuff or FINE (Flat Interface Nerve Electrode) electrodes, the charge threshold was as small as about 0.1 µC for artificial tactile sensation (Graczyk et al., 2016). Additionally, the maximum charge injected into median and ulnar nerves were 8 and 24 nC using intraneural TIME (Transversal Intrafascicular Multichannel Electrode) electrodes (Raspopovic et al., 2014), and it was also reported that the injected charge threshold ranged from 4.25 nC to 17.5 nC with LIFE (Longitudinal Intrafascicular Electrode) electrodes (Dhillon and Horch, 2005). The detection thresholds in our study were significantly higher than those under invasive circumstances, which indicated that more invasiveness would require less charge to excite the sensory afferents. Since the attention of the subject was engaged in detecting if there existed a stimulus, the detection thresholds in this operational definition might not be detected in other tasks or in daily life, which was possibly due to sensory inputs selection mechanism of attention (Hsiao et al., 1993). Consequently, it was difficult for subjects to describe the perceived intensity near the detection threshold. This was in accordance with the typical response of stimuli near the detection threshold (Flesher et al., 2016). Therefore, the default values of the PA, PW and PF were set a little higher than the corresponding detection thresholds to make sure that the subjects had perceptible and comfortable sensations during experiments of electrical stimulus discrimination and phantom finger recognition.

During the TENS of PFTs, the elicited artificial sensations would convey more information than just magnitude in sensory modalities such as "pressure," "vibration," "tingling," and a variable sensation area. While, for perceived intensity quantification under TENS of PFTs, the subject was instructed to ignore the sensory modality or area changes and only focused on the perceived intensity which was indexed by mechanical indentation depth on the contralateral healthy finger pulp. For participants, the elicited sensations were described as "natural sensation, but they were still different from the sensations under mechanical stimuli." They described that "the sensation of electrical stimuli is deeper and sharper than feeling under mechanical pressure." Especially for PF modulation, they felt a little confused to match the intensity of a sharp sting elicited by electrical stimuli with PF above 400 Hz to the mechanical counterparts.

Within the tested stimulus range, the perceived intensities boosted linearly with the increasing PA, and the changing tendencies were similar to the PW modulation in **Figure 4**. On the other hand, for PF modulation, the intensities were only enhanced linearly with frequencies from 0 to 200 Hz, and remained almost stable for higher frequencies. This was probably due to the reason that the charge per phase was changed under PA and PW stimulation to activate sensory afferents, while the firing rate of fibers changed for the PF modulation (Graczyk et al., 2016). For the PW and lower PF modulations, similar findings existed for peripheral nerve stimulation using FINE or spiral Cuff electrodes. The perceived intensities increased linearly with both PW and PF increasing (Graczyk et al., 2016), where the frequencies were from 25 to 166 Hz. Theoretically, the subjective experience of the perceived intensity was expressed by a power function (Stevens, 1957). For some somatosensory experience, the power function could have a unity exponent which showed a linear relationship (Kandel et al., 2012).

For determination of JNDs, the Weber fraction (Ekman, 1959) was adopted to represent the subjects' abilities to discriminate stimuli. The subjects were required to focus on the difference of perceived intensities between two stimuli in each trial while ignoring other modality or area changes, etc. The smaller the Weber fraction, the better the stimulus discriminability. For the PF modulation, the corresponding Weber fractions were larger than those in PA and PW modulations. Graczyk et al. also denoted that Weber fractions in the PW modulation was much lower than that in PF modulation (Graczyk et al., 2016). The JND for PF was 16.5 ± 1.6 Hz at 50-Hz reference with the Weber fraction of 0.33. The JND for PW was 6.7 ± 1.0 µs, yielding a Weber fraction of 0.05, which was significantly lower than Weber fractions of PF.

The performance in phantom finger recognition without additional training on purpose showed that the main misjudgments were associated with the adjacent PFTs, which could be due to the crosstalk from the electric field spreading during TENS for a specific PFT. Much smaller electrode could be adopted to minimize this kind of misjudgments. There existed incomplete judgment under TENS of more than one PFT. This kind of misjudgment might be due to the masking effect, which meant that the perception of one phantom finger could be also influenced by sensation from other PFTs (Gescheider et al., 1970). Besides, the deteriorated phantom finger recognition could also be resulted from the fact that uniform stimulating current parameters were adopted for tested PFTs among these six subjects with different detection thresholds. The artificial tactile sensation functioned as a process of perception which included "organization, identification and interpretation of sensory information in order to present and to understand the input information, or the environment" (Schacter, 2012). Although there existed some incorrect justice for phantom finger recognition, the discrimination ability of different phantom fingers was empirical, and would be improved through training as a part of learning process (Delhaye et al., 2016; Chai et al., 2017). The recognition of simultaneous stimulation was close to others' work in intracortical sensory feedback, which was 85% for one channel and 53% for two channels. This recognition performance would be advanced by recruiting more and smaller subsets of fibers individually through high electrode density and optimizing stimulating parameters and sites.

In the past several years, the implanted Cuff (Ortiz-Catalan et al., 2014; Tan et al., 2014; Graczyk et al., 2016), USEA (Utah Slanted Electrode Array) (Warwick et al., 2003; Ledbetter et al., 2013), LIFE (Dhillon et al., 2004), and TIME (Boretius et al., 2010; Raspopovic et al., 2014) electrodes were adopted to help produce natural sensation of lost fingers or palms, which made it feasible to accomplish closed-loop motor control of objects in a lab environment (Ortiz-Catalan et al., 2014; Tan et al., 2014; Graczyk et al., 2016). On the other hand, TENS of PFTs by surface electrodes also produced sensation of individual fingers comparable to that for the invasive sensory feedback scheme. However, due to the relatively large surface electrode size and limitation of PFT space, usually one stimulating electrode was located on the MSP within a PFT, and it was hard to stably discriminate different areas within one phantom finger. While, for invasive methods, sensation of some localized areas for a phantom finger could be stably discriminated (Raspopovic et al., 2014; Tan et al., 2014; Graczyk et al., 2016), which would be due to the reason that an implantable microelectrode could supply a more localized stimulation of sensory neurons. In addition, with the number of stimulating electrodes under simultaneous stimulation, recognition of different phantom fingers deteriorated in our study, and the correct ratio decreased from 85.83% (one-channel stimulation) to 67.67% for two channels and 46.44% for four channels. Although the correct ratios were lower for two and four channel stimulation, they were greatly higher than their corresponding chance level as 10 and 6.7%, respectively. In our opinion, the incomplete or partial misjudgment of phantom fingers would partly affect the sensation of object details, but during real-world closedloop control of prosthetic hands, there existed timing difference of activation among different electrodes (Raspopovic et al., 2014). So more sophisticated encoding approaches introducing this kind of timing difference could be adopted to improve the phantom finger recognition for clinical applications. It was reported that there were roughly 65% of trans-radial amputees with some form of phantom hand sensation (D'Anna et al., 2017). For these amputees, TENS of PFTs would be more appropriate having stable selectivity of individual fingers. For those with high-level amputation and without PFTs, the invasive sensory feedback scheme would be more suitable.

Tactile sensory feedback is undoubtedly essential for the engagement in manipulation and feeling of body ownership of the prosthesis. For now, confusion with the meanings of the resulted artificial sensation and the high cognitive load are still the key issues for the sensory feedback, which requires a more intuitive and high discriminative neural interface (Farina and Amsüss, 2016; Svensson et al., 2017). Others' studies revealed that the phantom finger sensations by mechanical stimulation of the residual stump mapped well to the corresponding normal fingers in the primary somatosensory cortex using fMRI (Björkman et al., 2012). Moreover, our previous work also revealed that the responses related to the phantom finger sensation under TENS were observed in the somatosensory cortex by using MEG neuroimaging technique (Chen et al., 2017). For those reasons, the PFTs under TENS would be intuitive to be recognized and understood by part of the upper-limb prosthetic users. This present work would provide guidelines for strategy selection of artificial tactile feedback in prosthetic hands with less cognitive load for potential clinical applications.

### CONCLUSION

The discrimination ability of phantom finger sensations elicited by TENS of the PFTs were characterized. We focused on the perceived intensity quantification, electrical stimulus discrimination and phantom finger recognition based on psychophysical experiments. The participants could discern small changes of stimuli in PA, PF, and PW modulations. Although the more number of PFTs under simultaneous stimulation would convey richer tactile information, the recognition performance would deteriorate. Our present studies would shed a light on the optimization of the stimulating strategy to accomplish the clinical application for the intelligent upperlimb prosthetics in the near future. In our future work, we would dig into the objective somatosensory cortical responses objectively by MEG, and further elucidated the neural basis about the discrimination and recognition characteristics.

### ETHICS STATEMENT

All the psychophysical experiments were carried out in terms of the Declaration of Helsinki, and approved by the Ethics Committee of Human and Animal Experiments at School of Biomedical Engineering, Shanghai Jiao Tong University (No. 2016012). All the subjects or participants were informed of the whole experimental procedure, and signed the informed consent form before experiments.

### AUTHOR CONTRIBUTIONS

XS contributed to the design of the overall psychophysical experiments and data analyses. ML conducted the psychophysical experiments and data analyses. ML wrote the first draft and XS also contributed to the whole manuscript revision. DZ, LH, and YC contributed to subject recruitment and the experiment. XC, YC, and JG were involved in the establishment of the experimental setup. All authors were active in the editing and

### REFERENCES


revising processes of the manuscript. All authors read and approved the final manuscript.

### FUNDING

This research is supported by the National Natural Science Foundation of China (81671801, 61761166006, 31471081, 61773256, 61472247, 51475292), the Innovation Studio Fund from School of Biomedical Engineering at Shanghai Jiao Tong University, and the Medical-Engineering Cross Project of Shanghai Jiao Tong University (YG2017MS53).

### ACKNOWLEDGMENTS

The authors would like to thank all the volunteers for involving in the experiments, and Mr. Chenxue Li and Haifeng Zhang for help in subject recruitment. They also give warm thanks to their colleagues for advice on improving the English-writing styles.


of human somatosensory cortex. Sci. Transl. Med. 8:361ra141. doi: 10.1126/scitranslmed.aaf8083


**Conflict of Interest Statement:** LH was employed by company Shanghai Health 51 Net Technology Co., Ltd.

The other authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2018 Li, Zhang, Chen, Chai, He, Chen, Guo and Sui. This is an openaccess article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# A User Study on Robot Skill Learning Without a Cost Function: Optimization of Dynamic Movement Primitives via Naive User Feedback

Anna-Lisa Vollmer <sup>1</sup> \* and Nikolas J. Hemion<sup>2</sup>

<sup>1</sup> Applied Informatics Group, Cluster of Excellence Cognitive Interaction Technology (CITEC), Bielefeld University, Bielefeld, Germany, <sup>2</sup> AI Lab, SoftBank Robotics Europe, Paris, France

Enabling users to teach their robots new tasks at home is a major challenge for research in personal robotics. This work presents a user study in which participants were asked to teach the robot Pepper a game of skill. The robot was equipped with a state-of-the-art skill learning method, based on dynamic movement primitives (DMPs). The only feedback participants could give was a discrete rating after each of Pepper's movement executions ("very good," "good," "average," "not so good," "not good at all"). We compare the learning performance of the robot when applying user-provided feedback with a version of the learning where an objectively determined cost via hand-coded cost function and external tracking system is applied. Our findings suggest that (a) an intuitive graphical user interface for providing discrete feedback can be used for robot learning of complex movement skills when using DMP-based optimization, making the tedious definition of a cost function obsolete; and (b) un-experienced users with no knowledge about the learning algorithm naturally tend to apply a working rating strategy, leading to similar learning performance as when using the objectively determined cost. We discuss insights about difficulties when learning from user provided feedback, and make suggestions how learning continuous movement skills from non-expert humans could be improved.

Keywords: programming by demonstration, imitation learning, CMA-ES, human-robot interaction, DMP, human factors, optimization, skill learning

## 1. INTRODUCTION

Robots are currently making their entrance in our everyday lives. To be able to teach them novel tasks, learning mechanisms need to be intuitively usable by everyone. The approach of Programming by Demonstration (Billard et al., 2008) includes users to show their robot how a task is done (for example via kinesthetic teaching), and the robot will then reproduce the demonstrated movement. However, not all tasks can be easily demonstrated to a robot this way. For example some tasks are only solved with very precise movements which are difficult to successfully demonstrate for the user. Instead, it is often more feasible to let the robot self-improve from an imperfect demonstration. Most research on robot learning aims primarily at optimizing the final task performance of the robot, while disregarding the usability of the system by non-expert users. In particular, Programming by Demonstration studies and, even more so the optimization, are primarily tested in laboratory environments and rarely evaluated with human users, let alone with

#### Edited by:

Luka Peternel, Fondazione Istituto Italiano di Technologia, Italy

#### Reviewed by:

Bojan Nemec, Jožef Stefan Institute (IJS), Slovenia Andrej Gams, Jožef Stefan Institute (IJS), Slovenia

#### \*Correspondence:

Anna-Lisa Vollmer avollmer@techfak.uni-bielefeld.de

#### Specialty section:

This article was submitted to Humanoid Robotics, a section of the journal Frontiers in Robotics and AI

Received: 15 March 2018 Accepted: 06 June 2018 Published: 02 July 2018

#### Citation:

Vollmer A-L and Hemion NJ (2018) A User Study on Robot Skill Learning Without a Cost Function: Optimization of Dynamic Movement Primitives via Naive User Feedback. Front. Robot. AI 5:77. doi: 10.3389/frobt.2018.00077 non-experts. The typical workflow for creating an optimization system encompasses the definition of a suitable cost function, which the system can evaluate to improve its performance. Finding a cost function that will ensure the desired outcome of the robot learning is far from trivial. In fact, often it is difficult even for domain experts to define a cost function that does not lead to unexpected behaviors by the robot. To be usable by nonexpert users, it is unrealistic to expect the user to design a cost function in order to teach their robot a new skill. To make things worse, many cost functions require an external sensory setup (in addition to the robot's on-board sensors) to measure relevant features precisely enough for the computation of the cost function—again, something which is feasible in a laboratory environment, but not realistic for use at home by non-experts.

The general research topic of this work is thus to investigate, whether it is possible to employ a state-of-the-art optimization system in a user-centered setup: one that is intuitively usable by non-experts, and could easily be operated outside the laboratory (for example, it does not require expensive or difficult to calibrate equipment). In particular, we concentrate on robot learning of complex movement skills with a human teacher. As a method, we chose optimization of Dynamic Movement Primitives (DMPs) (see section 2) as a widely used method from the Programming by Demonstration literature.

It is commonly assumed that the feedback humans provide is a noisy and unreliable reward signal (e.g., Knox and Stone, 2012; Weng et al., 2013; Daniel et al., 2015): it is assumed that humans do not provide an optimal teaching signal, and therefore additional care should be taken when using the human-provided signal in a robot learning system. In contrast, here we deliberately chose to use an unaltered optimization system, without any modifications to the learning algorithm for "dealing with" the human-provided teaching signal or specific adaptations toward the human. In doing so, we aim at demonstrating, as a baseline, the performance of an unaltered, state-of-the-art Programming by Demonstration setup trained using human feedback alone. The only modification in our system is to replace the sensorybased cost evaluation by an intuitive to use graphical user interface, allowing the user to provide a discrete-valued feedback to the robot after each movement execution.

### 1.1. Related Work

The field of Interactive Machine Learning (IML) aims to give the human an active role in the machine learning process (Fails and Olsen, 2003). It is a rather vast field including the human in an interactive loop with the machine learner, ranging from web applications to dialog systems, but also robots: the learner shows its output (e.g., performance, predictions) and the human provides input (e.g., feedback, corrections, examples, demonstrations, ratings). In robotics, IML combines research on machine learning (section 1.1.1) and human-robot interaction (section 1.1.2).

### 1.1.1. Machine Learning With Human Teachers

Regarding machine learning research, there is a large body of literature on incorporating human-provided reward signals into reinforcement learning algorithms. The majority of approaches focuses on the case where the action space of the robot is discrete (e.g., Abbeel and Ng, 2004; Thomaz and Breazeal, 2008; Chernova and Veloso, 2009; Taylor et al., 2011; Cakmak and Lopes, 2012; Griffith et al., 2013; Cederborg et al., 2015), which means that the robot already has to know the "steps" (or "basic actions") required to solve a task in advance: Related work in this area includes the work of Thomaz et al., who investigated user input to a reinforcement learning agent that learns a sequential task in a virtual environment (Thomaz et al., 2006). They then altered the learning mechanism according to the results of their Human-Robot Interaction (HRI) studies. Also Senft et al. recently presented a study with a virtual reinforcement learning agent learning sequential tasks with user rewards (Senft et al., 2017).

Here, in contrast, we are interested in the case of a continuous action space, which would allow a human user to teach their robot entirely new actions (which could in principle then also be used as new "basic actions" in reinforcement learning methods as the ones just mentioned). There is some existing work on robot learning from user feedback where the robot's action space is continuous. Knox and Stone proposed the "TAMER" framework, aimed at learning a model of the human-provided reward, explicitly taking effects such as time-delayed responses into account (Knox and Stone, 2009). TAMER has mostly been used for learning in the case of discrete state and action spaces (Knox and Stone, 2012; Knox et al., 2012a,b), but recently has also been applied to traditional reinforcement learning benchmark tasks involving continuous spaces (e.g., Vien and Ertel, 2012). Similarly, Daniel et al. use Gaussian process regression and Bayesian optimization in combination with relative entropy policy search to estimate a reward function from user-provided feedback. In contrast to these works, we do not estimate a reward function but directly treat the user responses as teaching signal to the learning algorithm, to evaluate if an unaltered optimization algorithm in conjunction with DMPs can operate on user-provided discrete scores, noisy or not.

Instead of requesting a score or reward value directly from the user, it has been suggested to employ preference-based learning (Christiano et al., 2017; Sadigh et al., 2017): the user is repeatedly presented with two alternative performances by the robot or agent, and is asked to select one over the other. Sadigh et al. used such an approach to let users teach a simulated 2-dimensional autonomous car to drive in a way deemed reasonable by the user (Sadigh et al., 2017). Their system learned a reward function from the human provided reward. However, the function estimation relied on a set of predefined features to succeed in learning from relatively little data. Like designing a cost function, also the design of suitable feature representations for the cost function estimation in itself can be challenging, and certainly is for non-experts. Christiano et al. successively presented pairs of short video clips showing the performance of virtual agents (simulated robots in one task, and agents playing Atari games in another task) to human participants, who then selected the performance that they preferred (Christiano et al., 2017). Using this feedback alone, the virtual agents were able to learn complex behaviors. Christiano et al. also learn a model of the userprovided responses. Interestingly, they were able to reduce the total amount of time humans had to interact with the learning system (watch videos, provide feedback) to only about 1 h. However, their work is based on deep reinforcement learning methodology and thus requires the agent to train in total for hundreds of hours, which poses a severe difficulty for application in real robots on the one hand in terms of time necessary for training, and on the other hand due to other factors such as physical wear down. In contrast, we present a system that does not rely on the definition of suitable feature representations, and can learn successful movement skills from non-expert users in as little as 20 min in total.

#### 1.1.2. Human-Robot Interaction With Machine Learners

Developing machine learning algorithms, we cannot imagine or model in theory what everyday, non-expert users will do with the system. For example, studies in imitation learning or Programming by Demonstration have shown that people will show completely different movement trajectories depending on where the robot learner is looking at the time of demonstration Vollmer et al. (2014). Thus, if we develop systems without considering human factors and testing it in HRI studies with everyday people, then our systems in the end might not be usable at all. Here, we briefly review studies of human-robot-learning scenarios with real naive human users. Some related HRI studies test machine learning algorithms with humans users and examine how naive users naturally teach robots and how the robot's behavior impacts human teaching strategies (see Vollmer and Schillingmann, 2017, for a review). In the area of concept learning for example, Cakmak and Thomaz (2010) and Khan et al. (2011) studied how humans teach a novel concept to a robot. In a task with simple concept classes where the optimal teaching strategy is known, Cakmak and Thomaz (2010) found that human teachers' strategies did not match the optimal strategy. In a follow-up study, they tried to manipulate the human teacher to employ the optimal teaching strategy. Khan et al. (2011) provided a theoretical account for the most common teaching strategy they observed by analyzing its impact on the machine learner.

Natural human teaching behavior of movement skills is very complex, highly adaptive and multimodal. Previous HRI studies have investigated the naive demonstration of continuous robot movement skills, focusing on the usability of kinesthetic teaching Weiss et al. (2009), or not applying machine learning algorithms but studying the influence of designed robot behavior, for example incorporating findings from adult-infant interactions (Vollmer et al., 2009, 2010, 2014).

Weiss et al. (2009) have shown that naive users are able to teach a robot new skills via kinesthetic teaching. Here, we do not focus on the demonstration part of the skill learning problem, but the users' feedback replaces the cost function for task performance optimization.

### 1.2. Contribution and Outline

In this work, we investigate whether a completely unmodified version of a state-of-the-art skill learning algorithm can cope with naive, natural user feedback. We deliberately restricted our system to components of low complexity (one of the most standard movement representations in the robotics literature, a very simple optimization algorithm, a simplistic user interface), in order to create a baseline against which more advanced methods could be compared.

We present a first study with non-expert participants who teach a full-size humanoid robot a complex movement skill. Importantly, the movement involves continuous motor commands and cannot be solved using a discrete set of actions.

We use Dynamic Movement Primitives (DMPs), which are "the most widely used time-dependent policy representation in robotics (Ijspeert et al., 2003; Schaal et al., 2005)" (Deisenroth et al., 2013, p. 9) combined with Covariance Matrix Adaptation Evolution Strategy (CMA-ES, Hansen, 2006) for optimization. Stulp and Sigaud (2013) have shown that the backbone of CMA-ES, "(µW, λ)-ES one of the most basic evolution strategies is able to outperform state-of-the-art policy improvement algorithms such as PI<sup>2</sup> and PoWER with policy representations typically considered in the robotics community."

The task to be learned is the ball-in-cup game as described by Kober and Peters (2009a). Usually, these state-of-the-art learning mechanisms are tested in the lab in simulation or with carefully designed cost functions and external tracking devices. Imagine robots in private households that should learn novel policies from their owners. In this case, the use of external tracking devices is not feasible, as it comes with many important requirements (e.g., completely stable setup and lighting conditions for colorbased tracking with external cameras). We chose the ball-incup game for our experiment, because it has been studied in a number of previous works (Miyamoto et al., 1996; Arisumi et al., 2005; Kober and Peters, 2009b; Nemec et al., 2010, 2011; Nemec and Ude, 2011) and we can therefore assume that it is possible to solve the task using DMP-based optimization. Still, it is not at all trivial to achieve a successful optimization, but a carefully set up sensory system is required to track the ball and the cup during the movement, as well as a robustly implemented cost function (covering all contingencies, see section 2.2). We therefore believe the task to be a suitable representative for the study of robot learning of complex movements from naive users, which would otherwise require substantial design effort by an expert.

Policy search algorithms with designed cost functions usually operate on absolute distances obtained via a dedicated sensory system. However, participants in our study are naive in the sense that they are not told a cost function and it is difficult for humans to provide absolute distances (i.e., the cost) as feedback to the robot. Therefore, we provided participants with a simple user interface with which they give discrete feedback for each robot movement on a scale from one to five.

The central question we aim to answer is: can human users without technical expertise and without manual or specific instructions teach a robot equipped with a simple, standard learning algorithm a novel skill in their homes (i.e., without any external sensor system)? For the evaluation, we focus on system performance and the user's teaching behavior. We report important difficulties of making learning in this setup work with an external camera setup (section 2.2) and with human users (section 4.1).

### 2. MATERIALS AND METHODS

## 2.1. System

#### 2.1.1. Robot

Pepper is a 1.2 m tall humanoid robot developed and sold by SoftBank Robotics. Pepper's design is intended to make the interaction with human beings as natural and intuitive as possible. It is equipped with a tablet as input device. Pepper is running NAOqi OS. Pepper is currently welcoming, informing and amusing customers in more than 140 SoftBank Mobile stores in Japan and it is the first humanoid robot that can now be found in Japanese homes.

In our study, Pepper used only its right arm to perform the movements. The left arm and the body were not moving. For the described studies, any collision avoidance of the robot has been disabled. Joint stiffness is set to 70%.

### 2.1.2. Setup

The setup is shown in **Figure 1**. Two cameras recorded the movement at 30 Hz, one from above and another one from the side. This allowed for tracking of the ball and cup during the movements. All events, including touch events on the tablet of the robot were logged.

### 2.1.3. Ball and Cup

The bilboquet (or ball and cup) game is a traditional children's toy, consisting of a cup and a ball, which is attached to the cup with a string, and which the player tries to catch with the cup. Kober et al. have demonstrated that the bilboquet movement can be learned by a robot arm using DMP-based optimization (Kober and Peters, 2009a), and we have demonstrated that Pepper is capable of mastering the game<sup>1</sup> . In this study, the bilboquet toy was chosen such that the size of the cup and ball resulted in a level of difficulty suitable for our purposes (in terms of time needed to achieve a successful optimization) and feasibility regarding the trade-off between accuracy (i.e., stiffness value) and mitigating hardware failure (i.e., overheating). Usually, such a movement optimization provides a more positive user experience when learning progress can be recognized. Thus, the initialization and exploration parameters together should yield an optimization from movements somewhere rather far from the cup toward movements near the cup. With a small cup, if the optimization moves rather quickly to positions near the cup, the "fine-tuning" of the movement to robustly land the ball in the cup takes disproportionally long. This is partially due to the variance introduced by hardware. Therefore, we chose the cup size to result in an agreeable user experience by minimizing the time spent on "fine tuning" of the movement near the cup at the end of the optimization process on the one hand, and on the other hand by minimizing the teaching time until the skill has been successfully learned.

### 2.1.4. Learning Algorithm

We implement the robot's movement using dynamic movement primitives (DMPs) (Ijspeert et al., 2013). We define the DMP as

<sup>1</sup>https://youtu.be/jkaRO8J\_1XI

FIGURE 1 | Experimental setup from above. In the studies with optimization via the external camera setup (section 2.2), where the experimenter only returned the ball to its home position, the seat for the participant remained empty.

coupled dynamical systems:

$$\frac{1}{\tau}\ddot{\boldsymbol{\gamma}}\_t = \alpha\_\circ(\beta(\boldsymbol{\gamma}\_\circ - \boldsymbol{\gamma}\_t) - \dot{\boldsymbol{\gamma}}\_t) \quad + \quad \nu\_t(\boldsymbol{\gamma}\_\circ - \boldsymbol{\gamma}\_0) \cdot \boldsymbol{h}\_\theta(\boldsymbol{\chi}\_t) \tag{1}$$

$$\frac{1}{\tau}\dot{\nu}\_t = -\alpha\_\nu \nu\_t (1 - \frac{\nu\_t}{K}) \tag{2}$$

The "transformation system," defined in Equation (1), is essentially a simple linear spring-damper system, perturbed by a non-linear forcing term hθ . Without any perturbation, the transformation system produces a smooth movement from any position y<sup>t</sup> toward the goal position y<sup>g</sup> (both positions defined in the robot's joint space). The forcing term hθ is a function approximator, parametrized by the vector θ. It takes as input a linear system x<sup>t</sup> , which starts with value 0 and transitions to 1 with constant velocity (see Stulp, 2014). The introduction of the forcing term allows us to model any arbitrarily shaped movement with a DMP.

As suggested by Kulvicius et al. (2012), a "gating system" (defined in Equation 2) is used to ensure that the contribution of the forcing term hθ to the movement disappears after convergence. It is modeled after a sigmoid function, with starting state 1 and attractor state 0, where the slope and inflection point of the sigmoid function are determined by the parameters α<sup>v</sup> and K (for details, see Stulp, 2014). This way, stable convergence of the system can be guaranteed even for strong perturbations, as we know that the transformation system without any perturbation by the forcing term is stable, and the multiplication of the forcing term with the gating variable v<sup>t</sup> blends out the perturbation once the gating system has converged.

For learning the ball-in-a-cup skill on Pepper, we adopt Stulp and Sigaud's method of optimizing the parameter vector θ using simple black-box optimization (Stulp, 2014). More specifically, we use the Covariance Matrix Adaptation Evolution Strategy (CMA-ES, Hansen, 2006) for optimization, and locally weighted regression (Atkeson et al., 1997) for the function approximator hθ . The parameter space is 150 dimensional as we use 5 degreesof-freedom (DoF) in the robot arm and 30 local models per DoF. Following the Programming by Demonstration paradigm, we initialize the local models via kinesthetic teaching, thus first recording a trajectory, and subsequently determining model parameters via regression on the trajectory data points. After this initialization, we keep all but one parameter of each local model fixed: in the CMA-ES-based optimization, we only optimize the offset of the local models, which proves to allow for a change in the shape of the trajectory that is sufficient for learning.

CMA-ES functions similarly to a gradient descent. After the cost has been obtained via the defined objective function for each roll-out in a batch, in each update step, a new mean value for the distribution is computed by ranking the samples according to their cost and using reward-weighted averaging. New roll-outs are sampled according to a multivariate normal distribution in R <sup>n</sup> with here, n = 150. There are several open parameters which we manually optimized. We aimed at allowing a convergence to a successful movement within a reasonable amount of time. The parameters include the initial trajectory given to the system as a starting point, the number of basis functions the DMP uses to represent the movement, the initial covariance for exploration and the decay factor by which the covariance is multiplied after each update, the batch size as the number of samples (i.e., roll-outs) before each update, the stiffness of the joints of the robot, the number of batches (i.e., updates) for one session in the described studies. The initial trajectory was recorded via kinesthetic teaching to the robot. We chose a trajectory with too much momentum, such that the ball traveled over the cup. All parameters and their values are listed in **Table 1**.

### 2.2. Optimization—External Camera Setup

In order to optimize the movement with external cameras and to create a base-line corresponding to a state-of-the-art skill learning system, a carefully designed cost function is defined that determines the cost as the distance between the ball and the cup at height of the cup when the ball is traveling downward, similar as described in Kober and Peters (2009a). As with any sensory system designed for an automated measurement of a cost or error, significant care has to be taken to ensure robust and accurate performance, as already a slightly unreliable sensory system can prohibit the skill learning. In this case, particular care had to be taken for example in choosing camera models with highenough frame rates, to ensure that the fast traveling ball could be accurately tracked in the camera image. During a roll-out, the ball typically (this depends on the chosen initialization, here, it will) passes the height of the cup and then descends again. From a webcam recording the side of the movement, we determine the exact frame when the descending ball passes the vertical position of the cup. In the corresponding frame from the top view camera at this moment, we measure the distance between the center of the ball and the center of the cup in pixels (see **Figure 2**).

We showed a cyan screen on the robot's tablet right before the movement began which could be detected automatically in the videos of both the side and top camera, to segment the video streams. The experimenter repositioned the ball in the home position after each roll-out.

Apart from the usual issues for color-based tracking, as for instance overall lighting conditions, the above heuristic for cost determination needed several additional rules to cover exceptions (for instance, dealing with the ball being occluded in the side view when it lands in the cup or passes behind the robot's arm). More severely, in this particular task the ball occasionally hits the rim of the cup and bounces off. The camera setup in this case

TABLE 1 | Overview of the open parameters of the system which influence learning.


detects the frame in which the ball passes beside the cup after having bounced off the rim, and thus assigns a too high cost to the movement. Although we were aware of this, we refrained from taking further measures to also cover this particularity of the task, as we found that the camera-based optimization would still succeed. In a version of the game with a smaller cup size however, this proves to be more problematic for the optimization and needs to be taken into account.

For initial trajectories that do not reach the height of the cup, additional rules would need to be implemented for low momentum roll-outs.

### 2.3. Optimization—Naive Users

In the following, we describe the conducted HRI study with nonexpert users, who are naive to the learning algorithm and have little to no experience with robots. It was approved by the local ethics committee and informed consent was obtained from all participants prior to the experiment.

### 2.3.1. Participants

Participants were recruited through flyers/adds around the campus of Bielefeld University, at children's daycare centers, and gyms. Twenty-six persons took part in the experiment. Participants were age- and gender-balanced (14 f, 12 m, age: M = 39.32, SD = 15.14 with a range from 19 to 70 years).

#### 2.3.2. Experimental Setup

The experiment took place in a laboratory at Bielefeld University. The participant was sitting in front of Pepper. The experimenter sat to the left of the participant (see **Figure 1**). As in the other condition, two cameras recorded the movement, one from above and another one from the side, such that a ground truth cost could be determined. However, the camera input was neither used for learning, nor was communicated to participants that and how the cost would be determined from the camera images.

### 2.3.3. Course of the Experiment

Each participant was first instructed (in German) by the experimenter. The instructions constitute a very important part of the described experiment because everything that is communicated to participants about the robot and how it learns might influence the participants' expectations and, in turn, their actions (i.e., ratings). Therefore, the instructions are described in full detail. It included the following information: The research conducted is about robot learning. The current study tests the learning of the robot Pepper and if humans are able to teach it a task, especially a game of skill called ball in cup. The goal of the game is that Pepper gets the ball into the cup with movement. During the task, Pepper will be blindfolded. The cup is in Pepper's hand and in the home position the ball is hanging still from the cup. The participant was instructed that he/she could rate each movement via a rating GUI, which was displayed on the robot's tablet (see **Figure 3**). The experimenter showed and explained the GUI. The participant can enter up to 5 stars for a given roll-out (as in **Figure 1**). The stars correspond to the ratings of (common

5-point Likert-scales) 1: not good at all, 2: not so good, 3: average, 4: good, 5: very good. A rating is confirmed via the green check mark button on the right. Another button, the replay button on the left, permitted the participant to see a movement again, if needed. When the rating was confirmed, it was transformed into a cost as cost = 6 − rating to invert the scale, and was associated to the last shown movement for the CMA-ES minimization. A ready prompt screen was then shown to allow the repositioning of the ball still in the home position. After another button touch of confirmation on this screen, the robot directly showed the next roll-out.

As stated above, the camera-setup remained the same also in this study, however, the videos were only saved and used afterwards to compute ground truth. In this study, the cameras were not part of cost computation or learning. Participants were also informed of the cameras recording the movements. We told them that we would use the recordings to later follow up on what exactly the robot did. We informed participants that each participant does a fixed number of ratings at the end of which the tablet will show that the study has ended. At this point, participants were encouraged to ask any potential questions they had and informed consent was obtained from all participants prior to the experiment.

Neither did we tell participants any internals of the learning algorithm, nor did we mention any rating scheme. We also did not perform any movement to prevent priming them about correct task performance.

Then, Pepper introduced itself with its autonomous life behavior (gestures during speech and using face detection to follow the participant with its gaze). Pepper said that it wanted to learn the game blindfoldedly but did not know yet how exactly it went. It further explained that in the following it would try multiple times and the participant had to help it by telling it how good each try was. After the experimenter had blindfolded Pepper, the robot showed the movement of the initialization (see section 2.1.4).

After rating the 82 trials (the initialization + 80 generated rollouts + the final optimized movement), each participant filled out a questionnaire on the usability of the system, and the participant's experience when teaching Pepper. A short interview was conducted that targeted participants' teaching strategies and feedback meaning.

### 3. EXPERIMENTAL RESULTS

### 3.1. System Performance

The system performance in the two studies is shown in **Figure 4**. To compare the system performance across the studies, we defined five different measures of success on the objective cost only:


Based on these success measures, we perform statistical tests with the aim to determine what is more successful in optimizing this task, the camera setup or the naive users.

The tests did not reveal any significant differences in performance between the two. Descriptive statistics can be found in **Table 2**. We conducted a CHI-square test for the binary hit or miss variable of the final roll-out (Final.hit) which did not yield significant results, χ 2 (1,41) <sup>=</sup> 1.5, <sup>p</sup> <sup>=</sup> 0.221. We conducted four independent samples t-tests for the rest of the measures. For the distance of the final mean (Final.dist), results are not significant, t(35.66) = −1.527, p = 0.136. For the mean distance in roll-outs of the final batch (Batch.dist), results are not significant, t(39) = −0.594, p = 0.556. For the total number of hits (#hits), results are not significant, t(39) = 0.66, p = 0.513. For the number of rollouts until the first hit (First.hit), the analysis was not significant either, t((31) = −0.212, p = 0.834.

When looking at the HRI study only, we identify three main cases of learning performance: (a) successful convergence, with sub-cases (a.i) early convergence, N = 12 and (a.ii) late convergence, N = 5; (b) premature convergence, N = 6; and (c) unsuccessful convergence, N = 3 (see **Figure 5**). Also in the camera optimized sessions, two out of 15 sessions showed unsuccessful convergence, which hints at important difficulties in both setups.

### 3.2. User Teaching Behavior

To investigate the teaching behavior of the non-expert users, we are particularly interested in the strategies that are successful or unsuccessful for learning.

#### 3.2.1. Questionnaire and Interview

We first report the questionnaire and interview answers relating to the strategies of the participants in our study. This will give us a general idea about their (self-reported) teaching behavior before we analyze the actual scores. The strategies participants report in questionnaires and interviews can be categorized into five approaches.

#### **3.2.1.1. Distance from ball to cup**

The majority of participants (N = 15) reported to use scores to rate the distance from the ball to the cup. Interestingly, all of these participants are part of sessions we identified as (a) successful convergence. This suggests that this strategy leads to success.

#### **3.2.1.2. Momentum**

A few participants (N = 2) reported to rate the momentum of a movement. Of course at the beginning of the sessions, the momentum correlates with the distance of the ball and cup. A movement with less momentum moves the ball closer to the cup. One of the participants who reported this strategy successfully trained the robot, for the other participant, the exploration converged prematurely.

#### **3.2.1.3. Comparative ratings**

A few others (N = 4) reported to give ratings comparing each movement to the previous one: if the movement was better than before, the rating was better and vice versa. Interestingly, sessions of participants with this teaching strategy all fall into the premature convergence category (b) described in section 3.1.

#### TABLE 2 | Descriptive Statistics.

can be seen.


#### **3.2.1.4. Spontaneous ratings**

Two participants claimed to rate the movements spontaneously, without any clear strategy (N = 2). For one of the two participants, exploration converged late, but successfully (a) and for the other the session was unsuccessful (c).

#### **3.2.1.5. Individual strategies**

The remaining participants reported individual strategies (N = 3). For instance one participant in this category gave always the same score (one star) with the intention to let the robot know that it should try something completely different in order to change the movement completely. The other two strategies were not reported clearly. However, the described strategy as well as another in this category, were not successful (c). One of the participants used a strategy that lead to premature convergence (b).

#### 3.2.2. Correlation With Ground Truth

Based on the self-reported user strategies, we expect the successful sessions to also reflect the 'Distance from ball to cup' strategy in the actual scores participants gave. We test this by calculating the correlation between the participant scores and the ground truth of the robot movements. In the HRI case in general, participants received an average correlation coefficient of M = 0.72, SD = 0.20. The strategy to rate according to the distance between the ball and the cup should yield a high correlation value and thus we expect successful sessions to obtain a higher correlation coefficient than sessions with premature convergence, which in turn receives a higher correlation coefficient than unsuccessful convergence (i.e., success category a > b > c). Because of small sample sizes, we conduct a Kruskal-Wallis H test. There was a statistically significant difference in correlation coefficients between the three different success categories, χ 2 (2) <sup>=</sup> 8.751, p = 0.013 < 0.05. An inspection of the mean ranks for the groups suggest that the successful sessions (a) had the highest correlation (mean rank = 16.24, M = 0.75, SD = 0.20), with the unsuccessful group (c) the lowest (mean rank = 2.67, M = 0.58, SD = 0.29), and prematurely converged sessions in between (mean rank = 11.17, M = 0.045, SD = 0.25). Pairwise post-hoc comparisons show a significant difference between the successful (a) and unsuccessful (c) sessions only (p = 0.014 < 0.05, significance value adjusted by Bonferroni correction for multiple tests). Thus the results confirm our hypothesis.

#### 3.2.3. Score Data

Prototypical plots for the three success strategies are shown in **Figure 6**. They corroborate and illustrate the teaching strategies we found.

Looking at individual plots of scores, we can draw a number of additional qualitative observations:

#### **3.2.3.1. Hits receive always 5 stars**

We observe that a hit (i.e., the ball lands in the cup) for all participants always receives a rating of 5 stars. Though some participants reserve the 5 star rating for hits only, in general, also misses could receive a rating of 5.

#### **3.2.3.2. Rating on a global scale**

One strategy we observe is to give ratings on a global scale, resulting in scores similar to the ground truth, but discrete.

#### **3.2.3.3. Rating on a local scale**

Some people that rate according to the distance between ball and cup, take advantage of the full range of possible scores during the whole session and adjust their ratings according to the performance.

#### **3.2.3.4. Giving the same score multiple times**

Some participants gave the same score multiple times in one batch. This could be due to perceptual difficulties. Participants often complained during the study that all movements look the same. Also this behavior could be part of a specific strategy, for example a behavior emphasizing the incorrect nature of the current kind of movement in order to get the robot to change the behavior completely (increase exploration magnitude) or a strategy that focuses on something else than the distance.

### 4. DISCUSSION

The results of this work can be summarized with two main findings.


DMPs are an established method for open-loop state-less optimization of robot skills and have been utilized for robot learning of diverse tasks, such as for (constrained) reaching tasks

(Guenter et al., 2007; Kormushev et al., 2010; Ude et al., 2010), the ball-in-the-cup game (Kober and Peters, 2009b), pick-and-place and pouring tasks (Pastor et al., 2009; Tamosiunaite et al., 2011), pancake flipping (Kormushev et al., 2010), planar biped walking (Schaal et al., 2003; Nakanishi et al., 2004), tennis swings to a fixed end-point (Ijspeert et al., 2002), T-ball batting or hitting a ball with a table tennis racket (Peters and Schaal, 2006; Calinon et al., 2010; Kober et al., 2011), pool strokes (Pastor et al., 2011), feeding a doll (Calinon et al., 2010), bi-manual manipulation of objects using chopsticks (Pastor et al., 2011), dart throwing (Kober et al., 2011), Tetherball (Daniel et al., 2012), and one-armed drumming (Ude et al., 2010).

While we so far only tested the learning in one task (the ballin-the-cup game), our results suggest that optimization in all of these tasks, which usually entails the difficult design of cost function and sensory system, could be achieved with a simple, generic user interface even in home settings by non-expert users. Through their task knowledge, users are able to impart the goal of the task, which is not implicitly pre-programmed into the robot beforehand, without explicitly formulating or representing a cost function. Further studies involving other tasks will be needed to fully confirm this.

The discrete feedback users provide, seems to work as well as the camera setup. Even without modifications, the system is able to solve the task which could attest to (a) the robustness of this simple base-line system toward unreliable human feedback and (b) the ability of humans to adapt to the specifics of an unfamiliar learning system.

We would like to point out that the camera setup was only able to achieve the reported learning performance because of (a) the hardware used (i.e., cameras with a specific frame rate) and (b) because of the careful implementation of the cost function. As such, naive human teaching was not tested against a naive reward function but a highly tuned one. As outlined in section 2.2, the design of a suitable cost function is rarely straight-forward, and in practice requires significant adjustments to achieve the necessary precision. We believe that with a few instructions to users, system performance in this case can even be improved, and failed sessions can be prevented. We could imagine the naive users to perform even better than a cost function in some cases. For instance, toward the end of the optimization, the ball frequently hits the rim of the cup, especially, when a smaller cup is used. Because the ball moves very fast, this event is difficult to track for a vision system even with a high frame rate as it often occurs between frames. Crucially, when the ball bounces off the rim, it often travels far away from the cup and is thus assigned a high cost value by the hand-coded cost function. In contrast, humans can easily perceive this particular event, especially because it is marked with a characteristic sound, and tend to rate it with a high score. Also if the robot performs similarly bad roll-outs for some time with the ball always at a similar distance from the cup and then for the next roll-out, the ball lands at the same distance, but on the other side of the cup, the user might give a high rating to indicate the correct direction, whereas the camera setup will measure the same distance.

### 4.1. Usability of/Difficulties With the Current System

The optimal teaching strategy is not known for the system in this task, but it seems that most naive users are able to successfully train the robot. However, we have observed some difficulties users had with the current system.

The DMP representation does not seem to be necessarily intuitive for humans. During the optimization, it appears more difficult to get out of some regions of the parameter space than others. This is not apparent in the action space. Additionally, nine participants reported to have first given scores spontaneously and later developed a strategy, hinting at difficulties at the beginning of the sessions, because they did not have any idea how to judge the first movements as they did not know how much worse the movements could get and they did not know the magnitude of differences between movements. Apart from these initial difficulties, four participants reported to be inconsistent in their ratings at the beginning or to have started out with a rating too high. This means that there is a phase of familiarization with the system and enhanced performance can be expected for repeated teaching.

Due to the nature of CMA-ES and the way new samples are drawn from a normal distribution in the parameter space, robot performances from one batch did not differ wildly but appeared rather similar. This was confusing to some participants, as they were expecting the robot to try out a range of different movements to achieve the task. In contrast, the CMA-ES optimization resulted in rather subtle changes to the movement. As a result, some participants rated all movements from one batch with exactly the same score. This is of course critical for the CMA-ES optimization, as it gives absolutely no information about the gradient direction. This issue could also be mitigated through repeated teaching interactions and familiarity with the system.

Furthermore, with the use of CMA-ES, there is no direct impact of the ratings. Participants expected the ratings to have a direct effect on the subsequent roll-out. This lead to an exploration behavior with some participants who tested the effect of a specific rating or a specific sequence of ratings on the following roll-out. The participants reacted with surprise to the fact that after a hit, the robot again performed unsuccessful movements. The mean of the distribution in the parameter space could actually be moved directly to a hit movement, if the user had the possibility to communicate this.

The cases of premature convergence could also be prevented by, instead of CMA-ES, using an optimization algorithm with adaptive exploration, like PI<sup>2</sup> CMA (Stulp and Oudeyer, 2012). Furthermore, participants were in general content with the possibility to provide feedback to the robot using a discrete scale. However, several participants commented that they would have preferred to also be able to provide verbal feedback of some form ("try with more momentum," "try more to the left"). This supports findings by Thomaz et al. (2006) that human teachers would like to provide "guidance" signals to the learner that, in contrast to only giving feedback on the previous action, give instructions for the subsequent action. How to incorporate such feedback in the learning is subject of future work.

### 4.2. Outlook

We considered a learning algorithm without any modification or adaptation toward the human. In the following, we suggest future alterations to the system that we hypothesize to be beneficial

### REFERENCES


for either system performance or usability and which can be measured systematically against the base-line.


### ETHICS STATEMENT

This study was carried out in accordance with the recommendations of Ethics Committee, Bielefeld University. All subjects gave written informed consent in accordance with the declaration of Helsinki. The protocol was approved by the Ethics Committee, Bielefeld University.

### AUTHOR CONTRIBUTIONS

AV and NH contributed equally to this work.

### FUNDING

This work was supported by the Cluster of Excellence Cognitive Interaction Technology CITEC (EXC 277) at Bielefeld University, which is funded by the German Research Foundation (DFG).

### SUPPLEMENTARY MATERIAL

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/frobt. 2018.00077/full#supplementary-material

Supplementary Video 1 | Summary of the User Study on Robot Skill Learning Without a Cost Function. This video explains the idea behind the study and illustrates the progress of learning in interaction with one user in 80 rated trials.


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2018 Vollmer and Hemion. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Teaching a Robot Bimanual Hand-Clapping Games via Wrist-Worn IMUs

#### Naomi T. Fitter 1,2 \* and Katherine J. Kuchenbecker 2,3

1 Interaction Lab, Department of Computer Science, University of Southern California, Los Angeles, CA, United States, <sup>2</sup> Haptics Group, Department of Mechanical Engineering and Applied Mechanics, University of Pennsylvania, Philadelphia, PA, United States, <sup>3</sup> Haptic Intelligence Department, Max Planck Institute for Intelligent Systems, Stuttgart, Germany

#### Edited by:

Luka Peternel, Fondazione Istituto Italiano di Technologia, Italy

Reviewed by:

Kensuke Harada, Osaka University, Japan Antonio Chella, Università degli Studi di Palermo, Italy

> \*Correspondence: Naomi T. Fitter fitternt@gmail.com

#### Specialty section:

This article was submitted to Humanoid Robotics, a section of the journal Frontiers in Robotics and AI

Received: 04 April 2018 Accepted: 27 June 2018 Published: 17 July 2018

#### Citation:

Fitter NT and Kuchenbecker KJ (2018) Teaching a Robot Bimanual Hand-Clapping Games via Wrist-Worn IMUs. Front. Robot. AI 5:85. doi: 10.3389/frobt.2018.00085 Colleagues often shake hands in greeting, friends connect through high fives, and children around the world rejoice in hand-clapping games. As robots become more common in everyday human life, they will have the opportunity to join in these social-physical interactions, but few current robots are intended to touch people in friendly ways. This article describes how we enabled a Baxter Research Robot to both teach and learn bimanual hand-clapping games with a human partner. Our system monitors the user's motions via a pair of inertial measurement units (IMUs) worn on the wrists. We recorded a labeled library of 10 common hand-clapping movements from 10 participants; this dataset was used to train an SVM classifier to automatically identify hand-clapping motions from previously unseen participants with a test-set classification accuracy of 97.0%. Baxter uses these sensors and this classifier to quickly identify the motions of its human gameplay partner, so that it can join in hand-clapping games. This system was evaluated by N = 24 naïve users in an experiment that involved learning sequences of eight motions from Baxter, teaching Baxter eight-motion game patterns, and completing a free interaction period. The motion classification accuracy in this less structured setting was 85.9%, primarily due to unexpected variations in motion timing. The quantitative task performance results and qualitative participant survey responses showed that learning games from Baxter was significantly easier than teaching games to Baxter, and that the teaching role caused users to consider more teamwork aspects of the gameplay. Over the course of the experiment, people felt more understood by Baxter and became more willing to follow the example of the robot. Users felt uniformly safe interacting with Baxter, and they expressed positive opinions of Baxter and reported fun interacting with the robot. Taken together, the results indicate that this robot achieved credible social-physical interaction with humans and that its ability to both lead and follow systematically changed the human partner's experience.

Keywords: physical human-robot interaction, social robotics, motion classification, human-robot teaming, hand-clapping games

## INTRODUCTION

As robot use expands from independent operation in factories to cooperative responsibilities in environments like hospitals and schools, social skills become an increasingly important factor for robot developers to consider. Socially capable robots are known to be able to deliver better interaction experiences in everyday human-populated environments (Fong et al., 2003). Although direct physical contact between humans and robots introduces new safety requirements, mastering such interactions could increase a robot's ability to help people (Ikemoto et al., 2012) and promote the acceptance of robots by the general population.

Human children frequently engage in hand-clapping games (patterns of hand-to-hand contacts carried out by two people) for entertainment, to learn about others, and to make friends (Brodsky and Sulkin, 2011). Accordingly, as an initial foray into equipping robots with social-physical human-robot interaction (spHRI) skills, we chose to investigate human-robot handto-hand contact during playful hand-clapping games like "Pat-a-cake" and "Slide." We prepared to run this study by connecting our past work on classifying human handclapping motions recorded via inertial sensors (Fitter and Kuchenbecker, 2016c) with our previously developed methods for making a robot clap hands in human-like ways (Fitter and Kuchenbecker, 2016b). The result of this union is sensormediated human-robot interaction (HRI) during which each participant (the human and the robot) physically mimics the movements of the other one at different times during the game.

After section Related Work presents related work, section Hand Motion Classification describes how we developed a capable system for repeatedly classifying human hand-clapping motions. Section Hand-Clapping Study Methods details our exploration and evaluation of a skilled Baxter robot that claps hands with people in various game modes. Sections Results and Discussion outline the results of this user study and discuss the findings and their implications for HRI.

### RELATED WORK

Our work sits at the intersection of social robotics and physical HRI (pHRI). The field of social robotics studies robots in social scenarios, usually without physical contact between the robot and the interacting humans (Fong et al., 2003). Within this field, the subtopic of socially assistive robotics leverages unique robot strengths in areas such as education and healthcare (Feil-Seifer and Mataric, 2005). In contrast, pHRI focuses more on interaction safety issues rather than social design (De Santis et al., 2008). pHRI might also be used to help a robot stay safe while navigating an unknown environment (Iwata and Sugano, 2005). Only a handful of pHRI investigations consider the social aspects of robotic contact. One previous study of how a human feels when touched by a robot in a medical setting found that people preferred procedural medical touch to compassionate pats from a robot (Chen et al., 2011). Experiments at this social-physical intersection, such as our work and the following related topics, elucidate how people perceive social-physical robots and how researchers can appropriately apply spHRI to aid people.

We are energized by prior research that combines social robotics and pHRI because touch is an essential pathway for human connection and emotion (Sonneveld and Schifferstein, 2008). In particular, physical interaction with the hands greatly aids human understanding and serves as a channel for complex sensation and expression (Klemmer et al., 2006). A few instances of spHRI appear in previous literature. The Haptic Creature Project, for example, explores an expressively actuated cat-sized furry robotic companion that responds to physical contact from humans (Yohanan and MacLean, 2008). Haptic feedback has also been leveraged to explore the subjective and objective results of physical human-robot collaboration in tasks such as joint target acquisition and object manipulation (Reed and Peshkin, 2008; Feth et al., 2011). In our spHRI work, the robot has a humanoid form and directly touches the human, rather than interacting through an external object.

Our research on bimanual hand-clapping robots additionally draws on the area of social motor coordination (also known as joint action). This topic is being actively explored not only in the HRI community, but also in research on human-human interaction (Schmidt et al., 2011). For example, one investigation proposes a video game that uses electrodermal activity-sensing controllers to detect hand-to-hand contacts between players for more enjoyable social gameplay (Baba et al., 2007). Similar research efforts by Kim et al. (2014) outline the design and testing of an electrodermal activity-sensing wrist-worn watch designed to increase intimacy in a workplace environment. In the HRI space, our initial inspiration for a jointly-acting hand-clapping robot was the popular PR2 demo entitled "Please do not touch the robot," during which people can high five, fist bump, and hug the Willow Garage PR2 robot (Romano and Kuchenbecker, 2011).

Our social-physical Baxter robot is designed to use inertial measurement units (IMUs) to understand the hand motions of its human partner. Previous research has shown that motion classification using IMUs and other inertial sensing systems can be more efficient and accurate than processing of visual input. Past studies of body-mounted sensors for action recognition include motion prediction for full-body ambulatory behaviors from five IMUs (Altun and Barshan, 2010; Altun et al., 2010) and motion and gesture recognition from a complex system of IMUs and accelerometers (Chavarriaga et al., 2013). Almost all such work hinges on machine learning principles introduced by early work in this field (Jain et al., 2000). More recently, researchers used a commercial IMU suit and a neural network for each robot joint to enable a human to teleoperate the full body of a Nao humanoid robot (Stanton et al., 2012). These related pieces of research all demonstrate that machine learning from IMU data can facilitate reliable near-real-time interpretation of human movement without the occlusion and lighting problems that often affect visual data.

Past work on playful spHRI also shaped our approach. Investigations of robot play activities like hugging (Kanda et al., 2004) and performing magic (Nuñez et al., 2014) inform our interaction design and analysis strategies. A study of the physical play activities people exhibit with a small humanoid robot further parallels our work and similarly performs activity recognition using IMU data (Cooney et al., 2010). Previous work on dancing robots additionally blends touch with social interaction, allowing a human dance partner to guide a robotic dancer (Kosuge et al., 2003). This play research influenced how we processed data, designed motion, and selected scenarios to investigate.

### HAND MOTION CLASSIFICATION

We previously demonstrated that a machine-learning pipeline trained on data from hand-worn IMUs can reliably classify handclapping motions (Fitter and Kuchenbecker, 2016c). In this past work, the two IMUs were attached to the backs of the human participant's hands using skin-safe adhesive. This attachment method did not always succeed in the presence of hair or sweat, it did not let the participant comfortably contact the robot with the backs of their hands, and it did not allow for easy removal of the sensors during breaks in the experiment.

Before building on our hand motion classification work, we needed a more robust and convenient way to attach the IMUs to participants' hands. Once developed, the new attachment method needed to be validated to confirm that the new form factor enabled accurate hand motion classification. This section describes how we achieved these two tasks and compares this updated approach to our previous work.

### Motion Classification Methods

In anticipation of intensive human-robot interactive gameplay scenarios, we chose to record participant motion via the same nine-axis Sparkfun MPU9150 IMU breakout boards used in prior work (Fitter and Kuchenbecker, 2016c). These sensors were affixed to each participant's wrists with Velcro straps that looped through custom 3D-printed housings, as shown in **Figure 1**. In addition to increasing the consistency and comfort of the sensor attachment, this scheme facilitated detaching and reattaching the sensors as needed during the experiment. While our sensors communicate via a lightweight cable, future iterations of this sensor system could be designed to use wireless communication.

With the sensors in this configuration, we aimed to classify hand motions using an updated version of the best method from our past work; it used training and testing data to create a linear support vector machine (SVM) that classifies individual handclapping motions based on particular features of the recorded data (Fitter and Kuchenbecker, 2016c). We slightly modified the set of target motions being learned to increase the diversity of hand-clapping games that could be constructed from them. This new set of motions requires wrist and hand movements that are largely similar to those studied in our prior work. However, relocating the IMU from the hand to the wrist prevents the system from observing the motion of the wrist joints and therefore reduces the expressivity of the captured data; thus it was possible that the wrist-worn sensors would necessitate a different type of data analysis.

FIGURE 1 | A plastic housing and integrated strap securely attach each inertial measurement unit to the user's wrist. The individual whose hands are shown in this image provided written consent for this image to be published.

### Hand-Clapping Game Selection

This investigation of motion classification accuracy from wristworn IMUs involved 10 motion primitives. Nine of the motions were the same as primitives studied in our previous work (Fitter and Kuchenbecker, 2016c), and one motion was new. Our previous investigations discovered that many participants were not able to snap their fingers, and also that people tended to pause at specific parts of various hand-clapping games. Accordingly, our updated experiment traded the previously used "right snap" motion for a stationary "stay" motion. **Figure 2** shows the set of primitives used in this investigation: back five (B), clap (C), double (D), down five (DF), front five (F), lap pat (LP), left five (L), right five (R), stay (S), and up five (UF).

To investigate the overall performance of prospective classifiers, we needed to select several hand-clapping games that use sequences of our chosen motion primitives and offer a range of classification challenge levels. This data collection considered the following six hand-clapping games, half of which are different from the patterns used in our previous work:


In each of these hand-clapping games, pairs of people typically repeat the listed motions over and over along with a verbal chant. For the purposes of this investigation, a single person outfitted with sensors instead pantomimed the motions alone, in the style of someone who is teaching their partner a new hand-clapping game. This approach allowed us to first focus on classifying motions and later add layers of complexity to the interaction.

### Human Hand-Clapping Behavior

We conducted an experiment to collect a rich dataset for automatic classification of hand-clapping motions. Ten participants enrolled in our data collection, gave informed these images to be published.

consent, and successfully completed the experiment. The University of Pennsylvania Institutional Review Board (IRB) approved all experimental procedures under protocol 822527. No formal demographic survey was administered in this data collection, but experimenter notes show that the participant population was composed entirely of technically trained students who all possessed normal motor function in their arms and hands. Each participant came to the lab for a single session that lasted about 30 minutes. The participant's wrists were outfitted with IMUs as shown in **Figure 1**. The raw x, y, and z-axis accelerometer, gyroscope, and magnetometer readings from both wrists were read by an Arduino Teensy and sent to our data processing program via a USB connection at 200 Hz.

We recorded two datasets from each participant: (1) a training set that contained selected pairs of motions repeated 10 or more times and (2) a test set with each of the six hand-clapping games repeated three or more times in sequence. Training data were used for model training and cross validation, while testing data were reserved for a separate round of model evaluation. The training set was designed to include all 17 pairs of sequential motions that appear in the chosen hand-clapping games. Some of these pairs consist of the same motion repeated over and over, while the rest show transitions between two different handclapping motions.

### Motion Classification Results

We sought to discover whether our system could classify all of the recorded hand-clapping motions using sensor data recorded from the wrist-worn IMUs. In order to classify each hand-clapping motion, we parsed full IMU recordings into individual hand-clapping motion data segments by applying a first-order Butterworth high-pass filter with a cutoff frequency of 25 Hz to the root-mean square (RMS) of the x- and z-axis accelerations from both IMUs together. Local maxima finding on the resulting signal proved effective for identifying the center of each hand clapping motion, assuming consistent participant clapping tempo and correct execution of hand-clapping motions.

We applied the linear SVM technique that was found to most accurately classify hand motions in our previous work (Fitter and Kuchenbecker, 2016c). From each motion recording, we extracted a feature set composed of basic statistical measures (maximum, minimum, mean, variance, skewness, and kurtosis) from each x-, y-, and z-axis channel of the accelerometer and gyroscope, the RMS acceleration for each hand, and highand low-pass filtered data from each of these channels (cutoff frequency of 25 Hz). As in prior work, we did not use the magnetometer because its readings were found to be unreliable in the indoor setting of the data collection. We also added a new set of Boolean features that indicate whether the measured acceleration range along each axis was greater than a threshold of 0.8 g. This new set of features was designed to detect changes in hand orientation that could help distinguish a clap from a lap pat after systematic errors distinguishing between these two motions in our previous work. A leave-one-subject-out cross-validation (LOSOCV) technique during model training let us compute a generalizable training-set classification accuracy. We also computed the test-set classification accuracy using the trained models. All calculations were performed in Python with the scikit-learn library using the default settings.

We examined the confusion matrices for this model's performance on the parsed training feature set and the parsed test feature set, as seen in **Figures 3**, **4**, respectively. The 97.3% overall training-set accuracy stems from high values along the diagonal of the training confusion matrix, indicating excellent performance. Similarly, the 97.0% overall test-set classification accuracy stems from the strong diagonal of the test confusion matrix. Note that the 10 motions are not exactly evenly represented in either the training or testing set, so the two overall accuracy values differ slightly from the averages of the diagonal entries in the two confusion matrices

The overall classification accuracies indicate that the linear SVM classification strategy that worked best in our previous work also performs very well on data gathered from wrist-worn IMUs. The negligible difference between training and testing


accuracies further shows that this technique generalizes well to hand-clapping motions performed as part of a longer sequence. Thus, this is the classifier we employed to enable our robot to understand motions pantomimed by a human partner.

### HAND-CLAPPING STUDY METHODS

We conducted a study to explore how people perceive different leadership and game generation experiences during bimanual hand-clapping interactions with a robot. The University of Pennsylvania IRB approved all experimental procedures under protocol 825490. Motivated by the desire to understand how our IMU machine learning pipeline can fit into meaningful spHRI applications, we were especially curious to discover what roles people prefer to play in these types of interactions, how structured or open-ended the interactions should be, and how users respond to inevitably varied machine learning performance.

### Hardware Systems

This study centered on two MPU9150 9-DOF IMU sensors strapped to the wrists of a human user. The same 12 channels of IMU data discussed previously (x, y, and z-axis accelerometer and gyroscope readings from each hand) were transmitted from an Arduino Teensy to our data processing program via a USB connection at 200 Hz. The robotic agent for this investigation was a Rethink Robotics Baxter Research Robot, a sturdy humansized platform that can exert human-level forces on the user's hands and can bear hand contacts without breaking or falling over. Our Baxter robot was equipped with two non-articulated custom hands, as shown in **Figure 5**. These custom hands are 3Dprinted and covered with flexible silicone rubber, as presented in our previous work (Fitter and Kuchenbecker, 2016b). A small rolling table was placed between Baxter and the participant to both provide a lap-like surface against which Baxter could tap for the lap pat (LP) motion and to keep the user at a constant distance away from the robot.

To equip Baxter with knowledge of how to perform each hand-clapping motion in the bimanual clapping games, we physically moved Baxter's arms to preparatory poses and action poses for each motion, aiming to imitate the poses of a person's arms during these actions. Our control strategy used the Baxter software development kit's raw position controller and trajectory planning using cubic interpolation between successive key poses to allow Baxter to move smoothly and fairly quickly while playing games with a person.

### Experiment Setup

24 participants (14 male and 10 female) enrolled in our study and gave informed consent. Participants were aged from 18 to 38 years (M = 24.4 years, SD = 5.2 years) and were mostly technical students (18 technically trained students, 2 nontechnical students, 2 technically trained research assistants, 1 technically trained engineer, and 1 non-technical homemaker). Sixteen of the robot users originated from the United States, three from China, two from India, two from South Korea, and one from Belgium. All participants had full function in their

FIGURE 5 | The experiment setup for the bimanual human-robot hand-clapping study. The individual shown in this image provided written consent for this image to be published.

arms and hands. Twenty-two participants were right-handed, and two were left-handed. We did not exclude left-handed participants because the experiment activities have balanced right and left hand roles, and also because some left-handed users were included in the dataset used to create the classifier. To help situate our results, we requested information about each user's applicable experience using robots. Participant experience with robotics ranged from 0 to 94 (M = 65.25, SD = 23.11) out of 100, with 100 being highest, and the group's experience with Baxter spanned the full range from 0 to 100 (M = 35.79, SD = 30.97).

Each participant came to the lab for a single 60-minutes session. The user stood facing Baxter throughout the experiment (as illustrated in **Figure 5**) and played various bimanual handclapping games with the robot, making hand-to-hand contact with Baxter throughout, as two people would when playing handclapping games. At the beginning of the session, the experimenter read a script to relay relevant background information on Baxter, described the experiment interaction, and asked the user to complete an opening survey about their perceptions of Baxter. Next, the participant was led through two sample interactions, one in which Baxter taught the user a simple game (C-R-C-L), and one in which the user taught the same game to Baxter.

In the main experiment, the user played hand-clapping games with Baxter in four blocks that each contained three interaction trials of a particular game. Over these three repetitions, either Baxter or the user would repeatedly teach the same motion sequence in order to give their partner a chance to practice it and improve. The block conditions varied in leadership assignment and game spontaneity, but every taught or learned game was eight motions long. After each block, the user completed a survey about their perception of the interactions within that set of three repetitions. After the four blocks, the user entered a free-play mode during which they could teach Baxter additional games and/or learn more games from Baxter. Finally, the participant completed a closing survey followed by a brief demographic survey.

### Data Processing Pipeline

The machine learning pipeline for human-led trials waited for the user to demonstrate an entire hand-clapping game and then parsed and classified each demonstrated hand-clapping motion from the full game recording. To help the pipeline identify meaningful portions of IMU data, we divided the experiment into discrete gameplay interactions that were fairly structured. At the beginning of a human-led trial, the experimenter asked the human user to be very still. When ready, the user would demonstrate the hand-clapping game to Baxter at the tempo of an ambient metronome that was set to 75 BPM. We relied on the participant pantomiming game motions at close to the metronome's tempo to give the motion parser a good guess of the inter-motion time interval. After the demonstration was complete, the user would return to being still and the experimenter would press a key on the Baxter workstation to relay the information that the demonstration was over.

At this point, the processing algorithm would have all the data from the human hand-clapping game demonstration. Thresholding on the gyroscope signal helped to determine precisely when the game demonstration started and stopped, which we took to be the transitions from stillness to general hand motion and general hand motion back to stillness. Within the portion of data identified to be the hand-clapping game demonstration, we could again use the first-order Butterworth high-pass filtered RMS acceleration of the x- and z-axis accelerations from both IMUs together to parse the motion recordings. Finding the local maxima of the resulting signal, combined with the knowledge of the stimulus spacing from the ambient metronome tempo, had seemed to be a good tool for identifying the center of each hand clapping motion when we tested this experiment with pilot participants. As in section Hand Motion Classification, the midpoints between local maxima were assumed to be the motion starting points.

Once the motion data was parsed, each section of data believed to represent a single hand-clapping motion was ready to undergo the feature extraction and classification processes outlined in section Hand Motion Classification. After the extraction of the features mentioned previously, the hand-clapping motion was classified using the linear SVM model trained in section Hand Motion Classification. Classified sequences of motions were reciprocated by the Baxter robot after the data processing step, for the final result of clapping gameplay with the user.

### Conditions

To begin understanding natural-feeling human-robot hand-clapping gameplay interactions, we needed to create opportunities for both Baxter and the user to lead complex interactions. We also aimed to strike a balance between well-controlled data collection and spontaneous natural play. Accordingly, we designed the experiment interactions to vary leadership assignment and spontaneity across trials. All other aspects of Baxter's behavior were kept as consistent as possible from trial to trial.

#### Leadership Conditions

In each block of hand-clapping game interactions, either Baxter or the human user was assigned to lead the game. When Baxter was the leader, it demonstrated eight hand-clapping motions while displaying a yellow neutral face, and then it smiled, changed to displaying a purple face, and repeated the same eight motions, this time making physical contact with the hands of the user. Within a block, this process was repeated three times with the same hand-clapping game to promote human mastery of that particular game. The facial expressions used in the study were adapted from the Baxter Open-Source Face Database (Fitter and Kuchenbecker, 2016a) and appear in **Figure 6**.

When the participant was leading, they demonstrated a sequence of eight hand-clapping game motions to a metronome beat, paused briefly while Baxter "thought" about the motions, and then played the game with Baxter, making physical contact with the robot. Again, within a block, this process was repeated three times with the same hand-clapping game to promote robot mastery of that particular game. Baxter again showed the yellow neutral face during the demonstration and the purple happy face when it was time for interactive play.

### Spontaneity Conditions

When people play hand-clapping games with one another, the interaction often begins with the swapping of known handclapping game activities and then gradually becomes more complex or inventive. To promote this same type of natural development over the course of this experiment, we introduced a second "spontaneity" condition variable.

In the non-spontaneous interactions, the game leader (Baxter or the human participant) was instructed to teach a specific game to the other party. For Baxter, this instruction was delivered in code, and for the human user, it was delivered via verbal instructions from the experimenter. Two specific games were used for the non-spontaneous interactions: (1) Game A: LP-C-R-C-L-C-B-F and (2) Game B: D-F-D-B-D-D-DF-UF. If Baxter taught the user Game A, the user would teach Baxter Game B, and vice versa. The games were randomly assigned and balanced across users to prevent a confound between the conditions and the game motion sequence itself.

When the person was leading non-spontaneous gameplay, Baxter did not use the data processing pipeline to attempt to identify and reciprocate the human motion pattern. Instead, Baxter performed pre-set routines with two canned mistakes in

FIGURE 6 | The two Baxter facial expressions used in this bimanual clapping study.

the first repetition, one canned mistake in the second repetition, and none in the final repetition. The mistakes were consistent for each non-spontaneous game and were designed based on common machine learning classifier errors. This behavior ensured that even if our IMU system did not work well in this new application, we would be able to understand how a consistently improving robot would be received by human users. Additionally, the human wrist IMU data was recorded during these trials, which allowed us to include the would-be accuracy of the data processing pipeline's classification of these patterns in our overall machine learning results.

During spontaneous gameplay, games were still required to be eight motions long and had to begin with either a clap or lap pat as those two bimanual movements provide a distinct beginning signal in the recorded data. Otherwise, Baxter and the participant were free to choose their own sequence of hand-clapping motions from the set given in section Hand Motion Classification, minus "stay," which was omitted because pilot participants had difficulty maintaining the rhythm when the sequence included this pausing move. To generate a random new game, Baxter employed a random number generator and a transition matrix of typical hand-clapping game motion transitions to create its own pattern. In human spontaneous lead cases, the user was free to create a game that followed the few guidelines mentioned above. Across the three interactions in a spontaneous play block, the robot and person were expected to repeat the same game to foster mastery by the team.

#### Overall Block Flow

To maintain an organic interplay throughout the experiment and allow the user to master the robotic system in the limited time available, we used the same block order for all participants. We present both the disadvantages and the advantages of this experiment structure in the discussion of this article. The order of the interaction blocks was always as follows:


This order gradually increased the autonomy of each partner while giving the human user time to become familiar with the system before leading an interaction. The transfer of leadership back and forth mimicked the natural tendency of people to take turns teaching their own clapping games when exchanging oral cultural traditions.

#### Data Collection

Our software recorded the IMU data from the human user and the sequences of motions performed by Baxter. We also asked participants to complete four surveys: (1) a robot evaluation after hearing introductory information about Baxter, (2) an interaction block survey after each trio of hand-clapping game repetitions, (3) a concluding survey after the final free-play interaction, and (4) a basic demographic survey after the concluding survey. The block perception survey used questions from the pleasure-arousal-dominance (PAD) emotional state model (also used by Ammi et al., 2015), The National Aeronautics and Space Administration (NASA) task load index (TLX) (Hart and Staveland, 1988), and an enjoyability survey used by Heerink et al. (2008), plus a safety rating question, as displayed in **Table 1**. Later in this article, we bundle the PAD and safety questions together under the acronym "PADS." Questionnaires (1) and (3) were adapted from the unified theory of acceptance and use of technology (UTAUT) and other metrics employed by Weiss et al. (2008) and Heerink et al. (2009); the questions are shown in the plot titles of **Figure 7**. The block survey and concluding survey also included the following freeresponse questions to help elicit experiential information from users:


The experiment was additionally videotaped for later analysis of user and robot behavior.

#### Hypotheses

This experiment sought to test the four main hypotheses detailed below:





similar motion classification accuracy in a more realistic and demanding interaction scenario.

These hypotheses helped guide the design of the experiment blocks and the interactions described previously in this section.

### RESULTS

All 24 users who enrolled in the study successfully completed the experiment. 23 of them were willing to physically contact the robot to play hand-clapping games. The other one person was bothered by the sound of Baxter's motors and only occasionally clapped hands with the robot; this individual's data were not excluded from the analysis because they still took part in the entire experiment.

This section focuses on statistical analyses of the questionnaire responses using paired t-tests and repeated measures analysis of variance (rANOVA). The t-tests enable us to discover whether the experiment changed user opinions of Baxter. The rANOVAs (using the R "aov" function and an α = 0.05 significance level) tell us how different hand-clapping game experiences affected block survey responses on the PADS, enjoyment, and TLX questionnaire scales. We also consider overall user comments and the success of the hand-clapping game motion classifier.

### Before/After Survey Results

We gathered matched sets of robot perception survey responses before and after the experiment. The overall user responses appear in **Figure 7**. Paired t-tests reveal that the answers to two questions significantly changed. Namely, after the experiment participants reported feeling more understood by the robot (REC2: p = 0.023, Mbefore = 35.54, Mafter = 52.33) and also more willing to follow the example of the robot (ATT2: p = 0.031, Mbefore = 65.29, Mafter = 78.04). Additionally, user ratings on the overall reciprocity-focused questions were higher after the experiment than before (REC1 + REC2: p = 0.010, Mbefore = 45.35, Mafter = 60.52).

### Block Survey Results

The within-subjects factor for our rANOVA was game block condition, giving a design space of four blocks with a cross of two leadership conditions and two cooperation conditions. We had initially designed the block differences as a 2 by 2 space, but after running the experiment, we realized that ordering played a role in the users' perceptions and that experiences in the paired conditions were sometimes quite different. Accordingly, we concluded that the most appropriate analysis tool was a oneway rANOVA comparing the four different block conditions as distinct levels of the factor. When an effect was significant for a particular outcome measure, post-hoc multiple comparison tests using the R "multcomp" library revealed which pairs of conditions had statistically significant differences. We also calculated the effect size using eta squared.

The rANOVA results for the block survey are summarized in **Table 2**, and breakdowns of interaction block effects on different question groupings appear throughout the following paragraphs.

#### PADS Results

We were curious to know how each block condition affected user ratings of safety and affective characteristics of the robot behavior, so we performed a one-way rANOVA for each of the PADS survey questions. There were several statistically significant trends in these block survey question responses, as outlined in **Table 2** and **Figure 8**.

Block modes significantly affected user ratings of robot pleasantness [F(3, 69) = 3.88, p = 0.022, η <sup>2</sup> = 0.058] and dominance [F(3, 69) = 5.94, p = 0.004, η <sup>2</sup> = 0.105]. Post-hoc multiple comparison tests revealed that Block 4 (human-led spontaneous) was rated as less pleasant then Block 3 (robotled spontaneous). Block 2 (human-led non-spontaneous) made Baxter appear less dominant than Block 3, while Block 1 (robotled non-spontaneous) made Baxter appear more dominant than Blocks 2 and 4. No significant differences were found for safety or energeticness, and safety ratings were uniformly high (M = 79.71, SD = 21.59).

#### Enjoyment Results

We also wanted to know how game block experiences influenced user ratings of enjoyment and engagement, so we performed a one-way rANOVA for each of the related block survey questions. There were no statistically significant trends in these responses, as shown in **Table 2** and **Figure 9**. Enjoyment (M = 74.25, SD = 19.83) and engagement (M = 78.59, SD =16.75) were both uniformly rather high.

#### TLX Results

Lastly, we looked to identify how game block experiences influenced user ratings of various task-load metrics. We performed a one-way rANOVA for each of the TLX-inspired

pre-experiment responses and the lower box plot represents post-experiment impressions. Filled-in box plots indicate significant differences. The question coding abbreviations stand for attitude toward technology (ATECH), cultural context (CC), effort expectancy (EE), forms of grouping (GR), performance expectancy (PE), reciprocity (REC), self-efficacy from UTAUT model (SE), and attachment (ATT).

block survey questions. There was one statistically significant trend in the responses, as depicted in **Table 2** and **Figure 10**.

Block modes had statistically significant effects on user ratings of robot performance [F(3, 69) = 18.95, p < 0.001, η <sup>2</sup> = 0.332]. The difference in the ratings of block interaction calmness was also close to significant [F(3, 69) = 2.90, p = 0.057, η <sup>2</sup> = 0.045]. A post-hoc multiple comparison test revealed that robot performance appeared to be better in both robot-led blocks (Blocks 1 and 3, M = 80.65, SD = 17.04) than in both human-led blocks (Blocks 2 and 4, M = 52.71, SD = 23.72). No significant differences were found for human performance, rushedness, or calmness.

TABLE 2 | p-values for the one-way rANOVA run to determine the effects of the block conditions.


Gray shading indicates a statistically significant effect.

### Participant Demographic Results

other condition.

Differences in participant feedback can stem from either study conditions or characteristics of the users themselves. To investigate differences due to participant demographics, we performed a further set of rANOVA tests with survey timing or block condition as a fixed factor and participant gender and region of origin as covariates.

Gender had a significant effect on several user ratings. Women thought people would be more impressed by their ownership of Baxter than men did [F(1, 23) = 4.60, p = 0.038, η <sup>2</sup> = 0.084]. Female participants also liked the presence of the robot more [F(1, 23) = 7.69, p = 0.008, η <sup>2</sup> = 0.146] and thought they could do activities with the robot more [F(1, 23) = 7.13, p = 0.011, η <sup>2</sup> = 0.134] than male users. Women were additionally more willing to follow the example of the robot [F(1, 23) = 19.75, p < 0.001, η <sup>2</sup> = 0.279]. Female robot users also found the robot more pleasant [F(1, 23) = 10.14, p = 0.002, η <sup>2</sup> = 0.095], found the interaction more enjoyable [F(1, 23) = 11.00, p = 0.001,

η <sup>2</sup> = 0.104], felt more engaged during the study [F(1, 23) = 8.75, p = 0.004, η <sup>2</sup> = 0.085], and felt more rushed during the interactions [F(1, 23) = 11.15, p = 0.001, η <sup>2</sup> = 0.108].

Since Eastern and Western cultures tend to have different views of robots and other technologies (Lee et al., 2012), we were also interested in comparing participant responses across origin lines. Robot users from Eastern cultures thought others would be more impressed by their possession of Baxter than those from Western cultures [F(1, 23) = 5.68, p = 0.021, η <sup>2</sup> = 0.104]. Individuals from Eastern cultures also found the robot more dominant than Western participants [F(1, 23) = 6.81, p = 0.011, η <sup>2</sup> = 0.0626].

### User Comments

While analyzing user comments on each interaction block survey, we noticed the emergence of the following themes: motion comments (MC), temporal comments (TC), human performance comments (HPC), robot performance comments (RPC), teamwork performance comments (TPC), positive general comments (PGC), haptic commentary (HC), social performance comments (SPC), cue suggestions (CS), comparisons to previous experience (CPE), and additional clarifications about how users were reading survey questions (AC). Example comments from each topic code appear in **Table 3**. This division of comments seemed interesting, especially because the frequency of comments in each topic area shifted from block to block, as pictured in **Figure 11**. Some participants wrote multi-part comments that fit into several categories, as included in the frequency counts.

Overall, the human-led Block 2 and Block 4 experiences yielded more comments on the performance of the robot and the human-robot team than other parts of the experiment. Robotled Blocks 1 and 3 led to an emphasis on motion and temporal commentary, as well as cue suggestions, perhaps because users were not as occupied with thinking about their own motions and demonstration success. Some comment frequency progressions may have occurred due to trial ordering effects; for example,

other condition.

the motion commentary may have decreased over the course of the experiment because users became accustomed to Baxter's movements. Other comments seem related to who was leading a trial, returning whenever a leadership condition occurs. The game spontaneity condition did not greatly affect user comments. Furthermore, the breakdown of comments in the canned "perfect robot improvement" performance of Block 2 is quite similar to that of Block 4, during which Baxter often still made mistakes in the final hand-clapping interaction.

### Free Play Results

In the free-play interaction following Block 4, all but two users identified a favorite interaction mode that they wanted to play again. The participants who chose not to engage in additional free play were not afraid of the robot; they simply were not interested in additional interactions at that time. One of them was the user who refrained from contacting the robot during the main blocks due to the robot's sound, and the other stated that they were more pedagogically curious about the robot than interested in the social aspects of play with it. All other participants played at least one more game repetition with Baxter during the free-play segment (2.2 game repetitions on average, with a range of 0 to 5 repetitions across the participant pool).

Participants varied in the types of additional interactions they wanted to perform with Baxter. Seven users both learned from and taught Baxter during the free-play time. Eleven users chose to only teach Baxter, while four opted to only learn from Baxter.

### Classifier Results

Another goal of this bimanual hand-clapping study was to evaluate the performance of the motion classifier described in section Hand Motion Classification. Data recording errors occurred during the first four sessions of this experiment, so our classifier evaluation omits these participants. In the data recordings of the remaining 20 users, the following preprocessing steps were applied before evaluating the accuracy of Baxter's real-time motion labeling in the bimanual gameplay:




Generally, we were monitoring for the correct sequence of motions in the recordings, regardless of what occurred between consecutive moves.

After these data processing steps, we were able to compare the data processing pipeline's linear SVM classifications with the actual identity of each hand-clapping motion demonstrated by the human user (taken from the specified game sequence or the demonstrated sequence visible in the video). The overall accuracy of this classification was 85.9%, and the breakdown of correct and incorrect motion labels appears in **Figure 12**. Although high, this accuracy is to be taken with the caveat that even when our analysis interpreted 100% classification accuracy for a particular game, the user may have seen extra moves before or after their intended game, extra "stay" motions, duplicate motions, or missing final motions in Baxter's reciprocal motion pattern. Participants reacted to these behaviors and classification errors in a variety of ways, from adjusting their behavior to match Baxter's errors to questioning Baxter's sobriety. Errors that caused Baxter to perform worse in the consecutive game repetitions making up one study block were most frustrating to users.

### DISCUSSION

The experimental results enable us to test our hypotheses and plan how to move forward with this spHRI research.

### Hypothesis Testing

The **H1** prediction that users would enjoy teaching games to Baxter as much as learning games from Baxter was partially supported. There was no statistically significant difference in user ratings of Block 1 vs. Block 2 interactions on the robot pleasantness scale, but participants rated robot behavior in Block 3 (robot lead, game spontaneous) as more pleasant than Block 4 (human lead, game spontaneous). Despite this pleasantness difference, users most frequently chose to continue teaching the robot during the free-play time, rather than continuing to learn from Baxter. Interaction enjoyment ratings, on the other hand, did not differ significantly across any of these conditions. This finding might indicate that teaching to and learning from a robot that improves consistently (Blocks 1 and 2) are equally


fun and pleasant activities, while a robot that displays different types of learning patterns is interesting but less pleasant. Another intuitive difference in robot dominance ratings appeared in the robot lead vs. human lead trial comparison; participants rated Baxter as less dominant when the robot was following their game lead, except in the comparison of Blocks 3 and 4, which did not yield a significant difference. Robot performance also received higher ratings for robot-led trials compared to human-led trials.

There was less evidence to support **H2**'s predicted preference for spontaneous hand-clapping activities. Overall, no block survey response difference emerged from the comparison of scripted and unscripted game experiences. When Baxter taught games to the user, the person never knew whether Baxter's motion sequence was pre-set, so it makes sense that the human perception of these game activities was fairly uniform. We thought that users might enjoy creating their own clapping game in the fourth experiment block, but experimenter notes show that some people were eager to undertake this task while others were quite intimidated by having to compose their own pattern. Participants who liked being able to teach Baxter commented that "it was fun to watch the robot trying to move in the way [they] created and taught," "making up [their] own motion and seeing [Baxter] learn it made the experience more exciting," and "it was more fun leading than learning from the robot." Less enthusiastic users noted that they "had trouble teaching Baxter," felt "anxiety from [...] memorizing the pattern of clapping," and wondered "whether or not [they] had shown Baxter the moves clearly enough." These two viewpoints may have contributed to the lack of overall differences between Block 2 and Block 4 ratings.

Our hypothesis **H3** was correct. Users rated their perception of Baxter differently on the pre- and postexperiment surveys. Participant felt more understood by the robot after the experiment, and they also became more willing to follow Baxter's example. The overall feelings of reciprocity between participants and Baxter grew during the experiment as well, indicating that the robot successfully achieved at least a rudimentary form of social-physical interaction.

The final hypothesis **H4** predicted that our machine learning pipeline would perform well and help Baxter to understand human motion demonstrations throughout humanled interactions. We especially hoped that the classifier would work well in Block 4, during which Baxter had no information about the motion sequence that the human user would demonstrate. The classifier was able to label human handclapping moves with 85.9% accuracy. This recognition rate is lower than the 97.0% achieved on the testing set, and it has some additional caveats. Mainly, the data processing pipeline's motion parsing technique required users to demonstrate games at a specific constant tempo with no errors or hesitations. We acknowledge the need to improve classifier robustness and have additional new users test the system to confirm the redesign's success. Fortunately, the IMU data recorded throughout this study gives us a new prospective training set for improving our classifier's robustness to pauses and variable demonstration tempo in future bimanual clapping interactions. We hope to determine the maximum human motion recognition accuracy that can be achieved using IMUs in a natural setting.

### Major Strengths and Limitations

This study represents the most complex and natural-feeling HRI that we have investigated, and we were pleased with the promising and informative results. All participants successfully completed the study, and although one user never contacted Baxter through an entire cycle of hand-clapping motions, this individual's interaction displeasure arose from the timing of the noises Baxter produced, rather than concerns about the safety of the robot. This person wrote that "there seemed to be some feedback missing (for example, a sound to accompany the hands clapping), which damaged any sense [of] rhythm that might have driven the pace of the game." Additionally, all but two of the users identified a free-play interaction that they wanted to try and engaged in that activity with Baxter during at least one additional round of hand-clapping gameplay.

This study interaction led to improved user opinions of the robot and several reports of fun interacting with Baxter. Notable positive comments included that one user "was surprised and impressed at how fast and fluid[ly] the robot was able to move" and another "liked how [Baxter] appears to get excited to play" when switching from the yellow neutral face to the purple happy face. The safety ratings of Baxter were also uniformly high, despite Baxter's occasional motion interpretation errors. Other strengths of this work are findings on the ability to influence how people think about working with Baxter via different leading and following roles. Users thought a lot about teamwork with Baxter during human-lead trials, sharing more comments about Baxter's performance, their own performance, and the handclapping teamwork. Experiences varied from easy ("I really liked how easily he learned my game") to medium ("I may not have been the best teacher, but Baxter still learned a lot by round 3") and even challenging/adverse situations ("the first time we were perfect, and that was super exciting. But once we did well, the mistakes in the next round were that much more devastating"). Nevertheless, users seemed to want to succeed in teaching Baxter, and some empathetic users even adjusted their motion sequence to fit Baxter's errors during the post-demonstration interactive play. In the broader social robotics picture, this experiment also provoked a number of complex emotional responses from people. Especially in the Block 4 interactions, users expressed joy at successes, and they also exhibited occasional cheeky responses to Baxter's errors. One non-technical user even talked to the robot, asking "Are you drunk, Baxter?" when the robot did a poor job reciprocating the demonstrated motion pattern.

The study design also had some shortcomings. Although the user behavior in this experiment was more naturally situated than in our previous spHRI work, the interaction could still be more natural; we required quite a bit of structured behavior from users to help Baxter interpret their motions. This requirement was especially problematic for users who were not adept at keeping a constant tempo. The chosen motion parsing and classification strategy further leads to a delay between when the user demonstrates each motion and when each move is classified. The system transparency could also be better. An additional robot thinking face while Baxter processes the participant motion data, for example, would help users understand the robot's state. Participants often recommended sound effects and experiment flow changes in the block surveys. Some wanted "a beat like the metronome from the teaching part" throughout their entire clapping experience with Baxter or a "clearer indication of [when] learning and playing phases start and stop, perhaps via audio" to help them focus their visual efforts on tracking Baxter's movement. Several users also requested a brief pause during robot-led conditions between Baxter's demonstration and the interactive human-robot play, perhaps inspired by the time the robot took to "think" about the demonstrated movements during the human-led trials. Furthermore, a few of the hand-clapping motions, especially DF and UF, were awkward for tall users. Our future research would benefit from automatically adjusting clap contact location based on user height.

Other drawbacks arose from the setting and the user population of the study. The experiment participant pool was fairly small and consisted mostly of young technical students. Within this group, we found that female users had a more positive impression of the robot than male users; this difference could the fact that most of our non-technical participants were also female. The study also took place in a lab setting that is different from future natural environments where humans and robots might interact. To ensure broader generalizability, we would need to run the experiment on a more diverse population in a less controlled everyday environment. The within-subjects design of the experiment may have exaggerated differences between conditions due to demand characteristics (Brown et al., 2011). We also must consider the fixed block ordering of the experiment when interpreting results and note the possible ordering effects on any condition differences. For example, participants might be more interested in the first block due to novelty effects and less engaged in the final block when the interaction has become more familiar. Users might also compare each subsequent block related to the previous experience, which is the same for each person in this study design. Hence, ratings might be better balanced in an experimental design with a varied trial ordering. A final challenge arising from the largely technical, robotics-savvy population of the experiment was that some people assumed that Baxter was using a vision algorithm to classify their motions. This belief is not inherently problematic, but it may have influenced the way people moved when demonstrating motions to Baxter, thus affecting Baxter's motion classification accuracy and attempted game pattern reciprocation. One user stated their belief in how the classifier worked explicitly, noting that there were "some mistakes during the training process, but [that] the accuracy was pretty good (considering [the algorithm] must differentiate between different hand poses quickly with the other hands somewhere in the background)."

### Key Contributions and Future Work

Next research steps would involve trying to improve the robustness of Baxter's motion classification ability. The machine learning pipeline could be updated using the study data recordings of how people move and behave when in front of an actual robot. There may also be opportunities to improve user demonstration performance by offering advice on how to move during motion demonstrations, training additional bigrams, encouraging games that involve only bigrams of motion encapsulated in our original training and test datasets, and/or giving users a way to provide feedback to Baxter to enable reinforcement learning. Other improvement steps include adding more social feedback and auditory cues to the experiment, as suggested in user comments.

Overall, we are energized by signs of user fun and increasingly social opinions of Baxter over the course of the study. This work may be applicable to future HRI efforts on manipulating what users think about during interactions, considering how to get a person's attention, and designing future spHRI with appropriate cueing. The hand-clapping interaction itself may be a good way to help people learn how robots move and to break the ice when forming human-robot teams. Other future research directions from this bimanual clapping work include trying the sensing system on populations who are undergoing physical therapy for motor rehabilitation. Our findings, especially those on the classifier accuracy and social user responses to bimanual hand-clapping with a robot, can guide future spHRI research.

### ETHICS STATEMENT

This study was carried out in accordance with the recommendations of the University of Pennsylvania IRB under

### REFERENCES


protocols 822527 and 825490. The protocols were approved by the University of Pennsylvania IRB. All subjects gave written informed consent in accordance with the Declaration of Helsinki.

### AUTHOR CONTRIBUTIONS

NF was responsible for experiment preparation, data acquisition, data processing, and publication writing. KK advised throughout the experiment preparation, data acquisition, and data processing, and also supplied revisions for each publication draft.

### FUNDING

This research was funded by the US National Science Foundation (Grant Numbers DGE-0822 and 0966142).

### ACKNOWLEDGMENTS

We thank Professor Kostas Daniilidis for allowing us to use his Baxter Research Robot in our work. Thanks also go out to Yi-Lin E. Huang and Jamie P. Mayer for their help with preliminary steps of this research and to Elyse D. Z. Chase for her design suggestions.


**Conflict of Interest Statement:** NF and KK have received research grants from the US National Science Foundation. KK has also received funding from the National Institutes of Health, Intuitive Surgical, Inc., IERION, Inc., Rolls Royce, Inc., the Wallace H. Coulter Foundation, the Defense Advanced Research Projects Agency, the Army Research Laboratory, Willow Garage, and the Pennsylvania Department of Health. NF now works at the University of Southern California. KK now works at the Max Planck Institute for Intelligent Systems. KK has also served as the Chief Scientist for Tactai, Inc. and VerroTouch Medical, Inc.

Copyright © 2018 Fitter and Kuchenbecker. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Movement-Based Control for Upper-Limb Prosthetics: Is the Regression Technique the Key to a Robust and Accurate Control?

Mathilde Legrand\*, Manelle Merad, Etienne de Montalivet, Agnès Roby-Brami and Nathanaël Jarrassé

Sorbonne Université, CNRS, INSERM, Institut des Systèmes Intelligents et de Robotique, International Society for Intelligence Research (ISIR), Paris, France

Due to the limitations of myoelectric control (such as dependence on muscular fatigue and on electrodes shift, difficulty in decoding complex patterns or in dealing with simultaneous movements), there is a renewal of interest in the movement-based control approaches for prosthetics. The latter use residual limb movements rather than muscular activity as command inputs, in order to develop more natural and intuitive control techniques. Among those, several research works rely on the interjoint coordinations that naturally exist in human upper limb movements. These relationships are modeled to control the distal joints (e.g., elbow) based on the motions of proximal ones (e.g., shoulder). The regression techniques, used to model the coordinations, are various [Artificial Neural Networks, Principal Components Analysis (PCA), etc.] and yet, analysis of their performance and impact on the prosthesis control is missing in the literature. Is there one technique really more efficient than the others to model interjoint coordinations? To answer this question, we conducted an experimental campaign to compare the performance of three common regression techniques in the control of the elbow joint on a transhumeral prosthesis. Ten non-disabled subjects performed a reaching task, while wearing an elbow prosthesis which was driven by several interjoint coordination models obtained through different regression techniques. The models of the shoulder-elbow kinematic relationship were built from the recordings of fifteen different non-disabled subjects that performed a similar reaching task with their healthy arm. Among Radial Basis Function Networks (RBFN), Locally Weighted Regression (LWR), and PCA, RBFN was found to be the most robust, based on the analysis of several criteria including the quality of generated movements but also the compensatory strategies exhibited by users. Yet, RBFN does not significantly outperform LWR and PCA. The regression technique seems not to be the most significant factor for improvement of interjoint coordinations-based control. By characterizing the impact of the modeling techniques through closed-loop experiments with human users instead of purely offline simulations, this work could also help in improving movement-based control approaches and in bringing them closer to a real use by patients.

Keywords: upper-limb prosthetics, movement-based control, shoulder-elbow coordinations, regression algorithms, motor strategy

#### Edited by:

Dingguo Zhang, Shanghai Jiao Tong University, China

#### Reviewed by:

Claudia Casellato, University of Pavia, Italy Yinlai Jiang, University of Electro-Communications, Japan

> \*Correspondence: Mathilde Legrand legrand@isir.upmc.fr

Received: 15 January 2018 Accepted: 25 June 2018 Published: 26 July 2018

#### Citation:

Legrand M, Merad M, de Montalivet E, Roby-Brami A and Jarrassé N (2018) Movement-Based Control for Upper-Limb Prosthetics: Is the Regression Technique the Key to a Robust and Accurate Control? Front. Neurorobot. 12:41. doi: 10.3389/fnbot.2018.00041

## 1. INTRODUCTION

Advances in mechatronics and robotics over the last years have led to the production of more biomimetic active prostheses with more and more degrees of freedom (DoFs). Upper limb amputees can thus be proposed complex active mechatronic devices like polydigital hands or whole arm prostheses like the Luke Arm by Deka (Resnik et al., 2013) or the modular arm by the Applied Physics Laboratory of Johns Hopkins (Johannes et al., 2011), among other examples. However, while the hardware improved, there remains a lack of natural, easy and intuitive control of these artificial limbs with numerous active DoFs (Engdahl et al., 2012; Cordella et al., 2016). Conventional myoelectric control commands these multiple DoFs with only one or two muscles, which leads to complex and sequential control schemes. Indeed, depending on the amputation level, there can be the hand, the wrist and the elbow to control at the same time, each with at least two distinct actions to pilot. To improve myoelectric control in such a case, solutions like pattern recognition have been developed for more than 20 years (Saridis and Gootee, 1982; Park and Lee, 1998; Chu et al., 2006). Myoelectric control via time-invariant muscle synergies is also explored to allow continuous and simultaneous control of multiple DOFs (Lunardini et al., 2016). Yet, with all the limitations of the EMG signals measurement and its decoding [electrode shift, sensibility to perturbations like sweat or skin impedance, etc. (Castellini et al., 2014), leading to a robustness issue], there is a renewal of interest in movements, that humans are more likely to control than individual muscle contractions (see works by Kaliki et al., 2008, 2013; Popovic and Popovic, 2001, 2002; Alshammary and Bennett, 2016 for instance). It is actually easier to master a sequence of movements than a sequence of contractions/cocontractions, which is highly unnatural. We indeed receive numerous sensory feedbacks of our own movements (vision but also proprioception or tactile), compared to the one of our muscular activity. Movement-based control approaches aim to create a more intuitive and natural control by using the motion of the residual limb to predict the movement of the prosthesis. Indeed, it has been showed that one way the Central Nervous System (CNS) deals with the redundancy of the human body is to control synchronously several muscles or joints, by grouping them into "synergies" (which exist at the muscle and at the joint levels). For example, for a given space and task type, there exist some synergies synchronizing shoulder and elbow movements (Soechting and Lacquaniti, 1981; Lacquaniti and Soechting, 1982; Lacquaniti et al., 1982; Cirstea et al., 2003). These synergies can be modeled to then determine elbow motions from shoulder motions (Popovic and Popovic, 2001; Kaliki et al., 2008; Farokhzadi et al., 2016). Exploiting synergies could especially be useful in prosthetics control since regression methods could be used to predict motion of a distal prosthetic joint from motion of residual proximal joints.

Of course, it is important to remind that different tasks and motion spaces are associated to different synergies. It seems therefore difficult to use movement-based control to predict every motion, as each of them requires a different model; some voluntary control would always be needed. Nonetheless, for some given generic movements from the Activities of Daily Living, there could be a functional gain for patient if, for fast motion like reaching, part of the prosthesis joints was synchronously and automatically controlled, avoiding a fatiguing and slow sequential decomposition of joint actions. In this work, we thus focused on reaching tasks, for which people do not naturally concentrate on the intermediate joint control, making this motion perfectly adapted to movement-based control. For now, our approach is hybrid: movement-based control does not totally replace myoelectric control but substitutes it only for the elbow even if synergy-based control could be used for the wrist (Montagnani et al., 2015). Long-term goal would be to control both elbow and wrist with joint synergies; we chose to focus first on the elbow. Joint synergies yet cannot control the hand as it is not part of a synergistic scheme with more proximal joints.

Some studies have already been conducted on movementbased control for elbow-shoulder motion. Merad et al. (2016a,b), for instance, used Radial Basis Function Networks (RBFN), one of the simplest Artificial Neural Networks (ANNs), to estimate flexion/extension elbow angular velocity from shoulder Euler angular velocities, measured with embedded Inertial Measurement Units. In a wider context, Kaliki et al. (2013) developed an inferential control scheme to command elbow flexion/extension, forearm pronation/supination and opening/closing of the hand at the same time. They combined three ANNs and proportional control that took shoulder rotational and/or translational movements as inputs (recorded with a magnetic tracking system) and predicted the outputs cited above. In addition to the work of Kaliki et al. and Merad et al. several other studies on shoulder-elbow coordinations have been published (Popovic and Popovic, 2002; Iftime et al., 2005; Mijovic et al., 2008; Farokhzadi et al., 2016). At this time, two points can be raised:


This study addresses these two issues. Starting from the fact that none of the cited studies has used a linear regression technique, we first wondered whether it was really unsuitable (whereas it has been showed that shoulder-elbow synergies can be approximated by a linear relationship; Micera et al., 2005). Then, we wanted to compare the prediction ability of several models to objectively and reliably determine the best modeling tool for the control of a prosthetic elbow. We here focused on three relatively simple methods: RBFN, the simplest ANN, which was shown to correctly model shoulder-elbow synergies (Iftime et al., 2005); Principal Components Analysis (PCA), to test the prediction ability of a linear regression technique; and Locally Weighted Regression (LWR), whose complexity is between PCA and RBFN. We conducted a preliminary experimental session, with fifteen healthy subjects that performed reaching movements, to gather training data and build the three generic interjoint coordination models. Once the models were implemented in the prosthesis, a second experimental session was conducted with ten other healthy subjects who performed the same tasks as in the preliminary session but with the prosthesis substituting to their natural arm. The prosthesis was controlled through the mobilization of the subjects' shoulder as the control input. To determine the best regression methods for prosthetic control, six metrics, that characterize the task achievement, the joint motions and the body compensations, were assessed.

In this paper, we thus focus on the elbow-shoulder synergies to automatize a prosthetic elbow during reaching tasks. Real tests, in "closed loop" situation, were conducted to compare the three elbow-shoulder coordination models obtained with RBFN, PCA, and LWR respectively. During these tests, the participants could directly react to the system behavior, which is closer to real life scenario and gives more weight to the reflection on the models robustness than fixed offline data simulation.

### 2. MATERIALS AND METHODS

### 2.1. Preliminary Session: Training Data Acquisitions

To build and train the coordination models, data of motions from healthy subjects are required. These training data were collected from fifteen healthy subjects (different from those who participated to the second session) who performed pointing movements with their natural arm. Kinematics was recorded with motion capture (**Figure 1**). Ten subjects used their right arm, ten their left arm (five subjects participated twice). This work was carried out in accordance with the recommendations of the Université Paris Descartes ethic committee CERES. Subjects provided written informed consent to participate in the study, in accordance with the Declaration of Helsinki. Two Inertial Measurement Units (x-IMUs from x-io technologies©), a Codamotion (a camera-based motion capture system from Charnwoods Dynamics, Leicestershire, UK) and a Nintendo WiiTM balance board were used to record the movements. IMUs, one located on the latero-posterior part of the arm, the other on the trunk, at the sternum level, recorded the arm orientation in the trunk coordinate system, represented by quaternion values and then transformed into ZYX Euler angles. Codamotion markers were placed on the hand, forearm, arm, shoulders and hips to record elbow flexion/extension angle as well as other kinematic parameters for further analysis. The balance board was used to measure the variations of the weight repartition at the feet level when performing the task. Subjects had to reach nine targets at two different distances (18 targets in total), whose height and position were adapted to subjects' morphology (the length of the subject's arm minus 10 cm defined the first distance, the second one was 15 cm closer. Targets 1, 2, and 3 were at the hip level, targets 7, 8, and 9 were at the shoulder level, targets 4, 5, and 6 were in-between see **Figure 2**). Each target was reached three times with pause between each movement. No specific instruction were given to the participants, to let them move naturally. Only the initial position was imposed: subjects were

FIGURE 1 | Experimental set-up for training data recordings: participants performed natural reaching movements toward 18 targets (9\*2 distances). x-IMUs are placed over the arm and the trunk; Coda markers on the arm, shoulder, and trunk. Written informed consent for publication of images was obtained from the participants.

asked to start with the humerus along the body and the elbow flexed at 90◦ . Shoulder ZYX Euler angular velocities, computed in the trunk frame, and elbow flexion/extension angular velocity (obtained from IMUs and Codamotion markers respectively) were collected and used to train the three models offline, thanks to a Matlab (Mathworks Inc.) script. As the aim is to predict elbow motions from shoulder ones, the inputs of the models were the shoulder data (ZYX Euler angular velocities in the trunk frame) and the output was the elbow data (flexion/extension angular velocity, see **Figure 3**). We chose to use joint velocities to avoid any dependence on the initial position. Shoulder Euler angles were selected as input data since they are commonly used in shoulder-elbow coordination modeling (Lacquaniti et al., 1982; Popovic and Popovic, 2001; Wu et al., 2002; Kaliki et al., 2008). The kinematic data were filtered (low-pass filter with a cut-off frequency of 5 Hz) and segmented. The start and end of the movements were automatically determined with a Matlab script, using a threshold on the hand velocity profile (30% of the maximum velocity ± an offset adapted to each subject). Only the go were used for training the models.

### 2.2. Models

Let f be the function that approximates the relationship between the selected inputs/outputs sets. For PCA, used for regression as in Vallery and Buss (2006), we have, for a given input vector **x** (the three shoulder Euler angular velocities for one time sample in our case):

$$f(\mathbf{x}) = \Gamma\_2 \Gamma\_1^+ \mathbf{x} \tag{1}$$

with Ŵ the matrix of principal components of the training data, Ŵ<sup>1</sup> and Ŵ<sup>2</sup> the corresponding sub-matrices. The first two Principal Components were kept since they were enough to account for 98% of the total variance. We thus have Ŵ<sup>1</sup> ∈ R 3×2

and Ŵ<sup>2</sup> ∈ R 1×2 . Ŵ + <sup>1</sup> <sup>=</sup> (<sup>Ŵ</sup> T <sup>1</sup> <sup>Ŵ</sup>1) <sup>−</sup>1Ŵ T 1 is the left pseudo-inverse of Ŵ1.

For LWR, the output is defined as:

$$f(\mathbf{x}) = \sum\_{\mathbf{e}=1}^{E} \phi(\mathbf{x}, \theta\_{\mathbf{e}}) \cdot \mathbf{a}\_{\mathbf{e}}^{T} \mathbf{x},\tag{2}$$

(with E the number of local linear models, φ the weighting functions of these models -here Gaussian functions-, θ<sup>e</sup> which accounts for the localization and **a<sup>e</sup>** parameters of the linear models) (Stulp and Sigaud, 2015). The number of local linear models, which minimized the residual error (between real and predicted output), was set to 2 after cross-validation.

For RBFN, we have:

$$f(\mathbf{x}) = \sum\_{\varepsilon=1}^{E} w\_{\varepsilon} \cdot \phi(\mathbf{x}, \theta\_{\varepsilon}), \tag{3}$$

(with the radial basis functions φ, set as Gaussian functions, and w<sup>e</sup> the weight for each function, determined with linear least square method) (Stulp and Sigaud, 2015). The number of basis functions E, that minimized the residual error, was set to 5 after cross-validation.

### 2.3. Experimental Session: Testing the Models in Closed Loop Situation

Ten different healthy subjects, who did not contribute to the collection of training data, participated in the second experimental session. They were equipped with a prosthetic elbow prototype with one active DoF (flexion/extension of the elbow). The prototype was attached laterally to an elbow orthosis worn by the subject (attached to his arm), installed such that the prosthesis rotation axis was aligned with the natural elbow flexion/extension axis of the participant. The elbow orthosis blocked any motion of the natural elbow (it was fixed at 90◦ during the whole experiment). Five subjects used the prototype to the right, five to the left. The control models were trained on the data of the preliminary experimental session from the right and left arm group respectively.

#### 2.3.1. Prosthetic Elbow Prototype

The prosthetic elbow is a 1-DoF (flexion/extension) prototype whose functional characteristics are based on the ones of commercialized active elbow prostheses (10 N/m of nominal torque, 80◦ /s of nominal speed). The angular velocity is controlled by a DC motor driver (Ion motor control, Ltd) via an optical encoder placed on the motor rear shaft (resolution of 2,048 ppr and gear ratio of 1:1,000). The prototype is controlled by a Raspberry PI, which controls the DC motor driver. It reads data from two x-IMUs (Xio Technologies, Ltd.) placed on the subjects arm and trunk, at the same location as for the preliminary experimental session. The IMUs gave quaternion values representing the arm orientation, from which ZYX Euler angular velocities of the shoulder, in the trunk frame (ψ˙ , <sup>θ</sup>˙, <sup>φ</sup>˙) were computed. IMUs were reset at the beginning of each experimental session, and their position remains unchanged during the whole experiment. They are the only devices used for control. The Codamotion and balance board were used for analysis purpose only; specific Coda markers were placed on the arm, the shoulders and the hip (see **Figure 4**). A prosthetic hand, blocked in an open posture (forming a u-shape in the horizontal plane), was placed at the extremity of the prosthetic limb. The subjects reached the targets by placing this hand around them.

#### 2.3.2. Experimental Set-Up

Participants were asked to use the prosthesis to reach the same eighteen targets as for the preliminary experimental session. We did not ask them to reach new targets because this study particularly focused on the robustness of the interjoint coordination models (obtained through RBFN, PCA, and LWR) to the inter-subject variability. We were interested in the prosthesis response to different motor behaviors and kinematics. Elbow angular velocity <sup>β</sup>˙ was estimated by the different

regression models from <sup>ψ</sup>˙ , <sup>θ</sup>˙, <sup>φ</sup>˙, the shoulder Euler angular velocities, computed in the trunk coordinate system, obtained from the IMUs. The initial position, to which the participants had to come back after every movement, was defined with the prosthetic elbow at 90 degrees and subject's humerus at zero degrees, along the body. The task was limited to the go (from initial position to target), the return of the prosthesis (from target to initial position) was automatic. The end of the movement was defined by the end of the prosthesis motion toward the target (elbow velocity set to zero when the shoulder angular velocities dropped below a chosen threshold). Subjects were asked not to correct the final reached position with visual feedback, even if the prediction was bad. Each target was reached 3 times, each time with a different model. The order of models used for control was randomized before the experimental session, and subjects were not aware of this order. Models were implemented in the Raspberry PI which controls the prosthesis. <sup>ψ</sup>˙ , <sup>θ</sup>˙, <sup>φ</sup>˙, obtained from xIMUs, were sent as model inputs. The total experimental session (placement of the markers and the prosthesis, reaching tasks and removal of the markers and the prosthesis) lasted approximatively 2 h.

#### 2.3.3. Performance Quantification

Evaluating whether a movement was correctly performed is a complex task. Indeed, despite some characteristics shared among subjects in reaching motions, there is a significant inter-subject variability that prevents the use of traditional error values. **Figure 5** illustrates the inter-subject variability of <sup>β</sup>˙ for the ten subjects that performed reaching motions with their right arm in the preliminary experimental session (without the prosthesis). On the box-plot of the maximum of <sup>|</sup>β˙| (**Figure 6**), we can see that the range of variation is large and that there is even some outliers identified, whereas all the motions were correct. Considering an average healthy <sup>β</sup>˙ and compute an error with respect to it for a given motion is thus not relevant. Moreover, the targets can be correctly reached but with the help of compensatory movements (such as trunk flexion or rotation) that have to be avoided. Musculoskeletal pain and overuse injuries are actually a well-known problem for the upper-limb amputee population (Kontson et al., 2017; Postema, 2017). Error value of <sup>β</sup>˙ only concentrates on functional performance and does not take this point into account. For these reasons, we developped sixteen features to evaluate the performance of the models used for prosthetic control. They were defined in order to give a measure of the achievement of the task, the natural (or unnatural) aspects of the arm movements and the importance of the body compensations. Six of the most relevant metrics are presented here, since the others lead to the same conclusion (see **Appendix** for the exhaustive list) :


FIGURE 5 | Illustration of inter-subject variability in elbow flexion/extension angular velocity. Example of time evolution of β˙ (target 1) for the ten healthy subjects that performed the preliminary session with their right arm.

• The curvature of the trajectory, c, that illustrates the deviation of the hand from a straight line trajectory toward the target. It is defined as

$$c = \frac{\max(||\overrightarrow{P(t)H(t)}||)}{||\overrightarrow{P(t\_{final})P(t\_0)}||} \tag{4}$$

with P the end-effector position at each time step and H the orthogonal projection of P on the straight line (P(t0)P(tfinal)). It measures the natural aspect of the movement.;

• The smoothness <sup>s</sup> of the elbow angular velocity (β˙) measured by its spectral arc length (Balasubramanian et al., 2012). During the experiment, we observed that, for some models, the extension of the elbow (and so the arm movement) was jerky, which was very unpleasant for the user. It is thus important to quantify the smoothness of the movement to select a model that predicts a natural (i.e., smooth and fluid) motion;


$$\frac{F\_{ips}^{t\prime} - F\_{ips}^{t\_0}}{F\_{ips}^{mean} + F\_{contra}^{mean}} \tag{5}$$

(with F tf ips and F t0 ips the force on the ipsilateral feet at the end and the beginning of the movement respectively, and F mean ips and F mean contra, the mean of the force on the ipsilateral and contralateral feet, respectively). It is given in percentage of total force applied on both foot. It measures how much the subject moves its center of mass and thus moves its trunk laterally from the start to the end of the reaching. It is a direct measure of the body compensations.

Values of these metrics for prosthetic motions were compared with values for motions performed without the prosthesis (motions performed during the preliminary session, later called "natural" motions), except for δ that is zero for natural motions (the target was always perfectly reached without the prosthesis).

### 3. COMPARISON OF THE MODELS

### 3.1. Results

δ and 1<sup>t</sup> were averaged over subjects and targets to have one global error value per model. The curvature, c, and the spectral arc length of <sup>β</sup>˙, <sup>s</sup>, were first averaged over subjects, to have one value per model and targets, and then over targets to simplify the analysis. The final extension of the elbow, βfinal, and the amplitude ratio of the ipsilateral force, a, were only averaged over subjects (the average over targets does not make any sense since the two metrics directly depend on target location). Statistical analysis (Wilcoxon test for difference between models and ANOVA of Friedman for targets location dependency), performed on Statistica <sup>R</sup> , was conducted for every metrics except βfinal because of the lack of data for some targets. The final position error, δ (see **Figure 7**), is bigger for motions induced by PCA controller than for motions induced by RBFN or LWR controllers (+10 and +15 mm respectively, p < 0.05). δ of motions controlled by LWR is the smallest (53 mm) and its standard deviation is smaller than the one of δ of RBFN- or PCA-controlled movements.

On **Figure 8**, we note that there is a natural delay between shoulder and elbow motions, which is most of the time positive (the elbow moves after the shoulder). The sign of 1<sup>t</sup> has no evident correlation neither with the target location nor with the subjects. We can still see that, compared to the natural 1<sup>t</sup> , the most reactive model is PCA, with 7 ms of delay. RBFN is a bit slower, with 8 ms. Both stay in the natural baseline. LWR shows a different behavior since , on average, the elbow starts moving before the shoulder (1<sup>t</sup> is -60ms). Very small shoulder angular velocities are enough to cause elbow motion. 1<sup>t</sup> of LWR is thus significantly different from the one of RBFN and PCA but also from natural 1<sup>t</sup> (p < 0.05).

On **Figure 9**, we first see that c depends on the target reached (p<0.05). It is an expected result as the curve described by the end-effector varies according to the height and the lateral position of the targets. For most of the targets, movements estimated by PCA and LWR controllers have a larger curvature than those estimated by RBFN controller or than natural motions. This is confirmed by the mean of c , whose values for PCA and LWR are significantly different from the value of RBFNcontrolled motions (p < 0.05) or from the one of natural motions (p < 0.05). Reaching motions performed with PCA and LWR control have thus a less natural trajectory than those performed with RBFN control, even though they still stay in the range of natural motions.

Concerning the smoothness s, the more negative, the less smooth is the motion. s does not depend on the target location

FIGURE 7 | Distance δ between the end-effector of the prosthetic hand and the target to reach. Values are averaged over subjects and targets. There is no value for natural reaching motions without prosthesis as the task was always perfectly achieved in the preliminary session. \*indicates a statistically significant difference (p < 0.05).

for PCA-controlled, LWR-controlled and natural motions but depends on location for RBFN-controlled motions (p < 0.05). On **Figure 10**, we can quickly notice that motions made with LWR control are always less smooth than all other modes of control (RBFN, PCA and natural). s values of LWR are indeed significantly different from natural ones (p < 0.05 for 14 targets out of 18). s values of RBFN are significantly different for 10 targets out of 18 but are still lower than s values of LWR and the mean value of s for RBFN is in the natural baseline (i.e., lower than mean+standard deviation of smoothness for natural movements). PCA provoked significantly less smooth movements for only 3 targets out of 18 and the mean value of s for PCA is very close to the one of natural motions (−3.218 and −3.213, respectively). **Figure 11** first shows that the elbow is too extended, in the final posture, with all regression models for the three higher targets of distance 1 and the six higher targets of distance 2. The range of βfinal is smaller for motions with the prosthesis than for natural motions. The natural variations of βfinal are not fully reproduced with the prosthesis, maybe because reaching of higher and/or closer targets involve slightly different joint synergies, as explained in the introduction. βfinal especially discriminates PCA control since its estimation by this technique is higher than the normal extension and the one predicted by RBFN and LWR. This higher extension can explain the bigger δ of the movements with PCA control, observed **Figure 7**.

Finally, **Figure 12** shows that there are important body compensations with the prosthesis, whose amplitude depends on the target side location. These compensations may be mainly due to the weight repartition of the prosthesis which is different from the one of a natural arm, to the orthosis discomfort and/or to the shift of the prosthetic forearm relatively to the humeral axis. The body motions caused by the three regression models are significantly different from natural body motions (p < 0.05), but there is no significant difference between models.

### 3.2. Discussion

With these six metrics, the robustness (capacity to control the prosthesis in closed loop situations) of the models considered, the delay of their response and their generalization to new subjects can be analyzed. It can be seen that:


which is a major limitation since the motion appears non natural and hardly usable to perform some tasks (like carrying delicate objects). Moreover, the elbow starts to move with very small shoulder angular velocities, which does not make the prosthesis control very confortable nor robust;

• Control obtained through movement estimation by RBFN is less smooth than the one through estimation by PCA but it still remains within the natural baseline. The trajectory of these movements is close to natural movements (see c) and the predicted extensions are globally correct, except for the same targets as for LWR control. The delay between shoulder and elbow motions is close to the natural one.

RBFN seems thus to be the most suitable algorithm for elbow prosthetic movement-based control, among the three models considered in this study. Nonetheless, it cannot be concluded that RBFN really outperforms PCA and LWR and predicts a totally natural and accurate elbow motion. In particular, βfinal is overpredicted for the highest targets, δ is still not close to zero (60 mm), and the body compensations are not smaller than with PCA and LWR control. Moreover, we can notice that each metric has an important standard deviation, be it for natural motions or the ones estimated by PCA, LWR or RBFN control. Indeed, beyond the common characteristics of reaching, each subject has its proper joint coordinations. This raises the following points : can we expect to find a model of joint coordinations that will perfectly perform for all subjects? To which extent the optimization of the regression algorithm used to build the interjoint coordination model can contribute in improving the control of the prosthesis? The results of this study show that, even if a global RBFN model (i.e., trained with data from several healthy subjects) has a good overall performance, the elbow extension is not correct enough to satisfy the accuracy required for the use of a prosthesis. Additional control schemes are needed.

The experiment performed to test the real-time response of the regression models also has some limitations, especially because the subjects wore the prosthesis as a supernumerary arm. Indeed, the artificial arm is not aligned with the shoulder, as in the case of amputated patients wearing a prosthesis, which can disturb the participant and might modify the natural shoulder/prosthetic elbow coordinations. The weight of the prosthesis and the weight repartition (different from the natural one, due to the motors and electronic parts) can destabilize the participants and partly explains the significant difference between "without-prosthesis" and "with-prosthesis" values of a. The real arm of the subjects (blocked in the orthosis) also hid the targets and reduced the visibility for some movements, especially when reaching the highest targets, which is one of the explanations of the bad predictions of elbow extension for these targets.

### 4. CONCLUSION AND FUTURE WORKS

This paper presents the experimental comparison of three regression models for movement-based control of a prosthetic elbow. This was performed through real-time tests, with human performing a reaching task with an arm prosthesis instead of their natural arm. Real-time tests are a significant contribution in movement-based control study since very few have been done so far (Bennett, 2016; Merad et al., 2016b). It is yet very important to take the prosthesis user in the loop as he reacts during the movement of the prosthesis and creates perturbations that cannot be studied in simulations or even in virtual reality environment. The three models were deliberately chosen among the simplest techniques in order to evaluate to which extent they can be performant, instead of immediately using more complex models (e.g., Multi-Layer Perceptron or multi-layer ANNs). We focused on reaching movements because, due to their high speed and the absence of concentration on intermediate joint control, they are absolutely adapted to movement-based control. Elbow flexion/extension was estimated from shoulder Euler angular velocities, computed in the trunk frame. The quantification of the prediction ability was assessed by six metrics (chosen as the most representative among sixteen), which accounted for task achievement, joint motion and body compensations. RBFN showed better performance than PCA and LWR. It predicted smooth enough movements, with a natural-like trajectory and correct timing but it does not reduce the body compensations nor always lead to a correct final elbow angle. An approximate interjoint coordinations modeling can also be done by PCA but it seems not performant enough to control a prosthesis, which requires very good predictions to satisfy the users. LWR predictions corresponded to the desired elbow extension angles but the problems of the smoothness of the output and the too sensitive response yet remain discriminating. Nevertheless, even if some performance differences exist between the models considered, none of them outperforms significantly the others. The regression technique used to model joint synergies may not be a key factor to improve prosthetic movement-based control.

This paper also highlights interesting elements to justify the use or the exclusion of some models for elbow/shoulder movement-based control. A sensible continuation of this study would first be to expand the comparison to more complex (multi-layer) ANNs, to evaluate if they are worthy or if the RBFN's ability is good enough to control a prosthesis. Moreover, the experiment conducted in this study remains perfectible. As said above, wearing the prosthesis as a supernumerary arm is not natural and raises some problems. Motions of healthy subjects and amputees are also different (Merad et al., 2018). It is known that upper limb amputees generally exhibit particular movement strategies and numerous body

### REFERENCES


compensation strategies (for example, an overuse of the trunk; Metzger et al., 2012), because performing a task with a natural arm or with an artificial one remains a very different sensorimotor experience. The inter-subjects variability for amputees may also be higher than for non-amputees (different amputations, stump morphology, healing, etc.). Therefore, next experimental tests should be performed in a near future with final end users.

Finally, according to the results of this study that illustrate the rather little influence of the regression techniques and interjoint model on the control performance, we believe that new research directions should be explored. First, the individualization of the models could improve the prediction performance by tackling the issue of inter-subject variability. Future studies aim to directly build and improve the model on the user, taking into account his own coordinations, during first uses of the prosthesis with movement-based control. This is different from building the model with data from the remaining arm, which is a solution we do not consider as several studies have shown that joint coordinations of dominant and non-dominant arms are distinct (Bagesteiro, 2003; Sainburg et al., 2011; Schaffer and Sainburg, 2017). Second, "shared control paradigm" would offer an ability to the user to correct instantaneously the movement when the prediction was wrong or not adapted. This would also allow for voluntary control for smaller, more precise or slower movements.

### AUTHOR CONTRIBUTIONS

ML, MM, EdM, NJ, and AR-B conceived and designed the experiment. ML, MM, and EdM performed the participant registration and the experiment. ML, MM, and NJ analyzed the data. ML and NJ wrote the paper.

### FUNDING

This project was supported by the Labex SMART (ANR-11- LABX-65) and by Sorbonne Université (Project PROCOSY).

machine interfaces: Going beyond traditional surface electromyography. Front. Neurorobot. 8:22. doi: 10.3389/fnbot.2014.00022


prostheses: a feasibility study," in 2015 37th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC) (Milano).


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2018 Legrand, Merad, de Montalivet, Roby-Brami and Jarrassé. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

## APPENDIX

### Complementary Features for Performance Quantification of the Regression Models **Task realization**

• Distance end-effector ratio

characterizes the trajectory of the endeffector (the index finger). It is defined by total distance followed by the end-effector distance between initial and terminal points of the trajectory . It measures if the hand shakes during the movement or if there is bumps due to the regression. If it is the case, it provokes extra traveled distance. Distance end-effector ratio will also be higher if the elbow is too extended for the target;


### **Articulation features**

• Smoothness of Euler angular velocities To know if the difference in smoothness for <sup>β</sup>˙ comes from the models or from the inputs (possibility of sub-movements), we also looked at the smoothness of <sup>ψ</sup>˙ , <sup>θ</sup>˙, <sup>φ</sup>˙;


proposed by Bockemühl et al. (Bockemühl et al., 2010), measures the similarity of synergies between two movements.

#### **Body compensation**


# Neural Network-Based Muscle Torque Estimation Using Mechanomyography During Electrically-Evoked Knee Extension and Standing in Spinal Cord Injury

Muhammad Afiq Dzulkifli <sup>1</sup> , Nur Azah Hamzaid<sup>1</sup> \*, Glen M. Davis 1,2 and Nazirah Hasnan<sup>3</sup>

*<sup>1</sup> Department of Biomedical Engineering, Faculty of Engineering, University of Malaya, Kuala Lumpur, Malaysia, <sup>2</sup> Discipline of Exercise and Sports Sciences, Faculty of Health Sciences, The University of Sydney, Sydney, NSW, Australia, <sup>3</sup> Department of Rehabilitation Medicine, Faculty of Medicine, University of Malaya, Kuala Lumpur, Malaysia*

This study sought to design and deploy a torque monitoring system using an artificial neural network (ANN) with mechanomyography (MMG) for situations where muscle torque cannot be independently quantified. The MMG signals from the quadriceps were used to derive knee torque during prolonged functional electrical stimulation (FES)-assisted isometric knee extensions and during standing in spinal cord injured (SCI) individuals. Three individuals with motor-complete SCI performed FES-evoked isometric quadriceps contractions on a Biodex dynamometer at 30◦ knee angle and at a fixed stimulation current, until the torque had declined to a minimum required for ANN model development. Two ANN models were developed based on different inputs; Root mean square (RMS) MMG and RMS-Zero crossing (ZC) which were derived from MMG. The performance of the ANN was evaluated by comparing model predicted torque against the actual torque derived from the dynamometer. MMG data from 5 other individuals with SCI who performed FES-evoked standing to fatigue-failure were used to validate the RMS and RMS-ZC ANN models. RMS and RMS-ZC of the MMG obtained from the FES standing experiments were then provided as inputs to the developed ANN models to calculate the predicted torque during the FES-evoked standing. The average correlation between the knee extension-predicted torque and the actual torque outputs were 0.87 ± 0.11 for RMS and 0.84 ± 0.13 for RMS-ZC. The average accuracy was 79 ± 14% for RMS and 86 ± 11% for RMS-ZC. The two models revealed significant trends in torque decrease, both suggesting a critical point around 50% torque drop where there were significant changes observed in RMS and RMS-ZC patterns. Based on these findings, both RMS and RMS-ZC ANN models performed similarly well in predicting FES-evoked knee extension torques in this population. However, interference was observed in the RMS-ZC values at a time around knee buckling. The developed ANN models could be used to estimate muscle torque in real-time, thereby providing safer automated FES control of standing in persons with motor-complete SCI.

Keywords: functional electrical stimulation, mechanomyography, neural network, spinal cord injuries, torque estimation

#### Edited by:

*Dingguo Zhang, Shanghai Jiao Tong University, China*

Reviewed by: *Zhan Li, University of Electronic Science and Technology of China, China Fan Gao, University of Kentucky, United States*

> \*Correspondence: *Nur Azah Hamzaid azah.hamzaid@um.edu.my*

Received: *16 March 2018* Accepted: *18 July 2018* Published: *10 August 2018*

#### Citation:

*Dzulkifli MA, Hamzaid NA, Davis GM and Hasnan N (2018) Neural Network-Based Muscle Torque Estimation Using Mechanomyography During Electrically-Evoked Knee Extension and Standing in Spinal Cord Injury. Front. Neurorobot. 12:50. doi: 10.3389/fnbot.2018.00050*

### INTRODUCTION

Individuals with spinal cord injury (SCI) often require rehabilitation strategies and assistive technologies to facilitate their daily tasks. Functional electrical stimulation (FES) enables these individuals with neuromuscular disability to execute functional activities such as walking, cycling, and standing up, as well as improving their blood flow and sensory awareness (Petrofsky, 2004). FES activates the nerves using small electrical currents, thereby recruiting muscles to produce non-physiologically evoked contractions and retrain atrophied muscles, thereby partially or fully regaining lost functions (Hamid and Hayek, 2008). Electrical stimulation can be applied through the skin surface or via intramuscular electrodes to evoke contractions of the non-innervated muscles (Ferrarin and Pedotti, 2000). The intensity and temporal characteristics of the stimulation must be regulated to prevent rapid-onset muscle fatigue that leads to failure to perform the desired movement.

When an able-bodied individual performs exercise, over time the muscles becomes fatigued due to repetitive muscle activity, and thus are not be able to reach a set level of maximum voluntary contraction (MVC) force to maintain the current task (Barry and Enoka, 2007). The definition of muscle fatigue in an engineering context is when the muscle's physiological performance change before being finally unable to produce any more force (Barry and Enoka, 2007). This can be used as the basis for determining the muscle fatigue threshold whereby a certain percentage from the MVC during an experiment can be used to determine that the muscle has become fatigued. Another parameter that can be used to quantify muscle fatigue is a change of joint angle (Barry et al., 1985; Guo et al., 2008).

Apart from torque and angle measurements, a physical sensor to measure muscle activity and performance is the mechanomyogram (MMG). MMG records mechanical changes of the muscle during its contraction (Weir et al., 2000). Unlike electromyography, MMG does not have power line interference and has high signal to noise ratio (Islam et al., 2013). MMG also provides information such as forces the muscle can produce, the stiffness and the fluid pressure (Barry et al., 1985). MMG signals during specific activities such as walking, standing and reaching are recorded in order to monitor the muscle fatigue by placing the MMG sensors on the skin surface of the muscle to provide a measure of the mechanical activity of contracting muscles by detecting the muscular sound (Islam et al., 2013). The amplitude of the MMG is related to the force produced by the muscle, whereby even a small change of force is reflected in the MMG amplitude (Beck, 2010).

MMG has been used as a development tool to find the abnormalities from the designated baseline. MMG is useful in the detection of muscle fatigue during sustained voluntary contraction (Jensen et al., 1994). Even though MMG has been commonly used to quantify muscle fatigue during isometric contractions, the usability of MMG for postural control after fatigue made it significant in various fields such as occupational therapy and ergonomics (Beck, 2010).

Researchers have not been able to measure muscle performance during activities such as standing because there is no adequate tool to directly quantify knee and hip extensor torques in stance. With the use of MMG, the muscle activity can be quantified over time and thus its performance assessed. Therefore, in this study, the aim was to design an artificial neural network (ANN) that could predict the torque exerted around the knee joint by the quadriceps muscle by taking inputs from certain MMG parameters, namely the root mean square (RMS) and zero crossings (ZC). The models were designed to predict the muscle torque during FES isometric knee extension. Second, we sought to apply the ANN models to multiple sessions of FES standing challenges. This was done to determine the accuracy and reliability of the ANN models based on RMS and RMS-ZC inputs to predict the knee torque produced by the quadriceps in FES isometric knee extension and standing. Finally, this study aimed to compare the ANN model's performance to determine the input(s) that best predicted of performance of isometric knee extension and standing. In other words, the ANN's accuracy to predict knee torque produced by the quadriceps was tested during FES isometric knee extension and the developed model was then deployed in an FES standing activity. It was hypothesized that the knee extension torque could be modeled through MMG-derived RMS and ZC, which would enable the prediction of torque in activities where torque cannot be physically measured, such as upright stance.

### METHODOLOGY

The study was performed in three phases, the first being data collection where the SCI participants performed electrical stimulation-evoked isometric knee extensions to obtain their muscle MMG signal parameters and torques. The second phase was ANN model development and signal processing of the captured previously acquired MMG data from the first phase to process the signal as input for the ANN model. In the third phase the ANN models were deployed in FES-evoked standing performed by the SCI participants. In this study, 3 subjects with SCI were employed in the ANN design and 5 subjects with SCI were used for the standing protocol. Subject 1, Subject 2 and Subject 3 test data were used to train and test the ANN in seated evoked contraction while all 5 subjects were used to test ANN model to estimate torque in evoked contraction in standing environment. The study was approved by the University of Malaya Medical Centre Medical Research Ethics Committee [Ethics Number: 1003.14 (1)].

### Phase 1: Knee Extension Training Data Collection

This experiment was conducted to obtain the mechanical signal and torque during isometric FES contractions of the quadriceps muscle in three SCI individuals. The torque data were recorded with a dynamometer (System 4; Biodex Medical System, Shirley, NY, USA) and the MMG data were recorded using MMG sensor (Sonostics BPS-II VMG transducer, sensitivity 30 V/g). The subjects were asked to repeat the same isometric knee extension protocol in two sessions with 48 h between each. The experiment was conducted at the Department of Rehabilitation Medicine, University Malaya Medical Centre.

The data obtained from the experiment were then used as the foundation to design a neural network system in MATLAB toolbox to predict torques. The neural networks were tested on with the MMG data obtained during the FES standing contraction without torque data in Phase 3. The next phase of the experiment involved training the system and validating the system.

#### Equipment and Materials

The validation of the ANN model was done by comparison with isometric knee torque data obtained from the commercially available dynamometer (System 4; Biodex Medical System, Shirley, NY, USA). The test protocol set on the dynamometer was Isometric knee extension and 900 seconds recovery between each trial. Three trials were conducted for each of the left and right leg. The isometric contraction angle was set at 30◦ from the straight leg position. The subjects for this experiment were three individuals with SCI (International Standards for Neurological Classification of Spinal Cord injury (ISNCSCI) of A and B) who were trained FES users and non-sensate due to the sensory deficit of their injury. The subjects were briefed about the research protocol before providing their informed consent to participate.

#### FES Evoked Muscle Contractions and Knee Torque Measurement

The subjects were familiar with the FES activity and therefore no familiarization session was needed prior to data collection. The FES stimulation of square-wave pulses was provided at 30 Hz and 200 µs pulse durations with a stimulation current amplitude of 100 mA. The stimulation was provided by a commercial neurostimulator (RehaStimTM, Hasomed GmbH, Magdeburg, Germany). Electrodes used in this experiment were 9 × 15 cm<sup>2</sup> self-adhesive electrodes.

### Data Collection Procedure

The subjects were seated on the dynamometer seat and seatbelts were strapped around them to prevent movement from muscles other than the quadriceps interfering the reading of the MMG. The knee attachment was applied to the right leg to measure the torque exerted around the right knee. The subject's ankle was strapped to a cushion of the knee attachment to hold the leg at a 30◦ knee angle. Since the armature prevented the leg from moving, the torque signal obtained from the dynamometer fully originated from the subject's muscle and not affected by the gravity. The maximum and minimum flexion and extension were set on the Biodex. The Biodex recorded knee torque at a sampling rate of 500 Hz.

The FES electrodes were placed at both ends of quadriceps muscles but not on the tendon area which was around 5 cm near the position of the patella and around 8 cm distal to the groin area (Levin et al., 2000). **Figure 1** illustrates the setup for FES induced isometric knee torque measurement. The subject was seated on the Biodex seat such that the lateral femoral condyle was parallel to the dynamometer axle. This body position and the lever arm of the dynamometer were consistent throughout the study.

FIGURE 1 | FES electrodes and MMG sensor placement on the quadriceps muscle.

Once the settings were set, the dynamometer guided the knee attachment to 30◦ knee flexion. The MMG recording was initiated first while the dynamometer torque recording and FES stimulation were started simultaneously after. The recording of the dynamometer, the MMG, and the simulation was stopped once the torque reading reached well below 50% of the maximum torque and the recovery period began thereafter. The same procedure was repeated for the left knee once the third trials had ended with the same settings for dynamometer and neurostimulator as well the recovery period. The subject then repeated the same procedure after 48 h. To ensure high dayday reproducibility of the protocol, the same researchers and physiotherapists were involved in the experiment for all subjects.

### MMG Acquisition and Processing

Muscle mechanical signals were recorded with the MMG sensor placed right on the muscle belly and held onto the muscle belly with a double-sided tape (3M 157 Center St. Paul, MN, USA). Acqknowledge v4.3 data acquisition and analysis software (MP150, BIOPAC Systems, Santa Barbara, CA, Inc) were used to collect the data at 1 k Hz frequency. The signal was then filtered with a bandpass filter (20 Hz lower cutoff frequency and 200 Hz higher cutoff frequency). The MMG amplitude is a recognizable way to see the relation between MMG and net torque as the decrease of the net torque correlated to decrease of MMG (Gobbo et al., 2006).

The dataset processed from the MMG signal could be in the time or the frequency domain. In the time domain, the amplitude was identified as voltage values and the amplitude was used to calculate RMS. The MMG RMS is reported as a variable in describing motor unit recruitment during a contraction process (Orizio et al., 2003).

The RMS was the magnitude of the measurement obtained by the MMG and the data was in the time domain. Both parameters (RMS and torque) were then scaled to values in the range of 0–1 to simplify the data for preprocessing step for the ANN. The MMG RMS were obtained from MATLAB at 1 s epochs. Normalization of MMG and torque data, as well as the designing process of the ANN, was done using MATLAB version R2015a (2015) toolbox.

RMS was correlated to load as increasing MVC increased the RMS value of the MMG (Akataki et al., 2003). RMS value represents the motor activation (Weir et al., 2000). RMS has also been reported to be an important parameter to monitor muscle fatigue due to its association with the force of contraction of the muscle (Barry et al., 1985). The equation for the RMS processing was defined as:

$$RMS = \sqrt{\frac{1}{N} \sum\_{k=1}^{N-1} x\_k^2}, \text{ for } k = 1, \dots, N \tag{1}$$

where x<sup>k</sup> is the raw signal from each segment and N is the number of samples.

In isometric contractions, an increase of MMG amplitude was observed when force production was low which was around 10– 40% of the MVC. During high level of muscle torque which was around 50–80% of MVC, there was no change in MMG amplitude (Perry et al., 2016). The same observation was reported by another research group (Rodriguez-Falces and Place, 2013).

A lower level of muscle torque resulted to decrease in MMG amplitude (Orizio et al., 2003) due to a linear relationship reported between the contraction muscle and the RMS amplitude of the MMG (Oster and Jaffe, 1980). The correlation of amplitude of MMG signal and motor unit activation was reported during a voluntary contraction as well as FES contraction (Beck, 2010).

The mean frequency shows the frequency feature of the MMG (Cescon et al., 2004). Zero crossing (ZC) was used due to the fact that unlike mean frequency, ZC does not require the use of Fast Fourier Transform (FFT) to obtain and the calculation used to obtain ZC is a simple one (Hägg, 1991). ZC has been defined as the number of times that the MMG signal passed through the horizontal amplitude axis (Zecca et al., 2002). The Equation (2) for ZC was as follows:

$$\begin{aligned} \text{ZC} &= \sum\_{k=1}^{N} \text{sgn}\left(-\mathbf{x}\_{k}\mathbf{x}\_{k+1}\right), for \, k = 1, \ldots N\\ \text{sgn}\left(\mathbf{x}\right) &= \begin{cases} 1 \text{ if } \mathbf{x} > 0\\ 0 \text{ otherwise} \end{cases} \end{aligned} \tag{2}$$

where x<sup>k</sup> is the raw signals of the of the segment and N is the number of samples.

Both MMG RMS and ZC were taken at the sample rate of N = 1,000. While the torque data from the Biodex were averaged to get the mean torque for every 500 torque samples. This was done to obtain the reading of torque, MMG RMS and ZC for every second during the stimulated contraction for synchronization.

#### Phase 2: Neural Network Development Training Data Processing and Neural Network Development

The Neural Network system was designed using MATLAB 2015 using the Neural Network fitting toolbox. The ANN system takes MMG inputs to predict the onset of muscle fatigue with the output of normalized torque ranging from 0 to 1. Two types of ANN models were developed based on the two types of data sets used to train the model, the first was normalized MMG RMS only and the second type was normalized MMG RMS together with normalized MMG ZC; i.e., RMS-ZC. RMS and RMS-ZC were used as the input for the neural network training and the normalized torque was used for the target data for the desired output of the network. The ANN was trained by feeding the RMS and RMS-ZC signals along with the desired signal data, i.e., the torque output from the Biodex, to the models. The samples obtained from the first session from the 3 subjects were used as training samples. Based on a priori correlation test considering 0.91 correlation of probability, alpha error probability of 0.05, 0.2 beta error probability, and 0.84 effect size, at least 6–8 samples were minimally required statistically thus this study employed 18 training data and 12 testing samples of various sample size to test the accuracy of the neural network system. The testing samples were obtained from the second session of the experiment for two of the subjects. The samples were arranged in matrix row. A feed-forward network with sigmoid hidden neurons and linear output neurons was used for the development of the ANN. Sigmoid transfer function was utilized as a transfer function due to the transfer function introduced non-linearity to the network's calculations as well as it is a simple derivative function (Calcagno et al., 2010). The type of ANN model developed was the multi-layer Perceptron which contained multiple layers of computational units that were interconnected in a feed-forward manner. The three layers used were the input layer, hidden layer and the output layer. ANN model training technique involved the output values of the system to be compared with the correct values thus producing the error between the output and the correct answer are computed in an error function (Calcagno et al., 2010). Adjustments were made to the weights on every connection to obtain a smaller value of error function. The percentage of testing data was set at 70% training samples, 15% validations samples and 15% testing samples. These were the default settings for ANN. The number of hidden neurons was set at 10. The number of hidden neuron was chosen based on the number of hidden neurons that gave the best results of the training data (r > 0.8) The network was trained with the Levenberg-Marquardt algorithm (Levenberg, 1944):

$$
\omega = \omega + \Delta \omega \tag{3}
$$

$$\boldsymbol{w} = \left[\boldsymbol{J}^T \boldsymbol{J} + \,\mu\boldsymbol{I}\right]^{-1} \boldsymbol{J}^T \boldsymbol{e} \tag{4}$$

$$e = R - z \tag{5}$$

where w was the weight vectors, 1w was the differences between the weight vectors, J was Jacobian matrix that included the first derivatives of the network errors according to the weight, µ was a scale parameter, I was the identity matrix, R is the vector of measured torque, z is the vector of predicted torque, and e is a vector of the network errors. Post neural training, the network was deployed with the MATLAB compiler and Builder tools to generate a MATLAB function. The training and testing data sets for ANN building can be found at these repositories: figshare | figshare.

#### Neural Network Accuracy Test

In order to quantify the performance of the two ANN models, a correlation between the predicted torque output and the actual torque output as well as the accuracy of the models were identified. To achieve the objective, the network was tested with all the normalized RMS and RMS-ZC from the second session of the 3 subjects. The output torque was then compared with the actual torque obtained from the Biodex with the "fitlm" function on MATLAB to obtain the correlation (r). A critical point of 50% torque drop was chosen in order to test the accuracy of the ANN model by comparing the time for the actual torque in each test data samples to reach 50% torque drop and the time for predicted torque (RMS and RMS-ZC) to reach 50% drop to determine the reliability of the models to detect a specific torque value. The accuracy was obtained from the equation (6). The results from the Neural Network test with the isometric knee extension was presented in **Table 2**.

$$1 - \frac{\left| \text{predicted torque time} - \text{actual torque time} \right|}{\text{actual torque time}} \% \tag{6}$$

### Phase 3: Testing the Neural Network Model in FES Standing Standing Protocol

A standing protocol was executed in order to validate the effectiveness of the ANN model to predict the onset of muscle fatigue by predicting muscle torque during an FES standing stance in SCI subjects. Five individuals with sensory complete SCI (ISNCSCI A and B) participated in this study phase. This protocol has been developed to measure different stimulation frequency effects during a prolonged FES standing (Ibitoye, 2016, unpublished) and had been approved by the University of Malaya Medical Centre Medical Research Ethics Committee (MECID.NO: 20164-2366). All 5 subjects had been familiarized with the FES training and were able to undergo the stimulation as intended in the protocol. The FES stimulator that was used in the standing experiment was a commercially available neurostimulator (RehaStimTM, Hasomed GmbH, Magdeburg, Germany). The stimulation was channeled to the targeted muscle by 9 × 15 cm<sup>2</sup> surface adhesive electrodes (RehaStimTM, Hasomed GmbH, Magdeburg, Germany). This protocol was adapted from the procedure reported by Braz and colleagues (Braz et al., 2015). A harness (Biodex Offset Unweighing System) was used to support the subject's body and prevent the subject from swaying and tumbling. Handle bars were available on the subject's sides for upper body balancing. This is because the torque generated by FES was sufficient to maintain the balance of the lower limbs. However, to stabilize the upper body trunk the SCI subject had to hold on to the handle bars to maintain balance due to lack of abdominal and chest voluntary strength. To ensure that the harness did not influence the subject's weight, researchers ensured that both subject's feet were flat on the ground and their heels not hanging above the ground. The muscle mechanical signal during the standing protocol was recorded with the same MMG accelerometer used in the knee extension experiments. Data acquisition and signal processing were done digitally through Acqknowledge v4.3 software (MP150, BIOPAC Systems, Santa Barbara, CA, Inc). FES standing was achieved by continuous stimulation of both left and right quadriceps and gluteal muscles. The quadriceps muscles were stimulated to achieve stabilization in the knee extension and glutei was stimulated for hip extension and upright posture stabilization. The subject was stimulated at quadriceps (80 mA) and glutei (64 mA) at 200 µs pulse width. The frequency of the stimulation was 35 Hz on the one trial and 20 Hz on the other trial. During the stimulation, the changes in the knee bend were observed and verified using a goniometer. The goniometer was used as to identify the end-point for the experiment as the stimulation and MMG recorded was then stopped when the knee reached 30◦ flexion. The subject was then given a 30 min recovery period between the two trials. The MMG signals obtained from the standing protocol was processed similarly to the signal processing in isometric knee extension. **Figure 2** illustrates (a) the setup for the experiment and (b) the moment where the subject was approaching the fatigue point which was the 30◦ knee bend.

The filtered MMG signal data was then processed to obtain the normalized RMS and ZC. The time taken for the RMS to drop to 70, 50, and 30% of the maximum RMS was taken for t-test comparison with the time taken for the knee bend to reach 30◦ . This was to determine if the RMS alone was sensitive enough to the changes in torque to maintain the knee angle above the 30◦ mark. The RMS and RMS-ZC data set were then used as inputs for the ANN models respectively to obtain the predicted torque.

A point where changes in the gradient of the predicted output had been selected as a critical point from both sets of predicted torque to determine the consistency between both models to predict the critical point at a similar time and

predicted torque value. The time taken to the critical point was normalized in the range of 0–100% stimulation time for all subjects because the overall experiment time differed for each trial and the torque value at the critical point from each standing subject were used in t-test to determine its significance. In order to determine the effectiveness of the ANN to predict muscle torque and to compare between the two types of input, few hypotheses had been established to determine the behavior of ANN in standing protocol was similar to isometric knee extension.

The hypotheses were (i) the initial torque predicted would be higher than the final torque predicted, (ii) the predicted torque

TABLE 1 | Average correlation (R) and accuracy test for the two ANN models to predict torque during FES isometric knee extension.


TABLE 2 | *T*-test significance values for time to reach 30, 50, and 70% of MMG RMS drop compared to the time to 30◦ knee buckle.


output pattern would be reduced throughout the stimulation and (iii) the pattern of RMS and ZC before and after the 50% torque drop point would not be the same. To confirm the hypotheses, t-test was used to identify the P values of the following pairs; Initial and Final predicted torque, the gradient of MMG RMS and MMG ZC before and after the point where the ANN models predicted a 50% torque drop where there should be a noticeable change to the gradient of MMG RMS and MMG ZC once the predicted torque from each model had reached a 50% torque drop from the maximum, and the gradient of the predicted torque. The statistical analysis was done using PSPP (1.0.1, GNU operating system, 2017). The results from the t-test for consistency test for both models are presented in **Table 3** while the hypothesis testing results are summarized in **Table 4**.

### RESULTS

### Testing the ANN Model With Isometric FES Contraction to Predict Torque

The MMG data were processed into MMG RMS and MMG ZC and then normalized. The final MMG dataset is presented in **Figure 3** while **Figure 4** illustrates the predicted output torque produced by the neural network model and the actual output torque measured by the dynamometer during the data collection part of the research where Model 1 is the ANN model that uses RMS as input and Model 2 uses RMS-ZC as input. **Figure 3** shows RMS gradually decreased from the maximum as the stimulation continues and ZC shows a dramatic increase in the frequency of

muscle contraction after a certain period toward the end of the session. The gradient of the RMS decrease differed from the start and toward the end of the contraction.

#### Actual Torque and Predicted Torque From Isometric Contraction Testing

The accuracy of the ANN model to predict the measurement of torque was first tested on isometric knee extension prior to the standing experiment. The correlation and the accuracy of the ANN model to predict the torque in both subject 1 and 2 as presented in **Table 1** which shows the mean accuracy and correlation between the two types of inputs.

### Testing the ANN Model in FES Standing Protocol to Predict Torque

A series of 2-tailed t-test was performed to determine whether the time taken for MMG RMS to drop to a certain level was significantly different than the time taken for the knee angle to reach 30◦ at the end of stimulation. The results from the t-test are presented in **Table 2**.

**Figure 5** shows the predicted torque, which was the output from the ANN model, where model 1 was based on RMS as input and model 2 was from the RMS-ZC input. Both torque series mostly satisfied the set hypotheses where (i) the initial predicted torque was higher than the final predicted torque, (ii) the predicted torque output pattern descended throughout the stimulation in most cases, and (iii) the gradient of RMS and ZC before and after the 50% torque drop point were different.

The results from t-test statistical analysis of the standing protocol based on the said hypotheses are shown in **Tables 3**, **4**.

### DISCUSSION

This study sought to investigate the practicality of using ANN models to predict the knee extension torque during isometric contraction and standing stance using RMS and RMS-ZC as inputs to the ANN. The testing on isometric knee extension revealed that the ANN model used to predict muscle torque from

TABLE 3 | Summary of the *t*-test done for time to reach a critical point (RMS and RMS-ZC) and the predicted torque at a critical point (RMS and RMS-ZC).


TABLE 4 | Summary of t-test statistical analysis for standing protocol from devised hypotheses.


the MMG muscle signal of the quadriceps muscle was reliable. RMS-ZC input ANN model revealed a higher accuracy compared to RMS input ANN model which suggested that in isometric knee extension, RMS-ZC was more suitable than RMS only as input to the ANN model. This also suggests that ANN is a feasible strategy to predict torque without the need of dynamometer. However, when frequency of the stimulation is increased, the initial frequency of the MMG would also increase. This can be seen in **Figure 3**, whereby when the muscle is fatigued there is a rise in the initial frequency.

The effect of pulse width on the MMG or fatigue was not studied in this research, however other literature suggested that the pulse width has no significant effect on the muscle fatigue but it affects the maximum muscle force production (Jailani and Tokhi, 2012).

Higher accuracy from RMS-ZC input was due to an increase of ZC value past ∼50% of maximum knee torque. This was due to SCI muscle are more fatigable compared to able-bodied especially during low-frequency FES (Mahoney et al., 2007). This could be explained by the transformation of slow twitch muscle fiber to fast twitch muscle fiber (Bickel et al., 2004). The transformation explains the ZC graph where the increase of the number of contraction leads to decrease of torque recorded by the dynamometer.

From **Table 3**, the t-test results of P = 0.93 indicated no significant difference between the time taken for the predicted knee torque output pattern to reach the point where there are significant changes to the pattern of the actual knee torque obtained from the Biodex dynamometer. The value of the predicted torque at the critical time from both models were not significantly different from the value of the torque obtained from the dynamometer with a p-value of 0.33. This indicated that in general both models performed with a consistent level of prediction.

Individually, for the first hypothesis in the standing protocol which states that the initially predicted torque was significantly different than the final predicted torque, both RMS input and RMS-ZC input ANN model outcome revealed that they are significantly different (P < 0.01). The difference was due to the rapid muscle fatigue which lead to decrease of RMS and an increased frequency of muscle contraction based on the findings in isometric knee extension (Barry et al., 1985).

The second hypothesis which stated that at the point where the ANN predicted 50% quadriceps torque or lower, there was a significant change toward the pattern of RMS where the RMS decreases at the steeper slope and ended up plateauing (gradient is near 0). However, t-test for prediction for both RMS input and RMS-ZC input for the gradient of ZC before and after the predicted 50% torque drop shows that there is no significant difference (P-valuerms = 0.18, P-valuerms−zc = 0.66). When compared to isometric knee extension protocol, the standing protocol did not stabilize the legs and this caused the legs to move and this movement had possibly caused the changes in amplitude in the ZC value.

The third hypothesis was that the gradient of predicted torque for both models of ANN is decreasing throughout the experiment. The RMS input showed a slightly more significant difference compared to RMS-ZC input. Although from **Table 3** both models showed the same consistency in predicting the torque generally, RMS input showed better reliability in predicting muscle fatigue compared to RMS-ZC input due to less disturbance to RMS when there is a leg movement. However, ZC input was able to provide a frequency domain of the muscle contraction as an increased number of contraction indicated the recruitment of fast twitch muscle fiber which had less endurance to fatigue compared to slow twitch fiber (Karlsson et al., 1981). Additionally, as shown in **Table 3** there was significant difference between the time taken for RMS MMG to record a drop to

selected level and the time for the muscle to get fatigued and unable to maintain quiet standing. This assumption enabled the ANN to be more useful in predicting the torque at higher accuracy. With both ZC and RMS a better model can be developed that combines both temporal and spectral domain of the muscle signal.

At the end of the evoked standing session, the irregular torque predicted by the models, as illustrated in **Figure 5**, could be due to gravity effect acted during standing. The biomechanics of standing is illustrated in **Figure 6**. We hypothesized that the amplified torque due to the gravity and the increased distance (d) between the knee joint and the ground reaction force had affected the MMG responses. A research done with similar protocol and SCI subject supported this hypothesis whereby when the knee started to buckle, the MMG amplitude started to increase. The graph from the experiment is shown in **Figure 7** (Mohd Rasid, 2017). However, a biomechanical study which include the study procedure involving biomechanical setup such as ground reaction force plate and a 3D camera system is required to further ascertain this.

This research was limited as presently the ANN model to predict the torque was analyzed only during quiet standing and isometric knee extension. Future studies should include a wider movement pattern such as sit-to-stand movement, which is another nonmeasurable knee torque movement. Different types of inputs such as PTP and ARV in the time domain and MP in the frequency domain could be investigated as well as different types of computer software networks such as support vector machine (SVM). This research also focused on a specific set of parameters for the FES. To our knowledge, there has not been

REFERENCES


any investigation on ANN model that is trained to predict torque in FES standing experiment using MMG. Hence, this study has demonstrated that an ANN model is feasible in predicting torque during isometric knee extension and FES standing. We hope that this study will be used as the basis for development of real-time ANN model to predict torque and thus may contribute to the improvement of the automated control FES during rehabilitation in SCI.

### AUTHOR CONTRIBUTIONS

MD wrote the manuscript supervised by NAH and with critical feedback from GD and NH. The experiment procedure was conceived by GD, NH and NAH. Isometric FES contraction was done by MD and the standing protocol was done by MD, NAH, and NH. The signal processing and analysis were done by MD and NAH.

### FUNDING

This project was supported by the Ministry of Higher Education, Malaysia through HIR Grant no. UM.C/HIR/MOHE/ENG/39 and the University of Malaya research grant UMRG Grant no. RP035A-15HTM.

### ACKNOWLEDGMENTS

We thank the SCI volunteers who participated in this study, the lab assistants and the physiotherapists; Mr. Muhammad Nur Hakim Nadzri, Mr. Hazim Fadzil and Mr. Syuaib for their assistance in carrying out the experiment protocol.



ergometry. J. Electromyogr. Kinesiol. 11, 299–305. doi: 10.1016/S1050-6411(00) 00057-2


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2018 Dzulkifli, Hamzaid, Davis and Hasnan. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

MATLAB version R2015a (2015). The MathWorks, Inc. Natick, MA.

# Rapid Decoding of Hand Gestures in Electrocorticography Using Recurrent Neural Networks

Gang Pan1,2, Jia-Jun Li <sup>2</sup> , Yu Qi <sup>2</sup> \*, Hang Yu<sup>2</sup> , Jun-Ming Zhu<sup>3</sup> , Xiao-Xiang Zheng<sup>4</sup> , Yue-Ming Wang<sup>4</sup> and Shao-Min Zhang<sup>4</sup> \*

*<sup>1</sup> State Key Lab of CAD&CG, Zhejiang University, Hangzhou, China, <sup>2</sup> College of Computer Science and Technology, Zhejiang University, Hangzhou, China, <sup>3</sup> Department of Neurosurgery, The Second Affiliated Hospital of Zhejiang University, Hangzhou, China, <sup>4</sup> Qiushi Academy for Advanced Studies, Zhejiang University, Hangzhou, China*

Brain-computer interface (BCI) is a direct communication pathway between brain and external devices, and BCI-based prosthetic devices are promising to provide new rehabilitation options for people with motor disabilities. Electrocorticography (ECoG) signals contain rich information correlated with motor activities, and have great potential in hand gesture decoding. However, most existing decoders use long time windows, thus ignore the temporal dynamics within the period. In this study, we propose to use recurrent neural networks (RNNs) to exploit the temporal information in ECoG signals for robust hand gesture decoding. With RNN's high nonlinearity modeling ability, our method can effectively capture the temporal information in ECoG time series for robust gesture recognition. In the experiments, we decode three hand gestures using ECoG signals of two participants, and achieve an accuracy of 90%. Specially, we investigate the possibility of recognizing the gestures in a time interval as short as possible after motion onsets. Our method rapidly recognizes gestures within 0.5 s after motion onsets with an accuracy of about 80%. Experimental results also indicate that the temporal dynamics is especially informative for effective and rapid decoding of hand gestures.

#### Edited by:

*Dingguo Zhang, Shanghai Jiao Tong University, China*

#### Reviewed by:

*Andrey Eliseyev, CEA LETI, France Xie Tao, Shanghai Jiao Tong University, China*

#### \*Correspondence:

*Shao-Min Zhang shaomin@zju.edu.cn Yu Qi qiyu@zju.edu.cn*

#### Specialty section:

*This article was submitted to Neural Technology, a section of the journal Frontiers in Neuroscience*

Received: *20 January 2018* Accepted: *20 July 2018* Published: *27 August 2018*

#### Citation:

*Pan G, Li J-J, Qi Y, Yu H, Zhu J-M, Zheng X-X, Wang Y-M and Zhang S-M (2018) Rapid Decoding of Hand Gestures in Electrocorticography Using Recurrent Neural Networks. Front. Neurosci. 12:555. doi: 10.3389/fnins.2018.00555* Keywords: brain-computer interface, electrocorticography, neural prosthetic control, neural decoding, motor rehabilitation

### 1. INTRODUCTION

Brain-computer interface (BCI) is a direct communication pathway between brain and external devices (Wolpaw et al., 2002). BCI systems do not depend on peripheral nerves and muscles, and thus have great potential to provide new rehabilitation options to patients with motor disabilities (Daly and Wolpaw, 2008), toward the big vision of cyborg intelligence (Wu et al., 2013, 2016; Yu et al., 2016). Electrocorticography (ECoG)-based BCI systems, i.e., the semi-invasive BCIs, have better long-term stability than invasive BCIs (Pilcher and Rusyniak, 1993), although neural spikes (Qian et al., 2018; Xing et al., 2018) have high temporal resolution, and contains richer information than traditional non-invasive BCIs, such as EEG (Blankertz et al., 2004; Sun et al., 2016), thus have been considered as an ideal option for applications such as neural prosthesis control (Leuthardt et al., 2004; Schalk et al., 2008).

A key problem in BCI-based neural prosthesis control is decoding movement intentions from brain signals. Hand gestures convey rich information in communication, and hand gesture decoding has attracted a lot of attention recently. Most existing hand gesture decoding approaches fall into two categories: finger movement regression and hand gesture classification. Some typical studies on hand gesture decoding are summarized in **Table 1**. Finger movement regression approaches aim to predict the flexion trajectories of individual fingers (Kubánek et al., 2009; Miller et al., 2012, 2014; Xie et al., 2018). But the flexion trajectories of individual fingers in those studies were generated by the movement of single finger. Very few studies tried to decode flexion trajectories of fingers when multiple fingers move simultaneously (Acharya et al., 2010). According to several finger movement decoding studies, the sites of useful signals in ECoG locate separately in space for different fingers (Miller et al., 2012, 2014). When multiple fingers move simultaneously, although the mixed signals of multiple finger movements could be recorded by ECoG electrodes, the temporal overlapping and spatially sparse sampling makes it difficult.

Instead of predicting the flexion trajectories of fingers, hand gesture classification directly regards hand posture decoding as a classification problem, which is more straightforward for practical solution of prosthesis control. Yanagisawa et al. (2011) proposed a real-time decoding system to classify three hand gestures with a linear classifier. Chestek et al. (2013) proposed to use naive Bayes decoder to effectively classify five hand postures from the ECoG signals. These approaches addressed the strength of ECoG signals in hand gesture classification, however, they extracted features using statistics over a long time window, and thus ignored the dynamics in time. Since the performing of gesture is a process, temporal information in ECoG signals contains potential information for decoding. To capture the temporal information, Bleichner et al. (2016) and Branco et al. (2017) proposed a temporal template matching method to decode four gestures from ECoG signals, and Li et al. (2017) proposed SVM-based short-term window approach to further explore the information in time. With short-term time windows, the temporal patterns of different gestures can be characterized, which provides useful information to improve the accuracy in gesture decoding. However, the sequential relationship among windows was not explicitly modeled for accurate decoding. It is still a problem to further exploit the underlying temporal patterns and structures in ECoG signals to improve gesture decoding.

In this study, we propose an RNN-based decoder to accurately recognize hand gestures in ECoG signals. To capture the underlying temporal information in ECoG signals, we propose to use gated RNN models, i.e., long short-term memory (LSTM) models, to learn the temporal patterns of different gestures. The LSTM model can sequentially update the gates in memory cells to determine which features in the preceding windows should be considered for gesture decoding. To benefit temporal pattern learning, our method selects the most temporally informative features to be input to the LSTM decoder. Specially, we evaluate the features in different channels and frequencies by their decoding performances in temporal patten representation, and select the optimal features using a greedy strategy. Experimental results of two subjects show that our method outperforms other methods with an accuracy of 90% in three gesture recognition. Moreover, we investigate the possibility of recognizing the gestures in a time interval as short as possible after motion onsets. The motion intents can be rapidly recognized within 0.5 s after motion onset. Our method achieves high motion recognition performance with quick response, and is promising for online BCI control of prosthetic and robotic devices.

### 2. METHODS

The framework of our method is shown in **Figure 1**. In our approach, the ECoG signals are firstly divided into sequential short-time segments, and power spectrum features are extracted from each segment. Then we select the most informative signal


*<sup>a</sup>The movement onset time is regarded as time 0.*

*<sup>b</sup>American Sign Language finger spelling alphabet D, F, V and Y, respectively.*

*<sup>c</sup>CC is the abbreviation of correlation coefficients.*



\* *LH, Left hemisphere; RH, Right hemisphere.*

channels along with the frequency bands using a greedy strategy, to compose compact features for decoding. Finally, the features of the segments are sequentially put into a RNN-based decoder for gesture recognition.

### 2.1. Experimental Paradigm and Data Collection

#### 2.1.1. Subjects

The participants in this study were patients with intractable epilepsy, who had implanted temporary intracranial electrode arrays for surgical purpose. The configuration and location of the electrodes were determined by clinical requirements. The clinical electrodes were platinum electrodes with a diameter of 4 mm (2.3 mm exposed) spacing at 10 mm and generally implanted only for a period ranging from several days up to 2 weeks. **Table 2** and **Figure 2** presents the information and implantation details of each participant. During the task, the participants temporarily stopped taking the epilepsy medicine under the supervision of doctors. All participants went through the clinical examination routine of the motor, sensory, language function, and so on through cortical stimulation mapping (CSM), which helped to further and functionally localize the electrodes. In addition, combined with preoperative MRI examination, a computed tomography (CT) scans were used to further confirm the location of the electrodes after the implantation surgery, and none of the hand motor areas were in seizure onset zones

FIGURE 3 | Behavior task paradigm. A trial was initiated by a red cross displayed on the center of the screen along with a verbal cue of "ready". After a short delay, the red cross disappeared and a gesture cue appeared on the screen, and the participant should perform the given gesture and hold it on, until the red dot appeared.

for both participants. All procedures were followed from the guide and approved by the Second Affiliated Hospital of Zhejiang University, China. Participants gave written informed consent after detailed explanation of the potential risks of the research experiment.

#### 2.1.2. Experimental Paradigms

In the experiment, the participants were asked to perform three kinds of hand gestures ("scissors," "rock," and "paper") guided by the cues presented on the screen. As shown in **Figure 3**, a trial began with a verbal cue of "ready," and meanwhile a cross sign displayed at the center of the screen. The cross sign indicates that the participants should relax the task hands and be prepared. During the relax stage, the participants were asked to relax their task hands and flex the fingers slightly with their palms facing up. The relax stage would last for 2–2.5 s randomly. After the relax stage, the cross sign would be replaced by a picture of a randomly selected gesture, and the task stage began. In the task stage, the participants were asked to perform the given gesture instantly, and hold the gesture until a red circle (stop cue) appeared. The task stage would last for 2–3 s randomly. When the stop cue showed, the participants should release the gesture and relax the task hands. At the end of each trial, a verbal feedback "correct" or "wrong" was given by the experimenter to tell the subjects whether it was an eligible trial or not.

During the experiment, if participants failed to hold the gestures until the stop cue, or forgot to release the gestures, the trial was considered to be invalid. The failed trials were then removed from the dataset. Each session contained three blocks, and each block was composed of 50 trials. For both participants P1 and P2, a total of five sessions were involved in the experiment. The participants would have a short break between the blocks. In practice, the number of trials and the duration of each break depended on the medical condition and the willingness of the participants. Experiments were carried out to evaluate the behavioral compliance of the participants by analyzing the finger trajectories after movement onsets. As shown in **Figure 4**, the finger movement trajectories are consistent within the same gestures with small variance (denoted by the thickness of the line). We further analyze the trajectories by clustering after t-distributed stochastic neighbor embedding (t-SNE). As shown in **Figure 4**, the gestures are discriminative for both participants. The results verify the compliance of the behavior task.

#### 2.1.3. Data Acquisition

The ECoG signals were collected at the Second Affiliated Hospital of Zhejiang University. The NeuroPort system (128 channels, Blackrock Microsystems, Salt Lake City, UT) was used to record clinical ECoG signals from subdural electrode grids. The recorded signals were stored continuously during the whole task at the sampling rate of 2 kHz and low-pass filtered with a cutoff frequency of 500 Hz. The hand movement data were collected by a 5DT data glove with 14 sensors (5DT Inc., USA) and each sensor simultaneously recorded the finger flexion values. Since we need to mark the onset time of each movement, we defined the onset of a movement as the moment when five first derivative of the flexion values consecutively exceeded a specific threshold. In order to synchronize the neural signals and the motor data, we marked the timestamps of each cue in the ECoG signal recordings using the event channel of the NeuroPort system.

#### 2.2. Segmentation and Feature Extraction

After data acquisition, both ECoG signals and movement signals are continuous. According to the event timestamps recorded synchronously with the signals, the valid trials could be located and preserved for gesture decoding. Each trial contains three timestamps of events: gesture cue start, hand motion onset (indicated by the glove signals), and gesture cue stop.

For each trial, the ECoG signals between "hand motion onset" and "gesture cue stop" is adopted for gesture decoding. The raw ECoG signals are firstly processed by a common average reference spatial filter for noise removal. For each channel, we calculate the average value of the data of the whole session, then the average is subtracted from the raw signals. After filtering, a sliding window is adopted to divide the signals in trials into small temporal segments. In accordance with previous work (Li et al., 2017), we use a window with length of 300 ms and stride of 100 ms. With the temporal segments, the dynamics during the movement stage could be preserved for further decoding.

Then, the power spectral density (PSD) is estimated for each temporal segments. The PSD is calculated using the Welch's algorithm (Welch, 1967). Since the range of the power in different frequency bands could be different, normalization is required. In our method, we adopt the ECoG signals in the relax stage to provide the baseline for normalization. For each channel, we firstly calculated average PSD of all the data segments obtained in relax period:

$$\bar{R}\_{c\circ f} = \frac{1}{N} \sum\_{i=1}^{N\_{\text{relax}}} R\_{c\circ f}(i),\tag{1}$$

where Rc,<sup>f</sup> (i) is the PSD of channel c and frequency f in the relax segment i, and Nrelax is the total number of segments in the relax stages. Then PSD of the task signals could be normalized by dividing the respective PSD value in <sup>R</sup>¯:

$$S\_{c,f}(i) = \frac{S\_{c,f}(i)}{\bar{R}\_{c,f}}, \quad i = 0, 1, 2, \dots, N\_{task}, \tag{2}$$

where Sc,<sup>f</sup> (i) is the PSD of channel c frequency f in the task segment i.

After normalization, we aggregate the PSD values in frequency bands. According to previous studies (Li et al., 2017), a total of five frequency bands are used: a low-frequency band (4–12 Hz), beta frequency band (12–40 Hz), low gamma frequency band (40–70 Hz), high gamma frequency band (70–135 Hz) and a high frequency band (135–200 Hz). For each frequency band, we calculated the average PSD for each channel:

$$\vec{S}\_{c,t,F} = \frac{1}{F} \sum\_{f=1}^{N} \mathcal{S}\_{c,t}(f),\tag{3}$$

where <sup>S</sup>¯ <sup>c</sup>,t,<sup>F</sup> is the average PSD of tth in band F for channel c, and F is the total number of frequencies in each band.

At last, we put extracted features from small temporal segments in a trail into a matrix with t rows and n columns as a input sequence, where t is the number of windows and n is the number of features. Each input sequence contains t time steps, and n features at each time step. This operation let us able to put features into RNN-based model in a recurrent way, which better characterized temporal information by preserving the sequential information in short-term windows. With the temporal segments, the dynamics during the movement stage could be preserved for further decoding.

#### 2.3. Gesture Recognition

Since the electrode placement was determined by surgery requirements, most channels are unrelated to hand motor activities. The unrelated signals can bring noise in gesture decoding and cause unnecessary computational costs. Therefore, effective feature selection strategy is applied to choose the most informative features for effective and efficient gesture recognition.

#### 2.3.1. Feature Selection

In feature selection, we adopt a greedy strategy-based method to select the most informative channels along with the frequency bands. The greedy strategy performs in an iterative manner. Firstly, we choose the feature with the highest decoding performance using an SVM classifier, and put it into the selected set. Then, at each step, we iteratively choose one candidate feature that improves accuracy the most when combined with the selected features, to be added to the selected set. Since the candidate feature is evaluated together with the selected features, redundant features are not likely to be selected. The iteration stops when the request feature number is reached or there is no improvement of decoding performance after adding the newly selected feature. The greedy feature selection strategy is presented in Algorithm 1.

#### 2.3.2. Recurrent Neural Network-Based Gesture Recognition

After feature selection, the feature representation of a task trial can be denoted as {x1, x2, . . . , xt}, where x<sup>i</sup> is the feature vector at the ith temporal segments. The feature representation takes rich information in both spectrum and temporal dynamic for gesture recognition. Since most classifiers require inputs in the form of vectors, the decoders based on such classifiers need to

#### **Algorithm 1** Greedy Feature Selection

**Input:** Input Feature Matrix F containing N samples of feature vector {fi} N i =1 **Output:** Selected Feature List l

1: Step 0: Initialization

2: Put ith feature with the best accuracy into list l

3: <sup>l</sup> <sup>←</sup> arg max<sup>i</sup> P(fi)

4: Initialize the best accuracy B ← 0

5: Initialize local best accuracy LB ← P(f<sup>l</sup> )

6: Delete f<sup>i</sup>

7:

8: Step 1: Greedy Feature Selection

9: **while** LB > B **do**

10: B ← LB

```
11: l ← arg maxi
                     P(< fi
                            , fl >)
```

```
12: LB ← P(fl
               )
```
13: Delete f<sup>i</sup>

14: **return** l

concatenate the temporal features into a vector. This procedure loses the temporal structure of data, thus leads to inaccurate decoding.

The RNN-based method overcomes this problem by inputting data in a recurrent way. As shown in **Figure 5**, the feature vectors are sequentially put into the model, and the temporal information could be well preserved by the temporal connections. In our method, the LSTM model is adopted (Hochreiter and Schmidhuber, 1997):

i(t) = σ(Wix(t) + Uih(t − 1) + bi), f(t) = σ(W<sup>f</sup> x(t) + U<sup>f</sup> h(t − 1) + b<sup>f</sup> ), o(t) = σ(Wox(t) + Uoh(t − 1) + bo), c(t) = i(t)tanh(Wcx(t) + Uch(t − 1) + bc) + f(t)c(t − 1), h(t) = o(t)tanh(c(t)) (4)

where x(t) is the feature vector at the t-th time window, o(t) is the recognition result output from the model after the last time window, σ(x) is the sigmoid function, c(t) is the memory cell, h(t) is the hidden layer units, and i(t), f(t), o(t) are the input gate, forget gate, and output gate respectively. The memory cell can remember useful information through time, and the gates control how many time windows should be used for the current gesture recognition task. Therefore, in the LSTM model, temporal information can be well preserved for accurate gesture decoding.

### 3. RESULTS

In this section, experiments are carried out to evaluate the gesture decoding performance of our method. Firstly, we examine and analyze the decoding performance of the features selected by different kinds of strategies. Secondly, we test the RNN model with different settings to select the optimal parameters for gesture decoding. After that, the RNN-based decoder is compared with four other competitors to demonstrate the advantages of our method. Finally, we investigate the decoding performance in a time interval as short as possible after motion onsets for rapid gesture recognition. The RNN model is implemented with Keras on the top of TensorFlow.

In the experiment, we have rejected the trials with move artifacts or electrode failures by visual inspection. After removing invalid trails, the dataset includes 243 samples for P1, and 394 samples for P2. In our study, there are a total of three classes of gestures of "rock," "scissors," and "paper."

### 3.1. Feature Analysis

In this section, we analyze the features extracted from the ECoG signals. Firstly, we evaluate the feature selection strategy and assess its influence on the gesture recognition performance. Then, experiments are designed to find the appropriate number of features to be applied in gesture recognition. After that, the channels and frequency bands selected are presented and analyzed.

The performance of the greedy-based feature selection is evaluated in comparison with other methods. Firstly, we evaluate the gesture recognition performance using all the channels and frequency bands by the SVM classifier to serve as the baseline in the experiment. Then, an optimal-based feature selection strategy, which independently selects the top N features with the best decoding performance, is implemented and compared. The settings of competitors in this experiment are as follows:


In this experiment, the signals are divided into temporal segments using a 300 ms sliding window with a stride of 100 ms, and a total of 10 temporal segments following the movement onsets are used. The performance is presented in the average accuracy of 3-fold cross-validation. In gesture classification evaluation, we apply 10-fold cross-validation, for each fold in cross-validation, we randomly select 20% of the training dataset as validation dataset to select the hyper-parameters.

As shown in **Figure 6**, we compare the feature selection strategies using the accuracy of the gesture recognition performance. Results show that the baseline method using all the features obtains high performance. With the feature selection strategies, performance close to the baseline can be achieved using only a small set of features. It is because the useless channels could bring noises in classification. Besides, the large amounts of features (both P1 and P2 have 32 signal channels, the total feature number is the product of the number of channel, the number of frequency, and the number of temporal segment) lead to high computational costs.

We also compare the performance using a different number of features. Compared with the optimal-based strategy, the greedy strategy achieves better performance on both of the participants. In the greedy strategy, since the candidate feature is evaluated together with the selected features, redundant features are not likely to be selected, and thus more informative feature sets could be obtained.

the points that the greedy algorithm stops, and the performance converges after the points.

Here, we present the statistical analysis of the channels and the frequency bands selected by our method. The feature distribution of frequency bands is shown in **Figure 7**, which shows that the most useful bands are 70–135 Hz and 135–200 Hz. The results indicate that high frequency bands in ECoG are highly correlated to hand motions, which is in agreement with previous studies (Bleichner et al., 2016; Branco et al., 2017).

For the number of features, we only used the first six features selected by the greedy algorithm. It is because, although using more features can still lead to improvement of performance as in **Figure 6**, the later selected electrodes can not bring much improvement. Besides, since the dataset is small, a slight improvement can be brought by overfitting instead of useful information. The channels and their corresponding frequency bands are shown in **Table 3**. The corresponding electrodes for the features are illustrated in **Figure 2**. Most of the selected electrodes are close to the central sulcus and within the sensorimotor region, which is in accordance with existing studies (Li et al., 2017).

### 3.2. Performance of Gesture Recognition

In this section, we evaluated the decoding performance of our method. Firstly, in order to maximize the performance of the classifiers, experiments are carried out on the validation dataset to select the optimal model settings. Secondly, we compare our method with other decoders to demonstrate the effectiveness of temporal information, and the ability of RNN in ECoG time series decoding.

### 3.2.1. Model Selection

Experiments are carried out to select the optimal setting for the LSTM RNN model. For the LSTM model, one important setting is how many hidden units are used. Models with a small set of

TABLE 3 | The channels and frequency bands selected by the greedy-based strategy.

the number of the selections in each frequency band.


hidden units may not be useful to encode the information, while models with large sets of hidden units are prone to overfitting.

In this experiment, we tune the number of hidden units from 8 to 128 to test the performance of the LSTM model. In this experiment, we use the top six features selected by the greedy strategy, and the settings of temporal segments are the same as in section 3.1. As shown in **Figure 8**, the LSTM model with 32 hidden units got the best performance (90.56% on for P1 and 88.18% for P2) for both participants on validation dataset. Therefore, we use 32 hidden units for gesture recognition in our decoder. In model training, we use Adam optimization algorithm, the learning rate was set to be 0.001 with a decay rate of 0.0005 for each epoch. An early stop was applied by selecting the epoch with the best performance on the validation set.

### 3.2.2. Comparison With Other Methods

In this experiment, comparison is carried out between our method and other decoders. We firstly compare our method with decoders using long time windows to evaluate the effectiveness of temporal information. Then our decoder is compared with other classifiers to demonstrate the strength of RNN models in sequential modeling. For the competitors, we carefully select typical segment-based ECoG/EEG classification approaches from the existing studies, including linear and nonlinear methods. For linear method, we choose the widely used logistic regression method as in Subasi and Erçelebi (2005). For nonlinear method, we choose the classical SVM classifier with RBF kernel as in Li et al. (2017) for comparison. We also compare the segmentbased approaches with the method using long time windows to show the effectiveness of temporal information. In order to demonstrate the effectiveness of recurrent structure, we compare our method with an MLP-based approach as in Chatterjee and Bandyopadhyay (2016), to evaluate advantage of weight sharing of RNN models. In this experiment, the signals are divided into temporal segments using a 300 ms sliding window with a stride of 100 ms. A total of 10 temporal segments are used. Thus, each input sequence contains 10 time steps, and 6 features at each time step for our RNN model.

In this experiment, we evaluate our method in comparison with other methods using a permutation test. In each permutation trial, we randomly select 10% of the data for test, and run a total of 500 trials. We also examine the significance of the results using paired t-test.

The implementation and settings of the competitors in this experiment are as follows:


FIGURE 8 | Performance of different number of hidden units in RNN. The LSTM model with 32 hidden units got the best performance (90.56% on for P1 and 88.18% for P2) for both participants on validation dataset. The black dash line represents the standard deviation.


• **LR-Segments:** a logistic regression based decoder from previous work (Subasi and Erçelebi, 2005). The segment settings are the same as the RNN method. The features in sequence are reshaped into a single vector to input to the LR classifier.

The results are shown in **Table 4**. Overall, the RNN-based decoder obtains the highest accuracies for both participants. For participant P1 the gesture recognition accuracy is 89.34%, and for participant P2 the gesture recognition accuracy is 90.83%. Among the competitors, the SVM-Global gives the worst performance. It is reasonable since it calculates the features using the whole time window and ignores the information in time. The SVM-Segments method improves the accuracy by 7.48 and 8.72% for P1 and P2 respectively, by using the temporal segments. The results demonstrate the importance of considering the temporal information in ECoG decoding. The significance of the results are evaluated using paired t-test. Results show that our method statistical significantly outperforms other approaches under significance of 0.01 (see **Table 5**).

### 3.3. Rapid Recognition

Quick recognition is an important issue in BCI-based prosthetic control. In this section, we investigate the possibility of TABLE 5 | *P*-value of paired *t*-test in comparison with other methods.


recognizing the gestures in a time interval as short as possible after motion onsets. In the experiments, we tune the time interval from 100 ms to 1,200 ms after motion onsets. For each time interval, the ECoG signals are divided using a 300 ms sliding window with a stride of (t − w)/9 ms, where t is the time interval and w = 300 ms is the length of the sliding window. If the time interval is <300 ms, we use a w = t/2 ms sliding window with a stride of w/9 ms. A total of 10 temporal segments are used. We evaluate the performance using a permutation test. In each permutation trial, we randomly select 10% of the data for test, and run a total of 500 trials.

The results are shown in **Figure 9**. As the time interval become longer, better gesture decoding performance could be obtained. The results of this experiments also demonstrate the possibility of rapid recognition. As shown in **Figure 9**, recognition accuracies of over 75% could be obtained at the 0.3 s interval for both of the participants. If we use a 0.5 s time interval, the gesture recognition accuracy is over 80%. The results also indicate that, the temporal dynamic is especially informative for quick decoding within short time intervals. The significance of the results is evaluated using paired t-test, and our method outperforms both SVM-Global and SVM-Segment significantly with p < 0.01. The details of the t-test results are shown in the **Supplementary Table 1**.

### 4. DISCUSSIONS

In this study, we have shown that ECoG signals provide useful information for effective hand gesture classification, and demonstrated the importance and effectiveness of temporal in formation in gesture decoding. Compared with the existing approaches, our method explore further on the temporal information in ECoG signals to achieve more accurate hand gesture decoding. Bleichner and Branco et al. (Bleichner et al., 2016; Branco et al., 2017) proposed to use temporal template matching of local motor potential (LMP) for each channel for gesture decoding. Compared with their approaches, our method considered temporal information in different frequency bands, and modeled patterns and underlying relationships using the RNN decoder. Li et al. (2017) proposed to model temporal information in the ECoG signals using short-term time windows and SVM classifier. In their approach, the features in temporal sequence were reshaped into a vector for classification, which broke the temporal structure of the features. Different from their method, our RNN-based decoder input features in a recurrent way, which better characterized temporal information by preserving the sequential information in short-term windows. Elango et al. (2017) proposed to use RNN-based models to classify individual finger movements. Different from their approach which manually selected the ECoG channels and frequencies from empirical observations, our method selected the optimal channels and frequencies with a greedy strategy to provide the most useful temporal information for gesture decoding. Overall, our method further exploited the temporal information of ECoG signals in both feature selection stage and gesture decoding stage, and recognized three hand gestures with a high accuracy of 90%. Besides, our results provided evidence for the possibility of rapid recognition. As shown in **Table 1**, most existing methods require long detection delays (from 1.2 to 2.6 s) to achieve high performance, which leads to poor user experience in real-time prosthesis control. In our system, quick response can be achieved within 0.5 s with an accuracy of 80%, which is promising for online applications.

Although our model achieved great results on ECoG signals, the details of temporal information still need a discussion. The temporal dynamic of different gestures is illustrated in **Figure 10**. The color presents the feature values of six features in different

FIGURE 10 | Temporal dynamic of different gestures. (A,B) are the feature values of 3 gestures averaged from all the samples for participant P1 and P2, respectively. Each subfigure illustrates the averaged feature values of the six selected features of ten time windows.

time. The features are ordered by the selection order as in **Table 3**. The horizontal axis denotes the time windows, where 0 is the movement starting point. As described in section 3, the window length is 300 ms with a stride of 100 ms. It is shown that, the features contain varying patterns in time. Most of the features show higher values in the first several time windows and the values decrease with time. One exception is the fourth feature for P2, which has small values shortly after movement onset. It is reasonable because the feature covers low frequency band (12–40 Hz). The feature might be chosen under overfitting. We also evaluate the importance of each feature for different gestures. In **Figure 11**, we present the mutual information of each features to the gesture labels. For P1, the most informative features are the 1st and the 2nd (the corresponding electrodes are 3 and 11 respectively), for P2, the most informative features are the 1st, 2nd, and 5th (the corresponding electrodes are 13 and 20 respectively). The most informative electrodes are close to the central sulcus. For P2, although the 5th feature is informative, the selection priority is not high. It might because the feature is correlated to the early selected features. Therefore, it is not preferable in the greedy algorithm. In addition, the results in feature selection show that most of the selected electrodes are distributed along both sides of the postcentral gyrus in two participants, which is in accordance with existing studies (Pistohl et al., 2012; Wang et al., 2012; Chestek et al., 2013). The results suggest that the activation of the postcentral gyrus play an influential role in hand movement. This phenomenon is probably due to the motor control copy or the force-related feedback.

### 5. CONCLUSION

In this study, we proposed a RNN-based method to exploit the temporal information in ECoG signals for rapid and robust gesture recognition. Compared with the existing approaches using linear methods or SVM classifiers, the RNN model better preserved the structure in feature sequence and was capable of learning from nonlinear relationships. Our system recognized three hand gestures with a high accuracy of 90%, and quick response was achieved within 0.5 s with an accuracy of 80%. The results showed that ECoG signals provide useful information for effective hand gesture classification, and demonstrated the possibility of rapid recognition. The results provided further evidence for the feasibility of robust and practical ECoG-based control of prosthetic devices.

### 6. ETHICS STATEMENT

This study was carried out in accordance with the recommendations of the institutional review board ethical guidelines of the Second Affiliated Hospital of Zhejiang University with written informed consent from all subjects. All subjects gave written informed consent in accordance with the Declaration of Helsinki. The protocol was approved by the Medical Ethical Committee of the Second Affiliated Hospital of Zhejiang University, China.

### REFERENCES


### AUTHOR CONTRIBUTIONS

GP and S-MZ conceived and designed the experiment. S-MZ and J-MZ collected and preprocessed the clinical data. J-JL and HY performed the data analysis. GP, YQ, X-XZ, and Y-MW provided advice on the analysis and interpretation of the final results. YQ, GP and J-JL wrote the paper.

### FUNDING

This work was partly supported by the grants from National Key Research and Development Program of China (2017YFB1002503, 2017YFC1308501), Zhejiang Provincial Natural Science Foundation of China (LR15F020001, LZ17F030001), and National Natural Science Foundation of China (No. 61673340, No. 31627802).

### SUPPLEMENTARY MATERIAL

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fnins. 2018.00555/full#supplementary-material


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

The reviewer XT and handling Editor declared their shared affiliation.

Copyright © 2018 Pan, Li, Qi, Yu, Zhu, Zheng, Wang and Zhang. This is an openaccess article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Muscle Synergy Analysis of a Hand-Grasp Dataset: A Limited Subset of Motor Modules May Underlie a Large Variety of Grasps

Alessandro Scano<sup>1</sup> \*, Andrea Chiavenna<sup>1</sup> , Lorenzo Molinari Tosatti <sup>1</sup> , Henning Müller <sup>2</sup> and Manfredo Atzori <sup>2</sup>

1 Institute of Intelligent Industrial Technologies and Systems for Advanced Manufacturing (STIIMA), Italian National Research Council (CNR), Milan, Italy, <sup>2</sup> Information Systems Institute, University of Applied Sciences Western Switzerland (HES-SO), Sierre, Switzerland

Background: Kinematic and muscle patterns underlying hand grasps have been widely investigated in the literature. However, the identification of a reduced set of motor modules, generalizing across subjects and grasps, may be valuable for increasing the knowledge of hand motor control, and provide methods to be exploited in prosthesis control and hand rehabilitation.

#### Edited by:

Dingguo Zhang, Shanghai Jiao Tong University, China

#### Reviewed by:

Shingo Shimoda, RIKEN Center for Brain Science (CBS), Japan Kazutaka Takahashi, University of Chicago, United States

> \*Correspondence: Alessandro Scano alessandro.scano@stiima.cnr.it

Received: 02 May 2018 Accepted: 27 August 2018 Published: 25 September 2018

#### Citation:

Scano A, Chiavenna A, Molinari Tosatti L, Müller H and Atzori M (2018) Muscle Synergy Analysis of a Hand-Grasp Dataset: A Limited Subset of Motor Modules May Underlie a Large Variety of Grasps. Front. Neurorobot. 12:57. doi: 10.3389/fnbot.2018.00057

Methods: Motor muscle synergies were extracted from a publicly available database including 28 subjects, executing 20 hand grasps selected for daily-life activities. The spatial synergies and temporal components were analyzed with a clustering algorithm to characterize the patterns underlying hand-grasps.

Results: Motor synergies were successfully extracted on all 28 subjects. Clustering orders ranging from 2 to 50 were tested. A subset of ten clusters, each one represented by a spatial motor module, approximates the original dataset with a mean maximum error of 5% on reconstructed modules; however, each spatial synergy might be employed with different timing and recruited at different grasp stages. Two temporal activation patterns are often recognized, corresponding to the grasp/hold phase, and to the pre-shaping and release phase.

Conclusions: This paper presents one of the biggest analysis of muscle synergies of hand grasps currently available. The results of 28 subjects performing 20 different grasps suggest that a limited number of time dependent motor modules (shared among subjects), correctly elicited by a control activation signal, may underlie the execution of a large variety of hand grasps. However, spatial synergies are not strongly related to specific motor functions but may be recruited at different stages, depending on subject and grasp. This result can lead to applications in rehabilitation and assistive robotics.

Keywords: muscle synergies, centroids, synergies clustering, hand grasps, spatial synergies, temporal components, NinaPro Database

## INTRODUCTION

The use of the hands is one of the most crucial capabilities for daily activities. The loss of a hand can substantially reduce the quality of life of a person, since it strongly affects physical capabilities in performing activities of daily living (ADL) and it represents a relevant social problem considering that people with a major upper limb loss were ∼41,000 in USA in 2005. The number of amputees is expected to double by 2050 (Atkins et al., 1996; Ziegler-Graham et al., 2008).

Hand grasps are mainly composed of two main stages: the reach-to-object and the grasp itself. The first phase is divided into two sub-phases, consisting of the transport of the hand done by the arm, whose motion law is characterized by a bellshaped velocity (Fan et al., 2006), and the hand pre-shaping, required for adapting the hand to the object to grasp, which occurs after ∼60–70% of the reaching phase (Hu et al., 2005). The grasp phase is determined by several parameters, including the force closure (force needed to close the hand around the object and to achieve a stable grasp), grasp stability (the ability to resist external forces), and grasp security (resistance to slippery objects, which is depending on the configuration of the grasp; Cutkosky, 1989; Cipriani et al., 2008). A third phase is reported in some articles (Liarokapis et al., 2013) and represent the release of the object; a fourth phase can be considered too, involving the return of the arm and hand to the rest position.

Hand grasps have been investigated mainly in the domain of finger joint kinematics and past studies have developed qualitative taxonomies to describe and cluster different types of grasps (Cutkosky, 1989). The main distinction among grasps was between power grasps and precision grasps but many other features can be taken into account for grasp characterization, such as the limb configuration for the task execution or the geometry of the object to grasp.

Considering the complexity of hand control, involving a remarkable number of degrees of freedom and redundancy, both at the muscle and skeletal levels, many studies in the literature applied feature extraction methods to identify a subset of the original data for an accurate description of hand functioning, even if reduced in dimensionality.

A recent study (Jarrassé et al., 2014) investigated a set of hand grasps by considering a 15-degree-of-freedom (dof) Cyberglove. The study used a Principal Component Analysis (PCA)-based technique for the extraction of kinematic motor synergies and showed that no more than 4 PCs are needed to explain ∼95% of the total variation. The first and second PCs accounted for about 90% of data variation, leading the author to suggest that these two components might be enough to control (or even mechanically design) an upper-limb prosthesis, even if pattern refinement can be achieved by adding further PCs. In Patel et al. (2017), kinematic synergies were extracted by using a PCA-based algorithm. While the first PC accounts for more than half of the total variation, the rest is distributed across many PCs, indicating that a quite large set of motor modules is needed to reconstruct the original kinematics. Seven synergies were extracted in Thakur et al. (2008) for the explanation of >90% of the total variance of a set of hand-grasps and hand motions. A comprehensive study on hand grasps by Santello et al. (1998) suggests that the modules that underlie the control of the hand are basically two. However, the study also remarks that the remaining variation, accounted for by further synergies, is not due to noise but to motor control modules needed for fine tuning.

The fact that a limited number of modules may account for a large variety of grasps is thus commonly deduced from the literature. A recent study by Prevete et al. (2018) investigated the hypothesis of sparsity applied to kinematic synergies during hand grasps. According to this study, sparsity might be found both at the spatial synergy level (indicating that spatial modules may incorporate only some joints or muscles) and in the coordination of the synergies, in which only a reduced number of overlapping modules contribute to the execution of an action. A combination of the two conditions, called double sparsity hypothesis, can happen as well. This concept fits well with previous research on dimensionality reduction, with the addition that sparsity could partially explain the different number of synergies extracted in different studies (together with varying study designs).

Despite the kinematic patterns being exploited more often for hand analysis, some studies have investigated the dimensionality reduction problem from the point of view of muscle synergies. The muscle synergy approach is based on decomposition algorithms that identify groups of co-activating muscles (synergies) that are coordinated by time-varying activation commands. The extracted patterns may be influenced by several factors regarding sEMG, including fatigue, sweating, changes in electrode or arm positioning (Farina et al., 2014), clinical parameters of the subjects (e.g., level of the amputation, phantom limb sensation intensity; Atzori et al., 2016), the BMI (Atzori et al., 2014b), other anatomical characteristics of the subjects (Farina et al., 2002) or training in using myoelectric prostheses (Cipriani et al., 2011). Few studies addressed these effects until now, and the effect on the resulting muscle synergies. Considering upper limb synergies, Ortega et al. observed that synergy structure was conserved with fatigue, but interestingly synergy activation coefficients decreased on average by 24.5% with fatigue development (Ortega-Auriol et al., 2018). In Tagliabue et al. (2015) two-digit grasping is analyzed. A reduced number of modules (2–3) is needed to explain the largest part of the variation for each grasp and the correlation between muscle and kinematic primitives is suggested, justifying synergybased analysis in both domains. Considering two arrays of sEMG-electrodes, positioned distally and proximally on the forearm, Castellini and van der Smagt (2013) found that the combination of 3 muscle synergies could account for a set of 5 hand grasps, on both sets of the electrodes. The "main synergy" represents a "global, indistinct" co-activation pattern, while the other two synergies account for dorsal and ventral patterns, respectively.

Overduin et al. (2008) used the time-varying muscle synergy model to analyze a set of 25 grasps of two monkeys and found that three synergies could explain 71% of the total sEMG variation for proximal muscles, 83% for the wrist and extrinsic hand muscles and 81% among intrinsic muscles. The first of the three synergies was linked to the muscles involved in the reach phase operated by proximal muscles and distal flexors, the second was characterized by bimodal activation of distal muscles and the third, more related to the transport of the object, featured by proximal muscles and distal extensors.

The main challenge of using muscle synergies to analyze hand grasps is represented by the impossibility to track all the muscles involved in the grasps, as hand muscles are hard to acquire due to their small size, which can easily produce crosstalk, and due to encumbrance of probes/wires on the palm of the hand that can prevent a physiological grasp execution. Nevertheless, the reduction of the dimensionality is still a crucial process for the comprehension of the patterns underlying hand use and grasps. In fact, motor modules are considered to be the basis of motor control organization at the neural level (Schmidt, 1975; d'Avella et al., 2006). Furthermore, once recognized, the basic modules might be employed as references for the study of motor control, to evaluate pathological conditions and to control prosthetic devices. Dexterous, naturally controlled surface electromyography (sSEMG) prostheses would better allow amputees to perform personal needs such as eating or using tools. Prosthetics companies and scientific research are advancing toward this, but dexterous naturally controlled prosthetic hands are not yet available, in the market as well as in scientific research mainly due to control problems (Atzori and Muller, 2015) related to robustness. Clinical parameters of the amputation were demonstrated to affect control capabilities (Atzori et al., 2016). In order to foster the improvement of control systems for sEMG hand prostheses, a publicly available dataset for robotic hand prosthesis control (the Ninapro database<sup>1</sup> ) was released in 2014 (Atzori et al., 2014a), and extended with several additional datasets afterwards (Krasoulis et al., 2017; Pizzolato et al., 2017). Currently, the database includes over 120 subjects (including 11 trans-radial amputees), repeating as naturally as possible up to 53 hand movements with several acquisition setups ranging in price from a few hundred to several thousand dollars. The aim of Ninapro is to foster the improvement of the field by allowing the development and test of advanced machine learning methods. However, the path to natural control of dexterous prosthetic hands can also be paved by the simplification of the problem, for instance via the identification of a set of motor primitives sufficient to control a comprehensive set of hand grasps.

The application of muscle and postural hand synergies to myoelectric hand prostheses development and low level control was recently suggested in literature and tested in specific settings, while high level control strategies are still not extensively explored. The application of postural hand synergies to hand prostheses development is particularly evident in the development of the PISA/IIT Softhand, a robotic hand actuated by a single motor (Catalano et al., 2014). The application of postural hand synergies to low level control approaches can be defined as controlling a dexterous robotic hand with few (usually 4) independent input signals that modulate some of the first synergies (usually the first one-two) in the robotic hand, leading the robotic hand to reproduce several hand grasps (Matrone et al., 2010, 2012; Segil and Weir, 2013).

In the literature, there are several open points regarding hand grasp synergies that can be investigated in more detail. Some of the more refined studies, providing state-of-the-art methods, involve a large variety of grasps but a limited number of subjects, or map a reduced number of grasps compared to the ones that are needed for daily life activities, lacking generalization of results. Furthermore, a limited number of studies focuses on muscle patterns rather than on hand kinematics. Lastly, most studies focused especially on the spatial organization of motor modules, while the temporal components were less analyzed.

Following the previous considerations, the aim of this study is threefold. First, to provide a set of benchmark muscle hand synergies extracted from the publicly available NinaPro database, that includes a considerable number of subjects while repeating a comprehensive number of hand grasps; second, to evaluate the effects of the reduction of dimensionality of the dataset on the accurateness in reconstructing the original dataset of synergies; third, to characterize the spatial and temporal features of the subjects included in the dataset.

### MATERIALS AND METHODS

### Acquisition Set-Up

The flow-chart of the study is portrayaed in (**Figure 1**). The acquisition setup included 12 surface EMG (sEMG) electrodes and a data glove. The sSEMG electrodes were a double differential Delsys Trigno wireless system, measuring the myoelectric signals at 2 kHz with a baseline noise inferior to 750 nV RMS. The Trigno integrated a 3-axes accelerometer sampled at 148 Hz. Electrode positioning was performed with the aim of combining precise anatomical positioning (DeLuca, 1997) and a dense sampling approach (e.g., Fukuda et al., 2003). Eight electrodes were equally spaced around the forearm at the height of the radio-humeral joint. Four electrodes were placed on the main activity spots, respectively, of the flexor and the extensor digitorum superficialis, the biceps and the triceps brachii, which were identified by palpation by trained researchers by trained researchers (**Figure 2**). The data glove (CyberGlove II, CyberGlove Systems LLC 2) allowed to measure hand kinematics using 22-sensors. Considering that the primary objective of this study was to characterize the hand grasps rather than the dynamics of the reaching phase at proximal level, the choice of the NinaPro database is reasonable, since it includes recordings from extrinsic hand muscles.

### Participants

The data used in this experiment were from the publicly available NinaPro database that currently includes 7 datasets of sEMG and kinematic data from over 120 subjects (including 11 transradial amputees), performing (or imagining to perform) up to 53 different hand movement (Atzori et al., 2016). The datasets used forthis study were from the second dataset (DB2), which includes 40 intact subjects. A 28-subject subset of the original dataset was

<sup>1</sup>Ninapro database: http://ninapro.hevs.ch

FIGURE 1 | Study flowchart. \*Twelve subjects were excluded from analysis because noise was found on at least one of the SEMG channels in some grasps. The decomposition algorithm applied to extract synergies would be influenced, even in case of removal of the affected channels from the analysis. Consequently, 12 subjects were discarded.

used for this study. The subjects include 19 males, 9 females; 24 right handed, 4 left handed; average age 29.64 with standard deviation 3.1 years (data summarized in **Table 1**). Twelve subjects were excluded from the analysis because the proper extraction of synergies was prevented by noise of the sEMG channels. The decomposition algorithm applied to extract synergies would have been influenced, even in case of removal of the affected channels from the analysis.

### Experimental Protocol for Acquisition

This section briefly describes the acquisition protocol. For more details about the protocol, please refer to Atzori et al. (2014a). TABLE 1 | Summary of the demographic data of the involved subjects.


During the experiment, the subjects were asked to sit at a desktop with the arms relaxed on the table and to repeat a set of movements with their right hand as naturally as possible. The entire experiment included 49 movements plus rest, divided into three exercises and extracted from the ADL literature, thus including movements from categories, such as personal needs, eating or use of tools (Smurr et al., 2008). In this work, we consider only the set of hand grasps, i.e., the first 20 movements of the second exercise (**Figure 3**). The subjects were asked to repeat the movements represented in short films that were shown on the screen of a laptop with their right hand and they were asked to concentrate on mimicking the movements rather than on exerting high forces. Each movement was repeated 6 times, with each repetition lasting 5 s and separated by the other movements by 3 s of rest. The experiment was approved by the Ethics Commission of the Canton Valais (Switzerland) and before data acquisition, the subjects were given a thorough written and oral explanation of the experiment itself and were asked to sign an informed consent.

### Data Analysis: Synergies Extraction

The Data Analysis was fully performed with Matlab 2014a with custom-developed software. First, kinematic recordings ("restimulus" signal of the NinaPro database) were used to separate movement phases. Data from 12 sEMG channels were bidirectionally high-pass filtered at 50 Hz (Butterworth

filter, 7th order) to remove motion artifacts, rectified, Hilberttransformed (Matlab hilbert), low-pass filtered with a cut-off frequency of 10 Hz (Butterworth filter, 7th order) to remove noise with mono-directional filtering. sEMG data from each subject and each trial were pooled in single aggregated matrices and synergies were extracted using the non-negative matrix factorization (NMF) algorithm (Cheung et al., 2005; Tresch et al., 2006). The NMF decomposes the sEMG matrix into the product of two matrices, the first one representing timeinvariant, neurally coded synergies (wi), and the second one representing time-variant activation commands for each synergy (ci), as in Equation (1):

and are stored in the publicly available Ninapro Database.

$$EMG(t) = \sum\_{i=1}^{N} c\_i w\_i \tag{1}$$

where, for each of the recorded muscles, sEMG(t) represents the sEMG data at time t and N is the total number of extracted synergies.

The order of the factorization r was chosen, increasing from 1 to 50 (to limit the dimensionality for synthesis). For each r, the NMF algorithm was applied 1,000 times in order to avoid local minima. The repetition accounting for the highest variance of the signal was chosen as the representative of order r. The number of synergies was chosen as the minimum r explaining at least 90% of the variance of the signal (Clark et al., 2010). Further synergies were added only if the total amount of variation was increased of at least 5% for each further synergy.

### Synergy Clustering

In the literature of motor synergies, standard analysis methods may include the definition of clusters to group synergies according to their spatial composition. The set of extracted synergies can be clustered to obtain a limited number of spatial patterns, each one represented by a centroid (mean spatial synergy).

In this work, the extracted synergies were included into a single cluster analysis. Grouping all the modules could lead to complex matching between each spatial component and the corresponding motor function (Scano et al., 2017). In fact, it was reported in Roh et al. (2013) that synergies related to the same motor function may split into two or more clusters. As a consequence, the correspondence between the phases of the grasps and the motor synergy recruitment is not always clearly identifiable. In fact, in the majority of the cases, the synergy prevailing in terms of magnitude of the temporal components is the one characterizing the moment of the grasp hold. However, a relevant number of subjects may show patterns more complex to identify.

However, performing the clustering procedure on the whole dataset allowed to provide a comprehensive overview of all the modules involved in hand grasping tasks. Furthermore, a comprehensive mapping of hand grasps is proposed by considering the whole dataset for analysis.

The cluster analysis was conducted using the k-means clustering algorithm. The algorithm was applied to an aggregated matrix containing the whole dataset of muscle synergies extracted from all subjects. Each clustering order, ranging from 1 (minimum) to 50 (maximum), was tested by repeating the algorithm 200 times and selecting the best solution for each order according to the metrics described in the following section.

### Selection of the Number of Clusters

The selection of the appropriate number of clusters (mean spatial synergies, each one represented by a cluster centroid) was made by pondering the following metrics (Bora et al., 2014):


When N = 1, the clustering procedure classifies a population within a single group: thus, the cluster solution 1 is (implicitly) the mean of a population, and corresponds to the lowest level of precision in approximating a population with a clustering procedure. Following the previous considerations, the Normalized Euclidean Distance (NED) was computed by considering the cluster solution 1 as the source of maximum clustering error, which was set to 1. Thus, the NED for each clustering order i was computed as:

$$NED(i) = \frac{MED(i)}{MED(1)}\tag{2}$$

3) The slope of the Normalized Euclidean Distance (NED') is NED derivative. NED indicates how the precision of the cluster analysis increases when increasing the order of the clustering.

Each of the previous three metrics can be considered for the choice of the clustering order, by imposing a threshold on the reconstruction accuracy.

Whatever metric is selected, the choice is driven by the principle of using a parsimonious number of clusters for synthesis power (the lowest possible number of clusters, given a reasonable descriptive precision). The threshold selected by the experimenters in this work was 5%. Consequently, the number of clusters was selected as the minimum number needed to have the NED < 0.05.

The hypothesis that justifies the use of cluster analysis is that the dataset can be represented with a chosen number of cluster centroids depending on the maximum error that the experimenter is willing to accept. Depending on the application, the tolerance can be increased or reduced, describing the original dataset of motor modules with a specific level of precision (and a choice of dimensionality).

### Spatial and Temporal Components Analysis

The characterization of the obtained mean spatial synergies was furtherly specified by considering all the pairwise dot products between their compositions. Each temporal component, initially associated with its respective spatial synergy, was matched to its relative centroid after cluster identification. Then, all the temporal components were averaged to extract a mean temporal component for each cluster, representing a mean activation of the spatial synergy in time. Finally, the characterization of temporal components was concluded by considering the correlations between the mean temporal components.

### Summary of Outcome Measures and Statistics

Given the aims of the study (see Introduction): "First, to provide a set of benchmark muscle hand synergies extracted from publicly available data<sup>1</sup> including a considerable number of subjects that perform a comprehensive number of hand grasps; second, to evaluate the effects of the reduction of dimensionality of the dataset on the accurateness in reconstructing the original dataset of synergies; third, to characterize the spatial and temporal features of the sample of subjects included in the dataset," the following outcome measures were defined:

Outcome 1: Definition of the complete dataset of extracted muscle synergies of healthy subjects in freely executed grasps; methods and statistics: NMF algorithm for factorization; 90% of the VAF + minimum slope 0.05 for each further extracted synergy.

Outcome 2: Definition of cluster centroids for muscle synergies in freely executed grasps; methods and statistics: k-means clustering; lowest normalized Euclidean distance to define the number of centroids.

Outcome 3a: Characterization of the spatial composition of the centroids; methods and statistics: dot products between pairwise centroids to assess their difference in composition.

Outcome 3b: Characterization of the temporal features of the centroids; methods and statistics: Pearson correlations between temporal components.

## RESULTS

### Extracted Synergies

The extracted synergy dataset is summarized in **Figure 4** by portraying the mean spatial synergy compositions and cumulated temporal component profiles. Synergies were grouped within grasps, and matched according to the similarity of their temporal components, computed with the Pearson's correlation coefficient. For compactness of the representation, only the first two synergies of each extracted dataset were portrayed (while, three modules were extracted in some grasps).

### K-Means Cluster Order Selection

The whole dataset of spatial synergies, which is composed of 966 extracted modules, was clustered according to the k-means algorithm, with a clustering order ranging from 1 to 50. **Figure 5** shows the graphs with the metrics used for the choice of a reasonable number of clusters as synthetic representation of the spatial synergies of the dataset. Increasing the order of the clustering leads to a

FIGURE 4 | The whole dataset of synergies extracted for each grasp is synthetically reported, coupled with the corresponding cumulated temporal components. For each grasp (numbered 1–20 as in the order shown in Figure 2), the mean spatial synergies are reported. The mean spatial synergies are computed by averaging the spatial synergies grouped by matching each subject's spatial synergies according to the Pearson's Correlation coefficient computed on the temporal components. Only the first two modules are reported for each grasp (module 1, reported in blue, exploited during the grasp phase, and module 2, depicted in green, used mainly in the pre-shaping and release phases). Mean spatial synergies are also coupled with the cumulated mean temporal components that modulate in time the mean spatial synergies, plotted as percentage of the normalized duration of each movement.

monotonic decrease of the NED. Thresholding the NED (at 0.05, as explained in the methods), only 10 clusters are needed to approximate the original dataset. It can also be observed that a further increase of the order of the clustering provides only slightly increased precision in describing the

maximum normalized Euclidean error was reasonably set at 0.05\*SSm, corresponding to a 10-cluster solution.

### Clustering on Spatial Synergies and Temporal Components Analysis

dataset.

The results of the clustering procedure are shown in **Figure 6**. The 10 identified centroids (mean spatial synergies) are portrayed (composition coefficients), along with the number of synergies of the original dataset that are addressed to each centroid, expressed as percentage of the original dataset. It can be seen that the extracted synergies are quite uniformly distributed on the centroids, each one representing between 7 and 15% of the original dataset of motor modules. **Figure 7** depicts a polar and histogram-based representation of the extracted mean spatial synergies, along with the associated mean temporal components. The temporal components are shown for each of the spatial modules referring to each of the centroids, along with their mean. Analysis of temporal components shows that some centroids are found mostly in the central phase of the grasp (e.g., centroid 3 and centroid 6), while others mainly in the pre-shaping and release phases (e.g., centroid 1 and centroid 2). Following these results, in order to provide characterization of the summarized groups of motor modules, the similarity of mean spatial synergies and temporal components was assessed as well. **Figure 8** shows the similarity, expressed as the dot product, among all the pairwise mean spatial synergies. It can be seen that the mean spatial centroids have a pairwise dot product ranging from 0.65 to 1, indicating that some muscle groups are shared between several patterns. Similarly, **Figure 9** shows the correlation matrix between each temporal component, expressing the temporal relation that links each spatial synergy to the others. In this case, results show high variability, and indicate that some mean temporal components are very closely related to others (e.g., temporal components 5 and 6), while others are very different (e.g., temporal components 1 and 6). These results are critically analyzed in the following paragraphs.

## DISCUSSION

### On the Extracted Synergies

An interesting result of this study is that, with the used method for synergy extraction, a number of modules ranging from 1 to 3 is sufficient for reconstructing the majority of the original sEMG in each grasp. As a consequence, a limited number of patterns is needed to achieve a grasp, which is a relevant

result considering the availability of high redundancy at the muscle and kinematic level. This is seen in some patterns that are often repeated and especially in the co-activating group composed of f1-f8-finger flexors that are very often grouped together, especially in the hold phase. In most of the cases, two activation patterns are recognizable: a strong co-activation, often (but not always) corresponding to the grasp/hold phase, and two minor co-activating patterns in the pre-shaping and release phases that are often grouped in a single synergy. This result is particularly interesting considering that only two electrodes were not positioned on the forearm (respectively, biceps and triceps) and comparing the results with the results obtained by d'Avella et al. (2006) and Liarokapis et al. (2013). In these studies, the biceps is activated during the reaching phase in confirming that it is indeed an active reaching component, being active in the pre-shaping and release phase. This result suggests that the pre-shaping and release synergies may represent hand opening, before (pre-shaping) and after grasping. The number of phases seems to be in accordance with those proposed by Liarokapis et al. (2013) and seems to reproduce on the hand part of the results obtained in previous studies in terms of time varying muscular synergies for shoulder and arm. Furthermore, it should be remarked that the movements considered in this study were not performed against gravity, reducing consistently the involvement of shoulder muscles.

While not extensively discussed in this paper, the remarkable repeatability of the temporal components might be a further motor-control feature aimed at simplifying muscle coordination, as a strategy exploited by the CNS to perform hand grasps. These results are in accordance with the previous findings in the literature, that showed that, in respect to the original dimensionality of the control, the number of modules underlying grasps is probably strongly reduced (Santello et al., 1998; Overduin et al., 2008; Jarrassé et al., 2014).

### Cluster Analysis and Control of Precision

On a comprehensive dataset of 20 grasp types, performed by 28 healthy subjects, 10 spatial motor modules, properly elicited in time, are enough to describe the whole dataset with good accuracy, generalizing through subjects. Such results are coherent with the notion that the central nervous system may embed a modular structure that relies on a limited number of predefined co-activation patterns to produce motor outcome at the hand level. These findings are in accordance with previous results that demonstrated that a small subset of synergies can generalize across tasks and suggest that they represent basic building blocks underlying natural human hand motions (Thakur et al., 2008).

The main spatial synergies were not directly linkable to specific grasp types or motor functions, suggesting that the spatial modules that can be employed for the execution of different grasp types. Furthermore, each spatial module can be elicited at different stages. Together with previous findings, these results suggest that grasp types and muscle synergies may not be univocally related: some muscle patterns may be used for different grasp types or, vice versa, the same grasp might be controlled with slightly different muscle synergies depending on the subject.

polar plot (A) and with histograms (B). Temporal components are depicted in (C), and mean temporal components are shown in light gray.

These results also reflect some intrinsic features of the human grasping related to proximal forearm and hand muscles control. This study suggests that a large variety of hand grasp types can be performed with a limited number of patterns. However, it should be considered that the proposed protocol was meant for applications related to control of prosthesis for trans-radial amputees, measuring the activity of proximal muscles. Coherently, previous studies sharing proximal muscle based protocols showed that a few basic patterns are responsible for a variety of grasp types (Castellini and van der Smagt, 2013). On the contrary, considering a fine recording of the muscles of the hand more differentiated patterns may be observed, even if due to the difficulty of recording muscle activity directly on the hand the motor primitives related to the hand are computed and analyzed in a kinematic domain (Jarrassé et al., 2014; Prevete et al., 2018).

### Temporal Components

The analysis of temporal components underlines that spatial patterns may be recruited at different stages of a grasp, with variability related both to the subject who executes the grasp and the type of grasp. This result is confirmed by the high correlation of the temporal components of many clusters. However, mean temporal components suggest that some patterns are more often used during the grasp phase with a monophasic, bell-shaped activation profile, while other patterns are biphasic and usually activate when the hand opens, so in the approaching/pre-shaping phase and in the release phase rather than in the middle of the grasp. Such findings can be taken into account for several applications related to high level robotic hand and prosthesis control, as described in section Impact of the Muscle Synergy Dataset.

### Impact of the Muscle Synergy Dataset

A limited number of motor modules (e.g., 10), properly elicited in time can approximate the entire dataset for all subjects with high accuracy (5% error in respect to approximating the dataset with its mean, in the case of 10 motor modules). Ideally, each movement considered in the experiment can potentially be reproduced as a combination of spatial synergies, thus providing prostheses with higher dexterity (a higher number of movements that can be controlled) starting from a set of a few robustly controlled modules. Hand muscle synergies may be applied to high level control approaches, consisting of training subjects to reproduce and modulate the sEMG patterns that correspond to the muscular hand synergies (or combinations of them) and apply pattern recognition algorithms to recognize the results. This strategy may be an alternative to the control systems currently described in literature. As said in the introduction, robotic hands that reproduce hand movements by modulating the main postural hand synergies have already been presented in literature (Matrone et al., 2010, 2012; Segil and Weir, 2013). However, high level control systems have not been extensively studied. Developing high level control systems based on time dependent muscle-hand synergies and training subjects to perform them may link the subjects' intentions with the movement of a robotic hand naturally, by exploiting the same synergies. Such result may lead to natural myoelectric control of robotic hands, a challenge currently not yet achieved in literature. If replicated on hand amputees, this result can potentially have applications in rehabilitation and assistive robotics in order to improve the control of dexterous prosthetic hands,

by joining robotics and neuroscience findings (Santello et al., 2016).

Usually, in machine learning the training data (used to train a model) and the test data (used to test it) are taken from the same distribution. However, this is not always easy, in particular when using deep learning approaches that require large amounts of data for training. To overcome distribution mismatches, transfer learning and domain adaptation approaches have been used in several domains, including computer vision (Saenko et al., 2010; Tommasi et al., 2010), and natural language processing (Ben-David et al., 2010; Daumé et al., 2010). In myoelectric control, several studies explored the use of previous models from different subjects to reduce the amount of required training data (Farina et al., 2002; Tommasi et al., 2013; Patricia et al., 2014), but performance increase was not confirmed after proper model optimization (Gregori et al., 2017). The fact that the motor modules are common to the subjects can provide physiological foundations to include within the prosthesis a subject-independent motor memory. Prosthesis control could then be produced as "plug and play," improve control robustness for a specific subject through successive calibration, and improve its adaptability to other subjects too. In this context, properly choosing the motor modules and the movements to be reproduced (in order to maximize dexterity, robustness and correspondence to ADLs) is potentially interesting to improve the rehabilitation capabilities of hand prostheses. However, it is an open question in the field of how exactly extracted synergies are mapped into motor functions: previous studies employing clustering procedures or synergy combination theories (Prevete et al., 2018) showed that the mapping between "physical space" of the end effector and the extracted muscle synergies may be due to different exploitation mechanisms.

In this study, it is proposed that a linear combination of centroids, properly activated by their temporal components, can be enough to reconstruct the physical space of the end effector in a large variety of grasp types with high accuracy. However, the authors are aware that the noticeable reduction of the original dataset implies that the original sEMGs are reconstructed with a pre-determined level of precision. The proper tradeoff between accuracy and synthesis needs to be tested in future work where the reduced dataset is integrated into a real control system.

Despite the potential provided by the muscle synergy analysis, several limitations and issues related to the method should be considered. Recent studies reported that pre-processing, including filtering and normalization techniques, might lead to different results and interpretation of the data (Shuman et al., 2017; Kieliba et al., 2018). While it is commonly accepted to normalize the duration of the tasks to a common phase axis, as it was done in this study, uniform guidelines for EMG pre-processing for synergies extraction are still missing in the literature. Consequently, pre-processing could be a source of data misinterpretation. Furthermore, the insurgence of fatigue was not inspected in this study, while it was demonstrated in the literature that fatigue may influence the recruitment of synergies, even if their spatial composition is preserved (Ortega-Auriol et al., 2018). As described in the section Introduction, several factors may have an effect on sEMG signal and make synergies tough to be generalized. Those may include fatigue, despite the acquisition protocol was carefully designed to induce low fatigue on subjects, even in the case of patients (Atzori et al., 2014a), and future developments should also consider these variables for a complete assessment.

Lastly, the model of human grasps described in this paper can potentially provide insights for calibrated interventions of rehabilitation robotics. Several implications can be found considering neurological or orthopedic rehabilitation of the hand (Bissolotti et al., 2016; Vanoglio et al., 2017). In recent studies, the exploitation of devices for hand rehabilitation has shown to lead to promising, therapeutic results that can be further enhanced by training muscle synergy-oriented exercises, based on a detailed knowledge of motor synergies (Scano et al., 2018).

### CONCLUSION

In this paper, muscle synergies were extracted from the recordings of a publicly available dataset. The extracted synergies were clustered from a cohort of 28 subjects executing a variety of hand grasps. The synergies are often characterized by two temporal activation patterns: a strong co-activation corresponding to the grasp/hold phase, and two minor coactivating patterns related to hand opening (visible in the preshaping and release phase). The conclusions of this article suggest that a limited number of time-dependent motor modules,

### REFERENCES


correctly elicited by a control activation signal, may underlie the execution of a large variety of hand grasps. However, spatial synergies are not strongly related to a specific motor functions but have a sparse recruiting timing.

### AUTHOR CONTRIBUTIONS

AS designed the experiment, wrote the software for synergy extraction and clustering, elaborated the data, and wrote the paper. AC participated to data analysis and interpretation, and wrote the paper. LM participated to data analysis and wrote the paper. HM acquired the NinaPro database and wrote the paper. MA acquired the NinaPro database, participated in the design of the study and to data interpretation, and wrote the paper.

### FUNDING

This work was funded by the National Research Council of Italy, within the research project: muscle Synergies Mapping of Upper Limb Reaching and Hand Grasps aimed at Human-Robot Interaction description in Rehabilitation and Industrial Applications (SyRIA).

synergies during natural motor behaviors. J. Neurosci. 25, 6419–6434. doi: 10.1523/JNEUROSCI.4904-04.2005


Poststroke patients in reaching movements. Front. Bioeng. Biotechnol. 5:62. doi: 10.3389/fbioe.2017.00062


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2018 Scano, Chiavenna, Molinari Tosatti, Müller and Atzori. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# A Decoding Scheme for Incomplete Motor Imagery EEG With Deep Belief Network

Yaqi Chu1,2,3, Xingang Zhao1,2 \*, Yijun Zou1,2,3, Weiliang Xu1,4, Jianda Han1,2 and Yiwen Zhao1,2

*<sup>1</sup> State Key Laboratory of Robotics, Shenyang Institute of Automation, Chinese Academy of Sciences, Shenyang, China, 2 Institutes for Robotics and Intelligent Manufacturing, Chinese Academy of Sciences, Shenyang, China, <sup>3</sup> University of Chinese Academy of Sciences, Beijing, China, <sup>4</sup> Department of Mechanical Engineering, University of Auckland, Auckland, New Zealand*

High accuracy decoding of electroencephalogram (EEG) signal is still a major challenge that can hardly be solved in the design of an effective motor imagery-based brain-computer interface (BCI), especially when the signal contains various extreme artifacts and outliers arose from data loss. The conventional process to avoid such cases is to directly reject the entire severely contaminated EEG segments, which leads to a drawback that the BCI has no decoding results during that certain period. In this study, a novel decoding scheme based on the combination of Lomb-Scargle periodogram (LSP) and deep belief network (DBN) was proposed to recognize the incomplete motor imagery EEG. Particularly, instead of discarding the entire segment, two forms of data removal were adopted to eliminate the EEG portions with extreme artifacts and data loss. The LSP was utilized to steadily extract the power spectral density (PSD) features from the incomplete EEG constructed by the remaining portions. A DBN structure based on the restricted Boltzmann machine (RBM) was exploited and optimized to perform the classification task. Various comparative experiments were conducted and evaluated on simulated signal and real incomplete motor imagery EEG, including the comparison of three PSD extraction methods (fast Fourier transform, Welch and LSP) and two classifiers (DBN and support vector machine, SVM). The results demonstrate that the LSP can estimate relative robust PSD features and the proposed scheme can significantly improve the decoding performance for the incomplete motor imagery EEG. This scheme can provide an alternative decoding solution for the motor imagery EEG contaminated by extreme artifacts and data loss. It can be beneficial to promote the stability, smoothness and maintain consecutive outputs without interruption for a BCI system that is suitable for the online and long-term application.

Keywords: brain-computer interface, decoding scheme, incomplete motor imagery EEG, power spectral density, deep belief network

### INTRODUCTION

The emergent brain-computer interface (BCI) technology allows individuals with severe neuromuscular related locomotive disabilities to directly use their brain to operate or communicate with external peripherals and environments (Daly and Wolpaw, 2008; McFarland and Wolpaw, 2011). Namely, the BCI system provides an alternative interface bridge which can bypass the

#### Edited by:

*Tetsunari Inamura, National Institute of Informatics, Japan*

#### Reviewed by:

*Jianjun Meng, Carnegie Mellon University, United States Xiaogang Chen, Institute of Biomedical Engineering (CAMS), China*

> \*Correspondence: *Xingang Zhao zhaoxingang@sia.cn*

#### Specialty section:

*This article was submitted to Neural Technology, a section of the journal Frontiers in Neuroscience*

Received: *08 January 2018* Accepted: *10 September 2018* Published: *28 September 2018*

#### Citation:

*Chu Y, Zhao X, Zou Y, Xu W, Han J and Zhao Y (2018) A Decoding Scheme for Incomplete Motor Imagery EEG With Deep Belief Network. Front. Neurosci. 12:680. doi: 10.3389/fnins.2018.00680*

conventional motor neural pathways and map brain intentions to relative control commands (Ortiz-Rosario and Adeli, 2013). Brain activity can be characterized by various signal modalities, such as invasive ElectroCorticoGraphy (ECoG) (Miller et al., 2010; Hiremath et al., 2015), non-invasive electroencephalogram (EEG) (Lazarou et al., 2018), the functional Magnetic Resonance Imaging (fMRI) (Cohen et al., 2014), and the functional Near-Infrared Spectroscopy (fNIRS) (Naseer and Hong, 2015). Due to its manageability, easy capture, high time resolution and relative cost effectiveness, the EEG signal has been widely adopted for substantial BCI applications, such as remote quadcopter control (Lin and Jiang, 2015), motion rehabilitation (Xu et al., 2011; Zhao et al., 2016), biometric authentication (Palaniappan, 2008), and emotions prediction (Padilla-Buritica et al., 2016). Currently, the electrophysiological brain patterns used in EEGbased BCI systems are mainly Steady-State Visual Evoked Potentials (SSVEPs) (Chen et al., 2015; Zhang et al., 2015; Zhao et al., 2016; Nakanishi et al., 2018), P300 (Cavrini et al., 2016), sensorimotor rhythms (SMRs) (Yuan and He, 2014; He et al., 2015), and motion-related cortical potential (MRCP, one kind of a slow cortical potential) (Karimi et al., 2017). Compared to other patterns, the SMRs-based BCI is more flexible and suitable for practical applications due to the spontaneous EEG signals, which are generated by individuals voluntarily without any external stimuli.

The SMRs are derived from the motor imagery EEG, which evoked by mentally imaging the movements of limbs without actual actions (Yuan and He, 2014). The underlying neurophysiological phenomena are event-related synchronization (ERS) and event-related desynchronization (ERD) in the SMRs, which are induced simultaneously by an exogenous event. The variability of ERS/ERD intensity or power in particular frequency bands can be utilized to distinguish the different motor imagery EEG signals (Pfurtscheller et al., 2006; Koo et al., 2015). Some remarkable SMRs-based BCI systems for motor imagery classification have been created and applied in wheelchair control (Li et al., 2013), objects control in 2D (Ma et al., 2017) or 3D space (LaFleur et al., 2013), and robotic arm control (Xu et al., 2011; Meng et al., 2016). However, there are still various challenges faced in the establishment of efficient SMRs-based BCI systems, such as fewer recognizable motor types or states, apparently lower recognition rate, and longer training time (Yuan and He, 2014; He et al., 2015). In addition, due to the volume conduction effect of scalp and skull, the EEG is a non-stationary and non-linear dynamic signal with low signal-to-noise ratio and vulnerable to be interfered or submerged by complex background artifacts, which makes it really challenging to accurately decode various motor imagery tasks (Blankertz et al., 2011). Consequently, the crucial issue that needs to be solved is how to improve the decoding performance of the SMRs-based BCI in the condition of various artifacts.

The artifacts affecting the quality of motor imagery EEG mainly contain electrooculography (EOG), electromyography (EMG) and electrical line interference. Traditionally, a variety of filters can be available to alleviate or even eliminate electrical line interference and some high frequency noises, like EMG (35 Hz above). In the past researches, many typical attempts have been proposed to reduce EOG, such as filter-based method (Shoker et al., 2005), independent component analysis (ICA) (Lindsen and Bhattacharya, 2010) and discrete wavelet transform (DWT) (Peng et al., 2013). However, these methods can cause the loss of some useful EEG components. And the procedures for manual parameter tuning are needed to obtain optimal performance of these methods. Moreover, they generally fail in the case of the EEG contains extreme noises. Otherwise, the EEG signals could be accidentally overwritten or lost caused by hardware or system malfunctions during recording periods. For the above cases, good decoding performance for SMRs-based BCI systems could still hardly be achieved. One intuitive and helpless solution to avoid such extreme artifacts and data loss is usually to reject the entire severely disturbed EEG segments. Consequently, this raises some defects including no decoding results during certain period, additional EEG rejection process and increased BCI training time. Furthermore, from a practical perspective, consecutive and smooth recognition of SMRs-based BCI systems is extremely necessary for the online and long-term application. This requires that the BCI system can continuously decode brain signals without any interruption. If entire EEG segments are discarded due to extreme artifacts or data loss, the BCI system cannot obtain the decoding results during the corresponding time slice. Hence, it is very important to decode incomplete motor imagery EEG for SMRs-based BCI systems in the condition of extreme artifacts and data loss. Currently, only few studies have been conducted to solve the decoding performance from the incomplete EEG signals. Zhang et al. applied a Bayesian tensor factorization based method to find the underlying low-rank EEG tensor from incomplete EEG signals and improve the decoding accuracy with robustness after artifacts and outliers removal (Zhang et al., 2016). Cui et al. used a fully Bayesian CP factorization for incomplete tensors method to analyze and classify incomplete EEG signals with different data missing ratios (Cui et al., 2016). However, such decoding methods for incomplete EEG need complicated matrix and tensor computations, which are not efficient for an online BCI application. Moreover, the classification accuracies obtained by these methods need further improvement.

In this paper, to improve the decoding performance for incomplete motor imagery EEG and satisfying the needs of smooth operation for the BCI system, a novel decoding scheme composed of Lomb-Scargle periodogram (LSP) for feature extraction and deep belief network (DBN) for classification was proposed. Instead of rejecting the entire EEG segment, the portions that affected by extreme artifacts or data loss were directly removed and the remaining portions were used to construct the incomplete motor imagery EEG signals in this study. Generally, the most robust and representative feature for the contents of different motor imageries is spectral power in particular bands of ERS/ERD (Pfurtscheller et al., 2006). The conventional fast Fourier transform (FFT) or Welch periodogram can be available to estimate the spectral power features for the intact motor imagery EEG. Nevertheless, these spectral analysis methods cannot work well for the non-uniformly sampled signals (Stoica et al., 2009), such as incomplete motor imagery EEG signals. The LSP method can handle signals that have been sampled non-uniformly or have missing data points (Stoica et al., 2009; Stankovic et al., 2014) and is suitable for processing incomplete signals. Hence, the LSP method was adopted to extract major spectral power features from the incomplete motor imagery EEG signals in this study. A DBN structure based on the restricted Boltzmann machines (RBM) was exploited and optimized to learn different motor imagery EEG classes. The proposed scheme may offer the following advantages: (a) It can provide comparable decoding performance for the incomplete motor imagery EEG with different proportion of data removal; (b) The extracted spectral power features are more robust for the representation of the incomplete motor imagery EEG; (c) It is applicable to consecutive and smooth operation without any disruption for the online BCI system.

The remaining parts of this paper are organized as follows. The overall systematic framework of decoding scheme for incomplete motor imagery EEG is introduced in section Overall Decoding Scheme Framework. Accordingly, section EEG Processing Pipeline describes the EEG signal processing pipeline in detail, including artifacts and data loss preprocessing, spectral features extraction and DBN classifier construction. The motor imagery experiments and datasets are presented in section Motor Imagery Experimental Paradigm and Datasets. Some experimental comparison results and discussions are given in section Experimental Results and Discussions. Finally, section Conclusions and Future Works gives the conclusions and ideas for future works.

### OVERALL DECODING SCHEME FRAMEWORK

The objective of our study is to address the issue of improvement of the recognition accuracy and stability associated with different motor imagery tasks for the incomplete EEG signals. The schematic diagram of the overall decoding system is illustrated in **Figure 1**, which primarily synergizes three procedures: preprocessing for raw EEG, spectral power feature extraction, and motor imagery recognition. Definitely, the raw EEG signals were captured by the means of non-invasive wet electrodes arranged on the brain scalp when individuals perform diverse motor imagery tasks, such as imagining limbs movements. The preprocess procedure was devoted to constructing incomplete motor imagery EEG datasets, which covered band-pass filtering, sliding windows segmentation, and data loss or noise removal.

The deep belief network was composed of three layers of pre-trained stacking RBMs along with an output layer of softmax regression. The spectral power features within specific frequency bands extracted through Lomb-Scargle periodogram were normalized to pre-train each layer of the RBMs and fine-tune the weights of the DBN. Stochastic binary units were utilized in the pre-training stage to initialize the deep neural network. Deterministic real-valued probabilities were also implemented to adjust the connection weights of each layer by error backpropagation algorithm. After a fine-tuning stage, the trained DBN was employed to decode the corresponding classes of motor imagery from incomplete EEG, such as movement intention of left hand, right hand, or foot. The structure of each layer in the DBN was optimized and determined by various group experiments. Moreover, simulated and extensive experiments for multi-subjects, different feature extraction methods (FFT or Welch) and classifiers (supervised Support Vector Machines, SVMs) were conducted to verify the viability and effectiveness of the proposed decoding scheme for incomplete motor imagery EEG signals.

### EEG PROCESSING PIPELINE

### Preprocessing

In order to exclude the unwanted components of the interested EEG segments, the preprocessing procedure was designed to transform the intact EEG with complex artifacts or data loss into incomplete EEG segments. Essentially, the preprocessing pipeline consists of three sub-parts: (a) signal filtering, (b) sliding windows segmentation, and (c) artifacts or data loss removal. More explicitly, the signal filtering was dedicated to alleviating the background noises arose from experimental, instrumental, and electrical or physiological sources. The sliding windows were mainly responsible to segment the expected motor imagery fragments from the continuous EEG signals. For the motor imagery EEG segments, the portions with extreme artifacts or data loss were directly discarded and the remaining portions were utilized to form incomplete signals.

#### Signal Filtering

Because of the fact that EEG signals contain useful information below 100 Hz, noise elements above this frequency may be directly excluded through low-pass filters. For motor imagery EEG, the phenomenon of ERS/ERD obviously appears in the frequency range of mu (8–12 Hz) and beta (18–26 Hz) rhythm band (Pfurtscheller et al., 2006). In other words, the frequency band of 8–30 Hz possesses the most discriminative information associated with different motor imagery tasks. In this study, a fifth-order Butterworth band-pass filter with gain 1.5, cutoff frequencies [8, 35] Hz was applied to attenuate the frequency component of specific noises while amplifying interested frequency band for motor imagery classification. After signal filtering, a large part of noise can be removed, such as EMG (high frequency noise, higher than 35 Hz), low frequency component of EOG (lower than 8 Hz) and electrical line interference (50 or 60 Hz). In addition, the baseline drift caused by head or limb motions can also be alleviated to reduce the impact on the raw EEG signals.

#### Sliding Windows Segmentation

For a continuous recorded EEG signal, we just only focus on the motor imagery segments. Then, the band-filtered and continuous EEG signals were segmented by a time window, which corresponding to a trial of motor imagery task. Moreover, a trial of motor imagery task needs repeatedly imagine limb movements for a certain time to generate stable and effective brain activity. In existing motor imagery EEG studies, the features can be extracted either by using the whole EEG trial or by dividing the trial into a number of overlapping/nonoverlapping time segments (Asensio-Cubero et al., 2011, 2013; AYDEMIR, 2016). To improve the temporal resolution of EEG and obtain better performance of the classifier, a sliding window was commonly adopted to split the targeted motor imagery trial into overlapped segmentations which can be used for multiple classifications by a voting strategy (Herman et al., 2008; Shahid and Prasad, 2011; Choi, 2012). In this study, instead of using the whole data length of EEG trial, a four-second EEG trial was divided into 16 segments of 1 s length with 0.2 s step size by the 1 s sliding window with 80 % overlap.

### Artifacts or Data Loss Removal

Even if the filter processing is done, some artifacts may still exist in the EEG segments. Furthermore, the residual elements stem from artifacts may overlap the effective frequency band correlated with motor imagery EEG. For instance, the EOG artifacts resulted from eye blinks are usually presented in the frequency band of 0–10 Hz. The high frequency elements of the EOG overlapping with ERS/ERD bands cannot be readily excluded by band-pass filters. On the other hand, the filters are in general ineffective in the case of the signal with data loss. Instead of rejecting the entire motor imagery EEG segments, an additional preprocessing implementation was proposed to address artifacts and data loss. For the case of the EEG segment contaminated by extreme artifacts, the entire EEG segment was divided into data chunks with different widths. The width which represents the number of data points in each data chunk can be generated according to a normal distribution with a mean of 10 and a standard deviation of 2. A form of data chunk removal was applied to directly discard data chunks which contain severe artifacts. In addition, for the case of data loss within the EEG segment, a form of data point removal was employed to eliminate acquisition outliers. For the two forms of data removal, the EEG portions contaminated by extreme artifacts or data loss within an EEG segment were directly discarded by a proportion from 10% to 80% in this study. For example, for the case of 10% data chunk removal, 10% data chunks in a 1 s EEG segment were randomly discarded. For the case of 10% data point removal, 10% data points (100 points in this study) in a 1s EEG segment (1,000 points) were randomly discarded. Subsequently, the remaining EEG data chunks or data points were combined to construct the incomplete motor imagery EEG segments.

### Feature Extraction Based on Lomb-Scargle Periodogram

The crucial step in a BCI system is feature extraction, which is used to find mental task-related information and most discriminative representations from the brain activities for subsequent classification. The quality of extracted features highly affects the performance of the following recognition process. For motor imagery EEG signals, we concentrated on the spectral analysis during certain frequency bands. The non-parametric fast Fourier transform (FFT) and Welch periodogram methods have been confirmed to effectively estimate the spectral power features for the intact motor imagery EEG, such as power spectral density (PSD) (Herman et al., 2008; Djemal et al., 2016). However, due to the incomplete motor imagery EEG signals belong to a kind of non-uniformly sampled sequence, these methods may not extract stable spectral features. In our research, the Lomb-Scargle periodogram was adopted to estimate the spectral power features for incomplete motor imagery EEG segments. An incomplete EEG segment is denoted by X ∈ R <sup>C</sup>×N, where C is the number of channels and N is the length of signal points. For each channel, the signal series were denoted by eeg(ti), where i = 1, 2, ..., N.

#### Lomb-Scargle Periodogram

For signal series eeg(ti), the spectral power at frequency ω<sup>f</sup> should be estimated by solving the following fitting problem of sum of squared differences:

$$\min\_{\substack{\alpha \ge 0\\ \phi \in \{0, 2\pi\}}} \sum\_{i=1}^{N} \left[ \text{eeg}(t\_i) - \alpha \cos(\omega\_f t\_i + \phi) \right]^2. \tag{1}$$

For simplicity, the dependence of α and φ about ω<sup>f</sup> was replaced by using

$$a = \alpha \cos(\phi) \text{ and } b = -\alpha \sin(\phi). \tag{2}$$

The fitting problem can be reformatted by the term of a and b:

$$\min\_{a,b} \sum\_{i=1}^{N} \left[ \text{egg}(t\_i) - a \cos(\alpha\_f t\_i) - b \sin(\alpha\_f t\_i) \right]^2. \tag{3}$$

The optimal parameters in the minimizing Equation (3) can be obtained by solving

$$
\begin{bmatrix}
\hat{a} \\
\hat{b}
\end{bmatrix} = \mathcal{R}^{-1} r
\tag{4}
$$

where

$$\mathcal{R} = \sum\_{i=1}^{N} \begin{bmatrix} \cos(\omega\_{\hat{f}} t\_i) \\ \sin(\omega\_{\hat{f}} t\_i) \end{bmatrix} \begin{bmatrix} \cos(\omega\_{\hat{f}} t\_i) & \sin(\omega\_{\hat{f}} t\_i) \end{bmatrix} \tag{5}$$

and

$$r = \sum\_{i=1}^{N} \begin{bmatrix} \cos(\omega\_f t\_i) \\ \sin(\omega\_f t\_i) \end{bmatrix} \text{eg}(t\_i). \tag{6}$$

The power at specific frequency ω<sup>f</sup> corresponding to optimal parameters <sup>a</sup><sup>ˆ</sup> and <sup>ˆ</sup>b, is given as follows:

$$\begin{split} &\frac{1}{N} \sum\_{i=1}^{N} \left( [\hat{a} \,\hat{b}] \begin{bmatrix} \cos(\omega\_{f} t\_{i}) \\ \sin(\omega\_{f} t\_{i}) \end{bmatrix} \right)^{2} \\ &= \frac{1}{N} [\hat{a} \,\hat{b}] \mathbf{R} \begin{bmatrix} \hat{a} \\ \hat{b} \end{bmatrix} \\ &= \frac{1}{N} \mathbf{r}^{T} \mathbf{R}^{-1} \mathbf{r}. \end{split} \tag{7}$$

Accordingly, the powers for each channel signal at all frequency ω can be obtained by

$$P(\omega) = \frac{1}{N} r(\omega)^T \mathbf{R}(\omega)^{-1} r(\omega). \tag{8}$$

Similarly, the estimation step was repeatedly executed for all channels of the incomplete motor imagery EEG segments to extract the corresponding spectral features. Previous researches demonstrated that significant power oscillations in response to various motor imagery tasks mostly located in 8–30 Hz bands (Pfurtscheller et al., 2006; Shahid and Prasad, 2011). In this article, the concerned band was divided into four sub-bands with a bandwidth of 5 Hz, including alpha (8–13 Hz), sigma (13–18 Hz), low beta (18–23 Hz), and high beta (23–28 Hz) rhythms. For each channel, the PSD features of each sub-band were computed by averaging powers within the frequency range. Hence, all PSD features for EEG segments were concatenated by channel arrangement into a feature vector:

$$V = [p\_{11}, p\_{12}, p\_{13}, p\_{14}, p\_{21}, p\_{22}, p\_{23}, p\_{24}, \dots, p\_{C1}, p\_{C2}, p\_{C3}, p\_{C4}] \tag{9}$$

where C is the number of channels.

#### Feature Normalization

Generally, the original features can be directly fed into a neural network or an SVM classifier to recognize which motor imagery class the current EEG signal belongs to. However, the spectral feature variations caused by various channels or different motor imagery trials may affect the performance of classifiers. To eliminate the variation factor of feature scale and accelerate the convergence of learning algorithm, a min-max normalization step was utilized in feature vector set V. Refer to (10), the raw features were divided by the difference of maximum and minimum to scale all the values between 0 and 1.

$$F(m)\_{norm} = \frac{V(m) - \nu\_{min}(m)}{\nu\_{max}(m) - \nu\_{min}(m)} \tag{10}$$

where, vmax(m) = max{V(m)}, vmin(m) = min{V(m)}, m ∈ R 4×C .

### Deep Belief Network Based on Restricted Boltzmann Machines

Considering the advantages of high-speed and parallel computation, a neural network classifier is more suitable and efficient for the online BCI application and the trained parameters can be directly used to distinguish new EEG signals. Currently, a variety of deep learning architectures based on neural networks have been constructed and applied in motor imagery EEG classification (Yang et al., 2015; Kumar et al., 2016; Tabar and Halici, 2016). In this study, we adopted a deep belief network (DBN) structure to obtain more robust and ultimately more notable representation for the incomplete motor imagery EEG. The DBN structure can be formed by multiple layers of stacked restricted Boltzmann machines (RBMs) or auto-encoders.

#### Restricted Boltzmann Machine (RBM)

Each RBM is composed of a visible layer, a hidden layer, and connection weights between two layers, which is greedily trained in an unsupervised mode (Hinton et al., 2006; Tang et al., 2015). The basic structure of RBM is presented in **Figure 2**. The neurons used in the RBM are stochastic binary units. Traditionally, the visible layer receives the input data and have undirected connections with the neurons of the hidden layer. Meanwhile, the neurons from the same layer are disconnected. The hidden layer is responsible to reconstruct the input data as close as possible by tuning the connection weights and biases repeatedly. For motor imagery EEG, each visible neuron represents a spectral feature with hypothetically Gaussian distribution. The energy function of joint configuration for the two layers is defined as

$$E(\boldsymbol{\nu}, \boldsymbol{h}) = -\sum\_{i=1}^{m} b\_i \nu\_i - \sum\_{j=1}^{n} a\_j h\_j - \sum\_{i=1}^{m} \sum\_{j=1}^{n} \nu\_i h\_j \nu\_{ij} \tag{11}$$

where v<sup>i</sup> and h<sup>j</sup> are the binary states at the visible neuron i and hidden neuron j respectively. b<sup>i</sup> and a<sup>j</sup> are the corresponding biases of neurons, wij is the connection weight between them. Based on the Boltzmann distribution and energy function, a joint probability for pair of the visible and hidden layer is determined by

$$p(\mathbf{v}, \mathbf{h}) = \frac{1}{Z} e^{-E(\mathbf{v}, \mathbf{h})} \tag{12}$$

where Z = P v,h e <sup>−</sup>E(v,h) denotes the partition function or normalization term.

Considering that the hidden neurons are conditional independent due to no connections between them, given visible

vector **v**, the conditional probability of neuron h<sup>j</sup> being 1 can be obtained as follows:

$$\rho(h\_j = 1 | \mathbf{v}) = \sigma(a\_j + \sum\_i \nu\_i w\_{ij}) \tag{13}$$

Similarly, given hidden vector h, the conditional probability of the visible neuron v<sup>i</sup> being 1 can be determined by

$$\rho(\nu\_i = 1 | \mathbf{h}) = \sigma(b\_i + \sum\_j h\_j \nu\_{ij}) \tag{14}$$

where σ(•) denotes the logistic sigmoid function.

Given the training dataset S = {s 1 ,s 2 , ...,s <sup>n</sup>s}, n<sup>s</sup> is the number of training samples, the parameters of RBM are trained to fit the training samples by maximizing a log-likelihood function, including connection weights **w**, biases **a** and **b**.

$$L\_{\mathbb{S}} = \sum\_{i=1}^{n\_{\mathbb{S}}} \log p(\mathbf{v}, \mathbf{h}) \tag{15}$$

Based on gradient ascent and contrastive divergence methods (Hinton et al., 2006), the derivative of the log-likelihood with respect to weights **w** can be formulized by

$$\frac{\partial \log p(\mathbf{v}, h)}{\partial w\_{ij}} = E\_{data} \left[ \frac{\partial E(\mathbf{v}, \mathbf{h})}{\partial w\_{ij}} \right] - E\_{model} \left[ \frac{\partial E(\mathbf{v}, \mathbf{h})}{\partial w\_{ij}} \right] \tag{16}$$

where **E**data [•] and **E**model [•] are respectively the expectation under the distribution of the training dataset and the model. Furtherly, the gradient can be rewritten by

$$\frac{\partial \log p(\mathbf{v}, \mathbf{h})}{\partial w\_{ij}} = E\_{data} \left[ \nu\_i h\_j \right] - E\_{model} \left[ \nu\_i h\_j \right] \tag{17}$$

The contrastive divergence method can be used to approximately estimate the expectation **E**data - vih<sup>i</sup> . The Gibbs sampling method can be adopted to calculate the expectation **E**model - vih<sup>i</sup> . Hence, the learning rule of connection weights can be obtained by

$$
\Delta \boldsymbol{\omega}\_{ij} = \eta(\boldsymbol{E}\_{data} \left[ \boldsymbol{\nu}\_i \boldsymbol{h}\_i \right] - \boldsymbol{E}\_{model} \left[ \boldsymbol{\nu}\_i \boldsymbol{h}\_i \right]) \tag{18}
$$

Similarly, the updating rules of the biases are respectively

$$
\Delta b\_i = \varepsilon(\mathbf{E}\_{data} \left[ \nu\_i \right] - \mathbf{E}\_{model} \left[ \nu\_i \right]) \tag{19}
$$

and

$$
\Delta a\_j = \varepsilon (\mathbf{E}\_{data} \begin{bmatrix} h\_j \end{bmatrix} - \mathbf{E}\_{model} \begin{bmatrix} h\_j \end{bmatrix}) \tag{20}
$$

where η and ε donate the learning rate. According to the updating rules of parameters, each RBM is trained to reconstruct the input data in an unsupervised way.

### Deep Belief Network

Three layers of RBM were superposed to construct a deep belief network with a layer of softmax regression in the study, as shown in **Figure 1**. The raw input data was fed to the bottom layer of RBM, and the output of the hidden layer from the lower RBM was delivered to the visible layer from the higher RBM. Compared to logistic regression, the softmax regression was used to solve multiclass recognition problems by statistically estimating the maximum probability of the class that a sample belongs to (Salakhutdinov and Hinton, 2012). The procedures of the DBN primarily consisted of pre-training stage and finetuning stage. The pre-training stage was conducted in each layer of RBM to obtain initial parameters of the DBN. The softmax regression was added to obtain prediction error to optimize the parameters by backpropagation algorithm in the fine-tuning stage. Additionally, some constraint terms were incorporated into the cost function of softmax regression to avoid overfitting, including weight decay and sparsity constraint (Cho, 2013; Plis et al., 2014; Jiang et al., 2016). In our research, the weight decay was set to 0.05 and the sparsity constraint was set to 0.1. The learning rates for connection weights and biases were set to 0.5 and 0.25 respectively. All these parameters were determined and optimized by a grid search procedure with 5-fold crossvalidation.

### MOTOR IMAGERY EXPERIMENTAL PARADIGM AND DATASETS

In our study, nine right-handed volunteers (all males, mean age 26.5 years, ranging from 25 to 28 years, numbered S01-S09) with thin hair participated in the motor imagery experiments. All subjects were healthy, without any history of neurological, psychiatric or cognitive disorders. Specifically, none of them has any prior experience of the BCI experiment related to motor imagery. Moreover, details of motor imagery experimental procedures were explained to all participants and written informed consents were signed for all subjects before the experiment. The experimental protocol was reviewed and approved by the local ethics committee of the University of Chinese Academy of Sciences.

In an electromagnetic shielding environment, the participants were seated in a comfortable chair with armrests and watched an LCD screen from a distance of about 1 m, while wearing an EEG recording cap. Three kinds of motor imagery tasks were performed including imagining left hand, right hand and foot movements. Before the experiment, the instructor explained the meaning of kinesthetic imagery of the limb movements to the participants. Additionally, all participants performed motor imagery practice to get familiar with the kinesthetic sensation. Each participant carried out an experimental block consisted of 10 sessions, which lasted ∼1.5 h. All sessions were executed in the same condition and a rest period with several minutes was given between two consecutive sessions. The experiment paradigm of each session was devised in **Figure 3**. For all sessions, the first 2 s was an idle state with a black screen. Subsequently, a fixation green cross was emerged at the center of the screen

with a duration of 1 s to indicate the beginning of one trial. Immediately, a red arrow pointing to the left, right or down appeared with a duration of 5 s in addition to the fixation cross. In this specific period, the subjects were instructed to respectively perform the relevant motor imagery tasks according to the direction of the arrow, such as imagining repeated finger flexion and extension with the left or right hand at approximate 1 Hz frequency. Meanwhile, the subject must pay attention to imagine the kinesthetic experience of limb movements as much as possible. In addition, to minimize the artifacts, the participants were asked to limit their head movements and try not to blink or swallow during the motor imagery period. During the intertrial interval, the arrow cue and fixation cross were disappeared with the remaining of a black screen for 2 s, and the subject was instructed to perform idle state instead of motor imagery. To avoid the adaptability of brain activity for a given motor imagery task, each of the 3 cues was presented 10 times by a random sequence in each session. Hence, there are 30 trials for a session. For each subject, there are total 300 trials of motor imagery tasks in an experiment.

During the motor imagery tasks, EEG signals were collected through a grid cap with 64 Ag/AgCI passive electrodes provided by Plexon Inc., USA. The multiple electrodes with roughly 3 cm separation distance were closely arranged on the cap according to the international 10–20 positioning system. Extra conductive glues or gels were injected into each electrode for higher conductivity and better attachment. The left mastoid electrode was used as the reference channel and the right mastoid electrode served as the ground. The original EEG data were recorded with a sampling rate of 1 kHz by OmmiPlex Neural Data Acquisition System (Plexon Inc., USA), including analog pre-amplification, analog-to-digital conversion, and a low-pass filter with a cutoff frequency of ∼200–300 Hz. An additional notch filter with 50 Hz was applied to eliminate the power line artifacts. Finally, the recorded motor imagery EEG signals for each subject were saved in the form of times × channels × trials with 5,000 × 64 × 300.

To obtain dominant motor imagery EEG, a 4 s segment from 0.5 s after cue to 4.5 s was cut out from each trial. As mentioned in section EEG Processing Pipeline, the data was further band-pass filtered and segmented by a sliding window. Hence, the motor imagery datasets were represented by a three-dimensional array of size 1,000 × 64 × 4,800 for each subject, where 1000 was the length of time window (1 s), 4,800 was the number of motor imagery segments containing three class, and 64 was the number of channels. For each channel signal, there were 4 spectral power features estimated by Lomb-Scargle periodogram method. Then, the whole sample datasets with features were 4,800 × 256 for each subject, where 256 was the number of features (4 × 64 channels). The datasets were randomly divided into 75% training datasets (3,600 × 256) and 25% testing datasets (1,200 × 256).

### EXPERIMENTAL RESULTS AND DISCUSSIONS

### Simulation Comparison With Different Spectral Estimation Methods

To evaluate the effectiveness of the Lomb-Scargle method for incomplete signals, the simulated signal was devised by mixing two sinusoidal signals with a dominant frequency of 4 Hz and 8 Hz, respectively. The amplitude ratio between 4 Hz and 8 Hz sinusoidal signal was set to 0.75. For the simulated signal, data points with a certain proportion were randomly removed to construct incomplete or irregular signals. In addition, for comparison with Lomb-Scargle periodogram, traditional Welch and FFT periodogram methods were also applied to estimate spectral power for different incomplete signals.

The estimated spectral powers for the intact signal and the incomplete signal with various degrees of missing data are given in **Figure 4**. For the simulated signal, the data points were eliminated by a proportion from 10 to 80% with a step of 10%. Meanwhile, the powers were normalized to the same scale by dividing a factor, which was the proportion value of remaining data. From **Figure 4**, we can see that the spectral components at dominant frequency 4 and 8 Hz are more and more insignificant with the increase of proportional data removal for all three estimation methods. Especially, the spectral powers were obviously degraded after 30% data removed. However, the spectral powers estimated by Lomb-Scargle periodogram were more notable than those estimated by Welch or FFT method for various incomplete signals (the p-value from paired t-test was < 0.05). Indeed, the components at 4 Hz and 8 Hz were wellobtained for the incomplete signal even with 80% data removed. It demonstrated that compared to the traditional spectral analysis methods like FFT and Welch, the LSP method can estimate more stable and optimal spectral features from various incomplete or irregular signals. It proved that the LSP was particularly suited to estimate rhythm components in non-uniformly sampled signals (Stoica et al., 2009).

### Incomplete Motor Imagery EEG: Point Removal Form and Chunk Removal Form

To systematically validate the discrimination ability of the PSD features extracted by the LSP method for the incomplete EEG, two forms were adopted to randomly remove the portions from the intact motor imagery segments to construct incomplete signals. For the condition of data loss, a form of data point removal was applied to eliminate the EEG outliers, which caused by high contact impedance between electrodes and scalp. **Figure 5** presents the recognition performance of intact EEG and incomplete EEG with different proportions of data point removal for the nine subjects, obtained by the DBN classifier

FIGURE 4 | The comparison results of spectral power estimations for the complete signal and incomplete signal with different proportional removal (from 10 to 80% with a step of 10%). Three estimation methods were used: Lomb-Scargle, Welch and FFT periodogram.

with three feature extraction methods (FFT, Welch, and Lombscargle). For simplify, three methods were denoted as FFT+DBN, Welch+DBN, and Lomb-Scargle+DBN, respectively. From an overall perspective, the recognition accuracy showed a descending trend gradually along with the increasing proportion of data point removal for all three methods in **Figure 5**. For the intact motor imagery EEG, the average accuracies (±standard deviation) across the nine subjects were 72.27% (±1.33%) for FFT+DBN, 73.26% (±1.44%) for Welch+DBN, 74.77% (±0.43%) for Lomb-Scargle+DBN, respectively. There was no significant difference (p > 0.078, paired t-test) between the average accuracy of Lomb-Scargle+DBN and those of the other methods for the intact EEG across all subjects. This can be inferred that compared to the FFT and Welch method, the LSP method may not provide high-quality PSD features for the intact motor imagery EEG. Especially, for the intact EEG of subject 1 (S01), the accuracy of Welch+DBN was higher than that of Lomb-Scargle+DBN. Considering the computational complexity and the efficiency, it is not preferable to apply the Lomb-Scargle+DBN for the intact motor imagery EEG classification. However, the accuracy variation of Lomb-Scargle+DBN was obviously smaller than those of the FFT+DBN and Welch+DBN for the incomplete EEG with different point removal ratios. More specifically, for the incomplete EEG with point removal in the range from 10 to 80%, the mean difference of accuracy across the nine subjects was 13.38% (±2.67%) for FFT+DBN, 13.08% (±3.07%) for Welch+DBN, and 7.45% (±1.18%) for Lomb-Scargle+DBN, respectively. It demonstrated that the classification performance of Lomb-Scargle+DBN was significantly better compared to FFT+DBN (p = 0.012 < 0.05, paired Student's t-test) and Welch+DBN (p = 0.008 < 0.01, paired Student's t-test) for the incomplete motor imagery EEG. Implicitly, the spectral power features extracted by Lomb-Scargle periodogram can significantly improve the classification accuracy of the DBN for various degrees of incomplete EEG. An acceptable classification accuracy (above 65%) can be achieved by the

DBN with FFT, Welch and Lomb-Scargle feature extraction, respectively.

Lomb-Scargle+DBN method even when 80% of points were eliminated, while the accuracies of FFT+DBN and Welch+DBN were ∼60% or even lower. Interestingly, from **Figure 5**, we can find that the accuracies for the incomplete EEG after 30% data point removal declined sharply and substantially. Especially in the case of subject 1 (S01 EEG datasets), the accuracy obtained by FFT+DBN or Welch+DBN roughly varied from 70 to 53% for the incomplete EEG between 30 and 80% data point removal. This finding implied that the performance of spectral power features deteriorated distinctly for the methods of FFT and Welch periodogram, which was in accordance with the previous simulation comparison.

Similarly, to eliminate the effects of extreme artifacts, a form of data chunk was adopted to remove the EEG portions contaminated by tremendous electrophysiological artifacts or complex background noises. The corresponding classification results for the intact EEG and incomplete EEG with various ratios of data chunk removal are presented in **Figure 6**. Compared to the data point removal, the accuracies of the incomplete EEG dramatically and significantly decreased across different degrees of data chunk removal (p = 0.022 < 0.05, paired Student's t-test). Especially, the average accuracies for the incomplete EEG with 80% data chunk removal were 51.03% (±2.23%), 51.47% (±1.60%), and 64.17% (±0.63%), significantly lower than those for the incomplete EEG with 80% data point removal by 58.13% (±2.52%), 59.15% (±2.87%), and 66.44% (±1.13%) for FFT+DBN, Welch+DBN, and Lomb-Scargle+DBN respectively. More commonly and exactly, the mean difference of accuracy for the incomplete EEG with chunk removal in the range from 10 to 80% across the nine subjects was 20.51% (±2.39%), 19.68% (±2.21%), and 9.30% (±1.17%) for FFT+DBN, Welch+DBN, and Lomb-Scargle+DBN respectively. The statistical analysis indicated that the proposed Lomb-Scargle+DBN method for the incomplete


TABLE 1 | Statistical classification performance for the incomplete EEG with point and chunk removal.

*The maximum mean of comparative experiments were highlighted in the bold.*

EEG was constantly and significantly superior to the other two methods (p = 0.007 < 0.01 for FFT+DBN and Lomb-Scargle+DBN, p = 0.007 < 0.01 for Welch+DBN and Lomb-Scargle+DBN, paired Student's t-test). Moreover, the accuracies of the incomplete EEG in the condition of data chunk removal varied remarkably larger than those in the condition of data point removal (p < 0.05, paired t-test). It can be attributed to the fact that except for extreme artifacts, the informative signals corresponding to motor imagery tasks were also eliminated by the chunk form within the same contaminated segments. Thereby, for the incomplete EEG with data chunk removal, the extracted spectral powers of the mu/beta rhythms related to motor imagery tasks were relatively inferior to those for the incomplete EEG with data point removal.

In addition, the overall recognition performance for the incomplete EEG across various degrees of point and chunk removal are provided in **Table 1**. The results (mean ± standard deviation) were obtained by averaging accuracies for the incomplete EEG with different ratios of point and chunk removal in the range from 10 to 80%. It can be observed that the classification results of Lomb-Scargle+DBN were significantly higher than those of FFT+DBN and Welch+DBN for both incomplete EEG with point and chunk removal. The incremental performances between Lomb-Scargle+DBN and FFT+DBN were 5.48%, 6.60% for the incomplete EEG with point and chunk removal, respectively. The p-values computed by the paired Student's t-test of this comparison were all < 0.001. Likewise, the incremental performances between Lomb-Scargle+DBN and Welch+DBN were 4.67%, 6.44% for the incomplete EEG with point and chunk removal, respectively. The p-values computed by the paired Student's t-test of this comparison were also < 0.001. Furthermore, from the view of standard deviation, the Lom-Sacrgle+DBN method (2.68% for point form, 3.58% for chunk form) performed prominently lower variability than FFT+DBN (5.08% for point form, 7.70% for chunk form) and Welch+DBN (4.93% for point form, 7.49% for chunk form). Therefore, it is evident that the Lomb-Scargle+DBN method can significantly and steadily improve the recognition performance for the different incomplete motor imagery EEG.

### Comparison of DBN With Various Structures

It should be noted that the structures of DBN adopted in the incomplete EEG experiments were determined and selected by an optimization method. As previously mentioned, the DBN was constructed by three hidden layers of pretrained RBMs and an output layer of softmax regression. For this study, a number of 256 dimensional vectors were fed to the input layer of the DBN. Hence, the dimension of the input layer was 256. Furthermore, three units were utilized in the output layer of softmax regression, which corresponded to three motor imagery tasks. To obtain the relevant optimal parameters, various numbers of units were tried for the three hidden layers. More explicitly, different numbers of units varied over a range were used in one hidden layer, while the numbers of units in the remaining two hidden layers were unchanged. Since optimal parameters selection of the DBN was a combinatorial process, which yields comparable solutions rapidly. To evaluate the sensitivity of the hidden layers to the changes of the unit numbers, 5-fold cross-validation was applied for the classification of motor imagery EEG. For each subject, the intact EEG and incomplete EEG with various ratios of data removal were divided into 5 sections, in which 4 sections were adopted for training, and the rest section was used for the test. The average performances were obtained by executing 5 times procedures repeatedly. Additionally, all the evaluations were conducted in the features extracted by the Lomb-Scargle periodogram.

For the first hidden layer, the numbers of units varied in a range of [15 30 45 60 75 90] while the numbers of units in the other two hidden layers maintained a constant value with 50 and 35 units, respectively. The corresponding comparison of classification performances for the DBN with different numbers of units in the first hidden layer is presented in **Table 2**. The results showed that the maximum mean accuracy 71% was obtained in the condition of 60 units of the first hidden layer. The decoding accuracies were remarkably improved in the 60 units compared to other numbers of units for the first hidden layer (p < 0.05, paired Student's t-test). Similarly, **Table 3** gives the performance of the second hidden layer varying in [10 20 TABLE 2 | Comparison of classification accuracies based on different numbers of units in the first hidden layer for the nine subjects.


*The maximum mean of comparative experiments were highlighted in the bold.*

TABLE 3 | Comparison of classification accuracies based on different numbers of units in the second hidden layer for the nine subjects.


*The maximum mean of comparative experiments were highlighted in the bold.*

30 40 50 60] units with the other two hidden layers of 60 and 35 units respectively. The accuracies of 50 units in the second hidden layer (about 72%) were significantly higher than those of other numbers of units (p < 0.05, paired Student's t-test). **Table 4** represents the results of the third hidden layer taking units from [25 30 35 50 70 85] when the other two hidden layers of 60 and 50 units respectively. It can be observed that the performances of 35 units in the third hidden layer were significantly different compared to the other numbers of units (p < 0.01, paired Student's t-test). The process of adjusting parameters was very tedious and tricky for the BDN. Nevertheless, the change of the classification accuracy was lower than 10% for the motor imagery tasks with different numbers of units in the three hidden layers. It suggested that the DBN classifier was robust relative to the variation of the network structure. In brief, the structure of the DBN used in this experiment was 256 × 60 × 50 × 35 × 3.

### Comparison Between DBN and SVM

In this series of experiments, performance comparisons between DBN and SVM were evaluated, with respect to the recognition accuracy for the incomplete EEG in the case of point removal and TABLE 4 | Comparison of classification accuracies based on different numbers of units in the third hidden layer for the nine subjects.


*The maximum mean of comparative experiments were highlighted in the bold.*

chunk removal respectively. As previously described, the Lomb-Scargle periodogram can extract effective and robust spectral features for various incomplete EEG to promote the classification performance. Hence, the DBN and SVM classifiers were executed on the same feature datasets extracted by the Lomb-Scargle method. For the three motor imagery tasks, three binary SVMs with a Radial Basis Function (RBF) kernel were built to obtain the final accuracy by a majority voting strategy. The relevant parameters of the binary SVM were optimized using a gridsearch trick (Quitadamo et al., 2017) in a range of [−5 5], such as regularization parameter C and kernel width σ of the RBF. In addition, 5-fold cross-validation method was also applied to avoid overfitting for both classifiers.

**Figures 7**, **8** present the comparison results between DBN and SVM for the intact EEG and incomplete EEG in the case of point removal and chunk removal (ratios from 10 to 80% with a step of 10%), respectively. For the intact motor imagery EEG, the performance between DBN and SVM across the nine subjects was no significantly difference (p = 0.062 > 0.05, paired Student's t-test), with mean accuracies of 74.77% (±0.44%), 73.74% (±0.78%) respectively. From **Figure 7**, the overall performance of the DBN for the incomplete EEG with different ratios of point removal was better than that of the SVM. Especially, for the case of subject 5, 8, and 9 (S05, S08, and S09 EEG datasets), the accuracies of the DBN for the incomplete EEG after 30% data point removal were obviously improved, with an average increment of 2.64%. However, for the incomplete EEG with different ratios of data chunk removal, the accuracy improvement of the DBN was not significant compared with the SVM. For some subjects, such as subject 2, 3, 4, and 9, the SVM can outperform the DBN for the incomplete EEG with chunk removal in some degree (seen in **Figure 8**).

For further clarification, the average accuracies (± standard deviation) of the DBN and SVM across the incomplete EEG with various ratios of data removal (from 10 to 80% with a step of 10%) were presented in **Table 5**, including the case of point removal and chunk removal respectively. As shown, for the incomplete EEG with point removal method, the average classification

performance of the DBN (70.72 ± 2.65%) was higher than that of the SVM (69.89 ± 3.08%) across the nine subjects. For the case of point removal, the p-value computing from the Student's t-test between DBN and SVM was 0.021 < 0.05. Moreover, the DBN led to relatively lower variability compared to the SVM, with a mean standard deviation of 2.65% and 3.08% respectively. These results indicated that the DBN was superior to the SVM for the incomplete EEG classification in terms of point removal. Whereas, in the case of chunk removal, the increase of accuracy between DBN (68.86 ± 3.58%) and SVM (68.74 ± 3.53%) was lower than that in the case of point removal. And there was no statistical difference between DBN and SVM (p = 0.79 > 0.50, paired Student's t-test) for the incomplete EEG with chunk removal. This may be due to the reason that compared to the incomplete EEG with point removal, the extracted features from the incomplete EEG with chunk removal were relatively poor and weaken the performance of the DBN and SVM. However, it is likely that the DBN can perform better than the SVM for the motor imagery classification of the incomplete EEG when parameters are subtly tuned and extra layers are added.

### CONCLUSIONS AND FUTURE WORKS

In this study, a decoding scheme based on the combination of LSP and DBN was proposed to recognize incomplete motor imagery EEG segments. To construct incomplete EEG segments, point and chunk removal form were respectively utilized to randomly and proportionally eliminate the uninteresting EEG point or portion. The point removal form was mainly used to eliminate outliers within the EEG segments due to data loss. And the chunk removal form was used to eliminate portions within the EEG segments due to extreme artifacts. The LSP method was carried out to extract robust spectral power features of mu/beta rhythms related to motor imagery tasks for the incomplete EEG. The DBN consisted of three layers of stacking restricted

Boltzmann machines (RBMs) and a softmax regression layer was devised to perform motor imagery classification. Since this was a preliminary study, the chunk and point removal was processed in a random manner. However, for the real application, a more specific search process was needed to determine which chunks or points should be removed.

10 to 80% with a step of 10%), for the nine subjects (from S01 to S09).

To validate the effectiveness of the proposed decoding scheme for the incomplete EEG, various comparative experiments were conducted and evaluated on simulated signal and real motor imagery EEG, including the comparison of different spectral power estimation methods (FFT, Welch and Lomb-Scargle) and different classifiers (DBN and SVM). For the simulation comparison with three spectral estimation methods, the results show that the Lomb-Scargle method can extract more stable and remarkable spectral power for the incomplete or irregular signals. Furthermore, the PSD features extracted by the three estimation methods were recognized using a DBN classifier, and the classification accuracy of the Lomb-Scargle+DBN was not dramatically declined compared to FFT+DBN and Welch+DBN for the incomplete motor imagery EEG with increasing proportion of point removal or chunk removal (from 10% to 80% with a step of 10%). These results suggest that the Lomb-Scargle+DBN can lead to significantly and steadily improve the recognition performance for the incomplete motor imagery EEG. The significance statistical analysis between Lomb-Scargle+DBN and FFT+DBN or Welch+DBN was less than 0.05 for the incomplete EEG in the case of point removal and chunk removal. After three groups of experimental tests and comparisons, the structure of the DBN was determined to be 256 × 60 × 50 × 35 × 3 to improve the learning performance of the DBN. Extended comparison between DBN and SVM indicated that the DBN was superior to the SVM for the incomplete EEG in terms of point removal. Moreover, for the classification of the intact motor imagery EEG, there was no significant difference for the average accuracy (p > 0.078, paired t-test) between the Lomb-Scargle+DBN and the other methods (FFT+DBN



*The maximum mean of comparative experiments were highlighted in the bold.*

and Welch+DBN). Considering the computational complexity and the efficiency, it is not preferable to apply the Lomb-Scargle+DBN for the intact motor imagery EEG classification. Therefore, the proposed decoding scheme is suitable to improve the classification performance for the incomplete motor imagery EEG. It means that instead of rejecting the entire segment, the motor imagery EEG segment with data loss or extreme artifacts can still be used to generate comparable classification results when the affected portions are eliminated.

Thanks to decoding the incomplete EEG, the proposed scheme will be beneficial to improve the stability, smoothness and maintain continuous outputs for a BCI system. Especially, for online BCI systems, the intentions of subjects are continuously decoded from the EEG signals with no interruption. In the future work, the online test based on motor imagery EEG will be carried out to evaluate the validity of the proposed decoding scheme for the incomplete signals. Additionally, because of the Lomb-Scargle periodogram was particularly suited to estimate rhythm components in non-uniformly sampled signals (Stoica et al., 2009), it may be applicable to other modalities of the EEG signal related to spectral analysis. For example, the proposed method can be applied to decode the incomplete SSVEP EEG. For the structure of the DBN, more dedicated procedures can be implemented to further boost the decoding performance, such as adding layers of the RBMs and utilizing search algorithms to optimize the hyper-parameters of the DBN. Additionally, optimal frequency bands associated with relevant motor imagery tasks

### REFERENCES


can be further investigated to promote the overall performance of the proposed method. For the segmentation processing of the sliding window with 80% overlapping, there was a correlation between the 16 samples from the same EEG trial. This factor may influence the performance of the proposed method for the incomplete EEG classification. In the next work, similar to the study of Asensio-Cubero et al., a comparative research should be conducted by applying the proposed method to three different segmentation strategies: (1) no segmentation, by applying the proposed method directly to the whole EEG trial, (2) uniform segmentation without overlapping, and 3) segmentation with different overlapping (sliding window method) (Asensio-Cubero et al., 2011). In this study, the BCI system based on motor imagery EEG works in a synchronous way. And an asynchronous BCI system needs to be further investigated in the future work. In conclusion, the introduced decoding scheme provides an effective solution for the incomplete motor imagery EEG in the BCI system.

### AUTHOR CONTRIBUTIONS

YC, XZ, YijZ, WX, and JH conceived the conception and designed the decoding scheme for this research. YC and YZ carried out the comparative experiments, including acquisition and analysis of data for the work. YC, XZ, and YijZ interpreted the experimental results. YC drafted the manuscript. XZ, WX, JH, and YiwZ revised the manuscript.

### FUNDING

This work was supported by the National Nature Science Foundation of China under Grants 61503374 and 61573340, in part by the Frontier Science research project of the Chinese Academy of Sciences (Grant No. QYZDY-SSW-JSC005) and Liaoning Provincial Doctoral Starting Foundation of China under Grants 201501032.

### ACKNOWLEDGMENTS

The authors gratefully acknowledge the support by State Key Laboratory of Robotics for providing us with the acquisition devices. The authors would like to thank Huibin Du et al. for participating the experiment. We also appreciate the assistance of Guowei Wu in setting up the experimental condition and thank Qichuan Ding for his help with the proofreading and corrections.


decomposition. Psychophysiology 47, 955–960. doi: 10.1111/j.1469-8986.2010. 00995.x


interface. J. Neurosci. Methods 244, 8–15. doi: 10.1016/j.jneumeth.2014. 03.012

Zhao, X., Chu, Y., Han, J., and Zhang, Z. (2016). SSVEP-based braincomputer interface controlled functional electrical stimulation system for upper extremity rehabilitation. IEEE Trans. Syst. Man Cybern. Syst. 46, 947–956. doi: 10.1109/TSMC.2016.2523762

**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2018 Chu, Zhao, Zou, Xu, Han and Zhao. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# An Advanced Adaptive Control of Lower Limb Rehabilitation Robot

Yihao Du† , Hao Wang, Shi Qiu† , Wenxuan Yao, Ping Xie\* and Xiaoling Chen

*Key Lab of Measurement Technology and Instrumentation of Hebei Province Institute of Electric Engineering, Yanshan University, Qinhuangdao, China*

Rehabilitation robots play an important role in the rehabilitation field, and effective human-robot interaction contributes to promoting the development of the rehabilitation robots. Though many studies about the human-robot interaction have been carried out, there are still several limitations in the flexibility and stability of the control system. Therefore, we proposed an advanced adaptive control method for lower limb rehabilitation robot. The method was devised with a dual closed loop control strategy based on the surface electromyography (sEMG) and plantar pressure to improve the robustness of the adaptive control for the rehabilitation robots. First, in the outer loop control, an advanced variable impedance controller based on the sEMG and plantar pressure was designed to correct robot's reference trajectory. Then, in the inner loop control, a sliding mode iterative learning controller (SMILC) based on the variable boundary saturation function was designed to achieve the tracking of the reference trajectory. The experiment results showed that, in the designed dual closed loop control strategy, a variable impedance controller can effectively reduce trajectory tracking errors and adaptively modify the reference trajectory synchronizing with the motion intention of patients; the designed sliding mode iterative learning controller can effectively reduce chattering in sliding mode control and excellently achieve the tracking of rehabilitation robot's reference trajectory. This study can improve the performance of the human-robot interaction of the rehabilitation robot system, and expand the application to the rehabilitation field.

Keywords: lower limb rehabilitation robot, motion analysis, dual closed loop control, advanced variable impedance control, sliding mode iterative learning control

### INTRODUCTION

Recently, the rehabilitation robots have shown great advantages and have attracted more attention in rehabilitation field, which can assist patients in rehabilitation training and effectively alleviate the work pressure of the therapist (Lo et al., 2010). Currently, according to the training mode in the rehabilitation process, the rehabilitation robots are mainly divided into two types: passive training and active training. The former has been widely applied in clinic, and has brought some effects for patients, but it lacks active participation of patients and may leads to unreasonable and insufficient recovery. The latter can provide appropriate assistance according to patients' active motion intention and state, which contributes to the recovery of motor nerves and accelerate the rebuilding of motor function. Evidence-based medicine also shows that active rehabilitation training has better recovery effects on patients (Costandi, 2014). In the active rehabilitation training process, the control strategy can be adjusted adaptively according to motion state of the patient. Many studies on this have been done as following:

#### Edited by:

*Dingguo Zhang, Shanghai Jiao Tong University, China*

#### Reviewed by:

*Wei Meng, Wuhan University of Technology, China Xingang Zhao, Shenyang Institute of Automation, Chinese Academy of Sciences, China*

\*Correspondence:

*Ping Xie pingx@ysu.edu.cn*

*†These authors have contributed equally to this work*

#### Specialty section:

*This article was submitted to Biomedical Robotics, a section of the journal Frontiers in Robotics and AI*

Received: *16 March 2018* Accepted: *14 September 2018* Published: *08 October 2018*

#### Citation:

*Du Y, Wang H, Qiu S, Yao W, Xie P and Chen X (2018) An Advanced Adaptive Control of Lower Limb Rehabilitation Robot. Front. Robot. AI 5:116. doi: 10.3389/frobt.2018.00116*

In order to realize the active control of rehabilitation robot, effective motion intention recognition and motion state analysis is very important. Surface electromyography (sEMG), as an information which can reflect the muscle status (Wu et al., 2010), has been used in motion intention recognition (Amsüss et al., 2014; He et al., 2015) and interaction control of human-robot system (Meng et al., 2014). Human motion intention recognition methods are mainly divided into discrete action classification and continuous motion analysis (Kawase et al., 2014; Hou et al., 2016). The discrete action classification method can be used in the rehabilitation robot control system of early rehabilitation training for patients, but the human-robot interaction level is low, while the continuous motion analysis method can be used in adjusting the rehabilitation robots' degree of assistance in realtime according to the patients' motion intention and motion ability. For example, the skeletal muscle model is used to predict the multi-joint angle, but it is not suitable for interaction control of the human-robot system since the model has many unknown parameters and low accuracy (Buchanan et al., 2004; Meng et al., 2015). The musculoskeletal model is simplified in some researches, for example, joint-angle model was established by introducing the muscle activity and time domain features (Koo and Mak, 2005); the k-order dynamic model was designed by using the LS-SVR method to predict the joint angle (Tang et al., 2016). By establishing the regression model between sEMG and joint angles, the prediction accuracy is significantly improved, but the modeling takes long time, which may cause patient muscle tired. Relevant studies have shown that the prediction errors may significantly increase under the condition of muscle fatigue, and it is difficult to guarantee the interaction control security of the human-robot system (Li Z. et al., 2015). In addition, in some studies, the sEMG signal was applied to predict the muscle strength of the limb in order to realize the active control of the rehabilitation robot (Duschau-Wicke et al., 2010), but the prediction accuracy of muscle strength still need to be improved. Therefore, it is necessary to comprehensively consider the sEMG, joint angle and human-robot interaction force to realize an accurate motion state analysis.

Furthermore, many studies concentrates on how to design the adaptive control strategy of rehabilitation robot in the active training process. The impedance control method, as a commonly intelligent control method for rehabilitation robot, have been introduced into rehabilitation robot control (Jezernik et al., 2004; Xie et al., 2016), which can improve the interaction performance of the human-robot system by adjust the assistance level according to patients' motion intention and motion state. However, there are some limitations on the traditional constant coefficient impedance control method because the parameters of the human-robot system are preset and cannot be adjusted according to the changes of the patient's motion state in real time. Therefore, the variable impedance control method was proposed to adjust the gait training speed within the virtual channel according to the plantar pressure (Kiguchi and Hayashi, 2012), but the virtual channel varies from person to person. Because the sEMG signals can describe the motion state and reflect changes of human damping and stiffness in the human-robot system (Rahman et al., 2014), it has been introduced to the impedance control model in some researches. For example, muscle activity information has been used in the rehabilitation robot control system to adjust the control speed (Rahman et al., 2013), but the performance of control method still need to be improved when patient's motion intention and human-robot interaction force are variable. Therefore, it will be helpful to improve the adaptive ability of rehabilitation robot control if the impedance parameters of the control system are adjusted considering the sEMG, joint angle and human-robot interaction force together.

In this paper, we proposed an advanced adaptive control method for lower limb rehabilitation robot, which was designed with a dual closed loop control strategy based on the sEMG and plantar pressure. Firstly, we carried out motion analysis of human lower limbs with least squares extreme learning machine (LS-ELM) algorithm to obtain the desired trajectory of patients. Then, the designed variable impedance control was used to adaptively correct the desired trajectory according to patients' active motion intention and obtained the reference trajectory of the rehabilitation robot. Finally, the designed SMILC was used to track the reference trajectory and realize the adaptive control of rehabilitation robot, which can enhance the compliance and the robustness of the lower limb rehabilitation robot control system in training. This study can effectively improve the performance of the human-robot interaction and the robustness of control in the rehabilitation robot system.

#### HUMAN-ROBOT SYSTEM MODELING

To verify the adaptive control method of the lower limb rehabilitation robot proposed in this study, we first established a human-robot system model as the control object for further study.

In this study, we chose the lower limb rehabilitation robot with one degree of freedom as the object, which could complete the horizontal extension and flexion movement through the rod and the pedal, **Figure 1** showed the model and simplified diagram of the lower limb rehabilitation. In order to reduce the modeling complexity, the rehabilitation robot and human lower limb are considered to be a single unit and simplified as a two-link series mechanism.

The Cartesian coordinate system is established with the hip joint as the origin, as shown in (**Figure 1B**). The coordinate of robot's end point B is calculated through kinematics:

$$X = \begin{bmatrix} L\_1 \cos q\_1 + L\_2 \cos(q\_1 + q\_2) \\ L\_1 \sin q\_1 + L\_2 \sin(q\_1 + q\_2) \end{bmatrix} \tag{1}$$

where L<sup>i</sup> is the length and q<sup>i</sup> is the deflection angle of the i-th bar. The deflection angle of the joint can be solved through inverse kinematics:

$$q = \begin{bmatrix} \arcsin(\frac{-L\_2 s\_2}{\sqrt{\varkappa\_B^2 + \varkappa\_B^2}}) + \arctan(\frac{\varkappa\_B}{\varkappa\_B})\\\kappa \frac{L\_1^2 + L\_2^2 - \varkappa\_B^2 - \varkappa\_B^2}{2L\_1 L\_2} \end{bmatrix} \tag{2}$$

Considering the influence of human movement on the humanrobot system, the mapping torque of the human active power in

robot space is used as part of the drive torque of the human-robot system, and the dynamic model of the human-robot system can be described as:

$$
\pi\_r + \mathfrak{r}\_{hr}^h = M(q)\ddot{q} + H(q, \dot{q}) + G(q) \tag{3}
$$

where **q** = - q1 q2 T is the angle of hip joint and knee joint, **M**(q) is the positive definite inertia matrix of the human-robot system, **H**(**q, q**˙) is the Coriolis force and the centrifugal correlation matrix, **G**(**q**) is the gravity matrix, τ h hr is the equivalent torque of human active moment in robot space, and τ**<sup>r</sup>** is the driving torque provided by the robot.

The human active force and gravity are both considered in the process of human-robot system modeling, which can improve the accuracy of the human-robot system modeling and interaction performance for the human-robot system. In this paper, a dual closed loop control strategy based on the sEMG signals and plantar pressure was proposed to realize the adaptive control of the human-robot system.

### ADAPTIVE CONTROL OF HUMAN-ROBOT SYSTEM

#### Control Strategy of Human-Robot System

To improve the performance of human-robot interaction and compliance control of rehabilitation robot, a dual closed loop control strategy based on sEMG signals and the human-robot interaction force (plantar pressure) is designed for the humanrobot system, which is consist of the variable impedance control in the outer loop and the position control in the inner loop. The variable impedance control model based on sEMG and the plantar pressure is designed to obtain the reference trajectory that reflects the patient's motion intention and motion ability by correcting the patient's desired trajectory. Then, the sliding mode iterative learning control algorithm based on a variable boundary saturation function is designed to track the reference trajectory, which performs steady trajectory tracking and improves the robustness of the control system, as shown in **Figure 2**.

### Desired Trajectory Generation Based on Human Motion Intention

To obtain the desired trajectory of rehabilitation robot synchronizing with the human motion intention, we established a nonlinear motion analysis model between sEMG and the joint angle. In order to ensure the real-time performance, the desired trajectory was generated by using the least squares extreme learning machine (LS-ELM) algorithm (Li Q. L. et al., 2015), as shown in **Figure 3**.

The WL (Wave Length) is extracted as the sEMG feature:

$$WL = \sum\_{i=1}^{N-1} |\xi\_{i+1} - \xi\_i| \tag{4}$$

where ξ<sup>i</sup> is the pretreated sEMG signal and N is the number of sampling point over a period. The signals were filtered with a 1 Hz low-pass Butterworth filter, and then were normalized.

Taking the lower limb hip joint angle as an example. The inputs of LS-ELM network are the sEMG features x<sup>j</sup> of the tibialis anterior muscle and vastus rectus muscle, and the outputs are the hip joint angle θh◦

$$\begin{cases} \theta\_h = \left[\theta\_1, \dots, \theta\_i, \dots, \theta\_n\right] \\ x\_j = \left[x\_{j,1}, \dots, x\_{j,i}, \dots, x\_{j,n}\right], j = 1, \dots, k \end{cases} \tag{5}$$

where n is the number of training sample and the k is the number of input channels.

The hidden layer excitation function is sigmode function:

$$G(z) = \frac{1}{1 + e^{-z}}\tag{6}$$

FIGURE 3 | Principle of the desired trajectory generation of the human-robot system.

The desired output model is:

$$\theta\_h = \sum\_{i=1}^{L} \beta\_i G\_i(\alpha\_i \times \alpha\_i + b\_i) \tag{7}$$

where L is the number of hidden layer nodes, a<sup>i</sup> = [αi1, αi2, · · · , αin] T is the weight between the i-th hidden layer node and input node, b<sup>i</sup> is the threshold of the i-th hidden layer node, and β<sup>i</sup> = [βi1, βi2, · · · , βiL] T is the connection weight between the output layer node and i-th hidden layer node. Deforming the formula (7) with the existing methods (Huynh et al., 2008; Xie et al., 2016; Du et al., 2017; Li et al., 2017) as:

$$
\theta\_h = (\mathbf{x} \cdot \mathbf{a}) \cdot \beta \tag{8}
$$

According to the generalized inverse matrix theory of Moore-Penrose: x × α = θhθ<sup>h</sup> +G −1 (θhβ <sup>+</sup>), set that <sup>Z</sup> <sup>=</sup> <sup>θ</sup><sup>h</sup> +G −1 (θhβ +), and we can obtain that:

$$
\boldsymbol{x} \cdot \boldsymbol{\alpha} = \theta\_h \boldsymbol{Z} \tag{9}
$$

According to the least squares principle, when **Z** is randomly generated, the input weight α, offset **b** and output weight β are obtained.

We conducted lower limb motion analysis by using the LS-ELM (Least squares extreme learning machine) algorithm, and obtained the desired trajectory of rehabilitation robot synchronizing with the patient's motion intention. The desired trajectory was then used in the variable impedance control of human-robot system to generate the reference trajectory.

### Adaptive Compliance Control of the Human-Robot System

To realize human-robot interaction and compliance control of rehabilitation robot control system, we proposed an advanced

adaptive control method for lower limb rehabilitation robot. The method was a dual closed loop structure with variable impedance control in the outer loop and position control based on SMILC in the inner loop, as shown in **Figure 4**. In the outer loop, the variable impedance controller was designed with impedance coefficients corrected in real-time by the lower limb sEMG activity and muscle contribution rate, which can realize the adaptive adjustment of reference trajectory of the robot according to human stiffness and damping. In other words, the desired rehabilitation robot trajectory was corrected by the lower limb sEMG and human-robot interaction force, and the reference trajectory was obtained synchronizing with patient's motion intention and ability. In the inner loop, a sliding mode iterative learning control algorithm based on variable boundary saturation function is designed for position controller to realize the tracking of reference trajectory. The design of the algorithm could reduce the sliding mode chattering effectively and improve the robustness of the control system.

#### The Variable Impedance Control

The impedance control is a second order model that can denote the ideal dynamic relationship between the robot terminal position and human-robot interaction force. In other words, the desired trajectory of rehabilitation robot is adjusted according to the changes of the plantar pressure, and the reference trajectory is generated according to patients' motion ability. The specific model is designed as follows:

$$\begin{aligned} F\_{\text{int}} - F\_d &= \mathcal{M}\_d(\ddot{\mathbf{x}}\_d - \ddot{\mathbf{x}}\_r) + \mathcal{B}\_d(\dot{\mathbf{x}}\_d - \dot{\mathbf{x}}\_r) + \mathcal{K}\_d(\mathbf{x}\_d - \mathbf{x}\_r) \\ \mathbf{r}\_{hr}^h &= \mathbf{J}^T \mathbf{F}\_{\text{int}} \end{aligned} \tag{11}$$

where **M**d, **B**d, and **K**<sup>d</sup> are the inertia matrix, damping matrix and stiffness matrix respectively; **x**<sup>d</sup> and **x**<sup>r</sup> are the terminal position desired trajectory and reference trajectory of the rehabilitation robot respectively; **J** is the Jacobian matrix; **F**<sup>d</sup> is the ideal static balance force of human-robot; **F**int is human-robot interaction force.

Since the lower limb active force of patient was small, the effect of acceleration was neglected, and by only considering the damping and stiffness coefficients, we could get that:

$$F\_d - F\_{int} = \mathcal{B}\_d(\dot{\mathbf{x}}\_d - \dot{\mathbf{x}}\_r) + K\_d(\mathbf{x}\_d - \mathbf{x}\_r) \tag{12}$$

Formula (13) was obtained by the s transforming:

$$\mathbf{x}\_{\mathbf{e}} = \frac{F\_d - F\_{\text{int}}}{\mathbf{B}\_d \cdot \mathbf{s} + \mathbf{K}\_d} \tag{13}$$

where **x**<sup>e</sup> = **x**<sup>d</sup> − **x**<sup>r</sup> is the desired trajectory correction of the rehabilitation robot. Therefore, the reference trajectory in joint space **x**<sup>r</sup> = **x**<sup>d</sup> − **x**<sup>e</sup> was obtained by inverse kinematics as **q** r .

In rehabilitation training, the damping and stiffness of the lower limb changes with human active movement, showing that the change of muscle activity makes the traditional impedance control model unable to meet the requirement of the active compliance control of human-robot system. Therefore, muscle activity was introduced to establish the nonlinear mapping function and adjust the impedance parameters according to human motion (Lloyd and Besier, 2003), making the reference trajectory of rehabilitation robot more in line with the patient's movement ability.

The muscle activity is expressed as:

$$a\_j = \frac{e^{A\_j \Psi\_j(t)} - 1}{e^{A\_j} - 1} \tag{14}$$

where **u**j(t) is the sEMG signals after preprocessing and normalization, and A<sup>j</sup> is the nonlinear coefficient of the model between sEMG and muscle activity, whose scope is −3 ∼ 0.

Lower limb activity η is defined as:

$$\eta = \sum\_{j=1}^{N} a\_j \cdot a\_j \tag{15}$$

$$a\_{\circ} = \frac{RMS\_i(j)}{\sum\_{j=1}^{N}RMS\_i(j)}\tag{16}$$

where ω<sup>j</sup> is the contribution rate of the j-th muscle, and RMSi(j) is the mean square root of the sEMG signals.

The damping and stiffness coefficients of the impedance equation can be adjusted:

$$B\_d = \text{sig}(\lambda\_B \cdot \eta) \cdot B\_0 \tag{17}$$

$$K\_d = \text{sig}(\lambda\_K \bullet \eta) \cdot K\_0 \tag{18}$$

where λ<sup>B</sup> and λ<sup>K</sup> are the damping coefficient and stiffness gain coefficient respectively; B<sup>0</sup> and K<sup>0</sup> are the initial impedance coefficients; B<sup>d</sup> and K<sup>d</sup> are the modified impedance coefficients; and sig( ∗ ) is the sigmoid function that limits B<sup>d</sup> and K<sup>d</sup> in the scope of <sup>B</sup><sup>0</sup> <sup>2</sup> <sup>∼</sup>B<sup>0</sup> and <sup>K</sup><sup>0</sup> <sup>2</sup> <sup>∼</sup>K0.

The variable impedance control model, established based on human lower limb sEMG, can adaptively adjust the impedance parameter according to the changes of lower limb activity, and correct the desired trajectory of the rehabilitation robot and generate a reference trajectory, which is in greater agreement with patients' motion ability. Then, adaptive control of the humanrobot system is performed according to reference trajectory tracking.

#### The Position Control Based on SMILC

Involuntary tremble of lower limb and periodic interference caused by repetitive training may induce some unknown uncertainties in the human-robot system model, which affect the accuracy and stability of reference trajectory tracking of the rehabilitation robot. Therefore, the sliding mode iterative learning control algorithm based on the variable boundary saturation function is proposed in position control. This algorithm combines iterative learning control and sliding mode variable structure control to suppress the inhibitory periodic and non-periodic disturbances, and replaces the symbol function in iterative learning control algorithm with a variable boundary saturation function to improve the performance of rapidity and robustness of the control system, as shown in **Figure 5**.

Considering factors such as the modeling errors and parameters variation of the human-robot system, the dynamic model is corrected as:

$$M(q)\ddot{q} + N(q, \dot{q}) = \mu + \pi\_{hr}^h + \pi\_d \tag{19}$$

where **N**(**q, q**˙) = **H** (**q, q**˙) + **G**(**q**), **u** is the robot control torque, and τ <sup>d</sup> is the repetitive and non-repetitive disturbance caused by rehabilitation robot vibration and human tremble.

The overall control law of the k-th iteration is:

$$
\mu(k) = \mu(k-1) + \Delta\mu(k)\tag{20}
$$

where 1u(k) is the sliding mode controller output in the k-th iteration, u(k − 1) is the control variable of the (k-1)-th iteration, and u(k) will be stored in memory as the input for the next iteration.

The k-th error and error ratio of the control system are set as:

$$\mathbf{e} = \begin{bmatrix} q\_1^r - q\_1 \ q\_2^r - q\_2 \end{bmatrix}^T = \begin{bmatrix} e\_1 \ e\_2 \end{bmatrix}^T \tag{21}$$

$$
\dot{e}(t) = \frac{e(t) - e(t-1)}{\Delta t} \tag{22}
$$

where 1t is the time interval between two sampling points, and the sliding mode function is designed as:

$$s = Ce + \dot{e} = \begin{bmatrix} c\_1 e\_1 + \dot{e}\_1 \\ c\_2 e\_2 + \dot{e}\_2 \end{bmatrix} \tag{23}$$

where c<sup>1</sup> and c<sup>2</sup> are the sliding mode coefficients.

$$
\dot{\vec{s}} = \begin{bmatrix} c\_1 \dot{e}\_1 + \ddot{e}\_1 \\ c\_2 \dot{e}\_2 + \ddot{e}\_2 \end{bmatrix} = \begin{bmatrix} c\_1 \dot{e}\_1 \\ c\_2 \dot{e}\_2 \end{bmatrix} + \begin{bmatrix} \ddot{q}\_1^r \\ \ddot{q}\_2^r \end{bmatrix}
$$

$$
$$

To reduce the chattering in sliding mode control, saturation function based on nonlinear feedback is used to replace the function based on linear feedback in the boundary layer, which can enable the system state to reach the sliding surface in limited time and improve the system robustness. Therefore, we defined the exponential approach law with the saturation function based on the nonlinear feedback:

$$\dot{s} = -\varepsilon \text{sat}(s) - ks = \begin{bmatrix} -\varepsilon\_1 \text{sat}(s\_1) - ks\_1 \\ -\varepsilon\_2 \text{sat}(s\_2) - ks\_2 \end{bmatrix} \tag{25}$$

where ε<sup>1</sup> and ε<sup>2</sup> are strictly positive real numbers.

$$\text{sat}(s) = \begin{cases} \text{sgn}(s) & |s| > \phi \\ \left(\frac{s}{\Phi(s)}\right)^{\alpha} & |s| \le \phi \end{cases} \tag{26}$$

where Φ is the boundary layer thickness, Φ > 0, 0 < α = p <sup>q</sup> <sup>&</sup>lt; 1, p and q are positive odd numbers. We combined formula (24) with (25) and designed the control law:

$$\mu = \mathcal{M}\left( \begin{bmatrix} c\_1 \dot{\boldsymbol{e}}\_1 \\ c\_2 \dot{\boldsymbol{e}}\_2 \end{bmatrix} + \begin{bmatrix} \ddot{q}\_1^d \\ \ddot{q}\_2^d \end{bmatrix} + \mathfrak{e}\text{sat}(\boldsymbol{s}) + k\boldsymbol{s} \right) + \mathcal{N} - \mathfrak{r}\_{hr}^h - \mathfrak{r}\_d \tag{27}$$

Setting τ dc as the estimated value to replace τ <sup>d</sup>, whose upper and lower bounds to τ <sup>U</sup> and τ <sup>L</sup>, and then put them into the formula (25), we could get:

$$\dot{s} = -\mathbf{e} \cdot \text{sat}(\mathbf{s}) - ks - (\mathbf{\bar{r}}\_d - \mathbf{\bar{r}}\_{dc}) \tag{28}$$

where <sup>τ</sup>¯ <sup>d</sup> <sup>=</sup> **<sup>M</sup>**−<sup>1</sup> <sup>τ</sup> <sup>d</sup> and <sup>τ</sup>¯ dc <sup>=</sup> **<sup>M</sup>**−<sup>1</sup> τ dc. The Lyapunov function was set:

$$V = \frac{1}{2}\mathbf{s}^2\tag{30}$$

For the stabilization of sliding mode control system, lim t→0 **ss**˙ < 0, that is:

$$\overline{\mathfrak{r}}\_{dc} = \begin{cases} \overline{\mathfrak{r}}\_{L}, s > 0 \\ \overline{\mathfrak{r}}\_{U}, s < 0 \end{cases} \tag{31}$$

Setting τ¯<sup>m</sup> = τ¯U−τ¯<sup>L</sup> 2 , τ¯ <sup>p</sup> = τ¯U+τ¯<sup>L</sup> 2 , and the sliding mode control law is that:

$$u\left(k\right) = u\left(k-1\right) + \mathcal{M}\left(\begin{bmatrix} c\_1 \dot{e}\_1\\ c\_2 \dot{e}\_2 \end{bmatrix} + \begin{bmatrix} \ddot{q}\_1^r\\ \ddot{q}\_2^r \end{bmatrix} + \mathfrak{e}sat(s) + ks\right)$$

$$+ \mathcal{N} - \mathfrak{r}\_{hr}^h - \mathcal{M}(\tilde{\mathfrak{r}}\_p - \tilde{\mathfrak{r}}\_m \text{sgn}(s)) \tag{32}$$

The sliding mode iterative learning control algorithm, based on the variable boundary saturation function, was used in tracking the reference trajectory of the rehabilitation robot. By sensing the human-robot interaction force and suppressing periodic and non-periodic disturbances, we can quickly complete the tracking of reference trajectory and improve system control robustness, realizing adaptive compliance control of the humanrobot system.

### RESULTS AND DISCUSSION

#### Subjects

Seven healthy subjects (aged 25 ± 2 years old) without any previous history of neural or physiological disorders participated in this experiment. Before the experiments, each subject provided informed consent and was informed of the experimental requirements. The experiment was approved by the ethical review board of Yanshan University. To avoid the influence of fatigue, all subjects were in a good state of mind and had not undergone strenuous exercise with lower limb recently.

### Experimental Protocol

In order to verify the effectiveness of the proposed method, the horizontal extension and flexion movement of the lower limb was chosen as the experimental paradigm. And seven healthy subjects (S1∼S7, five males, two females, 25 ± 2 years old) were selected for analysis to avoid secondary injuries in patients by accident. The extension period was set to 5 s, and the American Delsys company TrignoTM Wireless EMG system was used to synchronously capture the subject's right leg muscle sEMG signals and joint angles, as shown in **Figure 6**. We chose the Vastus Rectus Muscle (VR), Vastus Lateralis Muscle (VL), Vastus Medialis Muscle (VM), Semitendinosus Muscle (SM), Biceps Muscle (BM), and Tibialis Anterior Muscle (TA) as data collection points. The researched method was conducted to analyze the adaptive compliance control of the human-robot system for all the subjects.

## Experimental Results

The Prediction of Joint Angles Based on sEMG The joint angle in lower limb extension motion of the 7 subjects

was predicted by sEMG signals with the LS-ELM algorithm to realize the continuous motion analysis. The sEMG signal and the predicted joint angle of subject S2 in one training process was shown in **Figure 7**. The sEMG signals of the VR and TA showed obvious periodicity, and the predicted joint angle were consistent with the actual joint angle. **Table 1** shows the results of predicted joint angles of the seven subjects, including the training time, testing time and analysis errors. The average training time of seven subjects' motion is 6.9 ms, the time for motion recognization is 2.9 ms, and the RMSE of hip joint and knee joint angle are respectively 7.55◦ and 7.26◦ , which meet the requirement of the desired trajectory generation in real-time and accuracy performance.

#### The Adjustment of Lower Limb Activity and Impedance Coefficients

The curves of the lower limb activity and impedance coefficients was computed according to formulas (15), (17), and (18) separately, as shown in **Figure 8**. The curve of lower limb activity indicated the motion state of the subject, and the tendency of the impedance coefficients B<sup>d</sup> and K<sup>d</sup> were in similar with that of the lower limb activity. For example, the value of lower limb activity decreased in the duration of 1.8∼4 s, and the value of the impedance coefficients B<sup>d</sup> and K<sup>d</sup> decreased also. Therefore, the impedance coefficients can be adjusted according to human motion activity and then can be used to correct the desired trajectory. In this paper, the initial impedance coefficients were set as B<sup>0</sup> = 20 and K<sup>0</sup> = 270, the gain coefficients were λ<sup>B</sup> = 5


and λ<sup>K</sup> = 10, and the impedance coefficients B<sup>d</sup> and K<sup>d</sup> were set in (10, 20) and (134, 235) respectively.

### The Correction of Desired Trajectory Based on Impedance Controller

In this simulation experiment, the plantar pressure was set as **F**int = 9 ∗ sin(2πf · t) + 13, where f = 1.26, and the static balance force is 10 N. As shown in **Figure 9**, the plantar pressure is less than the static balance force over 0∼2 s and the plantar pressure is greater than the static balance force over 2∼5 s. To verify the validity of the reference trajectory corrected by variable impedance controller and compare it with the constant impedance controller, the impedance coefficients were set as K = 220, B = 14, as shown in **Figure 10**. From 0.8 to 2.5 s, the subject's lower limb is in the transition state from extension to flexion, the plantar pressure is less than the static balance force, and the value of reference trajectory is less than the desired trajectory. From 3.5 to 4.5 s, the subject's lower limb is in the transition state from flexion to extension, the plantar pressure is more than the static balance force, and the value of the reference trajectory is higher than the desired trajectory. Combining **Figures 8**, **10**, we can find that from 1.5 to 2.5 s, the lower limb activity is significantly enhanced, the stiffness

coefficient is more than 220, the damping coefficient is more than 14, and the reference trajectory modified by variable impedance controller is more closer to the desired trajectory compared with that of the constant coefficients impedance control. In other words, subjects are encouraged to perform a flex movement. From 2.5 to 5 s, the subject's lower limb activity decreased and the stiffness and damping coefficients became smaller. The deviation of trajectory correction is increased, which indicates the compliance performance of the rehabilitation robot system, and provides rehabilitation assistance that matches the subject's motion ability.

#### The Reference Trajectory Tracking of Rehabilitation Robot

To verify the effectiveness of the sliding mode iterative learning control based on the variable boundary saturation function, we designed a controller to realize the terminal trajectory tracking of the lower limb rehabilitation robot and compared it with the PD iterative learning control algorithm (PDILC). In this paper, the SMILC algorithm parameters are set as c<sup>1</sup> = c<sup>2</sup> = 50, τ¯<sup>U</sup> = - 2 2<sup>T</sup> , τ¯<sup>L</sup> = - −2 −2 T , p = 1, q = 3, ε = - 0.5 0.5<sup>T</sup> , and k = 10 and the number of iterations is i =15; the PDILC algorithm parameters are respectively set as **kp** = 50 0 <sup>0</sup> <sup>50</sup><sup>T</sup> , **kp** = 50 0 0 50<sup>T</sup> , and the number of iterations is set as 15. The tracking trajectory obtained by SMILC and PDILC algorithms are shown in **Figure 11** and the tracking errors of the algorithms are shown in **Figure 12**.

As shown in **Figure 11**, with the change of the plantar pressure and impedance coefficients, the controller can adaptively correct the desired trajectory to obtain the reference trajectory, and both the SMILC and PDILC algorithms can achieve stable terminal trajectory tracking of the lower limb rehabilitation robot. However, the SMILC algorithm tracking error is kept within ±0.013 m and the convergence time is 0.33 s, while the tracking error of PDILC algorithm is ±0.025 m and its convergence time is 0.52 s, as shown in **Figure 12**, which indicate that the SMILC algorithm proposed in this paper can track the terminal trajectory with less time and smaller errors. Three abnormal jitters can be seen in the trajectory tracking process, which are related to the lower limb transition state from flexion to extension.

### The Statistical Analysis of the Trajectory Tracking Error

To further validate the feasibility and effectiveness of SMILC, we made a statistical analysis of the trajectory tracking error of PDILC and SMILC. The statistic result of tracking errors were shown in **Figure 13**. In **Figure 13A**, the statistic of tracking error of SMILC was performed, which came from the 7 subjects' lower limb training with rehabilitation robot. Each subject's tracking trajectory was repeated 10 times with SMILC. As it can be seen, all of 7 subject's tracking errors [F(6, 3) = 1.49, p = 0.191] vary up

error between PDILC and SMILC.

or down at zero and have little significant difference each other, which means that based on the proposed SMILC algorithms, the terminal trajectory tracking can be realized with little error for different subjects. In **Figure 13B**, taking subject S2 as an example, the mean and variance of the absolute value of the tracking errors were calculated separately for PDILC and SMILC. As it can be seen, there is significant difference between PDILC and SMILC [F(1, 18) = 13.71, p = 0.000], which is represented by "∗", as shown in **Figure 13B**, and the mean and variance of the absolute value of the tracking errors for PDILC are obviously bigger than that for SMILC, which means that more stable trajectory tracking is realized based on the SMILC.

### CONCLUSION

In this paper, we proposed an advanced adaptive control method, which was devised with a dual closed loop control strategy based on the sEMG and plantar pressure. The variable impedance controller was designed to obtain the reference trajectory of the rehabilitation robot, making the reference trajectory more closer to the desired trajectory of patients. And the sliding model iterative learning control was designed with the variable boundary saturation function to track the terminal trajectory of rehabilitation robot. The results showed that the proposed control strategy could adjust the reference trajectory according to the motion intention of subject and realize the trajectory tracking more effectively. The advanced adaptive control method

### REFERENCES


can improve the performance of the human-robot interaction and the robustness of the control system for lower limb rehabilitation robot. In addition, the proposed strategy could also be applied in the upper limb rehabilitation robots and others. Our future work will focus on the application of the proposed adaptive control method to the rehabilitation robot for patients.

### AUTHOR CONTRIBUTIONS

YD proposed the ideas of paper and wrote the contents. HW experimented and analyzed the experiment data of subjects. SQ and WY provided the results analysis. PX and XC provided the suggestions of paper.

### FUNDING

This work was supported by the National Natural Science Foundation of China (grant numbers 61673336) and Natural Science Foundation of Hebei, China (grant numbers F2015203372). College Science and Technology Research Project of Hebei, China (QN2016094).

### ACKNOWLEDGMENTS

Authors are grateful to the editors and all the reviewers for their comments and suggestions for the paper.


robot-assisted lower limb rehabilitation. Mechatronics 31, 132–145. doi: 10.1016/j.mechatronics.2015.04.005


Xie, P., Qiu, S., Li, X., Du, Y., Wu, X., and Guo, Z. (2016). "Adaptive trajectory planning of lower limb rehabilitation robot based on EMG and human-robot interaction Information and Automation (ICIA)," in IEEE International Conference on Information and Automation (ICIA). (Ningbo: IEEE), 1273–1277.

**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2018 Du, Wang, Qiu, Yao, Xie and Chen. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Bio-Cooperative Approach for the Human-in-the-Loop Control of an End-Effector Rehabilitation Robot

Francesco Scotto di Luzio<sup>1</sup> \*, Davide Simonetti <sup>1</sup> , Francesca Cordella<sup>1</sup> , Sandra Miccinilli <sup>2</sup> , Silvia Sterzi <sup>2</sup> , Francesco Draicchio<sup>3</sup> and Loredana Zollo<sup>1</sup>

<sup>1</sup> Research Unit of Biomedical Robotics and Biomicrosystems, Università Campus Bio-Medico di Roma, Rome, Italy, <sup>2</sup> Unit of Physical and Rehabilitation Medicine, Università Campus Bio-Medico di Roma, Rome, Italy, <sup>3</sup> INAIL, Department of Occupational & Environmental Medicine, Monte Porzio Catone, Rome, Italy

The design of patient-tailored rehabilitative protocols represents one of the crucial factors that influence motor recovery mechanisms, such as neuroplasticity. This approach, including the patient in the control loop and characterized by a control strategy adaptable to the user's requirements, is expected to significantly improve functional recovery in robot-aided rehabilitation. In this paper, a novel 3D bio-cooperative robotic platform is developed. A new arm-weight support system is included into an operational robotic platform for 3D upper limb robot-aided rehabilitation. The robotic platform is capable of adapting therapy characteristics to specific patient needs, thanks to biomechanical and physiological measurements, and thus closing the subject in the control loop. The level of arm-weight support and the level of the assistance provided by the end-effector robot are varied on the basis of muscular fatigue and biomechanical indicators. An assistance-as-needed approach is applied to provide the appropriate amount of assistance. The proposed platform has been experimentally validated on 10 healthy subjects; they performed 3D point-to-point tasks in two different conditions, i.e., with and without assistance-as-needed. The results have demonstrated the capability of the proposed system to properly adapt to real needs of the patients. Moreover, the provided assistance was shown to reduce the muscular fatigue without negatively influencing motion execution.

#### Edited by:

Dingguo Zhang, Shanghai Jiao Tong University, China

#### Reviewed by:

Rong Song, Sun Yat-sen University, China Surjo R. Soekadar, Universitätsklinikum Tübingen, Germany

#### \*Correspondence:

Francesco Scotto di Luzio f.scottodiluzio@unicampus.it

Received: 30 March 2018 Accepted: 20 September 2018 Published: 11 October 2018

#### Citation:

Scotto di Luzio F, Simonetti D, Cordella F, Miccinilli S, Sterzi S, Draicchio F and Zollo L (2018) Bio-Cooperative Approach for the Human-in-the-Loop Control of an End-Effector Rehabilitation Robot. Front. Neurorobot. 12:67. doi: 10.3389/fnbot.2018.00067 Keywords: upper limb robot-aided rehabilitation, arm-gravity support, human-in-the-loop, biocooperative control, muscle activation

## 1. INTRODUCTION

Stroke survivors are often left with severe impairments and huge limitations in arm motor abilities that may compromise many common activities.

In such a context, robot-aided neuro-rehabilitation has been globally acknowledged as an effective therapeutic approach for motor recovery after stroke, especially for the upper extremities. Rehabilitation robots are used for improving the therapy outcome and measure the improvements with objective indicators.

While, in the past, emphasis has been put mostly on planar exercises (Kwakkel et al., 2008), recently the importance of performing activities in the 3D space has been pointed out (Klamroth-Marganska et al., 2014). Thanks to rehabilitation exercises in the 3D space, the impaired subjects can regain functional abilities to perform activities of daily living (ADL).

In the field of rehabilitation robotics, bio-cooperative systems represent a novel generation of robotic platforms that promote a mutual human-robot interaction based on multimodal interfaces (Simonetti et al., 2017). Data coming from biomechanical, physiological, and psychological measurements, as well as data related to the user's intention and the environmental factors may contribute to provide a continuous feedback on patients' global conditions (Riener and Munih, 2010), and therefore to realize a personalized therapy. To provide the correct level of assistance, tuned on the patient's needs and performance, it is paramount to encourage subject's voluntary participation, promote neural plasticity, increase the potential for recovery of motor coordination, and realize a more effective training (Pehlivan et al., 2016) based on the patient's needs. A human-inthe-loop approach represents a winning strategy to try reaching this goal, being based on the inclusion of the human being in the robot control loop. This tight interaction between humans and robots is based on the adaptation of the robot behavior to the subject needs, thanks to the continuous monitoring of the patient's state and the active inclusion of the patient in the robot control loop by means of different types of feedback (i.e., visual, audio, haptic, etc.).

As a result, robotic assistance can be dynamically changed on the basis of the subject's needs measured by multisensory monitoring systems (Mihelj et al., 2007). This approach is called "assistance-as-needed." In Riener et al. (2009), biomechanical and psychophysiological measurements are used for including the human in the loop; in Guerrero et al. (2010), psychophysiological feedback is used to develop a humancentered approach method aimed to customize therapy on patient requirements and state, without affecting stress level and health. In Rodriguez-Guerrero et al. (2017), psychophysiological measurements are used for improving the challenge/skill ratio experienced by the user during the interaction with a multimodal interface in a cooperative scenario. Position error is used in Krebs et al. (2003) to measure motion accuracy and adjust the level of robot assistance accordingly.

Robot-aided rehabilitation systems often adoptelectromyographic (sEMG) signals. This type of data represents the most simple and intuitive way to trigger the support provided by the robot. EMG-based robot adaptation is adopted if the subject is able to contract the muscles, but is not able to perform a complete movement (Simonetti et al., 2017). In this case sEMG signals can be used to trigger the movement performed by the robot, to control robot movements through muscles contraction, or to vary the value of the assistance provided by the robot, as in Song et al. (2013). Other online approaches vary the level of assistance based on the obtained performance (Marchal-Crespo and Reinkensmeyer, 2009) or the application scenario (Zollo et al., 2001; Formica et al., 2005).

One of the main drawbacks of these systems is that the gravity effect due to the weight of the upper limb is often not considered.

Supporting the weight of the patient's arm is a key point in post-stroke rehabilitation, since it limits the unhealthy effects of abnormal muscular patterns (Johnson, 2006; Prange et al., 2015). In Amirabdollahian et al. (2007) it was demonstrated that a gravity compensation strategy based on sling suspension led to an improvement of arm function of stroke patients after 9 weeks of training. Therefore, the sole application of gravity compensation might be a valuable strategy to foster functional improvement in post stroke subjects.

Exoskeleton robots can provide compensation of the arm weight and apply forces to several segments of the arm to help the subject performing the desired task (Lauretti et al., 2018). The main drawbacks of these systems are the reduced adaptability to subject's different anthropometry, the passive gravity compensation, the significant amount of time needed for setting-up the device for a particular patient and therefore the complexity of the control algorithms (Maciejasz et al., 2014).

End-effector-based devices can overcome the limitations of exoskeleton robots related to anthropometry adaptability, facility in setting-up and control algorithm complexity.

The main drawback of these systems is that the provided arm support depends only on spatial limb configuration, since gravity torque is highly coupled with limb dynamics. Therefore, subjects voluntary participation and their muscular activation patterns might be affected.

Ideally, arm gravity compensation should guarantee the required assistance without altering users' physiological muscular activation patterns and their voluntary participation.

This paper aims at proposing a novel bio-cooperative platform for robot-aided 3D upper-limb rehabilitation. It is composed of an end-effector robot and an arm-weight support able to overcome the limitations pointed out in the literature. The patient is included in the control loop by continuously monitoring his/her state, extracting objective biomechanical and electromyographic indicators and, consequently, adapting the level of assistance provided by the robotic platform. In the last few years, researchers developed innovative methods to detect the level of muscular fatigue of the subject via sEMG signals (González-Izal et al., 2012). In particular, in Dimitrov et al. (2006), a simple and efficient algorithm to extract fatigue level during dynamic contractions is presented. Muscular fatigue represents an important parameter to assess patient state and adapt the level of support provided by a robotic platform in order to ensure the correct level of assistance. Therefore, user performance and muscular fatigue are taken into account to fit the level of assistance on the patient specific characteristics guaranteeing a patient-tailored therapy together with an assistance-as-needed approach.

During 3D rehabilitation with an end-effector robot, the user can assume incorrect postures during the execution of the task if he/she cannot autonomously support the arm weight, as in the case of impaired people. The introduction of the arm-weight support wants to face this issue by sustaining the patient's limb, according to his/her muscular fatigue level. The complete platform composed of the robotic arm and the armweight support is designed for achieving a two-fold purpose: to properly adapt the level of assistance to the patient's specific needs (through the end-effector robot), and to online assess the patient's muscular fatigue and avoid incorrect posture (through the arm-weight support).

A preliminary evaluation of the effects of the proposed platform on healthy subjects is performed in order to (i) give a complete picture of the subject's state and ensure his/her complete integration inside the control loop, (ii) demonstrate that the proposed platform does not negatively affect motor execution and muscular activation patterns. Therefore, muscular activity of the anti-gravity muscles and biomechanical indicators were extracted from 10 healthy subjects during the execution of state-of-the-art 3D point-to-point movements in two different conditions, i.e., with and without assistance provided by the endeffector robot and by the arm-gravity support. The execution of the task without assistance (i.e., in a condition where the healthy subject is not "constrained" by the assistance) represents the best ground truth for evaluating possible effects of the platform on the subject's motor execution and muscular activation patterns.

A comparative analysis between the two different conditions was performed by means of biomechanical and electromyographic indicators to evaluate effects on movement kinematics and muscular activation patterns. The same indicators were also used to develop a bio-cooperative control strategy in order to adjust robotic assistance on the basis of the patient's state. Furthermore, the kinematics of the arm movement is preserved in all arm-weight support conditions while, as suggested by previous studies (Prange et al., 2009), other weight compensation strategies may affect the muscular activation patterns of the upper-limb muscles used for 3D arm reaching movements.

The paper is organized as follows. In section 2 the biocooperative robotic platform, the experimental setup and protocol are presented. Experimental results are illustrated and discussed in sections 3 and 4, respectively. Finally, conclusions and future work are reported in section 5.

### 2. MATERIALS AND METHODS

The components of the proposed bio-cooperative system for robot-aided 3D upper limb rehabilitation are described in the following.

### 2.1. An Overview of the Proposed Robotic Platform

The proposed robotic platform is composed of a 7-DoFs anthropomorphic robot arm (i.e., the Kuka Light Weight Robot 4+ Bischoff et al., 2010), a purposely developed motorized armweight support system and a multimodal interface. It includes an adaptive interaction control for the on-line evaluation of patient performance. The level of assistance is modified by adaptively and dynamically adjusting stiffness and arm-gravity support.

The overall system, presented in **Figure 1**, is devised as an end-effector machine that, interacting with the patient at the endeffector, offers assistance during point-to-point movements both in 2D and 3D space, as well as in activities of daily living (ADLs). Moreover, an additional mechatronic arm-weight support system has been developed. To this purpose, an adaptive level of support is provided by compensating the gravity force acting on the arm depending on both the subject's performance and the arm configuration in the space. In **Figure 1** the arm-weight support is shown together with the whole platform that records hand

Cartesian position and provides the elbow Cartesian position to be tracked during the execution of the task. The pullies are used only for the arm-gravity support. The structure around the robot arm makes the system modular and the arm-weight support easily usable with other systems for upper-limb rehabilitation.

6 Maxon EC-max 40 motor, 7 Encoder, 8 Ergonomic backing for the arm.

The overall robotic platform is based on an adaptive strategy that allows personalizing the therapy including the human-in-the-loop, and assisting the patient as needed in performing rehabilitation treatment. For further promoting patient motivation and engagement, the selected task is reproduced and updated according to the patient behavior in a virtual reality environment (VR) developed in Matlab. VR is composed of a virtual limb that is able to move along 3D selected directions (as described in section 2.3), in order to reach the assigned targets, based on robot end-effector (i.e., subject hand) position.

During the exercise execution, the subject's wrist is attached to the robot arm end-effector that provides the subject with assistance-as-needed during the execution of a predefined trajectory. The encoders at the joint and the robot forward kinematics provide hand 3D trajectory. The robotic platform is composed of two independent modules (i.e., end-effector robot and arm-weight support) that communicate through USB and UDP protocols (**Figure 2**).

The proposed platform is able to provide the correct level of assistance thanks to the close interaction between end-effector robot and arm-weight support, as shown in **Figure 2**. More in detail, the correct level of assistance is assured through:

• the arm-weight support, by increasing or decreasing the weight of the arm felt by the subject. The level of the arm-weight support is evaluated through the level

of muscular fatigue, measured by sEMG, as described in section 2.2.1

• the robotic arm, by helping the subject to complete the required task. This level of assistance depends on biomechanical indicators, as described in section 2.2.3.

The multimodal interface is characterized by the following sources of information, suitably merged together to provide a picture of the patient condition: (i) robot sensors for determining hand pose, (ii) a magneto-inertial unit (M-IMU) for reconstructing the user upper-extremity joint motion, and (iii) electromyographic (EMG) electrodes for recording muscular activity and selecting the correct amount of arm-gravity compensation. M-IMU is positioned on subject upper arm, while EMG signals are recorded from the upper trapezius (UT, shoulder elevator), the posterior deltoid (PD, shoulder extensor), the lateral deltoid (LA, shoulder abduction), the anterior deltoid (AD, shoulder flexor), the pectoralis major (PM, arm adduction), the biceps brachii (BB, elbow flexor) and the lateral triceps (LT, elbow extensor). These muscles are chosen because they are surface muscles and their activation describes most of the upper-limb activity for a desired task. Electrodes for each muscle are placed according to SENIAM guidelines (Hermens et al., 1999).

The level of assistance (Kp) provided by the robotic arm and the time (t) given to the subject for executing the task are computed in the end-effector robot block shown in **Figure 2**. On the other hand, the amount of support to be provided to the subject elbow is computed in the arm-weight support block (**Figure 2**).

The patient biomechanical data acquired through the M-IMU, i.e. the orientation of the hand and the upper-limb acceleration, provided by robot, and the muscular signals, recorded by means of the sEMG sensors are used for (i) reconstructing the kinematics of the subject upper-limb, by means of the Augmented Inverse Kinematics (AIK) (Papaleo et al., 2015), (ii) computing performance indicators, (iii) evaluating the level of muscular fatigue. The obtained data are then used to update robot control parameters (i.e., robot stiffness and the execution time) and the amount of arm support (computed on the basis of the muscular fatigue) for accordingly shaping level of assistance and task complexity in the 3D workspace. Moreover, the elbow Cartesian position provided by the AIK is used in the control of the arm-weight support to track the subject's limb during task execution without interfering with its motion.

### 2.2. Closed-Loop Control of the Bio-Cooperative Robotic Platform 2.2.1. Evaluation of the Patient's Status

The subjects are constantly monitored during the execution of the task and their status is evaluated through the multimodal interface described in section 2.1. In particular, sEMG, M-IMU, and robot position/force data are acquired to constantly describe the subject's state and to guarantee a strong and safe human-robot interaction.

sEMG signals are used to compute Dimitrov's Spectral Fatigue Index (DI), defined as

$$DI = \frac{\int\_{f\_1}^{f\_2} f^{-1} \* PS(f) \* df}{\int\_{f\_1}^{f\_2} f^5 \* PS(f) \* df} \tag{1}$$

where PS(f) is the signal power spectrum and f<sup>1</sup> and f<sup>2</sup> are the lowest and the highest frequency of the bandwidth. The DI index is computed only during the contraction phase of each muscle. The DI index has been chosen since the literature shows that it is an effective indicator of muscular fatigue and increases with the muscular fatigue (Dimitrov et al., 2006; González-Izal et al., 2012). This parameter, normalized with respect to its maximum value, is estimated for each muscle and then weighted as follows

$$\begin{aligned} C\_m &= \frac{1}{4}(\frac{1}{4}DI\_{BB} + \frac{1}{4}DI\_{LT} + \frac{3}{4}DI\_{AD} + \frac{3}{4}DI\_{LA} + \frac{1}{2}DI\_{PD} + DI\_{PM} \\ &+ \frac{1}{2}DI\_{UT} \end{aligned} \tag{2}$$

Weights were selected through a "trial and error" approach, depending on the contributes of each muscle to the chosen 3D movement. The C<sup>m</sup> parameter continuously varies in the range [0, 1]; a threshold strategy is used to evaluate the fatigue level and correspondingly adapt the arm-gravity support level (Ls) as

$$L\_s = \begin{cases} 0 & \text{if } C\_m < 0.20, \\ 1 & \text{if } 0.20 \le C\_m < 0.40, \\ 2 & \text{if } 0.40 \le C\_m < 0.60, \\ 3 & \text{if } 0.60 \le C\_m < 0.80, \\ 4 & \text{if } 0.80 \le C\_m < 1. \end{cases} \tag{3}$$

The so-obtained L<sup>s</sup> values correspond to the following values of K (Equation 8): 0, 0.25, 0.50, 0.75, 1.

M-IMU and position/force data are acquired at 100Hz and use to reconstruct the subject's arm movement and evaluate biomechanical indicators in order to adapt robot stiffness, as described in section 2.2.3. More in detail, biomechanical indicators, used to describe subject limb movements are (Papaleo et al., 2013):

• Aiming angle (α) : angle between the desired direction tgEdir and the real direction of the task from the starting point up to peak speed point mEdir

$$\alpha = \frac{a \cos(t \vec{g\_{dir}} \ast \vec{m\_{dir}})}{(\|t \vec{g\_{dir}}\| \ast \|\vec{m\_{dir}}\|)} \tag{4}$$

• Mean − Arrest − Period − Ratio (MAPR): it represents the ratio between the number of samples (tperc) in which the joint velocity is more than 10% of the peak velocity and the whole task duration (ttot)

$$MAPR = \frac{t\_{perc}}{t\_{tot}}\tag{5}$$

• Inter −joint coordination (qcorri,j): it represents a coordination index beetween two upper-limb joint angles q<sup>i</sup> and q<sup>j</sup>

$$q\_{corri,j} = \frac{\mathcal{R}(q\_i, q\_j)}{\sqrt{\mathcal{R}\_{qi}(q\_i) \* \mathcal{R}\_{qj}(q\_j)}},\tag{6}$$

where R(q<sup>i</sup> , qj), Rqi(qi) and Rqj(qj) are covariance and autocovariance matrices


#### 2.2.2. Control of the Arm-Weight Support

In the proposed robotic platform, as shown in **Figure 2**, armweight support allows supporting subject limb based on his/her muscular fatigue. To this purpose, a proportional-derivative (PD) torque control with gravity compensation has been developed in C++ (by using Microsoft Visual Studio Community 2017 <sup>R</sup> ). The appropriate torque, to be supplied to the subject for supporting the arm in the correct position, is defined at each iteration as

$$
\pi(q) = \pi\_{PD}(q) + \pi\_{\emptyset}(q) \tag{7}
$$

where τPD(q) is the PD output torque and τ<sup>g</sup> (q) is the necessary gravitational torque. The τ<sup>g</sup> (q) is computed as

$$\text{tr}\_{\text{g}}(q) = K \text{tr}\_{\text{max}} \cos(q\_d - q) = K \text{tr}\_{\text{max}} \cos(e) \tag{8}$$

where K is a constant which ranges between [0, 1], determined according to the patient muscular fatigue (as detailed in section 2.2.1), τmax is the maximum torque needed to sustain the subject arm measured through the motor at the beginning of the task, q<sup>d</sup> and q are crankshaft desired and real position and e is crankshaft position error (q<sup>d</sup> − q), respectively. The desired position for the motor (qd) is based on elbow position and is computed as

$$q\_d(t) = \frac{g\_{ratio} \delta\_{cable}(t)}{\pi \,\sigma \,d} \tag{9}$$

where gratio is the gear ratio of the motor, σ is the encoder dimensionless resolution and d is the diameter of the driven pulley linked to motor. In our case, <sup>g</sup>ratio <sup>=</sup> 74, <sup>σ</sup> <sup>=</sup> <sup>5</sup> <sup>∗</sup> <sup>10</sup>−<sup>4</sup> and d = 0.14m. Let us define the difference between the new cable length and the reference position as

$$
\delta\_{cable}(t) = \vec{e}(t) - \vec{p}.\tag{10}
$$

where Ee(t) is the 3D elbow position provided by AIK and pE is the 3D pulley position in the robot frame. The AIK algorithm is applied to the hand position provided by the robot sensors and to the M-IMU data in order to solve human arm redundancy and compute upper limb joint angles. In particular, the reconstructed elbow position permits to decide if the cable needs to be reeled in or else unrolled according to the patient limb configuration. In brief, the elbow joint Cartesian coordinates are reconstructed as

$$
\vec{e} = \begin{bmatrix} l\_{\mu} \sin q\_1 \cos q\_2 \\ -l\_{\mu} \cos q\_1 \cos q\_2 \\ -l\_{\mu} \sin q\_2 \end{bmatrix} \tag{11}
$$

Frontiers in Neurorobotics | www.frontiersin.org

where l<sup>u</sup> is the upper-arm length, q<sup>1</sup> and q<sup>2</sup> are the reconstructed shoulder flexion-extension and intra-extra rotation angles. The M-IMU positioned on the subject upper-arm allows determining the y elbow component as

$$e\_{\mathcal{V}} = \frac{-\ddot{a}\_{\mathcal{V}} l\_{\mu}}{g} = -l\_{\mu} \cos q\_1 \cos q\_2 \tag{12}$$

where g is the gravity acceleration and a¨<sup>y</sup> is the acceleration component along y-axis read by M-IMU sensor.

#### 2.2.3. Control of the End-Effector Robot

As described in section 2.1, the subject wrist is attached to the robot arm end-effector that provides the user with assistanceas-needed during the execution of a predefined task. The endeffector robot performs a minimum-jerk trajectory with different task durations t (i.e., 5, 7.5, 10s), defined as follows

$$s = \|p\_f - p\_i\| \left[10(\frac{t\_j}{t})^3 - 15(\frac{t\_j}{t})^4 + 6(\frac{t\_j}{t})^5\right] \tag{13}$$

where p<sup>i</sup> is the initial position, p<sup>f</sup> is the final position, t<sup>j</sup> is the current time value and t is the task duration tuned according to Equations (21, 22, and 23). The robot is controlled with an impedance control with a variable stiffness K<sup>r</sup> in order to provide three levels of assistance, that correspond to three values of stiffness K<sup>r</sup> (i.e., 0.1, 300, 1,000 N/m), and it is able to change task duration (Papaleo et al., 2013), according to

$$
\vec{x}\_{cmd} = J^T(F\vec{T}\_c) + \vec{f}\_{dynamic} \tag{14}
$$

where τEcmd is the vector of the command torque, J T is the transposed Jacobian matrix, FT<sup>E</sup> c is the vector of Cartesian force, along axes <sup>x</sup>, <sup>y</sup>, <sup>z</sup>, and torques, about axes <sup>z</sup>, <sup>y</sup>, <sup>x</sup>, (i.e. FT<sup>E</sup> <sup>c</sup> = [Fc,<sup>x</sup> <sup>F</sup>c,<sup>y</sup> <sup>F</sup>c,<sup>z</sup> <sup>T</sup>c,<sup>z</sup> <sup>T</sup>c,<sup>y</sup> <sup>T</sup>c,x]) while <sup>E</sup>fdynamics is the dynamic model of the robotic arm. FT<sup>E</sup> c is computed as

$$F\_{\mathbf{c},\mathbf{x}} = \begin{cases} -k(\mathbf{x} - \mathbf{x}\_{m,j}) - d\dot{\mathbf{x}}, & \mathbf{x} < \mathbf{x}\_{m,j} \\ 0, & \mathbf{x}\_{m,j} \le \mathbf{x} < \mathbf{x}\_f \text{ and } \mathbf{x} \ge \mathbf{x}\_{\text{prev}} \\ -k(\mathbf{x} - \mathbf{x}\_{\text{prev}}) - d\dot{\mathbf{x}}, & \mathbf{x}\_{m,j} \le \mathbf{x} < \mathbf{x}\_f \text{ and } \mathbf{x} < \mathbf{x}\_{\text{prev}} \\ -k(\mathbf{x} - \mathbf{x}\_f) - d\dot{\mathbf{x}}, & \mathbf{x} > \mathbf{x}\_f \end{cases} \tag{15}$$

$$F\_{\mathbf{c},\mathbf{y}} = \begin{cases} -k(\mathbf{y} - \mathbf{y}\_{m,j}) - d\mathbf{y}, & \mathbf{y} < \mathbf{y}\_{m,j} \\ \mathbf{0}, & \mathbf{y}\_{m,j} \le \mathbf{y} < \mathbf{y}\_f \text{ and } \mathbf{y} \ge \mathbf{y}\_{\text{prev}} \\ -k(\mathbf{y} - \mathbf{y}\_{\text{prev}}) - d\mathbf{y}, & \mathbf{y}\_{m,j} \le \mathbf{y} < \mathbf{y}\_f \text{ and } \mathbf{y} < \mathbf{y}\_{\text{prev}} \\ -k(\mathbf{y} - \mathbf{y}\_f) - d\mathbf{y}, & \mathbf{y} > \mathbf{y}\_f \end{cases} \tag{16}$$

$$F\_{c,z} = \begin{cases} -k(z - z\_{m,j}) - d\dot{z}, & z < z\_{m,j} \\ 0, & z\_{m,j} \le z < z\_f \text{ and } z \ge z\_{\text{prev}} \\ -k(z - z\_{\text{prev}}) - d\dot{z}, & z\_{m,j} \le z < z\_f \text{ and } z < z\_{\text{prev}} \\ -k(z - z\_f) - d\dot{z}, & z > z\_f \end{cases} \\ \text{(17)}$$

$$T\_{c,z} = -k\_0(\varphi - \varphi\_m) - d\dot{\varphi} \tag{18}$$

$$T\_{\varepsilon,\mathbf{y}} = -k\_0(\theta - \theta\_m) - d\dot{\theta} \tag{19}$$

$$T\_{\mathfrak{c},\mathfrak{x}} = -k\_0(\psi - \psi\_m) - d\dot{\psi} \tag{20}$$

where K<sup>r</sup> is the robot stiffness, xm,<sup>j</sup> , ym,<sup>j</sup> , and zm,<sup>j</sup> are the desired positions, computed as reported in Equation (13), xprev, yprev and zprev are the previous positions at time t<sup>j</sup> , k<sup>0</sup> is the Cartesian stiffness for the orientation, d is the controller Cartesian damping (constant), ϕ, θ and ψ are the RPY (Roll-Pitch-Yaw) angles representing the orientation of the end-effector.

The robot stiffness K<sup>r</sup> and the task duration t are modified according to a threshold strategy based on two parameters, Ckr and C<sup>t</sup> , evaluated on the basis of the previously described biomechanical indicators as

$$C\_{k\_r} = \frac{1}{2}\alpha + \frac{1}{8}q\_{corr1,4} + \frac{1}{8}q\_{corr2,4} + \frac{1}{8}UMF + \frac{1}{8}UPF \tag{21}$$

$$C\_t = \frac{1}{2}MAPR + \frac{1}{8}q\_{corr1,4} + \frac{1}{8}q\_{corr2,4} + \frac{1}{8}UMF + \frac{1}{8}UPF \quad \text{(22)}$$

The correct level of assistance provided by the robot is estimated as

$$L\_i = \begin{cases} 1, & \text{if } 0 \le C\_i < 0.5 \\ 2, & \text{if } 0.5 \le C\_i < 0.75 \\ 3, & \text{if } 0.75 \le C\_i < 1 \end{cases} \tag{23}$$

where i = K<sup>r</sup> , t. Values of L<sup>i</sup> and L<sup>t</sup> are used to select the corresponding robot stiffness (0.1, 300, 1,000 N/m) and task duration (5, 7.5, 10s).

### 2.3. Experimental Setup and Protocol

The proposed robotic platform, shown in **Figure 3**, is composed of the anthropomorphic robotic arm and the actuated armweight support. The robotic arm is the Kuka Light Weight Robot 4+. It is characterized by 7 active Degrees of Freedom (DoFs) and embeds position and torque sensors at joints. The communication between the robot and a remote PC is guaranteed by the Fast Research Interface (FRI) Library. The arm-weight support actuation system is composed of: EC-max 40 brushless Maxon Motor, planetary gearhead Maxon GP 42-C 74:1, Maxon HEDL-5540 encoder and Maxon EPOS2 50/5 control unit. An aluminum pulley, for enveloping the steel rope, (diameter d = 0.14 m) is built-in with motor shaft. Finally, an ergonomic brace for arm support enables to set the correct fitting depending on patients requirements.

Subject upper limb kinematics is reconstructed by means of a Xsens MTw M-IMU sensor.The M-IMU and robot sensors data are acquired at 100 Hz and sent to the AIK algorithm via UDP communication.

sEMG data are collected at 1 kHz, digitized and then filtered by using a sixth-order Butterworth bandpass filter with cutoff frequencies (30,450) Hz and a second-order Butterworth notch filter (50 Hz) to remove noise from power lines. The filtered sEMG signal is normalized with respect to the Maximum Voluntary Contraction (MVC).

FIGURE 3 | The proposed 3D bio-cooperative robotic platform. (A) Detail of M-IMU and sEMG sensors used with arm-weight support; (B) Arm-weight support with the whole platform: subject interacts with robotic arm and arm-gravity support.

Ten right-handed healthy subjects (mean age: 27.9 ± 2.0) have been recruited to participate in this study. All the subjects were able to lift their right arm against gravity, and presented no musculoskeletal or neurological disorders. They provided written informed consent prior to participating in this study. Each subject seated on a chair in front of a screen projecting the virtual reality, as shown in **Figure 3**. The sensors embedded in the robot arm reconstruct the subject hand position which is used to move the subject hand avatar reproduced in the virtual reality. The virtual reality reproduces the task to be performed and gives the user a continuous feedback on him/her motion performance (in terms of error between the avatar position and the target).

The proposed bio-cooperative system for 3D upper limb rehabilitation allows performing the tasks in two different conditions: (1) without assistance provided by the end-effector robot and by the arm-gravity support and (2) with assistanceas-needed. In condition (2), the level of assistance is tuned on the subject muscular fatigue and on the biomechanical indicators computed during the trials executed without assistance.

The subjects were asked to perform two consecutive sessions in the two conditions. Condition 1 was always executed before condition 2 in order to evaluate all the indicators introduced in section 2.2.1 and correspondingly adapt the robot arm and the arm-weight support behavior. Before each rehabilitation session, an evaluation session is envisaged. When the approach will be tested on patients with severe upper-limb disabilities who are not able to perform the evaluation session without assistance, the computed parameters will suggest to provide the maximum level of assistance.

Each session was composed of two phases of 56 point-to-point movements. Each movement consisted in reaching a target on the screen and then return to the starting point. Targets were placed in 8 different positions, spaced π/4 rad apart from North to North-West direction. The transition from one target to another is performed either when the maximum value of the execution time t (established by Equation 22) is reached or when position error between the target and the end-effector position is less than a predefined threshold.

During the whole task execution, data from M-IMU, robot sensors and sEMG activities of 7 shoulder and upper-arm muscles were collected.

### 2.4. Statistical Analysis

A statistical analysis based on the Wilcoxon paired-sample test was performed for the comparative analysis between the two considered operative conditions (i.e., with and without assistance-as-needed), after verifying that the data did not belong to a Gaussian distribution. In particular, the statistical analysis was performed for comparing (i) the time taken by the subjects for accomplishing the task, (ii) the biomechanical indicators, and (iii) the muscular fatigue in the two conditions. The significance was achieved for p < 0.05.

### 3. EXPERIMENTAL RESULTS

Each of the ten healthy subjects involved in this study performed the assigned task in the two different conditions previously described.

The time needed by the subjects for accomplishing the task is reported in the box plots in **Figure 4** for both conditions 1 and 2. The subjects performed the assigned task without assistanceas-needed in (283 ± 28)s and with assistance-as-needed in (290 ± 40)s (average times). It was verified that the use of the support does not significantly alter the execution time of the assigned task (Wilcoxon test, p = 0.08).

Robot sensors provided position and force data for customizing the exercises on the basis of subject motor performance. As expected, the computation of the biomechanical indicators for the involved healthy subjects did not show a significant change between the first and the second condition, since they were able to perform the task without any assistance. This is confirmed by the values of the robot stiffness K<sup>r</sup> and task duration t. The corresponding level of assistance in terms of robot stiffness (i.e., Lkr) and time to accomplish the task (i.e., Lt) are shown in **Figure 5**. Indeed, it was demonstrated that, with the proposed system, the biomechanical indicators do not show

significant variations due to the introduction of the support (p = 0.28 with Wilcoxon test for all the biomechanical indicators evaluated with and without arm-weight support). In **Figures 6**, **7**, the mean EMG activity and its standard deviation, computed on 10 subjects, are reported in both operative conditions, i.e., with and without assistance-as-needed. The 7 sEMG values range between [0, 1] since each of them is normalized with respect to the corresponding MVC. Note that apparently there are not appreciable changes between BB and LT signals, but this is due to normalization with respect to their MVC, so they were activated but their variations are not perceivable.

The EMG signals are used to estimate the level of physical fatigue of the subjects. The corresponding level of arm-weight support is shown in **Figure 8** without and with assistance-asneeded. These results show an increase in muscular fatigue emerged during the execution of the task without assistanceas-needed for all the subjects, confirmed by the statistically significant difference in the decrease of fatigue between the two conditions (p = 0.03 with Wilcoxon test). The level of support to be applied in Condition 2 is selected on the basis of the fatigue level evaluated during Condition 1, as reported in section 2.2.1. In this way, the support assistance level can be adapted to patient fatigue performance allowing to follow subject arm movements as reported in section 2.2.2.

Results about desired crankshaft position (qd) and desired torque (τd) are shown in **Figure 9** for a sample subject. During the task execution in Condition 1, the level of assistance to be given to the arm support for this subject has been estimated to be equal to the 50% of the τ<sup>d</sup> necessary to completely support subject arm (i.e., τmax = 35mNm, as evaluated at the beginning of the experimental session).

### 4. DISCUSSIONS

The movements of the robot end-effector (i.e., subject hand) and elbow position, reconstructed by AIK algorithm, demonstrate that the proposed approach allows executing 3D tasks without interfering with the natural motion pattern and therefore not negatively affecting the motion execution. This is demonstrated by the results of the statistical analysis performed on the biomechanical indicators: their values do not change in a statistically significant manner between the two operative conditions (p = 0.45). The reason is that the subjects did not present musculoskeletal or neurological disorders and therefore they were able to perform the assigned task without any assistance. For the same reason, the mean values of task duration confirmed that there is not statistically significant difference between the time obtained without assistance-as-needed and with assistance-as-needed. In fact, the p = 0.34 obtained with the Wilcoxon test.

The use of the arm-weight support reduces muscular activity, as evident from **Figures 6**, **7**, also confirmed through Wilcoxon test applied for each muscle (p = 0.03). The subjects referred to perceive a reduced muscular fatigue after the introduction of the arm-weight support. This finding could certainly have a huge impact on neuro-rehabilitation. In fact, a reduced muscular fatigue could lead to an increase in therapy session duration and a decrease in wrong arm configurations that may result for compensating for the fatigue of some muscles.

As shown in **Figure 9**, the control algorithm for arm-weight support allows following subject arm movements and produces a desired torque τd, with a profile similar to qd, that is able to both compensate gravity component of the arm and move his/her limb in the 3D space. The proposed strategy, differently from the state-of-the-art, takes into account the relationship between gravity torque of the limb, its dynamics and its dependence on the postures and positions of moving limbs. This platform offers the main advantage, with respect to other platforms in the literature, to provide an adaptable level of both robotic assistance and armweight support, thanks to the online computed biomechanical indicators and muscular fatigue, with an expected significant

FIGURE 6 | Mean sEMG activity (normalized) and standard deviation during the execution of task without assistance-as-needed.

impact on the personalization and optimization of the treatment. Future studies will be conducted to rigorously assess pros and cons of the proposed platform on patient treatment.

The support level applied on the subject arm by the armweight support was varied in accordance to the fatigue level estimated for each subject on the basis of Equation (3).

From these results, it is clear that the proposed biocooperative robotic platform is based on a closed-loop control that includes the subject, with the aim of executing 3D point-to-point movements adapting to the state of the subject from both biomechanical and muscular fatigue point of views.

### 5. CONCLUSIONS

In this paper, a novel 3D bio-cooperative robotic platform based on subject status was presented. Aim of the research was defining and implementing a robot-aided neuro-rehabilitation strategy which includes the patient in the control loop by providing him/her the correct amount of assistance on the

basis of biomechanical performance and muscular fatigue indicators. In particular, the interaction between the subject and the proposed platform was constantly monitored to extract biomechanical and muscular indicators and consequently modify the level of assistance and the difficulty of the exercise, in order to demonstrate that the proposed platform does not negatively affect motor execution of the task and muscular activation patterns. The platform was tested on 10 healthy subjects performing a 3D point-to-point movements with and without assistance-as-needed. The obtained results demonstrated that the proposed system reduces the muscular fatigue without negatively influencing correct motor patterns.

Future work will be devoted to extend the study to a higher number of tasks, to test the proposed robotic platform on poststroke patients, with an ad hoc experimental protocol, to establish the effects on patients with motor disabilities.

### ETHICS STATEMENT

The experimental protocol was approved by the local Ethical committee (Comitato Etico Università Campus Biomedico di Roma, reference number: 01/17 PAR ComEt CBM), by the Italian Ministry of Health (Registro-classif. DGDMF/I.5.i.m.2/2016/1096) and complied with the Declaration of Helsinki. All subjects gave written informed consent in accordance with the Declaration of Helsinki.

### AUTHOR CONTRIBUTIONS

FS analyzed the literature, designed the paper, designed and developed the proposed robotic platform, analyzed the experimental data and wrote the manuscript. DS analyzed the literature, organized the experimental sessions, designed mechanical component of the proposed platform, acquired the data and contributed to the manuscript writing. FC analyzed the literature, contributed to design the control algorithm of the proposed platform, analyzed the experimental data and contributed to the manuscript writing. SM and FD contributed to the design of the experiments and discussed

### REFERENCES


the results. SS discussed the results and supervised the study. LZ contributed to the design of proposed platform, discussed the results, wrote the paper and supervised the study. All the authors read and approved the final version of the manuscript.

### FUNDING

This work was supported partly by the Italian Institute for Labour Accidents (INAIL) with the RehabRobo@work (CUP: C82F17000040001), PCR 1/2 (CUP: E57B16000160005) and PPR AS 1/3 (CUP: E57B16000160005) projects and partly by the European Project H2020/AIDE: Adaptive Multimodal Interfaces to Assist Disabled People in Daily Activities (CUP:J42I15000030006).

Neurorehabil. Neural Repair 22, 111–121. doi: 10.1177/15459683073 05457


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2018 Scotto di Luzio, Simonetti, Cordella, Miccinilli, Sterzi, Draicchio and Zollo. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# An Adaptive and Hybrid End-Point/Joint Impedance Controller for Lower Limb Exoskeletons

#### Serena Maggioni 1,2, Nils Reinert <sup>2</sup> , Lars Lünenburger <sup>2</sup> and Alejandro Melendez-Calderon1,3,4 \*

<sup>1</sup> Department of Health Science and Technology, ETH Zürich, Zurich, Switzerland, <sup>2</sup> Hocoma AG, Volketswil, Switzerland, <sup>3</sup> Cereneo Advanced Rehabilitation Institute (CARINg), Vitznau, Switzerland, <sup>4</sup> Department of Physical Medicine and Rehabilitation, Northwestern University, Chicago, IL, United States

#### Edited by:

Dingguo Zhang, Shanghai Jiao Tong University, China

#### Reviewed by:

Ping Xie, Yanshan University, China Naser Mehrabi, University of Washington, United States Kai Gui, Shanghai Jiao Tong University, China

#### \*Correspondence:

Alejandro Melendez-Calderon alejandro.melendez@ cereneo.foundation

#### Specialty section:

This article was submitted to Biomedical Robotics, a section of the journal Frontiers in Robotics and AI

Received: 29 March 2018 Accepted: 20 August 2018 Published: 22 October 2018

#### Citation:

Maggioni S, Reinert N, Lünenburger L and Melendez-Calderon A (2018) An Adaptive and Hybrid End-Point/Joint Impedance Controller for Lower Limb Exoskeletons. Front. Robot. AI 5:104. doi: 10.3389/frobt.2018.00104

Assist-as-needed (AAN) algorithms for the control of lower extremity rehabilitation robots can promote active participation of patients during training while adapting to their individual performances and impairments. The implementation of such controllers requires the adaptation of a control parameter (often the robot impedance) based on a performance (or error) metric. The choice of how an adaptive impedance controller is formulated implies different challenges and possibilities for controlling the patient's leg movement. In this paper, we analyze the characteristics and limitations of controllers defined in two commonly used formulations: joint and end-point space, exploring especially the implementation of an AAN algorithm. We propose then, as a proof-of-concept, an AAN impedance controller that combines the strengths of working in both spaces: a hybrid joint/end-point impedance controller. This approach gives the possibility to adapt the end-point stiffness in magnitude and direction in order to provide a support that targets the kinematic deviations of the end-point with the appropriate force vector. This controller was implemented on a two-link rehabilitation robot for gait training—the Lokomat®Pro V5 (Hocoma AG, Switzerland) and tested on 5 able-bodied subjects and 1 subject with Spinal Cord Injury. Our experiments show that the hybrid controller is a feasible approach for exoskeleton devices and that it could exploit the benefits of the end-point controller in shaping a desired end-point stiffness and those of the joint controller to promote the correct angular changes in the trajectories of the joints. The adaptation algorithm is able to adapt the end-point stiffness based on the subject's performance in different gait phases, i.e., the robot can render a higher stiffness selectively in the direction and gait phases where the subjects perform with larger kinematic errors. The proposed approach can potentially be generalized to other robotic applications for rehabilitation or assistive purposes.

Keywords: assist-as-needed, impedance, gait trainer, exoskeleton, stiffness, rehabilitation, Lokomat, adaptive control

## INTRODUCTION

Exoskeletons for gait rehabilitation or walking assistance in subjects with neurological injuries have flourished in the last decades (Esquenazi et al., 2017). These devices seek to control the leg segments of the user and try to restore a gait pattern that is both physiological (i.e., following kinematic characteristics observed in non-impaired individuals) and safe. An effective rehabilitation device should not only control the movements of a patient's legs, but should also challenge the patient and promote his active participation (Lotze et al., 2003; Hogan et al., 2006). One way to achieve the latter is through adaptation of the robotic support based on the user's capabilities (Cai et al., 2006; Marchal-Crespo and Reinkensmeyer, 2009). This concept is known as Assist-As-Needed (AAN) (Emken et al., 2005).

The simplest and most common method of modifying the level of robotic support is with impedance controllers, where impedance is defined as any dynamic operator that outputs a force (or a torque) from a kinematic input (e.g., displacement, velocity) (Hogan, 1985). In most available exoskeletons, the impedance parameters must be manually adapted by therapists based on their experience [e.g., Lokomat <sup>R</sup> (Hocoma AG, Switzerland) (Colombo et al., 2000), LOPES II (University of Twente, The Netherlands) (Meuleman et al., 2016), HAL (Cyberdyne Inc., Japan) (Nilsson et al., 2014)]. New controllers that automatically adapt the impedance of the joints based on the user's performance have been proposed (Emken et al., 2008; Koopman et al., 2013; Maggioni et al., 2015), but not extensively implemented due to safety requirements. The use of adaptive control algorithms increases the compliance of the exoskeleton to the user's movements. Too much compliance (i.e., a low mechanical impedance) can lead to unsafe conditions because support may not be provided against users' errors, which may lead to tripping and injuries. The challenge in implementing adaptive controllers in lower limb exoskeletons is to find an appropriate tradeoff between compliance (i.e., freedom of movement) and safety.

The choice of how an adaptive impedance controller is formulated inevitably determines how complex it is to address any potential hazard situation arising from reduced impedance. Here, we analyze the characteristics and limitations of controllers defined in two commonly used formulations: joint and endpoint space, exploring especially the implementation of AAN controllers in these two spaces. A comparative analysis of these two approaches has been reported for industrial manipulators (Smith et al., 2014, 2015) but, to the best of our knowledge, such comparison has not been extensively examined within the context of rehabilitation robotics and even less in lower limb applications.

After analyzing the properties of these two approaches for the control of lower limb exoskeletons, we propose an AAN impedance controller that combines the strengths of working in both spaces: a hybrid joint/end-point impedance controller. This controller gives the possibility to adapt the end-point stiffness in magnitude and direction, to provide a support that targets end-point deviations with the appropriate force vector. This controller was implemented and tested on a two-link rehabilitation robot for gait training with actuated hip and knee joints—the Lokomat <sup>R</sup> Pro v5 (Hocoma AG, Switzerland). We present the proof-of-concept for this hybrid controller based on simulations and tests conducted with five able-bodied subjects and one subject with walking impairment due to a complete spinal cord injury. The proposed approach can potentially be generalized to other robotic applications for rehabilitation or assistive purposes.

### JOINT VS. END-POINT SPACE FORMULATIONS

### Background Concepts

In this paper we analyze the implications of using joint or end-point space formulation for the control of lower limb exoskeletons. We model these systems as two-segment exoskeletons with a shank and a thigh segment. In the swing phase, the system can be modeled as a two-segment pendulum: the upper segment is fixed to the hip center of rotation (CoR) and the end-point corresponds to the ankle position (Kuo and Donelan, 2010). In the stance phase, the model is an inverse two-segment pendulum: after heel contact, the foot can only be moved backwards by the treadmill, hence the end-point of the kinematic chain is the hip CoR and not the ankle joint (**Figure 1**).

To analyze the impedance properties of the joint and end-point control approaches for the control of lower limb exoskeletons, we present the impact that these two formulations have on the end-point stiffness. The stiffness can be visualized as an ellipse, whose major axis indicates the direction of maximum stiffness (Mussa-Ivaldi et al., 1985; Shadmehr, 1993). The stiffness ellipse captures the geometrical features of the force field around a reference position of the end-point. In the force field representation, we can visualize the direction and magnitude of the restoring forces for displacements around the reference trajectory. For further details on the calculation of stiffness ellipses and force field, see the **Appendix 1**.

### Impedance Control Based on Joint Space Formulation

In most exoskeleton devices, the actuators control the flexion and extension of the robotic joints, which roughly align to the human joints. Therefore, it is common to implement impedance controllers that compute the actuators' torques in order to follow reference trajectories defined in joint space (e.g., hip and knee angles). Furthermore, instrumented gait analysis increased our familiarity with angular kinematics and kinetics of the human joints.

A joint controller can be applied both in the stance and the swing phase of gait, because the actual joint trajectory **q**act and the reference trajectory **q**ref are defined continuously

**Abbreviations:** AAN, Assist-as-needed; PD, Proportional Derivative control; CNS, Central Nervous System; CoR, Center of Rotation; ROM, Range of Motion; SCI, Spinal Cord Injury.

during the whole gait cycle and do not depend on the kinematic configuration (e.g., open chain in swing phase or closed chain in double-support phase). A joint space formulation avoids problems that might arise from inverse kinematics/dynamics calculations, especially in kinematic configurations (specific combinations of hip/knee angles) where the Jacobian matrix is singular.

For a two-link exoskeleton robot, the joint reference trajectory can be expressed as **q**ref = - qhip, qknee , while **q**act refers to the measured angles while the subject is walking. The torques τ**<sup>q</sup>** to control the robotic actuators are provided by a motion controller with stiffness **K**<sup>q</sup> = - Khip 0; 0 Kknee and damping **B**<sup>q</sup> = - Bhip 0; 0 Bknee (Equation 1).

$$\mathbf{r}\_q = \mathbf{K}\_q \left( \mathbf{q}\_{ref} - \mathbf{q}\_{act} \right) + \mathbf{B}\_q (\dot{\mathbf{q}}\_{ref} - \dot{\mathbf{q}}\_{act}) \tag{1}$$

Generally, in addition to the control torques τ**q**, robotic exoskeletons have a separate component τcomp, which compensates the inherent robot dynamics such as gravity, friction or inertia (e.g., Riener et al., 2005; Vallery et al., 2009).

#### Selection of Joint Reference Trajectories

Joint reference trajectories **q**ref can be taken from literature (e.g., Winter, 1991; Perry, 1992; Stoquart et al., 2008), or from recordings of able-bodied subjects walking "freely" (i.e., in "transparent mode," where only τcomp, but not τq, is applied) in the same device to be controlled (Colombo et al., 2000). When determining **q**ref , attention must be paid to avoid unwanted contact between the end-point (e.g., the heel or the tip of the foot) and the ground. For example, a small angular deviation at the knee joint may result in a considerable change in foot clearance (Winter, 1992).

One challenge in joint space formulation comes with the high inter-subject variability in angular patterns, which makes it difficult to define joint reference trajectories that fit all subjects. In some exoskeletons, **q**ref can be changed manually by the user within some limits (Riener et al., 2010; Meuleman et al., 2016). However, it is difficult to predict whether the subject will have adequate foot clearance and step length, since these also depend on the length of the thigh and shank segments.

Another challenge comes in applications where the users are required to perform a task following visual feedback, e.g., to follow a reference trajectory displayed on the screen. Simultaneous feedback from two or more joint space variables (e.g., hip and knee) is usually quite complex to process (Maggioni et al., 2015).

#### Impact of Joint Space Formulation on End-Point Stiffness

Potential hazards during walking can come from unwanted interactions between the foot and the floor (or treadmill). Therefore, we examined the forces at the ankle level that may result in such unwanted interactions. These forces were generated by a controller defined in joint space, given foot displacements of different amplitude and directions throughout the swing phase. We obtained the resulting end-point forces (force field) using the Jacobian matrix of the two-links robot (see **Appendix 1.2**).

In **Figure 2,** we show the force field for different points during the pre-swing and swing phase. In this case, hip and knee stiffness are constant throughout the gait cycle, but the resulting endpoint stiffness varies depending on the angular configuration of the joints. The magnitude and direction of joint torques and endpoint forces applied by a joint controller on a real trajectory are presented in **Figure 3**. Two main requirements for functional walking are adequate foot clearance and foot placement at the end of swing (Gage, 1991; Baker, 2013). Therefore, we examined these two phases in detail. As the reader can appreciate, the restoring forces around the foot are not always directed toward the reference trajectory (note that the reference trajectory is defined in joint space, but it is transformed to end-point space for visualization purposes). Consider the situation where a subject is not able to sufficiently lift the foot from the ground at the beginning of swing phase: as we can see in **Figures 2A**, **3B**, the joint controller is able to provide forces that are directed toward an adequate foot clearance position. On the other hand, if the subject is not able to perform a sufficiently long step (e.g., due to insufficient hip flexion or reduced knee extension at the end of swing), or if his foot is lagging behind the reference position, the actual position of his ankle can fall in an area where the forces rendered by the controller direct the foot toward the ground, instead of lifting it to guarantee a sufficient step length. It is interesting to compare how the same controller acts in the two different spaces; we can obtain insights that are not possible by studying the joint torques and end-point forces in isolation.

always point toward the reference position. Two critical points are magnified. (A) Point of maximum foot clearance: the vectors show that enough support is guaranteed if the ankle is below the reference trajectory. The ellipse in black represents the end-point stiffness resulting from the joint stiffness. (B) At the end of the swing phase, if the subject is late with respect to the reference point, it can experience forces directed downwards instead of forward.

### Impedance Control Based on End-Point Space Formulation

An alternative option to a joint space formulation is an end-point space formulation (sometimes referred to as task space formulation), in which the reference trajectory is defined according to an anatomical landmark around an end-point. In walking, the definition of end-point depends on the kinematic configuration, e.g., lateral malleolus or foot metatarsal during swing phase; or trochanter during stance phase, as the foot is already placed on the ground. Thus, formulating the problem in end-point space for lower limb exoskeletons may require two different control approaches: one for stance and another one for swing. While the implementation of this approach may be cumbersome in practice, a controller during swing that relies on an end-point space formulation may provide additional benefits compared to a joint space approach. In this paper, we are interested in studying the control of the end-point impedance only in the swing phase of gait.

In an end-point space formulation, the torque applied to the exoskeleton actuators is derived from an end-point force **F**<sup>x</sup> (Equation 2). This force depends on a set of stiffness, **K<sup>x</sup>** = [Kxx, Kxy; Kyx, Kyy], and damping, **B<sup>x</sup>** = [Bxx, Bxy; Byx, Byy], parameters and a kinematic error between a measured end-point trajectory, **x**act = - xact yact , and a reference trajectory, **x**ref = - xref yref . Note that xref and xact can be calculated in real-time by using forward kinematic equations that depend on the measured joint angles qref and qact and known limb segment lengths of the user (see **Appendix 1.1**). The accuracy of this calculation, however, depends on the correct measurement of the segments' lengths and on the alignment between the robotic joints and the human joints.

$$\mathbf{F}\_{\mathbf{x}} = \mathbf{K}\_{\mathbf{x}} \left( \mathbf{x}\_{ref} - \mathbf{x}\_{act} \right) \\ \quad + \ \mathbf{B}\_{\mathbf{x}} \left( \dot{\mathbf{x}}\_{ref} - \dot{\mathbf{x}}\_{act} \right) \tag{2}$$

Using the Jacobian matrix **J q**act , we obtain through inverse dynamics the torque that the joint actuators need to render the force **F**x:

$$\mathbf{r}\_{\mathbf{x}} = \mathbf{J} \begin{bmatrix} \mathbf{q}\_{act} \end{bmatrix}^{T} \mathbf{F}\_{\mathbf{x}} \tag{3}$$

#### Selection of End-Point Reference Trajectories

In contrast to joint reference trajectories, end-point trajectories are not widely available in the literature. One could take joint reference trajectories and apply forward kinematics, or obtain such trajectories experimentally. Another approach is to take a few features that ensure that the position of the foot guarantees a safe interaction with the environment, e.g., foot clearance and step length. These features can be easily visualized and adapted in end-point space. The manual adaptation of **x**ref is more intuitive for therapists if they reason in end-point space (Emken et al., 2008) and focus on specific gait subtasks (Meuleman et al., 2016), rather than setting hip and knee angular reference trajectories simultaneously.

The subject can be provided with visual feedback regarding the position of his foot and asked to control its trajectory, in a similar way he is required to do in real environments—e.g., by lifting a foot over an obstacle. In gait trainer device literature, similar approaches have been followed when the focus was on ankle height to guarantee appropriate foot clearance in stiffknee gait (Koopman et al., 2013). Additionally, visual feedback containing information about the end-point is much easier to process (Banala et al., 2009; Koopman et al., 2013; Krishnan et al., 2013) for subjects, whereas it is extremely difficult to adapt behavior based on feedback about hip and knee movements (Maggioni et al., 2015).

### Impact of End-Point Space Formulation on End-Point Stiffness

Similar to section Impact of Joint Space Formulation on End-Point Stiffness, we would like now to examine the forces acting at the level of the foot when end-point control is used. By design (Equation 2), at each point of the swing phase, the restoring force for every deviation in Cartesian space is directed toward the reference end-point position (**Figure 4**), which is the point that could have potentially critical collisions with the environment (e.g., stumbling). The axes of the stiffness ellipse can be modified in magnitude and direction as desired. For example, a higher stiffness in the direction of gravity can be designed. However, singularities exist which prevent the end-point controller from generating joint torques in correspondence of those points (i.e., when the knee is completely extended at the end of swing).

Now consider the end-point forces generated when an endpoint controller is used with a real trajectory. The force field set as shown in **Figure 4** leads, in the case of the real trajectory presented in **Figure 5B**, to forces directed toward the reference trajectory in end-point space. **Figure 5A** shows the same forces transformed to torques Equation (3). As visible in the graph, the joint torques in this case do not always point toward the joint reference trajectory, especially at initial swing, the phase that is crucial for determining a safe foot clearance through an appropriate knee flexion. When the foot is lagging behind the reference trajectory in end-point space, the end-point controller tries to push the foot forward by increasing the hip flexion, while not acting on the knee. This is evident in **Figure 5A** where, at the point of maximum knee flexion, the torques have an almost null component acting on the knee. This problem might cause insufficient foot clearance and potential undesired foot contact with the treadmill.

## Assist-As-Needed Controllers

#### General Formulation

"Assist-As-Needed" (AAN) refers to a control strategy based on assisting the patient/user only as much as needed to successfully perform a predefined task (Emken et al., 2005). One way to modulate the assistance provided by the robotic device is to modify the mechanical impedance rendered by the exoskeleton. A common AAN algorithm for an impedance controller typically updates a normalized impedance parameter P (P∈ R| 0 ≤ P ≤ 1), e.g., stiffness or damping, at every gait step s:

$$P\_{s+1} = \mathcal{Y}\ P\_s + f\left(\mathbf{e}\_s\right)\ \mathcal{Y} \tag{4}$$

A forgetting factor, γ (γ ∈ R| 0 < γ < 1), limits the excessive reliance on the robotic assistance provided by the motion controller (the "slacking" effect;

FIGURE 4 | The desired force field in task-space is shown at some selected points along the end-point trajectory. The force field always points toward the reference position. Two critical areas are magnified. (A) Point of maximum foot clearance: the circle in black represents the desired end-point stiffness. The arrows show that regardless of the deviation from the reference point, the restoring force results always in a force directed to the reference point. (B) At the end of the swing phase, the desired characteristics of the force field are the same as in (A).

Marchal-Crespo and Reinkensmeyer, 2009). A gain g (g∈ R| g > 0) adjusts the control parameter according to an error function f (es), f (f : e<sup>s</sup> → [0, 1]), where e<sup>s</sup> can be, for example, the kinematic deviation between the reference and actual trajectory of an exoskeleton. The function f may account for physiological kinematic variability (e.g., by defining a "deadband" around the reference trajectory; Banala et al., 2007; Emken et al., 2007). Note that domain of the parameters P, γ and f (es) can be different, depending on the behavior one would like to achieve with the AAN algorithm (Marchal-Crespo and Reinkensmeyer, 2009), however, for the examples discussed further in this paper we have selected the ones above.

#### Joint Space Formulation of an AAN Controller

There are several examples of controllers that adapt the robotic joint impedance of an exoskeleton to the subject's ability to walk - for a review see: (Marchal-Crespo and Reinkensmeyer, 2009; Hussain et al., 2011; Cao et al., 2014). For example, to create a patient-cooperative strategy for the Lokomat, hip and knee impedances were adapted according to the patient's effort (as estimated by the robot force sensors) (Riener et al., 2005). Based on a similar estimation of the subject's active contribution, Hussain adapted the joint impedance of a pneumatic-actuated exoskeleton robot (Hussain et al., 2013). However, both works were based on forces exerted by a limited group of able-bodied subjects, which could heavily compromise their applicability in patients exhibiting clonus or spasticity. In the Lokomat, this dependence on the interaction forces was overcome by implementing an approach called "Path control," which allows freedom of movement around predefined joint trajectories, while a virtual tunnel of adjustable width guarantees safety (Duschau-Wicke et al., 2010).

In Maggioni et al. (2015), we presented an AAN algorithm that automatically adapts the Lokomat actuators' impedance based on the ability of the subject to follow a reference gait trajectory. In this work, the algorithm described by Equation (4) was applied. The control parameters **P** were the stiffness **K** and the damping **B** of the hip and knee in an impedance joint controller. The estimator of the subject's performance relied on the kinematic deviation between the actual trajectory and the reference. The gait cycle was divided in 30 windows. For each window w and for each step s the joint impedance was defined by one set of parameters, **K**s,<sup>w</sup> and **B**s,w**,** which was adapted according to the weighted kinematic error performed in each window and every step.

$$\mathbf{K}\_{s+1,\boldsymbol{w}} = \boldsymbol{\gamma}\_1 \mathbf{K}\_{s,\boldsymbol{w}} + \boldsymbol{g}\_1 \boldsymbol{f}\_1 \left[ \mathbf{e}\_s \right]\_{\boldsymbol{w}} \tag{5}$$

$$\mathbf{B}\_{s+1,\boldsymbol{w}} = \boldsymbol{\wp}\_2 \mathbf{B}\_{s,\boldsymbol{w}} + \boldsymbol{\lg}\_2 \mathbf{f}\_2 \left[\dot{\mathbf{e}}\_s\right]\_{\boldsymbol{w}} \tag{6}$$

A set of gains <sup>γ</sup> <sup>1</sup> , γ2, g1, g<sup>2</sup> were defined in order to have the impedance decrease slowly in the presence of physiological deviations and to react fast enough in case of large errors. The error weighting function f [**e**s]<sup>w</sup> consisted of a hyperbolic tangent function of the kinematic error e<sup>s</sup> defined for each window w, which allowed physiological deviations from the reference trajectories of the hip and knee joint, while ensuring safety. This means that for each time point of the gait cycle, the subject's hip and knee was allowed to deviate from the reference trajectory within the deadbands defined for each joint, independently from each other and irrespective of the position of the end-point. Suitable deadbands in joint-space can be defined based on normal ranges for hip and knee joint angles (e.g., taking normative data from Winter, 1991; Perry, 1992; Stoquart et al., 2008 or from able-bodied people walking in the device). To study how these angular boundaries result in end-point space, we applied forward kinematics (see **Appendix 1.1**) to render the resulting boundaries around the end-point (i.e., at the ankle), as illustrated in **Figure 6**.

Due to the non-linearity of the kinematic transformation and its dependency on the joint configuration, the shape of the boundaries resulting at the end-point is hardly predictable from what can be seen in joint space. During the pushoff phase and at the beginning of swing, the boundaries are extremely narrow along the direction of the foot motion. This results in a very strict timing requirement for the subject walking in the robot (i.e., the subject must closely follow the desired ankle position at any time). Even small deviations along the directions of motion can result in a high error, which causes the algorithm to increase the impedance in this specific gait phase. However, in the direction perpendicular to the reference trajectory, higher deviations are allowed, and they could potentially result in insufficient foot clearance. Conversely, during mid-swing, the resulting shape of the joint space deadbands is less conservative along the direction of the trajectory, allowing increased leading or lagging of the foot with respect to a reference position. At the end of swing, again the shape of the boundaries in end-point space changes: here the boundaries allow the subject to perform longer or shorter steps than desired.

#### End-Point Space Formulation of an AAN Controller

In this type of controller, the parameters **P** adapted based on Equation (4) are the end-point stiffness and damping (**K**<sup>x</sup> and **B**x). In literature, there are several examples of end-point impedance adaptation implemented in exoskeleton and endeffector devices. Among the latter, Emken et al. adapted the end-point impedance of a robot guiding the ankle of the subject (ARTHuR) based on the position and velocity error between the reference and actual ankle trajectories (Emken et al., 2008). Hussein et al. implemented an algorithm for adapting the width of a deadband for velocity deviations in the footplate-based Gait Trainer GT-I (Reha-Stim, Germany): based on the error between actual and desired end-effector velocity; the deadband width was either increased to allow more freedom or decreased to provide more guidance to the subject (Hussein et al., 2009). Other works instead, despite using exoskeleton devices, developed an algorithm that adapted the end-effector impedance or force field and calculated the required joint torques based on end-point information. For example, Koopman et al. developed an adaptive vertical force acting on the ankle to support foot clearance (LOPES, Koopman et al., 2013); Banala et al. designed a force field acting on the ankle to guide the end-point along a virtual tunnel (ALEX, Banala et al., 2007).

Having control over the task space impedance allows the implementation of AAN controllers that provide optimal assistance to the end-point. Indeed, the task space force field can be shaped in order to support the foot only in the directions that are needed. Furthermore, designing the deadbands in end-point

FIGURE 6 | In the joint controller hip and knee deadbands are defined independently from each other, as shown in the (A) (hip angle) and (B) (knee angle). The reference trajectory (red) is taken from Colombo et al. (2000). The deadbands (black lines) are calculated from the standard deviation of the trajectories of 10 able-bodied subjects walking in the Lokomat with impedance set to 5% of the maximum, which allows freedom of movement. In the AAN algorithm, deviations occurring within the deadbands lead to a null error. The gait cycle is divided in 30 windows (gray lines show the windows' limits). In (C), the resulting reference trajectory (red) and the corresponding deadbands in end-point space are shown. For each window along the swing phase (only 15 are shown for clarity of representation), the gray rhomboid shows the area including all the possible combinations of hip and knee angles within the deadbands shown in (A,B).

TABLE 1 | Summary of the performances of joint and end-point formulations for the control of a two-link exoskeleton.


space allows requirements such as minimum foot clearance or minimum step length to be set directly.

### Summary of Working in Different Spaces

The two controllers show very different features when applied to a two-link exoskeleton and it is not possible to prefer one over the other independently of the application. In **Table 1,** we summarized the strengths and weaknesses of the two control formulations. The symbols "+" and "–" indicate whether the formulation can adequately address the specific features listed. These aspects have also been nicely addressed in Smith et al. (2015), where the performance of joint and end-point controllers is compared in an industrial manipulator.

### HYBRID JOINT/END-POINT SPACE CONTROLLER WITH ASSIST-AS-NEEDED

In section Joint vs. End-Point Space Formulations, we highlighted strengths and weaknesses of the two formulations: joint and end-point space. Here, we propose an adaptive controller that is formulated in both spaces ("hybrid" formulation) and aims to combine the strengths of both approaches. An end-point space component aims at adapting the end-point stiffness in both magnitude and direction to provide a guided foot placement; while a joint space component aims at providing appropriate temporal coordination between hip and knee angles, especially when the kinematic configuration of the exoskeleton is close to a singularity. This hybrid approach also gives the possibility of defining deadbands more intuitively (based on foot position), which gives more control over the interactions with the environment.

The torques applied during the swing phase of gait, τswing , are the sum of torques generated by a PD controller based on the end-point position and end-point velocity error (Equation 8), torques generated by a D controller based on the angular velocity error in joint space (Equation 9), and compensation, as illustrated in **Figure 7**:

$$
\tau\_{\text{swing}} = \tau\_{\text{xPD}} + \tau\_{qD} + \tau\_{\text{comp}} \tag{7}
$$

$$\mathbf{x}\_{\rm xPD} = \mathbf{J}[\mathbf{q}\_{\rm act}]^T \mathbf{K}\_{\mathbf{x}} \left(\mathbf{x}\_{\rm ref} - \mathbf{x}\_{\rm act}\right) + \mathbf{J}[\mathbf{q}\_{\rm act}]^T \mathbf{B}\_{\mathbf{x}} \left(\dot{\mathbf{x}}\_{\rm ref} - \dot{\mathbf{x}}\_{\rm act}\right) \tag{8}$$

$$\mathbf{r}\_{qD} = \mathbf{B}\_{\mathbf{q}} (\dot{\mathbf{q}}\_{ref} - \dot{\mathbf{q}}\_{act}) \tag{9}$$

The end-point controller is designed to control the magnitude and direction of the forces required in task-space. Since the reference trajectories in joint space are derived from trajectories in task space, one can express the controller terms as:

$$\mathbf{r}\_{\rm xPD} + \mathbf{r}\_{qD} = \mathbf{K}\_{\rm tot} \left( \mathbf{q}\_{\rm ref} - \mathbf{q}\_{\rm act} \right) + \mathbf{B}\_{\rm tot} (\dot{\mathbf{q}}\_{\rm ref} - \dot{\mathbf{q}}\_{\rm act}) \tag{10}$$

$$\mathbf{K}\_{\rm tot} = \mathbf{J}\left[q\_{\mathbf{act}}\right]^{\rm T} \mathbf{K}\_{\rm x} \mathbf{J}\left[q\_{\mathbf{act}}\right] \tag{11}$$

$$\mathbf{B}\_{\rm tot} = \mathbf{J} \begin{bmatrix} q\_{\mathbf{act}} \end{bmatrix}^T \mathbf{B}\_{\rm x} \mathbf{J} \begin{bmatrix} q\_{\mathbf{act}} \end{bmatrix} + \mathbf{B}\_q \tag{12}$$

Note that the stiffness and damping matrices must fulfill the necessary conditions for stability defined in **Appendix 2**.

#### AAN Algorithm

The actual stiffness and damping in end-point space, **K**x[N/m] and **B**x[Ns/m], are obtained from a normalized stiffness and damping **K**<sup>x</sup> and **B**<sup>x</sup> matrices, which are then scaled according to the specific characteristics of the robot. The normalized joint damping term **B**<sup>q</sup> can be adapted according to Equation (6) in section Joint Space Formulation of an AAN Controller. **B**<sup>x</sup> can be adapted either with a similar algorithm or coupled to **K**x.

For the term **K**<sup>x</sup> we would like an AAN algorithm that adapts both the magnitude and direction of the equivalent stiffness ellipse based on the kinematic errors performed throughout the swing phase.

To achieve this the swing phase is divided into equally sized windows. For each window w and for each step s, we adapt the stiffness based on the weighted error at the previous step, both in magnitude and in direction, as:

$$\overline{\mathbf{K}}\_{\mathbf{x}\_{\boldsymbol{s}+1,\boldsymbol{w}}} = \underset{\boldsymbol{r}}{\operatorname{\boldsymbol{\gamma}}} \overline{\mathbf{K}}\_{\mathbf{x}\_{\boldsymbol{s},\boldsymbol{w}}} + f\_{\boldsymbol{K}\_{\boldsymbol{x}}} \left[ \mathbf{e}\_{\mathbf{x}\_{\boldsymbol{s},\boldsymbol{w}}} \right] \\ \underset{\boldsymbol{\mathcal{R}}}{\operatorname{\mathbf{R}}} \left[ \boldsymbol{\alpha}\_{\boldsymbol{s},\boldsymbol{w}} \right] \\ \stackrel{\scriptstyle \mathbf{R}}{\operatorname{\mathbf{G}}}\_{\mathbf{x}} \operatorname{\mathbf{R}} \left[ \boldsymbol{\alpha}\_{\boldsymbol{s},\boldsymbol{w}} \right]^{\mathrm{T}} \\ \text{(13)}$$

$$\mathbf{e}\_{\mathbf{x}\_{\mathbf{s},\mathbf{w}}} = \begin{bmatrix} \boldsymbol{\chi}\_{\text{ref}\_{\mathbf{s},\mathbf{w}}} - \boldsymbol{\chi}\_{\text{act}\_{\mathbf{s},\mathbf{w}}} \\ \boldsymbol{\chi}\_{\text{ref}\_{\mathbf{s},\mathbf{w}}} - \boldsymbol{\chi}\_{\text{act}\_{\mathbf{s},\mathbf{w}}} \end{bmatrix} \tag{14}$$

$$\alpha\_{s,\w} = \arctan\left(\mathbf{e}\_{\mathbf{x}\_{\delta,\Psi}}\right) \tag{15}$$

$$\mathbf{R}\begin{bmatrix}\alpha\_{s,w}\end{bmatrix} = \begin{bmatrix}\cos\alpha\_{s,w} & -\sin\alpha\_{s,w} \\ \sin\alpha\_{s,w} & \cos\alpha\_{s,w}\end{bmatrix} \tag{16}$$

The first term, γx**K**xs,<sup>w</sup> , reduces the stiffness ellipse in all directions given a constant forgetting factor, γ<sup>x</sup> = 0.9. The second term increases the stiffness in the direction of the kinematic error. The magnitude of this change is controlled by a gain matrix **G**<sup>K</sup> = [0.1 0; 0 0.01], which can be seen as a predefined ellipse with axes of fixed length. This ellipse **G**<sup>K</sup> is (i) rotated along the direction of the error, (ii) scaled according to the magnitude of the weighted error fK<sup>x</sup> - **e**xs,<sup>w</sup> and (iii) summed to the stiffness ellipse γx**K**xs,<sup>w</sup> . The error function fK<sup>x</sup> (fK<sup>x</sup> : **e**xs,<sup>w</sup> → [0, 1]) is defined for each window w with different shape characteristics (**Figure 8**). The error functions fK<sup>x</sup> - **e**xs,<sup>w</sup> can be defined with deadbands designed in end-point space. In this way, it is possible to identify requirements for the foot trajectory that ensure a safe interaction between the foot and the treadmill, for example, minimum foot clearance (Begg et al., 2007) and minimum step length (Sekiya et al., 1996). One way of defining the error weighting functions fK<sup>x</sup> - **e**xs,<sup>w</sup> is by using Asymmetric Generalized Gaussian functions (AGGF) (Elguebaly and Bouguila, 2013) which can be designed to have a different variance depending on the gait cycle window. The AGGF allows the width of tolerated kinematic deviations to be defined in all directions independently. An example is presented in **Figure 8**. By design, **K**xs,<sup>w</sup> and fK<sup>x</sup> - **e**xs,<sup>w</sup> are bounded above by 1, therefore, even in presence of high errors, the eigenvalues of the stiffness matrix will never increase above the initial values. The change in the stiffness matrix between consecutive time steps can be bounded by the necessary stability conditions defined in the **Appendix 2**.

Due to the non-linear and adaptive nature of the controller (and the human) and to the variable impedance profile, it is a daunting task to derive the analytical necessary and sufficient conditions for stability. However, we believe that with the necessary (although not sufficient) conditions defined in the **Appendix 2**, in combination with a series of safety measures to prevent undesired robot behaviors, the safety of the user can be guaranteed. First and foremost, we made sure that the controller was stable with constant stiffness and damping values throughout the task space. Second, software mechanisms were in place to constrain the stiffness and damping values to the necessary boundaries defined in **Appendix 2**. The damping was tied to the stiffness to guarantee a critically damped (or overdamped) system throughout the different kinematic configurations. The rate of change of stiffness and damping parameters was constrained. Finally, the safety hardware and software mechanisms of the Lokomat prevented to reach singular configurations and shut down the motors whenever an excessive force or an excessive deviation from the reference trajectory was detected. Before the tests in humans, the controller was tested in real-life simulations on a test-bench as described in section Simulation Results.

#### SIMULATION RESULTS

Before testing the AAN hybrid joint/end-point controller in human subjects, we performed simulations of the expected behavior using Matlab (v2013b, Mathworks).

We started from the simple case of a point along the reference trajectory and simulated different types of kinematic error. We wanted to test whether the AAN algorithm in the hybrid controller ensures an adaptation of the stiffness matrix to the direction and magnitude of the error. We simulated two cases: (i) error of unitary magnitude and constant direction (angle α between the error vector and the x axis equals 0) and (ii) error of unitary magnitude but variable direction (with α varying randomly at each step in the interval [0, π /2]). The resulting stiffness ellipses are described in terms of size, shape, and orientation (Mussa-Ivaldi et al., 1985), whereby size indicates the length of the major axis of the ellipse, shape the ratio between the major and minor axis of the ellipse, and orientation the angle between the major axis and the x axis.

In the first simulation (**Figure 9**—first line), the size along the error direction (length of the ellipse major axis) does not decrease since the error function fK<sup>x</sup> - **e**xs,<sup>w</sup> gives a constant unitary result (Equation 13). The orientation of the ellipse's major axis aligns with the error direction, inducing a force field with maximal restoring forces along the direction of the error and very low forces in every other direction, guaranteeing a compliant behavior of the controller against disturbances in directions other

(Vallery et al., 2009). Detailed information on the low-level control architecture can be found in Riener et al. (2005) and Vallery et al. (2009).

swing phase an AGGF is defined (for clarity of representation, only half of the windows are displayed). Kinematic errors falling within the borders of the respective AGGF result in a null weighted error. Otherwise, the weighted error saturates to 1.

than the error. In the second simulation, as shown in **Figure 9** second line, the ellipse orientation follows the error direction and so does the relative force field. The shape of the ellipse depends on how variable the direction of the error was in the previous steps (Equation 13).

In a second phase, we used a robotic test bench to simulate neurological impairments such as spasticity. The test bench uses a bio-inspired model of a human leg implemented on the leg orthosis of a robotic gait trainer (the Lokomat, in this case). In this setup, one leg orthosis is controlled to simulate a human leg (simulated human leg), while the second orthosis (test orthosis) is controlled by the hybrid end-point/joint controller with AAN. The two orthoses are then rigidly connected using two aluminum bars, simulating a physical attachment of the robot to the user's leg. A spastic-like behavior was implemented on the simulated human leg by adding a velocity-dependent torque at the level of the knee joint, which was applied when the knee angular velocity exceeded a certain threshold. A detailed description of the test bench and of the impairment simulation can be found in Maggioni et al. (2016). The physical connection between the two orthoses allowed the hybrid controller implemented on the test orthosis to control the simulated human leg by shaping the stiffness ellipses to the simulated impairment. As expected, the test orthosis with the hybrid joint/end-point controller adapted the end-point stiffness to counteract the deviations of the simulated human leg caused by the spastic-like simulated impairment (**Figure 10**).

### EXPERIMENTAL RESULTS

The adaptive hybrid joint/end-point controller and the adaptive joint controller were tested with five able-bodied subjects (1 female, age = 27 ± 4.7 years) and one subject with a chronic motor complete Spinal Cord Injury (male, age = 37 years, ASIA B, level of injury = T4, WISCI II = 0/20). The Kantonale Ethikkommission Zürich and Swissmedic approved the study. The aim of this test was first to determine the feasibility and safety of the novel hybrid controller, and subsequently compare the performances of the adaptive hybrid controller to the existing joint adaptive controller (Maggioni et al., 2015). In particular, we

FIGURE 9 | First line: simulation of an error with constant magnitude and direction (black vector) around a reference point (in red). The stiffness ellipse initial configuration is a circle which adapts step by step to the error. The central force field visible at step 1 consequently changes its characteristics. At step 50, the force field is directed mainly along the direction of the error. This implies that the stiffness is high only in directions parallel to the error. Second line: simulation of an error with constant magnitude and variable direction. The error angle variates randomly between 0 and 90◦ . The error of the current step is shown in bold black, while the previous vectors are shown in gray. The stiffness ellipse adapts its orientation based on the error direction. The force field represented by the blue vectors adapts accordingly.

hypothesized (i) that this novel controller adapts the magnitude of the stiffness to the subject's ability to follow the reference trajectory and, at the same time, (ii) that the orientation of the stiffness ellipses aligns to end-point deviations. We decided not to test the pure end-point controller on human subjects, due to safety concerns that emerged while doing preliminary tests with a dummy. As foreseen in section Impact of End-Point Space Formulation on End-Point Stiffness, the end-point controller alone was not able to guarantee sufficient foot clearance and avoid potential undesired foot contact with the treadmill.

### Methods

Subjects were instructed to follow a given foot trajectory in time and space, which was projected on a screen positioned in front of the Lokomat. The actual and reference ankle trajectories were displayed in different colors and two dots indicated the reference and actual position at every time point. After being set up in the Lokomat, the subjects were allowed to familiarize themselves with walking in the device with the standard impedance controller (impedance was set at the maximum available value). The visual feedback was constantly presented to the subject. In this familiarization phase, the Lokomat gait pattern was adjusted to the subject's gait pattern by tuning the ROM and the offset of the hip and knee angular trajectories. These settings were then kept constant during the subsequent experiment. Once comfortable and accustomed to walking inside the robot, the subject was presented with a familiarization round with the novel AAN hybrid controller as described in section Hybrid

Each data point represents the mean value over the swing phase of one step. In (A), the adaptive stiffness of the hybrid controller (i.e., the maximum eigenvalue of the ellipse) is displayed. In (B), the adaptive stiffness of the joint controller (i.e., the mean of the normalized hip and knee stiffness) is shown. The data of the patient are visualized in red.

Joint/End-Point Space Controller With Assist-as-Needed. The subject was instructed to follow the reference trajectory as closely as possible while the adaptation algorithm adapted the impedance based on the kinematic error of the ankle trajectory. After the familiarization phase, the AAN control was active on the leg under test for 50 steps, while the impedance of the other leg was kept at the maximum available value. To ensure a safe foot clearance during swing, the stiffness in the vertical direction was made 5 times higher than the stiffness in the horizontal direction. While this is not a problem in the case of high impedance, it might become apparent when the adaptation algorithm reduces the impedance below a certain level, especially in patients with walking impairments. The implemented stiffness **<sup>K</sup>**˜ <sup>x</sup> and damping **<sup>B</sup>**˜ <sup>x</sup> in the Lokomat were:

$$\mathbf{K}\_{\mathbf{x}} = \mathbf{M}\_{K} \overline{\mathbf{K}}\_{\mathbf{x}} \tag{17}$$

$$\mathbf{M}\_{K} = \begin{bmatrix} 1500 \ 0; \ 0.7500 \end{bmatrix} \frac{N}{m} \tag{18}$$

$$
\tilde{\mathbf{K}}\_{\mathbf{x}} = \frac{\left(\mathbf{K}\_{\mathbf{x}} + \mathbf{K}\_{\mathbf{x}}\right)^{T}}{2} \tag{19}
$$

$$
\tilde{\mathbf{B}}\_{\mathfrak{x}} = \mathbf{M}\_{B} \overline{\mathbf{B}}\_{\mathfrak{x}} \tag{20}
$$

$$\mathbf{M}\_B = \begin{bmatrix} 40 \ 0; \, 0 \ 40 \end{bmatrix} \frac{Ns}{m} \tag{21}$$

The transformation in Equation (19) guarantees that stiffness matrix is symmetric. In addition, to guarantee the stiffness matrix to remain positive definite after this transformation the following constraint was implemented:

$$\left(-\sqrt{K\_{11}K\_{22}} + \rho\right) < K\_{\vec{\eta}} < \left(\sqrt{K\_{11}K\_{22}} - \rho\right) \tag{22}$$

For i 6= j;i,j = 1,2, where

$$
\rho = 0.1 \sqrt{K\_{11} K\_{22}} \tag{23}
$$

The performance of the AAN hybrid controller was then compared with that of the AAN joint controller (see section Joint Space Formulation of an AAN Controller and Maggioni et al., 2015). For this comparison, subjects were tested in a separate session (scheduled within 4 weeks), while performing the same task using the AAN joint controller.

In the AAN hybrid controller, the magnitude of the endpoint stiffness was calculated as the maximum eigenvalue of the stiffness matrix (i.e., the length of the major axis of the stiffness ellipse), averaged over all the windows during the swing phase of each step. The major axis of the stiffness ellipse indicates the direction where the end-point stiffness is maximal. To obtain a measure of the alignment between the direction of maximum stiffness and the position error at the ankle, we calculated the angle between the major axis of the stiffness ellipse and the vector of the end-point error. Only the swing phase of the gait is considered, since the hybrid controller is active only during swing. The weighted kinematic error fK<sup>x</sup> - **e**xs,<sup>w</sup> equals zero when the actual deviation is within the defined deadbands. In this case, the adaptation algorithm (Equation 13) decreases the size of the stiffness ellipse but does not change its orientation. Therefore, we only calculated the alignment in those windows where the weighted error fK<sup>x</sup> - **e**xs,<sup>w</sup> was greater than 0.1. The data of the last 5 steps of the adaptive task were used for the analysis of the final stiffness alignment determined by the algorithm. An average value over all subjects was calculated.

In the joint controller, the magnitude of the stiffness was calculated as the mean of the hip and knee joint stiffness during the swing phase. We then obtained the equivalent end-point

stiffness resulting from the joint stiffness matrix (Equation A.10 in **Appendix 1**). The angle between the major axis of the resulting stiffness ellipse and the direction of the error in end-point space was calculated to estimate the alignment of the force field perceived at the ankle with the error.

### Results

All subjects were able to perform the experiment with the adaptive hybrid controller; the subject with SCI required a fixed body weight support equal to 70% of his body weight to use the adaptive hybrid controller.

The overall end-point stiffness decreased over time and converged to a specific value for each subject. The patient reached, as expected, a higher final value than the able-bodied subjects did.

Results (**Figure 11**) confirmed that the stiffness ellipses start from an initial size and shape (ratio major/minor axis = 5) and, based on Equation (13), subsequently adapt in shape, orientation and size to the errors at the ankle (**Figure 12**). During adaptation, the size of the stiffness ellipses adapts gradually to the kinematic error occurring in that gait phase. At every step, the orientation of the stiffness ellipses tends to align to the direction of the error in that gait window (Equation 13, second term).

In contrast, **Figure 13** shows the results for the joint controller, whereby hip and knee joint stiffness adapt separately (section Joint Space Formulation of an AAN Controller) and no coupling terms are present. Hence, the size, shape and orientation of the resulting end-point stiffness depend not only on the actual joint stiffness but also on the configuration of the leg segments (therefore, on the gait phase). It is clear that there is little or no correspondence between the errors performed in task space and the resulting end-point stiffness.

The alignment between the major axes of the ellipses and the error in the respective time window in the last 5 steps is greater (i.e., the angle is minimum) in the ideal hybrid controller (for Kxx = Kyy) (**Figure 14**). The joint controller showed the worst performance in terms of alignment.

If we examine one of the critical points of the late swing phase, i.e., right before heel strike, in further detail (**Figure 12**), it becomes apparent that, especially in the subject with SCI, the hybrid controller generates an end-point stiffness ellipse rotated in the direction of the error (i.e., the stiffness is higher in the direction along which the error occurred). The adaptive joint controller (**Figure 13**) instead shows a very small stiffness value in that direction, but a high stiffness in a direction that does not apparently require any support.

## DISCUSSION

The aim of our work was to develop an AAN controller for a lower limb exoskeleton which could optimally adapt the support based on the patient's ability to follow a reference trajectory.

FIGURE 13 | Resulting end-point stiffness ellipses caused by the joint controller in the subject with SCI. The resulting end-point stiffness is calculated from the hip and knee stiffness during the last step (50th) of the adaptation. The ellipses appear in the figure as lines since the minor axis is close to a null length. The kinematic error between the reference trajectory (red) and the actual trajectory (blue) of the ankle joint during swing phase is shown by the black vectors.

To achieve this, we examined and discussed the features and disadvantages of joint and end-point space formulation to control exoskeleton robots for the lower limbs. Then, we developed a proof-of-concept novel controller that combines the benefits of joint and end-point formulations: an adaptive hybrid joint/end-point space controller. We presented the results of a software simulation and, finally, the results of the tests on able-bodied subjects and one subject with SCI.

vertical stiffness of the hybrid control was set equal to the horizontal stiffness.

When developing a controller for gait exoskeletons, the choice of the formulation (joint or end-point) highly influences the apparent stiffness and damping rendered by the robot and it has an impact on how the reference trajectories (and safety features around them) are designed. While gait trajectories defined in joint space are closer to the hardware structure of the exoskeleton and similar to what gait analysis presents us, the trajectory of the foot during gait is a precise end-point control task (Winter, 1992). The human achieves certain trajectories in task-space thanks to the fact that the internal models take care of the proper muscle activations that guarantee the correct joint movements (Shadmehr and Mussa-Ivaldi, 1994). In both healthy and pathological conditions, different joint kinematic solutions are adopted to control the position and orientation of the end-point and, in particular, to achieve a safe trajectory of the foot during the swing phase (Winter, 1992). It has thus been hypothesized that the control of the foot trajectory during swing is a major focus of our central nervous system (CNS) during human locomotion (Ivanenko et al., 2003), as also supported by animal studies (Georgopoulos and Grillner, 1989). In contrast to trajectories defined in joint space, end-point trajectories of ablebodied subjects during swing show very little variability during walking on firm level ground and on the treadmill (Winter, 1992; Ivanenko et al., 2002; Awai and Curt, 2014). Instead, during stance phase the main task is not control of the foot trajectory, but rather the support and balance of the body weight. These functional tasks are accomplished by the control of hip, knee and ankle angles in a so called "support synergy" (Winter, 1995). The different tasks performed during swing and stance phase and the different models used for these two phases, support the use of the end-point formulation only during the swing phase of gait.

Depending on the formulation used in the controller, the resulting stiffness properties of the exoskeleton can vary significantly. This results in different magnitudes and directions of supportive torques. Considering the strengths and weaknesses of the joint and end-point formulation for impedance controllers, we proposed a hybrid joint/end-point controller in order to exploit the benefits of the end-point controller in shaping a desired end-point stiffness, while using an additional joint component to guarantee the correct angular trajectories of the joints. In previous research, the concept of a hybrid controller was introduced for an industrial manipulator that was programmed to follow a given end-point trajectory in the presence of external disturbances (both at the end-effector and at the joint level) (Smith et al., 2015). The torques calculated by the end-point controller were complemented with the torques obtained from a joint impedance controller only at those joints that were affected by large disturbance forces. This approach was proven more effective than either end-point or joint control alone to reduce the tracking error in the presence of perturbations at the end-effector and at the joint level.

The control over the end-point stiffness also opens new possibilities when developing a controller with assist-as-needed characteristics. The AAN implemented in end-point space can be directly programmed to adapt the magnitude and the direction of the stiffness based on the error of the subject in task space. Our experiments showed that the controller was capable of adapting the end-point stiffness based on the deviation of the subject from the foot reference trajectory. As expected, the application with a subject with SCI resulted in a higher final end-point stiffness than the able-bodied subjects. When comparing the alignment of the end-point stiffness ellipses generated by the different controllers, we saw that in the hybrid joint/end-point controller the stiffness was better aligned with the error direction. In this way, the controller directs the restoring forces in the direction where they are needed, thus providing a more "specific" support.

Emken et al. (2008) employed a similar approach on the endeffector robotic gait trainer ARTHuR. The end-point stiffness of the robot was adapted with an AAN algorithm that separately adapted horizontal and vertical stiffness. Our approach differs in that the end-point stiffness ellipses align to the direction where the maximum stiffness is required (i.e., the direction of the error). Interestingly, this behavior is close to the way humans adapt their stiffness in response to external disturbances: as shown in Burdet et al. (2001), the CNS can voluntarily control the magnitude, shape and orientation of the end-point stiffness in the upper limb. Moreover, several studies have found that the control of the foot trajectory is the major focus of our CNS during locomotion, both in the unimpaired (Winter, 1992; Ivanenko et al., 2002) and in the impaired spinal cord (Ivanenko et al., 2003; Awai and Curt, 2014). Therefore, a controller for robotic exoskeletons that is shaping the end-point position and the end-point stiffness can be considered as a "bioinspired" solution for the control of robotic devices for human interaction.

A further advantage of the adaptive end-point controller is that the error metric for the algorithm can be defined in task space. This allows us to consider explicitly the interaction between the foot and the environment and the spatio-temporal features of the foot trajectory. As Winter (1992) showed, foot clearance is sensitive to very small angular deviations in any of the joints of the lower limb kinematic chain. This means that, in order to guarantee a safe minimum toe clearance, one would have to design very restrictive deadbands in joint space, which would have a negative impact on freedom of movement for physiological deviations. In contrast, deadbands in task space can be designed to be restrictive only in the directions that are needed for safety, determining how much deviation can be tolerated in the vertical direction (crucial for avoiding stumbling) and in the horizontal direction, which corresponds to leading or lagging with respect to the reference trajectory. Moreover, with the end-point controller, it is possible to present the subjects with visual feedback on the errors in end-point space, which is much easier to process than feedback on joint position (Banala et al., 2009; Koopman et al., 2013; Krishnan et al., 2013), and use the same representation within the error metric of the adaptation algorithm. When used in the experiments with subjects, the adaptive hybrid joint/end-point controller required the use of an additional term to support against gravity: the vertical stiffness was set 5 times higher than the horizontal stiffness. Alternatively, an additional feed-forward term for compensating the effect of gravity could be added. Furthermore, lighter robots (e.g., LOPES; Veneman et al., 2007) would reduce the role of gravity and inertia of the system and thus the need to counteract them. The accuracy of the end-point position can be increased by adding a position sensor which measures directly the **x** coordinates of the ankle, instead of estimating them from the joint angles. While we derived necessary conditions for stability based on the approach proposed by Kronander and Billard (2016) and we took several precautions to guarantee safety, the stability of the AAN hybrid controller is something that requires further investigation.

The single subject with SCI with whom we tested the adaptive and hybrid end-point/joint controller did not show abnormal muscle activation synergies. Extra care should be taken when using the hybrid control with patients that present abnormal synergies or other strong compensatory movements. There might be cases where, despite an almost physiological endpoint trajectory, hip and knee angles remain anomalous (Awai and Curt, 2014). In such cases the hybrid control should be extended by a term that counteracts joint position deviations, as in the approach proposed by Smith et al. (2015), where a joint impedance term was added only when large disturbances at the joint level were detected. Before drawing any conclusions on the benefits of this novel controller in treating subjects with gait disabilities, more tests are needed to study how the controller would react to different impairments such as spasticity.

As a future step, the application of our adaptive hybrid joint/end-point controller concept to other rehabilitation robots, e.g., upper limb exoskeletons [such as the ARMin (Nef et al., 2007), Armeo <sup>R</sup> Power (Hocoma AG) or ALEx (Pirondini et al., 2014)] would be of great interest, because a vast body of literature has investigated how humans adapt their upper-limb stiffness based on the task and on external disturbances (Shadmehr, 1993; Burdet et al., 2001) and it would be instructive to use an adaptive controller similar to the one presented in this work to test its interaction with the human arm.

## CONCLUSION

The adaptive controller presented in this paper implements our ideas of a safe controller combining an end-point impedance controller with a joint damping controller into a "hybrid" joint/end-point controller. The controller was tested successfully with able-bodied human subjects and one subject with spinal cord injury. With this approach, it was possible to implement an adaptive controller that shapes the end-point stiffness according to the direction and the magnitude of the error performed at the ankle. In contrast to other applications, the hybrid controller adapts the end-point stiffness to selectively counteract certain errors while leaving the robot compliant in other directions. The adaptive controller proposed in this paper is a patient-cooperative, bio-inspired solution for more humanoriented rehabilitation robots, which fulfills the requirement of "adaptability" identified by many studies in the field of rehabilitation robotics (Iosa et al., 2016) and may be used on other devices, including upper extremity rehabilitation robots.

## AUTHOR CONTRIBUTIONS

SM contributed to the development of the controller described in this work and to the study design, performed the experiments, analyzed the data and wrote the manuscript draft. NR developed the controller described, performed the experiments, analyzed the data and revised the manuscript. LL provided significant advice during the whole study and revised the manuscript. AM-C conceived the study and the controller, supported significantly in the development of the controller, interpretation of the data, and contributed to the manuscript draft. All authors contributed to manuscript revision, read and approved the submitted version.

### FUNDING

SM and LL were supported by the Industrial Academic Training Network Moving Beyond, funded through the European Community's Seventh Framework Programme FP7/2012 (under grant agreement No. 316639). AM-C was supported by the cereneo - Zentrum für Interdisziplinäre Forschung (cefir) Gemeinnützige Stiftung during the writing of this manuscript.

### REFERENCES


### ACKNOWLEDGMENTS

The authors thank Dr. Marc Bolliger with the Spinal Cord Injury Center of the University Hospital Balgrist and Prof. Robert Riener for the support received during the study; Dr. Ellen Jaspers, Dr. Jaime E. Duarte, Dr. Nicolas Gerig, and Marcel Menner for providing valuable suggestions during the writing process; the subjects who participated to the experiments; Christopher Jarrett for proofreading the manuscript.

### SUPPLEMENTARY MATERIAL

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/frobt. 2018.00104/full#supplementary-material


**Conflict of Interest Statement:** SM, LL, and AM-C at the time of the study worked at Hocoma AG in the R&D Department. SM, NR, and LL currently work at Hocoma AG.

Copyright © 2018 Maggioni, Reinert, Lünenburger and Melendez-Calderon. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Human-In-The-Loop Control and Task Learning for Pneumatically Actuated Muscle Based Robots

Tatsuya Teramae<sup>1</sup> , Koji Ishihara<sup>1</sup> , Jan Babicˇ 2 , Jun Morimoto<sup>1</sup> and Erhan Oztop1,3 \*

<sup>1</sup> Department of Brain Robot Interface, ATR, CNS, Kyoto, Japan, <sup>2</sup> Laboratory for Neuromechanics and Biorobotics, Department for Automation, Biocybernetics and Robotics, Jožef Stefan Institute, Ljubljana, Slovenia, <sup>3</sup> Computer Science Department, Ozyegin University, Istanbul, Turkey

Pneumatically actuated muscles (PAMs) provide a low cost, lightweight, and high power-to-weight ratio solution for many robotic applications. In addition, the antagonist pair configuration for robotic arms make it open to biologically inspired control approaches. In spite of these advantages, they have not been widely adopted in human-in-the-loop control and learning applications. In this study, we propose a biologically inspired multimodal human-in-the-loop control system for driving a one degree-of-freedom robot, and realize the task of hammering a nail into a wood block under human control. We analyze the human sensorimotor learning in this system through a set of experiments, and show that effective autonomous hammering skill can be readily obtained through the developed human-robot interface. The results indicate that a human-in-the-loop learning setup with anthropomorphically valid multi-modal human-robot interface leads to fast learning, thus can be used to effectively derive autonomous robot skills for ballistic motor tasks that require modulation of impedance.

#### Edited by:

Jorg Conradt, Technische Universität München, Germany

#### Reviewed by:

Antonio J. del-Ama, National Hospital for Paraplegics, Spain Guoyuan Li, NTNU Ålesund, Norway Zhijun Zhang, South China University of Technology, China

\*Correspondence:

Erhan Oztop erhan.oztop@ozyegin.edu.tr

Received: 21 April 2018 Accepted: 16 October 2018 Published: 06 November 2018

#### Citation:

Teramae T, Ishihara K, Babic J, ˇ Morimoto J and Oztop E (2018) Human-In-The-Loop Control and Task Learning for Pneumatically Actuated Muscle Based Robots. Front. Neurorobot. 12:71. doi: 10.3389/fnbot.2018.00071 Keywords: human in the loop control, pneumatically actuated muscle, biologically inspired multimodal control, human motor learning, electromyography

### INTRODUCTION

Human-in-the-loop control systems provide an effective way of obtaining robot skills that can eliminate the need for time consuming controller design (Peternel et al., 2016). Robot self-learning (i.e., reinforcement learning) is another powerful approach for obtaining robot skills; but it usually requires long training unless initialized by a human demonstration (which can be provided easily by human-in-the-loop systems). Conventional controller design is especially problematic for robots with Pneumatically Actuated Muscles (PAMs) due to their intrinsic high non-linearity. Therefore, obtaining controllers by using human-in-the-loop control seems to be a good choice to overcome the modeling difficulties faced in PAM modeling and control. However, how the human in the loop would adapt and learn to control the PAM based robots has not been investigated earlier. With this study, to our knowledge, we make the first attempt toward obtaining of a non-trivial skill for a PAM based robot through human-in-the-loop robot control. The motto we adopt in human-in-the-loop robot control is "let us utilize human brain to do the learning and optimization for control." Note that we make a distinction between human-in-the-loop control and kinesthetic teaching based studies (Hersch et al., 2008; Kronander and Billard, 2014; Tykal et al., 2016), as in the former human is the learning controller generating motor commands in real-time as opposed to being an active scaffold or a guide to the robot. After skilled operation is achieved by the human, autonomous controller synthesis boils down to mimicking human behavior by the help of a computer as a function of state and/or time and sometimes context. To ensure a smooth integration of the human into the control loop, the interface between the robot and the human operator is critical. The interface often necessitates anthropomorphic human-robot mapping with intuitive mechanisms to engage the sensorimotor system -as opposed to the cognitive system- of the human operator. Such an interface makes it possible for the human to learn to control the robot and do useful tasks with it as a tool in short timescales. In recent years, there has been a growing interest in human-in-theloop robotic systems for robot skill synthesis (e.g., Walker et al., 2010; Babic et al., 2011; Ajoudani et al., 2012; Moore and Oztop, 2012; Peternel et al., 2014). However, with a few exceptions [e.g., Ajoudani et al. (2012) who used human muscular activity from antagonistic pairs for end-point impedance estimation in teleoperation, and Walker et al. (2010) who proposed a system utilizing a hand grip force sensor to modulate the impedance of the robot during the teleoperation], the majority of the existing studies are targeted for position control based tasks. In Peternel et al. (2014), the authors have shown that human sensorimotor system could drive a robot using multimodal control. In this work, in addition to the usual position based teleoperation, hand flexion was measured by muscle electromyography (EMG) and used to set the compliance property of the robot in real-time. Although the interface was intuitive, the human operator had to perform an additional task of squeezing a sponge ball to create muscle contraction to deliver the required EMG signals to regulate the stiffness of the robot. A more direct control system can be envisioned for those robots that have antagonistically organized muscle actuation system akin to biological systems. Such robot architectures can be built by using so-called artificial muscles, e.g., by Pneumatically Actuated Muscles (PAMs). In such a case, the human muscle activities can be measured in real-time and channeled to the corresponding artificial muscles of the robot in an anthropomorphically valid way (i.e., biceps to "robot biceps;" triceps to "robot triceps"). However, driving a robot with control signals based purely on muscle activities is not trivial if not impossible due to factors such as noise in acquisition, motion artifacts, and the differences in the muscle organization of the robot and the human.

With this mindset, we propose a multimodal approach to control a Pneumatically Actuated Muscle (PAM) based robot where EMG signals and the elbow angle of the human arm are anthropomorphically mapped to the robot creating an intuitive control scheme. The proposed approach is realized on a simple single joint robot, and autonomous behavior of hammering a nail into a wood block is synthesized through human sensorimotor learning. Subsequently, a set of experiments is conducted for analyzing human adaptation to the developed human-in-theloop control setup. The results indicate that such a system can be adopted to effectively derive autonomous controllers for ballistic motor tasks (Brooks, 1983). In addition, to show the usefulness of our approach to design controllers for a non-linear robot system that is difficult to model, we compared the autonomous controller acquired through our human-in-the-loop system and the controller derived by a model-based optimal control method.

### METHODS

One of the factors driving this study is to investigate how human-in-the-loop robot learning can be naturally generalized to tasks that go beyond position control. In particular, we aim at generating autonomous skills based on force based policies. To realize this as a proof of concept we start from a simple one joint two degrees-of-freedom Pneumatically Actuated Muscle (PAM) based robot that has an antagonistic actuation design allowing the stiffness of the robot to be controlled through coactivation. The general framework realizes an anthropomorphic mapping for human to control the robot in real-time by using arm movements and muscle electromyography (EMG) signals from the arm so that the position and stiffness control can be achieved simultaneously. Once this is achieved then various tasks where the robot must change its stiffness for successful execution can be given to the control of human operators for shared control (Dragan and Srinivasa, 2013; Amirshirzad et al., 2016) or autonomous skill synthesis (Babic et al., 2011; Moore and Oztop, 2012; Peternel et al., 2014) purposes. The framework is illustrated in **Figure 3** in the special case of nail hammering task. How the EMG signals and the human movements are converted to PAM pressures is left for the designer. In a classical setting, it may include torque-to-pressure feedforward model as part of the human-robot interface; but, we favor a more direct approach to offload this mapping to human sensorimotor system to be learned as the part of task execution.

### Hardware Setup

The one joint robot is composed of an antagonistically organized Festo MAS-40 pneumatic artificial muscle (PAM) pair (see **Figure 1**) (Noda et al., 2013; Teramae et al., 2013). Each PAM is connected to a rotational disk/pulley system by string tendons housing an arm of 35 cm. Pressurizing the PAMs creates opposing torques on the disk, therefore it is possible the control both the motion and stiffness of the arm through pressure control. The hardware consists of load cells between the tendon and muscle-ends that can be used for control. A feed-forward

FIGURE 1 | One joint robot is composed of antagonistically organized Festo MAS-40 pneumatic artificial muscle (PAM) pair. Each PAM is connected to rotational disk/pulley system by string tendons housing arm of 35 cm. Pressurizing PAMs creates opposing torques on disk, therefore it is possible control both the motion and stiffness of arm through pressure control. Hardware consists of load cells between tendon and muscle-ends that can be used for control.

model representing the relation between air pressure and the resulting muscle/torque can be learned or derived (Ching-Ping and Hannaford, 1996) to control the muscles and the robotic system that it belongs. Due to highly non-linear relations between system parameters it is considered difficult to control such systems. In the current study, as human was placed in the control loop, we eliminated the torque-pressure modeling and leave it for human operator to learn it as a part of task execution. As described below, human was given a simple interface to directly control the pressures in the PAMs to achieve the task at hand.

A digital goniometer (Goniometer SG150, Biometrics Ltd.) was used to measure the human elbow angle, and surface EMG was used to measure muscle activities (see **Figure 2**). The EMG signals were used in real-time to generate desired pressure values (**u**) for the PAM of the robot at 250 Hz. The desired pressure values were realized by a proportional valve controller (provided by NORGREN). The EMG electrodes were attached to the skin over the triceps muscles. EMG signals were passed through rectification and low pass filtering.

### Human-Robot Interface

A generic interface to output the desired pressure values to the PAMs can be given with **u** = **W**[1 ϕ e] <sup>T</sup> where **u** is the vector of desired pressures for the PAMs; ϕ is the elbow angle of the human, and e indicates the muscle activity levels. The constant 1, enables a pressure bias to be given to PAMs. In short, **W** is a linear coefficient matrix that maps EMG and joint movement data of the human directly to PAM (desired) pressures and is composed of bias terms (BU, BL), positional factor (K ϕ ) and EMG factor (K e ). A non-linear mapping could have been used; but, as we would like to rely on human ability to learn to generate appropriate control signals, simplest possible mapping, i.e., linear, was deemed appropriate.

To allow ballistic explosive movements that are necessary for hammering, we designed the **W** matrix by inspiring from biology: we created reciprocal inhibition mechanism between the human arm and the robot. To be concrete, the human triceps EMG signal was channeled to the upper PAM (akin to biceps) as an inhibitory signal. The neural control of movement in the human follows a similar design: when the triceps are activated for arm extension, an inhibition signal is sent to the biceps for reducing the effective stiffness of the arm which enables high velocity movements (Ching-Ping and Hannaford, 1996). Since the hammering task relied on extension of the arm for impact, we did not use EMGs from the biceps in this task for experimental convenience. The lower PAM on the other hand was controlled by the human arm angle measured via a goniometer. Overall, the explained feedforward interface was specified with

$$\mathbf{W} = \begin{bmatrix} B\_U & 0 & K^{\epsilon} \\ B\_L & K^{\varphi} & 0 \end{bmatrix}. \tag{1}$$

The parameters that linearly map the goniometer read angles to lower PAM pressure was obtained for each participant through a simple calibration procedure to cover the allowed range of pressure. The parameters for mapping the EMG signals to upper PAM was obtained in a similar fashion. These parameters were kept fixed through the nailing experiments reported in this article. In sum, after the calibration we ended up with formulae weight matrix to map human actions to desired pressures for each participant. Concretely, each participant was asked to conduct hammering movements as depicted in **Figure 4**. We measured the elbow joint angle and triceps EMG during the movements. From the measured data, maximum (ϕmax), minimum (ϕmin) joint angles and the maximum triceps EMG amplitude (emax) were identified for each participant. These variables were utilized to derive the interface parameters in Eq. (1) so that minimum and maximum joint angles were mapped to maximum (Pmax = 0.8 [MPa]) and minimum (Pmin = 0 [MPa]) desired pressure for lower PAM as depicted in **Figure 5A**:

$$\begin{aligned} K^{\varphi} &= -\frac{P\_{\text{max}}}{\varphi\_{\text{max}} - \varphi\_{\text{min}}}, \\ B\_L &= \left( 1 + \frac{\varphi\_{\text{min}}}{\varphi\_{\text{max}} - \varphi\_{\text{min}}} \right) P\_{\text{max}}. \end{aligned} \tag{2}$$

Similarly, the maximum EMG amplitude of each participant during the real hammering movement was mapped to the maximum (Pmax = 0.8 [MPa]) desired pressure of upper PAM as depicted in **Figure 5B**:

$$K^{\epsilon} = -\frac{P\_{\text{max}}}{e\_{\text{max}}},$$

$$B\_U = P\_{\text{max}}.\tag{3}$$

It is worth underlining that the goal of human movement-torobot control input mapping is not to make the robot imitate the human exactly; the critical requirement is to obtain an intuitive control by having users see a consistent near real-time response from the robot.

## EXPERIMENTS Experimental Design

For the hammering task the robot tip was attached a hard plastic to serve as the hammer head. A compressed wood was used as the material the nail needed to be driven in. **Figure 3** illustrates the hammering set up schematically. The wood block was vertically placed, and had 9 cm thickness. We used a nail of 5 and 0.23 cm thickness. The hammering task was initialized by inserting the nail into the wood by ∼0.4 cm and placing the nail under the center of the plastic end-effector attachment that served as the hammer head. Experimenter detect the task termination when the nail could be completely driven into the wood.

The experiments were designed as a series of sessions in which several trials of human-in-the-loop robot control for driving the nail into the wood was run. Each trial consisted of 15 s of robot teleoperation in which the participants executed hammering movements in real-time via the robot. Participants were shown that their arm movement was imitated by the robot, and a muscle contraction caused movement on the robot even though their arm was still. Furthermore, participants were given the freedom

FIGURE 2 | Surface EMG was used to measure muscle activities and digital goniometer (Goniometer SG150, Biometrics Ltd.) was used to measure human elbow angle. Interface program we developed used these signals in real-time to generate desired pressure values for PAM of robot at 250 Hz. EMG electrodes was attached to skin over triceps muscle for hammering task.

by inserting nail into wood by ∼0.4 cm and placing nail under center of plastic end-effector attachment that served as hammer head.

FIGURE 4 | Calibration phase for human-robot interface: We measure minimum and maximum angle and maximum EMG signals while actual hammering task with real hammer and fit the parameters of (1) based on measured data. We obtained informed consent for the publication of this figure from the participant.

to hammer the nail as they like so the frequency of the strikes (hammering motion) and the amplitude of the robot motion varied from participant to participant. Each session deemed to be complete when the nail could be completely driven into the wood. Then the nail was reset to its default position (care was taken to place the nail in a fresh new location on the wood block). As a measure of performance, we took the number of trials, i.e., the number of 15 s blocks that it took the participant to drive the nail into the wood. We allowed a maximum of 5 trials for each session. The experimental data showed that this was sufficient for driving the nail into the wood for even novice participants.

To summarize, in the experiments, each participant went through 4 sessions. Each session took a maximum of 5 trials, where each trial was a fixed 15 s robot teleoperation. The number of strikes that a trial contained was up to the participant. Likewise, the number of trials that a session included was dependent on how successfully the participant could hammer the nail, thus varied among participants and sessions.

### Skill Transfer With Direct Imitation (Policy Copying)

Once a participant learns to drive the nail into the wood, his/her task execution data can be used to construct an autonomous controller. One of the good performing participants was selected for autonomous skill generation. Furthermore, we selected the desired pressure sequences for the lower and the upper PAM control that generated the highest impact among the hammering movements of the selected participant. Since the velocity is proportional to the impact force, we estimated the impact force from the tip velocity of the robot. The human generated pressure trajectories were segmented by taking the moment of upper PAM pressure rise as the start, and by taking the moment of collision with the nail as the end. For autonomous execution, the obtained pressure trajectories were then reproduced on the robot in a cyclic manner during an execution session (e.g., 15 s).

### Optimal Control Solution

To compare our model-free human-in-the-loop approach with a model-based controller, we design a policy based on an optimal control method as explained below.

Let U<sup>1</sup> ≡ {u1, u2, · · · , uN−1} be a sequence of control variables u ∈ R and denote state variables x ∈ R, optimal state and control trajectories are derived by solving an optimal control problem under non-linear system dynamics:

$$\min\_{\mathbf{U}\_{l}} \mathbf{J} \left( \mathbf{x}\_{1}, \mathbf{U}\_{1} \right),$$

$$s.t. \; \mathbf{x}\_{l+1} = \mathbf{f} \left( \mathbf{x}\_{l}, \mathbf{u}\_{l} \right). \tag{4}$$

where the objective function of the total cost J (**x1**, **U1**) is defined as being composed of the terminal cost function l<sup>f</sup> (x) alone:

$$J\left(\mathbf{x}\_1, \mathbf{U}\_1\right) = l\_{\hat{f}}\left(\mathbf{x}\_N\right). \tag{5}$$

The state and control variables consisted of **x** = - <sup>θ</sup>, <sup>θ</sup>˙, PU, P<sup>L</sup> ⊤ and **u** = [τu, τ<sup>l</sup> ] <sup>⊤</sup>, respectively. <sup>P</sup><sup>U</sup> and <sup>P</sup><sup>L</sup> are air pressures of the upper and lower PAMs. In this case, we considered a cost function model,

$$\mathbf{d}\_{\rm f} = \mathbf{w}\_1 \left( \boldsymbol{\theta}(\rm T) - \boldsymbol{\theta}\_{\rm ref}(\rm T) \right)^2 + \mathbf{w}\_2 \left( \dot{\boldsymbol{\theta}}(\rm T) - \dot{\boldsymbol{\theta}}\_{\rm ref}(\rm T) \right)^2,\tag{6}$$

where <sup>θ</sup>ref(T) and <sup>θ</sup>˙ ref(T) are a target terminal joint angle and target terminal angular velocity obtained from the strongest hammering trajectory of the selected participant. Weights of w<sup>1</sup> and w<sup>2</sup> were optimized by Inverse optimal control (IOC) framework with the learned hammering data of one participant (see **Appendix**).

To solve the optimal control problem, we derived dynamics model of the 1-DoF robot,

$$I\ddot{\theta}^{\,} + h\left(\dot{\theta}\right) + \lg\left(\theta\right) = \mathfrak{r}^{\mu} + \mathfrak{r}^{l},\tag{7}$$

where the inertial parameter is represented as I. The term h θ˙ stands for the friction model:

$$h\left(\dot{\theta}\right) = D\dot{\theta} + \Gamma\_1 \tanh\left(\Gamma\_2 \dot{\theta}\right),\tag{8}$$

which is composed of viscous and static friction models. D is the parameter of the viscous friction. Ŵ1and Ŵ<sup>2</sup> are the static friction parameters, and g (θ) represents the gravity term. τ u and τ l are torques generated by the upper and lower PAMs, respectively. The torque was calculated with a model of a PAM actuator as in Teramae et al. (2013). We convert the continuous time robot dynamics Equation (7) to a discrete time model to formulate the optimal control problem described in Equation (4). We applied an optimal control method, namely iterative Linear Quadratic Gaussian (iLQG) (Todorov and Li, 2005) to obtain the control inputs for executing the nailing task with the robot.

### RESULTS

### Human Control Adaptation and Learning

Six participants participated in "hammering with robot" experiments. All the participants showed clear learning effects. After the first session most participants were able to generate occasional high impact strikes; however it took more time for hammering behavior to stabilize into a regular pattern. As presented in **Figure 6**, the hammering performances of the participants improved, i.e., they could drive the nail with less number of strikes as they become more experienced with the system. A t-test comparing the first and last session performances showed that there was a significant improvement in the performance of the participants from the first session to the last (p < 0.01), indicating significant human learning.

### Autonomous Hammering With Direct Imitation (Policy Copying)

We selected strongest hammering data from high performance participant. In this case, strongest hammering means hammering with the fastest swing down speed, since the impact force is proportional to the swing down speed. We allowed 15 s of autonomous execution. **Figure 1** shows sample frames from an autonomous hammering with direct imitation. The obtained controller could nail with only 3 strikes (**Figure 7A**). Also, direct imitation of other participants can achieve the nailing (**Table 1**). As a stress test, we switched to a larger nail of 6.5 cm length and 0.34 cm thickness, and applied the autonomous controller obtained with the original nail (0.23 cm thick and 5 cm long) to the larger nail. The robot could also completely drive this nail, albeit now with 5 strikes.

### Comparison With the Policy Derived by an Optimal Control Method

To optimize the trajectory and pressure input by using optimal control method, we set the terminal angle and angular velocity based on the selected high impact hammering trajectory. We derived weights of objective function by IOC: we extracted 6 strikes form the final session data of the high performing participant to form the learning data for IOC. As a result, the weights of w<sup>1</sup> = 72.45 and w<sup>2</sup> = 0.033 were obtained. The optimal input and trajectory to be used in execution were then obtained by an optimal control method with the obtained objective function. We allowed 15 s × 5 trials of autonomous execution. **Figure 7B** shows some sample frames from an autonomous hammering session that employed the trajectories obtained by the optimal control method. The obtained controller could not completely nail within 5 trials (i.e., 40 strikes). These results clearly show the advantage of using our human-in-theloop approach to derive controllers for non-linear robot systems that is difficult to be identified.

### DISCUSSION

One of the bottlenecks for the introduction of multipurpose robots to human life is the necessity of programming them. It is not feasible to preprogram them for all possible task scenarios. Many methods such as visual demonstration (Pillai et al., 2015), haptic guidance (Power et al., 2015), motor primitive (Peter and Schaal, 2008), and optimization control based (Zhang et al., 2015) methods have been proposed for acquiring robot skills. However, most methods are geared toward systems in which position and force can be reliably controlled. For such systems, conventional methods may deliver suitable solutions for skilled robot behaviors. However, for those systems where position and force control is problematics as in PAMs, it is not effective to use model-based optimization and/or skill transfer methods based on kinematics and force. Needless to say, some studies do exist addressing the precise control of position and force in PAMs (Ching-Ping and Hannaford, 1996; Ugurlu et al., 2015), which nevertheless, have some drawbacks due to the need for complex calibration.

Teaching by demonstration framework is an effective way to rapidly synthesize skills on a robot, when the interface and modality of control is natural for the demonstrator. There are several variants as to how teaching is done from visual demonstration (Dillmann, 2004) to kinesthetic guidance (Calinon et al., 2001; Kushida et al., 2001). In the latter case,

TABLE 1 | Number of strikes required to accomplish the hammering task by autonomous hammering with direct imitation from 6 participants.


the actions are already realized on the robot so no complex processing is needed to reproduce it on the robot. In the former case, even special tracking sensors are used, significant effort may be needed to map the demonstrated movement into robot actions (Ude et al., 2010). These methods, however, may not be always suitable when the targeted task involves non-negligible dynamics and/or fast actions are required. Of course, it is possible and thus often the case that these methods are used to generate initial robot policies that are subject to optimization or improvement via reinforcement learning (Kober et al., 2012). In what we call robot skill synthesis through human-in-the-loop control and learning, we aim to engage the human sensorimotor system to do the learning and optimization. Therefore, we seek interfaces and adaptive mechanism for the robot to speed up human learning and minimize the mental and physical effort of the human. In particular, exploiting anthropomorphic similarity of the robot and human (Moore and Oztop, 2012; Oztop et al., 2015), simultaneous humanrobot learning (Peternel and Babic, 2013; Mohammad and Oztop, 2015), control mixing and intention understanding (Dragan and Srinivasa, 2013; Amirshirzad et al., 2016) seem to be promising directions to pursue for highly effective human-in-the-loop control systems. As a final note, PAM based robots can be suitable for exploiting human sensorimotor learning effectively as there

are parallels with human skeleto-motor system and those robots that employ PAMs with antagonistic setups. Therefore, it seems reasonable to target more complex tasks on higher degrees of freedom robots with PAMs.

## CONCLUSION

In this study, we proposed and realized a biologically valid multimodal human-in-the-loop system on an antagonistically designed pneumatically actuated one link, two artificial muscled robot. We focused on the ballistic movement of hammering a nail into a wood block, and ran experiments to assess the learning progress of humans to use the robot for driving a nail into a wood block. The rapid human adaption and learning observed, suggest that the developed system engages human sensorimotor learning and does not incur much burden for the cognitive system. In addition to the human experiments, we used one of the high performing participant's skilled execution of the task to synthesize an autonomous controller. The experiments with the controller showed that a significantly larger nail (0.34 cm thick, 6.5 cm long) compared the original one (0.23 cm thick, 5 cm long) used in the skill transfer can be handled with a fixed set of parameters over the conditions. Overall, the current study suggests that adoption of human-in-the-loop approaches

for PAM based robots is a fruitful research direction, in which easy and intuitive human learning facilitate effective skill transfer for tasks that require continuous modulation of impedance.

### AUTHOR CONTRIBUTIONS

TT worked mainly in experiment and wrote related sections. KI worked in experiment and wrote related sections. JB supported to improve the paper quality. JM supported about experimental protocol and improved the paper quality. EO worked in experiment and wrote the paper.

### FUNDING

This research is supported by ImPACT of CSTI. This research is also supported by the Commissioned Research of NICT,

### REFERENCES


AMED under Grant Number JP17dm0107034. Research and Development of Advanced Medical Devices and Systems to Achieve the future of Medicine from AMED, JSPS KAKENHI JP26820090, JP16H06565, JSPS Grant-in-Aid for JSPS Fellows 15J10675, NEDO, Tateishi Science and Technology Foundation. This research was made possible by a grant to EO and JM from Japan Trust (International research cooperation program) for inviting EO to ATR, Japan. Further support was obtained from EC FP7 Converge project (contract No. 321700) and EU Horizon 2020 research and innovation programme under grant agreement No. 687662–SPEXOR.

### ACKNOWLEDGMENTS

We thank Tomoyuki Noda and Nao Nakano for hardware maintenance and helping our experiment.


muscle actuators via a stable force feedback controller," in 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (Hamburg), 1633–1639.


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2018 Teramae, Ishihara, Babiˇc, Morimoto and Oztop. This is an openaccess article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

### APPENDIX

### Optimization of Objective Function With IOC

To determine the weights of objective function for the optimal control method, we used an inverse optimal control (IOC) method. IOC can estimate the reasonable weights to match the optimal states to the demonstrated behaviors. Then we adopted a probabilistic local IOC approach (Levine and Koltun, 2012), in which the probability of the actions is approximated locally around expert's demonstrations (Park and Levine, 2013). In the local IOC approach, given example trajectories D = {**X**<sup>1</sup> ,**X**2, · · · }, the expert's behaviors are represented with a probabilistic model:

$$p\left(\mathcal{D}|l\right) = \prod\_i p\left(\mathbf{X}\_i | l\left(\mathbf{w}\right)\right). \tag{A1}$$

After applying Laplace approximation to the model, the weights **w** are learned by maximizing its likelihood. We used the six hammering behaviors of the selected good performing participant to find the parameters of the objective function w<sup>1</sup> and w<sup>2</sup> in Eq. (6).

# Muscle Synergy Alteration of Human During Walking With Lower Limb Exoskeleton

#### Zhan Li\*, Huxian Liu, Ziguang Yin and Kejia Chen

School of Automation Engineering, University of Electronic Science and Technology of China, Chengdu, China

Muscle synergy reflects inherent coordination patterns of muscle groups as the human body finishes required movements. It may be still unknown whether the original muscle synergy of subjects may alter or not when exoskeletons are put on during their normal walking activities. This paper reports experimental results and presents analysis on muscle synergy from 17 able-bodied subjects with and without lower-limb exoskeletons when they performed normal walking tasks. The electromyography (EMG) signals of the tibialis anterior (TA), soleus (SOL), lateral gastrocnemius (GAS), vastus medialis oblique (VMO), vastus lateralis oblique (VLO), biceps femoris (BICE), semitendinosus (SEMI), and rectus femoris (RECT) muscles were extracted to obtain the muscle synergy. The quantitative results show that, when the subjects wore exoskeletons to walk normally, their mean muscle synergy changed from when they walked without exoskeletons. When the subjects walked with and without exoskeletons, statistically significant differences on sub-patterns of the muscles' synergies between the corresponding two groups could be found.

Keywords: muscle, synergy, walking, exoskeleton, human

### 1. INTRODUCTION

Combinational movements of multiple joints essentially result in human body motion. Joints are actuated by associated muscle groups which are synergistically manipulated by the neural signals from the central nervous system (CNS). As we may know, muscle groups possess high redundancy to achieve potential flexibility for joints, but they still follow limited coordination manners to finish various motor tasks. Such inherent coordination manners of muscles (i.e., muscle synergy) can be perceived as natural and optimal in CNS level. In the past decades, many researchers mainly focused on analyzing muscle synergy of people doing motor learning and locomotion tasks without using wearable assistive robots. In a pioneering work, d'Avella et al. pointed out that a set of muscle synergies basically constructs motor behaviors and that they are highly related to kinematics (d'Avella et al., 2003; Tresch et al., 2006). Chvatal et al. analyzed common muscle synergies for control of center of mass (CoM) for stepping and non-stepping postural responses, revealing that for some similar motor tasks the subject may share common muscle synergies (Chvatal et al., 2011). Zwaan et al. applied muscle synergies to investigate selective motor control in cerebral palsy in gait, supporting the sensitive nature of EMG to represent an aberrant motor control (Zwaan et al., 2012). Fautrelle et al. investigated the latencies of muscular activities and the way they are correlated between certain muscles to stress the muscular synergies involved in movement and, in their study, they suggested the CNS reprograms a new synergy after the target jumps in order to correct

#### Edited by:

Dingguo Zhang, Shanghai Jiao Tong University, China

#### Reviewed by:

Fan Gao, University of Kentucky, United States Haopeng Zhang, University of Louisville, United States Jihong Zhu, UMR5506 Laboratoire d'Informatique, de Robotique et de Microélectronique de Montpellier (LIRMM), France Fady Alnajjar, United Arab Emirates University, United Arab Emirates

> \*Correspondence: Zhan Li zhan.li@uestc.edu.cn

#### Specialty section:

This article was submitted to Neural Technology, a section of the journal Frontiers in Neuroscience

Received: 10 January 2018 Accepted: 24 December 2018 Published: 29 January 2019

#### Citation:

Li Z, Liu H, Yin Z and Chen K (2019) Muscle Synergy Alteration of Human During Walking With Lower Limb Exoskeleton. Front. Neurosci. 12:1050. doi: 10.3389/fnins.2018.01050

**219**

Li et al. Muscle Synergy Alteration of Human

the ongoing reaching movement (Fautrelle et al., 2010). Wojtara et al. proposed a synergy-based stability index during maintaining lateral balance, and this work considers the temporary muscle synergies in postural reflex and automatic response (Wojtara et al., 2014). Wang et al. analyzed muscle synergies facing a step made with obstacles in elderly people and revealed a decreased ability to use multiple-mode synergies following a predictable perturbation (Wang et al., 2015). Li et al. analyzed muscle synergy in the crus for examining its correlation with plantar/dorsiflexion in the ankle joint (Li et al., 2015).

The exoskeleton system is one kind of rehabilitation robots which enables the human knee joint to do daily movement training (Gui et al., 2017), such as being an active orthoses for injured pilots to correct abnormal gait. To assess wearing/training effects on subjects who use rehabilitation robots for daily movement (Zhang et al., 2017a), measurement and evaluation of their muscle activities is important in addition to analysis of kinematics/kinetics. Moreover, inducting muscle coordination information into exoskeletons for assistance of normal walking may be beneficial to human-in-the-loop optimization of energy flows (Zhang et al., 2017b). For instance, Alibeji et al. integrated muscle synergy into the control of hybrid walking neuroprosthesis (Alibeji et al., 2015). Estimating lower leg muscle activity can be achieved from distal bio-signals around the ankles (Isezaki et al., 2017). Upper limb exoskeletons have taken the muscle synergy effect into account in their design process (Burns et al., 2017), and muscle recruitment and coordination information is utilized to optimize the control of ankle exoskeletons (Steele et al., 2017). However, there is still a lack of research on investigation and evaluation of muscle synergy for subjects who perform normal walking while wearing lower-limb exoskeletons. It is important to observe how their muscle synergies would alter when equipping such wearable robots to assist walking. Such muscle co-contraction alteration is worthy of investigation to assess potential side effects for muscles from exoskeletons, especially for subjects with long-term use of exoskeletons, and their muscle synergies might be gradually transformed due to plasticity. Thus, analysis of muscle synergies with lower-limb exoskeletons may be important and beneficial to the design of novel exoskeleton systems toward achieving more natural muscle co-contraction patterns for locomotion and in daily life.

This paper aims to investigate such potential alteration effects of muscle synergies in able-bodied subjects when wearing lower limb exoskeleton systems in performing normal walking tasks, continuing our preliminary work on muscle synergy analysis for quiet standing in healthy subjects (Li et al., 2016). This work tries to investigate how muscle synergy patterns can be affected by lower-limb exoskeleton systems to assist normal dynamic walking. To the best of our knowledge, there is little work focusing specifically on this topic. We would like to present the muscle synergy alteration details with contrasted cocontraction sub-patterns of muscle groups among able-bodied people before and after equipping lower-limb exoskeletons to walk. EMG signals of eight muscles in the lower extremities of both legs of 17 healthy subjects were acquired and processed during the subjects' normal walking with and without wearing lower-limb exoskeletons, and the muscle synergy on a single leg is extracted to present the muscle coordination patterns in different reduced dimensions. In the following statistical results on the muscle synergy of the 17 subjects, it can be observed that the average muscle synergy of the subjects changed when the subjects wear exoskeletons to do normal walking. Statistical results indicate the level of significant difference that muscle synergy alteration phenomena can be reached before and after wearing exoskeletons.

### 2. MATERIALS AND METHODS

In this section, EMG signals of eight muscles of 17 subjects are acquired and analyzed to examine muscle synergies during walking in case of wearing exoskeletons and without exoskeletons.

### 2.1. Experiment Setup

Seventeen able-bodied subjects (16 male and 1 female, 22.88 ± 1.32 years old, 173.65 ± 5.22 cm height, and 54.59 ± 5.21 kg weight) participated in this study upon their consent. The experiments were exempted from IRB approval and followed the institutional guidelines of the University of Electronic Science and Technology of China, and all the experiment operations were in accordance with the Declaration of Helsinki. None of them had ever suffered neuromuscular disorders in their lower limbs. They were all instructed to utilize the lower limb exoskeletons to perform normal walking tasks. The lower limb exoskeleton system used in the experiment was developed by the University of Electronic Science and Technology of China. The lower limb exoskeleton system has four active degrees of freedom (flexion/extension) of motion in hip and knee joints, and its ankle joints have two passive degrees of freedom of motion (dorsi- and plantar flexion). The subjects are required to use crutches to maintain balance during locomotion for safety.

Surface EMG signals were acquired by a commercial EMG acquisition system (TeleMyo DTS System, Noraxon Ltd., Scottsdale, Arizona, United States). The placement of the EMG acquisition pods/electrodes on anterior and posterior sides of lower limbs is shown in **Figure 1**. Eight muscles around the knee, ankle, and hip joints were selected to be tested: the tibialis anterior (TA), soleus (SOL), lateral gastrocnemius (GAS), vastus medialis oblique (VMO), vastus lateralis oblique (VLO), biceps femoris (BICE), semitendinosus (SEMI), and rectus femoris (RECT) muscles in the lower limbs. Eight channels of bipolar differential amplifier were carefully placed on these muscles on each leg according to both the anatomy and joint flexion/rotation experience. The EMG electrodes of each channel were positioned at the muscle belly along the muscle fiber direction with the reference electrode orthogonal to the midline of the active electrodes according to the recommendation of Noraxon. The skin underneath the electrodes was cleaned to reduce the resistance between the skin and the electrodes. The EMG signals were amplified and sampled at 1,500 Hz. The acquired raw EMG signals were rectified and low-pass filtered with a 4th-order Butterworth filter under a 15 Hz cutoff frequency.

FIGURE 1 | EMG electrode locations on the lower limb of one able-bodied subject.

### 2.2. Experimental Protocol

In the experiment, all 17 subjects were individually instructed to perform two types of normal walking tasks, i.e., the first test session for each subject was to let him/her wear the exoskeleton to walk, and the second test session for each subject was to let him/her walk without wearing the exoskeleton. These two sessions are independent and separate. In the first session for normal walking, every subject was told to walk 10 m at a rate of 1 step per second. They stopped for a short time and repeated the same walking rhythm as they had just finished. All the subjects repeated this normal walking trial 4 times. After they completed the first session, they rested for a while and then wore the exoskeletons. In the second session for walking with exoskeletons, each subject was told to walk 5 m without speed restriction, and they repeated the walking tasks with the exoskeletons 4 times. **Figure 2** shows one subject wearing the exoskeleton and walking in the experiment.

### 2.3. Muscle Synergy Extraction

After all the EMG signals of all eight channels on the 17 subjects were acquired and filtered, we extracted the muscle synergies in their right legs as the following procedures. First, we construct the following multiple-channel EMG signal matrix U acquired for each individual

$$U = \begin{bmatrix} U\_{\text{TA}} \\ U\_{\text{SOL}} \\ U\_{\text{GAS}} \\ U\_{\text{VMO}} \\ U\_{\text{VLO}} \\ U\_{\text{BICE}} \\ U\_{\text{EET}} \\ U\_{\text{RECT}} \\ \end{bmatrix} = \begin{bmatrix} U\_{\text{TA}}(1) & U\_{\text{TA}}(2) & \cdots & U\_{\text{TA}}(N) \\ U\_{\text{SOL}}(1) & U\_{\text{SOL}}(2) & \cdots & U\_{\text{SOL}}(N) \\ U\_{\text{GAS}}(1) & U\_{\text{GAS}}(2) & \cdots & U\_{\text{GAS}}(N) \\ U\_{\text{VMO}}(1) & U\_{\text{VMO}}(2) & \cdots & U\_{\text{VLO}}(N) \\ U\_{\text{VLO}}(1) & U\_{\text{VLO}}(2) & \cdots & U\_{\text{EEC}}(N) \\ U\_{\text{BICE}}(1) & U\_{\text{BICE}}(2) & \cdots & U\_{\text{SEM}}(N) \\ U\_{\text{SEM}}(1) & U\_{\text{SEM}}(2) & \cdots & U\_{\text{BICE}}(N) \\ \end{bmatrix} \tag{1}$$

where U<sup>j</sup> (j ∈ {TA, SOL, GAS, VMO, VLO, BICE, SEMI, RECT}) denotes the EMG time sequence of each type of muscle in the right leg with total N samplings. The non-negative matrix

FIGURE 2 | One subject is wearing the lower-limb exoskeleton system and doing a normal walking task. (Consent was obtained from the individual for the publication of this image).

factorization (NMF) method is applied (Tresch et al., 2006) to decompose U ∈ R <sup>8</sup>×<sup>N</sup> as

$$U = WH$$

where W ∈ R <sup>8</sup>×<sup>k</sup> denotes the muscle synergy ratio matrix and H ∈ R <sup>k</sup>×<sup>N</sup> denotes the extracted synergy intensity matrix (neural commands). The decomposition for updating entries hkl and wjk of H and W is conducted with the following iterative algorithm

$$h\_{kl} \gets h\_{kl} \frac{[W^T U]\_{kl}}{[W^T W H]\_{kl}}$$

$$\begin{aligned} \boldsymbol{\nu}\_{jk} &\leftarrow \boldsymbol{\nu}\_{jk} \frac{[U \boldsymbol{H}^T]\_{jk}}{[W \boldsymbol{H} \boldsymbol{H}^T]\_{jk}} \end{aligned}$$

The algorithm is performed by calling the "nnmf" function built in MATLAB R2016a, by minimizing the cost function (residual error) kU − WHkF, where k · k<sup>F</sup> denotes Frobenius norm. The iterative method starts with random initial values for W and H. The entries of synergy matrix W in each of its columns come into being as the muscle co-contraction patterns with different choices of reduced dimension k, i.e., the vector combined by entries w1<sup>k</sup> ,w2<sup>k</sup> , · · · ,w8<sup>k</sup> denotes Synergy k. For example, in case of k = 3, there are three total types of synergy, i.e., the vector combined by entries w11,w21, · · · ,w<sup>81</sup> represents Synergy 1, the vector combined by entries w12,w22, · · · ,w<sup>82</sup> represents Synergy 2, and the vector combined by entries w13,w23, · · · ,w<sup>83</sup> represents Synergy 3.

### 3. RESULTS AND DISCUSSIONS

In this section, muscle synergies of the 17 able-bodied subjects were extracted by NMF with dimension k being reduced to 3, 4, and 6 from the acquired EMG signals, respectively. The muscle synergies of subjects who wear lower-limb exoskeletons for subjects are compared with those of subjects without wearing lower-limb exoskeletons. Analysis of variance (ANOVA) was used to evaluate the statistical significance between muscle synergies with an exoskeleton (i.e., Wwith) and those without an exoskeleton (i.e., Wwithout). The p-value matrices were calculated. If p ≤ 0.05 holds between each synergy value Wwith and Wwithout correspondingly, then the statistical significance of muscle synergy alteration can be seen.

### 3.1. Muscle Synergy With Extraction Dimension k = 3

In this case, the reduced dimension in NMF is k = 3 for muscle synergy extraction from multiple-channel EMG signals, i.e., there are three synergy patterns: Synergy 1, Synergy 2, and Synergy 3. **Figure 3** comparatively shows the average muscle synergies of the 17 subjects during their normal walking tasks with and without wearing lower limb exoskeletons. More specifically, to further show the statistical significance for the muscle synergy alteration effect, the p-values are shown in **Table 1**. From **Figure 3**, we can observe that the average muscle synergy patterns of the 17 subjects who wear lower-limb exoskeletons during normal walking are altered from those of the subjects who perform normal walking without exoskeletons. As seen from Synergy 1 in **Figure 3A**, we find that, when the subjects wear an exoskeleton for walking, their TA muscles exhibit a dominant role with little co-contraction effects from other muscles. For comparison, when the subjects walk without an exoskeleton, their TA muscles still keep the main contributed role, but different co-contraction patterns appear. As shown in **Table 1**, the differences between Synergy 1 with and without an exoskeleton mainly focus on TA and VMO muscles' contraction patterns are statistically significantly different, since their corresponding p-values are both <0.05. For Synergy 2 shown in **Figure 3B**, we can see the cocontraction patterns are quite different as well. When the subjects walk with and without exoskeletons, SOL and GAS muscles are always the main contributed ones. However, the p values for TA, SOL, BICE, and SEMI muscles are <0.05, which may indicate the co-contraction patterns from the two muscles are significantly altered. We observe Synergy 3 in **Figure 3C** and can find that BICE and SEMI muscles are the main contributions. The p-values for TA, SOL, VMO, VLO, SEMI, and RECT muscles are <0.05, and it indicates that wearing exoskeletons might change muscle co-contraction patterns.

### 3.2. Muscle Synergy With Extraction Dimension k = 4

In this part of the results, muscle synergies were extracted by NMF with reduced dimension being k = 4, i.e., Synergies 1∼4 are produced. **Figure 4** shows the average muscle synergy of the 17 subjects with and without lower-limb exoskeletons to perform normal walking. **Table 2** shows the statistical significance results for muscle synergy with and without an exoskeleton. From **Figure 4** we can observe that, when the subjects wear exoskeletons to walk, the TA muscle is still the main contributing muscle and other muscles' co-contractions are almost non-existent for Synergy 1, BICE and SEMI muscles are the main contributing muscles for Synergy 2, the GAS muscle can be seen as the main contributor for Synergy 3, and BICE and RECT muscles play the dominant roles for Synergy 4. For comparison, **Figure 4** also presents the mean average muscle synergy of the 17 subjects who do the same normal walking tasks without wearing lower-limb exoskeletons. From the synergy results without exoskeletons in **Figure 4** we can see that, for Synergy 1, the TA muscle is the main contributor with co-contractions from VLO and RECT muscles; for Synergy



The bold value is < 0.05.

2, BICE and SEMI muscles are the main contributors; for Synergy 3, SOL and GAS are the main contributors to the movement; for Synergy 4, VMO becomes the main contributor. As reflected from the p-values in **Table 2**, we could conclude that the TA, VLO, and RECT muscles' synergies are changed in Synergy 1, the TA, VLO, BICE SEMI, and RECT muscles' synergies are changed in Synergy 2, SOL and GAS muscles' synergies are changed in Synergy 3, and SOL, GAS, VMO, BICE, SEMI, and RECT muscles' synergy are changed in Synergy 4. When examining muscle synergy extraction with dimension k = 4, muscle synergy alteration seems to occur more frequently.

### 3.3. Muscle Synergy With Extraction Dimension k = 6

NMF was applied with reduced dimension k = 6 for muscle synergy extraction in the subsection. **Figure 5** shows the average muscle synergy of the 17 subjects who were with and without exoskeletons for their normal walking tasks. For comparison, **Figure 5** shows the average muscle synergy pattern of the 17 subjects who finished the same normal walking tasks without lower limb exoskeletons. **Table 3** shows the p-values which represent statistical significance results for muscle synergy with and without exoskeleton. From **Figure 5**, we can observe that, when the subjects wear the exoskeleton for walking, the synergies seem altered as compared with those in case of walking without wearing exoskeletons. When the subjects walk with exoskeletons, for Synergy 1, TA muscle still keeps the role of the dominant contributor to the movement with less co-contractions from other muscles, and such similar phenomenon also appears when the reduced dimension becomes k = 3 or k = 4; for Synergy 2, SOL muscle is the main contributor muscle; for Synergy 3, GAS muscle acts as the main contributed muscle more distinctly; for Synergy 4, BICE muscle is still the main contributor; for Synergy 5, SEMI is still the main contributor; for Synergy 6, RECT seems to be the main contributor muscle instead of VLO muscle. From the statistical significance results in **Table 3**, the muscle synergy alteration effect also appears in all 6 synergy patterns. In Synergy 1, the SOL and VLO muscles' synergies are significantly different; in Synergy 2, the SOL, GAS, VLO, and BICE muscles' synergies are significantly different; in Synergy 3, the GAS, BICE, SEMI, and RECT muscles' synergies are significantly different; and in Synergy 4, only the VMO muscle's

TABLE 2 | The p-values between muscle synergy Wwith ∈ R <sup>4</sup>×<sup>8</sup> and <sup>W</sup>without <sup>∈</sup> <sup>R</sup> 4×8 , p ≤ 0.05 indicates significant difference in statistics.


The bold value is < 0.05.

FIGURE 5 | Average muscle synergy of the 17 subjects who walked with and without lower-limb exoskeletons. NMF was used to exact the muscle synergy with reduced dimension being k = 6. (A) Synergy 1, (B) Synergy 2, (C) Synergy 3, (D) Synergy 4, (E) Synergy 5, and (F) Synergy 6.


TABLE 3 | The p-values between muscle synergy Wwith ∈ R <sup>6</sup>×<sup>8</sup> and <sup>W</sup>without <sup>∈</sup> <sup>R</sup> 6×8 , p ≤ 0.05 indicates significant difference in statistics.

The bold value is < 0.05.

synergy is significantly different; in Synergy 5, only the RECT muscle's synergy is significantly different and in Synergy 6, the TA, VMO, and RECT muscles' synergies are significant different.

### 3.4. Discussion

From the aforementioned muscle synergy results with and without exoskeletons in different extraction dimensions k = 3, 4, and 6, we find that, when the subjects wore exoskeleton for normal walking, the corresponding muscle co-contraction patterns could be altered. Statistically significant results further demonstrate that such alteration effects may concentrate on some muscles. As seen from the p value results in **Tables 1**–**3**, two groups of muscle synergies of the eight present significant statistical difference (i.e., p ≤ 0.05) in different levels of extent, and all the sub-patterns from these muscle synergies show at least one muscle's contribution is significantly different. When the extraction dimension is chosen as k = 3, the TA muscle's synergies with and without exoskeletons show significant difference as the p-values in the three synergy patterns are <0.05. The SOL, VMO, and SEMI muscles' synergies with and without exoskeletons show significant difference as well. Wearing an exoskeleton while walking does not affect only the GAS muscle's contribution. When the extraction dimension is set as k = 4, all eight muscles' synergies with and without an exoskeleton present significant difference, with p ≤ 0.05 appearing twice or more in **Table 2**. When we extract muscle synergy with dimension k = 6, all of the eight muscles' synergies with and without exoskeleton have chances to show significant difference. According to our previous work (Li et al., 2015), we can observe that some sub-patterns of muscle synergy have high correlations with joint movement (e.g., flexion or extension). This muscle synergy alteration indicates that human joint torque may be changed due to the involvement of exoskeleton joint torque. Thus, accurate measurement of the participation of assisted robots (e.g., robot torque) and human spontaneously-generated motion (e.g., human torque), together with clear distinction between them, can provide more insightful investigations on the cause of such significant differences.

From another point of view, utilizing lower-limb exoskeletons may change original patterns of muscle co-contractions in subjects during their normal walking activities. This may be not beneficial to the exercise of muscles of subjects who use exoskeletons frequently, since the natural and comfortable muscle synergy can be broken. In order to improve the co-contraction situations when the subjects wear exoskeletons to walk, it is necessary to design a muscle-contraction-primitive controller for exoskeletons instead of purely providing motion compensation by actuators. The users usually give feedback that they may feel uncomfortable and unnatural when they wear exoskeletons for walking. Based on observations of muscle synergy results, one can conclude the reason may lie in the fact that the original natural muscle synergies are altered to artificial ones when subjects use the exoskeletons, and the natural muscle's coordination patterns may be changed manually and compulsively during the process of subjects adapting themselves to exoskeletons. In order to make subjects' muscle synergies with assisted exoskeletons more similar to those without exoskeleton equipped, the following generalized procedure can help to improve the design of exoskeletons toward more natural motion assistance. First, through the aforementioned statistical significance results we can observe which specific muscle's contribution to movement is changed; secondly, by utilizing correlations between muscle synergy patterns and human joint torques in different degrees of freedom, we could improve the design to make the corresponding degree of freedom of the exoskeleton joint possess actuation; next, the level of actuation is adjusted according to exoskeleton dynamics with feedback from kinematics and EMG. Some works try to use EMG signals to control the exoskeleton by considering EMG as some sort of interpretation from human intentions (Kinnaird and Ferris, 2009; Kiguchi and Hayashi, 2012; Lenzi et al., 2012). In this case, the subjects' motion intention explicitly drives the contraction of one or more muscle groups to change EMG signals instead of subconsciously invoking inherent muscle coordination patterns. Involvement of synergistic information may be propitious to produce more natural motion for wearable exoskeleton devices (Hassan et al., 2018; Liu et al., 2018).

In the walking tasks not assisted by exoskeletons, the subjects perform their movement without crutches, as they can keep balance naturally as their daily walking movement. When the subjects wear exoskeletons to move, the crutches are used to maintain balance for safety reasons, with the hip and knee motion assisted by exoskeletons. The actuation of the human-exoskeleton hybrid system is composed of human muscle groups and robot motors. It is still a challenge to measure separately the torque from subjects and the torque from exoskeletons and how these torque values distribute and combine to cooperatively fulfill optimized motion in walking. This work presents that muscle synergy alteration effects appear when able-bodied subjects wear exoskeletons to walk rather than at the actuation level. Future development of advanced measurement technology on the torques of the ankle, knee, and hip joints synchronously together with EMG signals on their associated muscles may promote physiological interpretations of the reduced dimension number k for muscle synergy pattern extraction, as following the way of our previous work (Li et al., 2015). In case torque measurement of multiple joints in the lower extremities is lacking, utilization of EMG signals to analyze the muscle synergy might be a feasible manner to investigate the subjects' muscles' adaption effects to wearable robots.

### 4. CONCLUSIONS

This paper aims to investigate potential alteration effects of muscle synergies of able-bodied subjects after wearing lower limb exoskeleton systems when performing normal walking tasks. EMG signals from eight muscles in the lower extremities of one leg on 17 healthy subjects are used and processed to extract muscle synergies for these subjects to perform normal walking with and without wearing exoskeletons. According to the

### REFERENCES


muscle synergy results of the 17 subjects, we see that patterns of average muscle synergy are changed obviously after the subjects wear exoskeletons. Statistical analysis further shows significant differences among sub-patterns in muscle synergies with and without exoskeletons, indicating that such alteration phenomena evidently exist.

### AUTHOR CONTRIBUTIONS

ZL conceived of the study and designed the experiments. ZY designed the system. HL and KC conducted the experiment. HL and analyzed the data. All the authors drafted the manuscript.

### FUNDING

This work is supported by the National Natural Science Foundation of China (NSFC) under grant No. 61603078 and National Key R&D Program of China under grant No. 2017YFB1302300.

### ACKNOWLEDGMENTS

We would like to thank the volunteer subjects for their participation in the experiments.


**226**

Zhang, D., Ren, Y., Gui, K., Jia, J., and Xu, W. (2017a). Cooperative control for a hybrid rehabilitation system combining functional electrical stimulation and robotic exoskeleton. Front. Neurosci. 11:725. doi: 10.3389/fnins.2017. 00725

Zhang, J., Fiers, P., Witte, K. A., Jackson, R. W., Poggensee, K. L., Atkeson, C. G., et al. (2017b). Human-in-the-loop optimization of exoskeleton assistance during walking. Science 356, 1280–1284. doi: 10.1126/science.aal5054

Zwaan, E., Becher, J. G., and Harlaar, J. (2012). Synergy of emg patterns in gait as an objective measure of muscle selectivity in children with spastic cerebral palsy. Gait Posture 35, 111–115. doi: 10.1016/j.gaitpost.2011.08.019

**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2019 Li, Liu, Yin and Chen. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.