MAPPING HUMAN SENSORY-MOTOR SKILLS FOR MANIPULATION ONTO THE DESIGN AND CONTROL OF ROBOTS

EDITED BY : Matteo Bianchi and Gionata Salvietti PUBLISHED IN : Frontiers in Neurorobotics and Frontiers in Robotics and AI

#### Frontiers Copyright Statement

© Copyright 2007-2019 Frontiers Media SA. All rights reserved. All content included on this site, such as text, graphics, logos, button icons, images, video/audio clips, downloads, data compilations and software, is the property of or is licensed to Frontiers Media SA ("Frontiers") or its licensees and/or subcontractors. The copyright in the text of individual articles is the property of their respective authors, subject to a license granted to Frontiers.

The compilation of articles constituting this e-book, wherever published, as well as the compilation of all other content on this site, is the exclusive property of Frontiers. For the conditions for downloading and copying of e-books from Frontiers' website, please see the Terms for Website Use. If purchasing Frontiers e-books from other websites or sources, the conditions of the website concerned apply.

Images and graphics not forming part of user-contributed materials may not be downloaded or copied without permission.

Individual articles may be downloaded and reproduced in accordance with the principles of the CC-BY licence subject to any copyright or other notices. They may not be re-sold as an e-book.

As author or other contributor you grant a CC-BY licence to others to reproduce your articles, including any graphics and third-party materials supplied by you, in accordance with the Conditions for Website Use and subject to any copyright notices which you include in connection with your articles and materials.

All copyright, and all rights therein, are protected by national and international copyright laws.

The above represents a summary only. For the full conditions see the Conditions for Authors and the Conditions for Website Use. ISSN 1664-8714 ISBN 978-2-88945-795-3 DOI 10.3389/978-2-88945-795-3

#### About Frontiers

Frontiers is more than just an open-access publisher of scholarly articles: it is a pioneering approach to the world of academia, radically improving the way scholarly research is managed. The grand vision of Frontiers is a world where all people have an equal opportunity to seek, share and generate knowledge. Frontiers provides immediate and permanent online open access to all its publications, but this alone is not enough to realize our grand goals.

#### Frontiers Journal Series

The Frontiers Journal Series is a multi-tier and interdisciplinary set of open-access, online journals, promising a paradigm shift from the current review, selection and dissemination processes in academic publishing. All Frontiers journals are driven by researchers for researchers; therefore, they constitute a service to the scholarly community. At the same time, the Frontiers Journal Series operates on a revolutionary invention, the tiered publishing system, initially addressing specific communities of scholars, and gradually climbing up to broader public understanding, thus serving the interests of the lay society, too.

### Dedication to Quality

Each Frontiers article is a landmark of the highest quality, thanks to genuinely collaborative interactions between authors and review editors, who include some of the world's best academicians. Research must be certified by peers before entering a stream of knowledge that may eventually reach the public - and shape society; therefore, Frontiers only applies the most rigorous and unbiased reviews.

Frontiers revolutionizes research publishing by freely delivering the most outstanding research, evaluated with no bias from both the academic and social point of view. By applying the most advanced information technologies, Frontiers is catapulting scholarly publishing into a new generation.

#### What are Frontiers Research Topics?

Frontiers Research Topics are very popular trademarks of the Frontiers Journals Series: they are collections of at least ten articles, all centered on a particular subject. With their unique mix of varied contributions from Original Research to Review Articles, Frontiers Research Topics unify the most influential researchers, the latest key findings and historical advances in a hot research area! Find out more on how to host your own Frontiers Research Topic or contribute to one as an author by contacting the Frontiers Editorial Office: researchtopics@frontiersin.org

# MAPPING HUMAN SENSORY-MOTOR SKILLS FOR MANIPULATION ONTO THE DESIGN AND CONTROL OF ROBOTS

Topic Editors: Matteo Bianchi, Università di Pisa, Italy Gionata Salvietti, Università degli Studi di Siena, Italy

Image: maxuser/Shutterstock.com

Humans are endowed with extraordinary sensory-motor capabilities that enable a successful interaction with and exploration of the environment, as is the case of human manipulation. Understanding and modeling these capabilities represents an important topic not only for neuroscience but also for robotics in a mutual inspiration, both to inform the design and control of artificial systems and, at the same time, to increase knowledge on the biological side. Within this context, synergies -- i.e., goal-directed actions that constrain multi DOFs of the human body and can be defined at the kinematic, muscular, neural level -- have gained increasing attention as a general simplified approach to shape the development of simple and effective artificial devices.

The execution of such purposeful sensory-motor primitives on the biological side leverages on the interplay of the sensory-motor control at central and peripheral level, and the interaction of the human body with the external world. This interaction is particularly important considering the new concept of robotic soft manipulation,

i.e. soft, adaptable yet robust robotic hands that can deform with the external environment to multiply their grasping and manipulation capabilities. Under this regard, a preeminent role is reserved to touch, being that skin isour primary organ to shape our knowledge of the external world and, hence, to modify it, in interaction with the efferent parts.

This Research Topic reports results on the mutual inspiration between neuroscience and robotics, and on how it is possible to translate neuroscientific findings on human manipulation into engineering guidelines for simplified systems able to take full advantage from the interaction and hence exploitation of environmental constraints for task accomplishment and knowledge acquisition.

Citation: Bianchi, M., Salvietti, G., eds. (2019). Mapping Human Sensory-Motor Skills for Manipulation onto the Design and Control of Robots. Lausanne: Frontiers Media. doi: 10.3389/978-2-88945-795-3

# Table of Contents

*05 Editorial: Mapping Human Sensory-Motor Skills for Manipulation Onto the Design and Control of Robots*

Matteo Bianchi and Gionata Salvietti

### SECTION 1

### INVESTIGATION OF HUMAN SENSORY-MOTOR BEHAVIOR

*08 Unvealing the Principal Modes of Human Upper Limb Movements Through Functional Analysis*

Giuseppe Averta, Cosimo Della Santina, Edoardo Battaglia, Federica Felici, Matteo Bianchi and Antonio Bicchi


### SECTION 2

### BIO-AWARE ROBOTICS AND MAN-MACHINE INTERFACES


# Editorial: Mapping Human Sensory-Motor Skills for Manipulation Onto the Design and Control of Robots

Matteo Bianchi 1,2 \* and Gionata Salvietti 3,4

<sup>1</sup> Research Center "Enrico Piaggio", University of Pisa, Pisa, Italy, <sup>2</sup> Department of Information Engineering, University of Pisa, Pisa, Italy, <sup>3</sup> Department of Information Engineering and Mathematics, University of Siena, Siena, Italy, <sup>4</sup> Department of Advanced Robotics, Istituto Italiano di Tecnologia, Genoa, Italy

Keywords: human hands, robotic hands, human-robot interfaces, human arm, human touch, prosthetics

**Editorial on the Research Topic**

#### **Mapping Human Sensory-Motor Skills for Manipulation Onto the Design and Control of Robots**

The extraordinary human sensory-motor capabilities arise from the interaction with the external

world and the interplay of different elements, which are controlled within a space whose dimensionality is lower than the available number of dimensions, as suggested by the concept of synergies, see (e.g., Turvey, 2007; Latash, 2008; Santello et al., 2013). This general simplification approach has then been successfully used in robotics, to inform the development of simple yet effective artificial devices, see (e.g., Santello et al., 2016). Mutual inspiration between robotics and neuroscience could hence be the key to advance both these disciplines: through a bio-aware approach for the design of mechatronic systems, on one side, and the deployment of technical tools for novel neuroscientific experiments, on the other. The manuscripts presented in this e-book shed light on the organization of human sensory-motor architecture, presenting instruments and mechatronic systems that can be successfully applied to neuroscientific investigation. At the same time, we report on robotic translations of neuroscientific outcomes.

Edited by:

Ganesh R. Naik, Western Sydney University, Australia

#### Reviewed by:

Fernando Perez-Peña, University of Cádiz, Spain Nicolas Cuperlier, Université de Cergy-Pontoise, France

#### \*Correspondence:

Matteo Bianchi matteo.bianchi@centropiaggio.unipi.it

> Received: 08 November 2018 Accepted: 03 January 2019 Published: 22 January 2019

#### Citation:

Bianchi M and Salvietti G (2019) Editorial: Mapping Human Sensory-Motor Skills for Manipulation Onto the Design and Control of Robots. Front. Neurorobot. 13:1. doi: 10.3389/fnbot.2019.00001 INVESTIGATION OF HUMAN SENSORY-MOTOR BEHAVIOR

In Averta et al. functional principal component analysis (fPCA) was applied, for a first time, to upper limb human actions, to unveil principal motor control schemes of arm joints. Results show that a combination of few principal time-dependent functions can explain most of trajectory variability in daily living activities. These findings can be applied for planning robotic manipulators and characterizing human upper limb kinematics in physiological and pathological conditions. The latter affects not only the motor components but also subjects' somatosensation, whose assessment has received limited attention compared to motoric abilities. In Ballardini et al. a low-cost, bimanual mechatronic system is presented, which acts as a tactile stimulator and recorder. Results from tests with healthy subjects and post-stroke individuals show that the system can be a viable solution for characterizing tactile perceptual abilities at different body locations. The correct quantification of the performance of human somatosensory system can also provide useful inspiration for a successful human-robot interaction through haptic feedback. However, there are cases where the hands, which can be regarded as the organ of touch (Bicchi et al., 2011), are not accessible and other alternatives for haptic feedback delivery have to be investigated. Relying on the findings that humans can integrate normal force feedback at the toe into the sensorimotor loop, in Hagengruber et al. authors analyze human discrimination capabilities of spatial forces with different amplitudes and directions of application, at the bare front side of the toe. This provides a perceptual workspace that can be employed to design robotic devices for sensory substitution. Human afferences are not limited to touch, but they encompass multiple sensing channels, such as vision. Classic psychophysics characterizes sensory performance in terms of Weber's law and Just Noticeable Difference. However, the assumptions underneath these approaches can be violated in natural action-perception tasks, as it is the case of vision-guided grasping. Since perception and action are not synchronized in tele-robotic grasping, telerobotic systems can be an ideal platform to study the underlying causes that determine a violation of Weber's law. Afgin et al. propose a telerobotic system with time delays to investigate the perceptual basis of grasp control. White et al. study the modulation of the grip force during the interaction with soft and rigid virtual objects, when the stiffness is varied continuously across trials. Results suggest a sudden transition phase between two feedforward controllers, which is triggered at a given stiffness level.

### BIO-AWARE ROBOTICS AND MAN-MACHINE INTERFACES

In Salvietti, the principal solutions for the design of robotic hands that implement the inter-joint coupling associated to the concept of hand synergies are reviewed. Synergistic inspiration has been also combined with soft robotics for a novel generation of deformable, robust, and functional artificial hands (Catalano et al., 2014). These end-effectors have attracted the attention of prosthesis designers, since they guarantee a simplified control and a natural interaction with the environment. Under this regard, promising results have been obtained with the SoftHand—Pro, SHP (Godfrey et al., 2017), an anthropomorphic, adaptable myo-prosthetic robotic hand with 19 DoFs but actuated using only one motor [controlled with two surface electromyographic (sEMG) electrodes]. To improve the SHP

### REFERENCES


capabilities for fine grasp force control (Fu and Santello), propose a hybrid-gain myoelectric controller that switches the control gain based on the hand sensorimotor state. Haptic feedback was also delivered at the upper arm (Casini et al., 2015). Results show that the hybrid control architecture improves task completion speed and fine control, leading to performance qualitatively similar to the one of native human hands. The intrinsic capability of humans to vary the stiffness of their muscular-skeletal system is another key feature that allows complex motor behavior (Della Santina et al., 2017). Recently, mechanical structures with variable intrinsic stiffness have been proposed (Vanderborght et al., 2013) for energy-efficient action completion, as it could be the case of prostheses for cyclical drumming tasks. To achieve this goal, in Stillfried et al. able bodied drummers were asked to play simple regular drum beats. Results show that a series-elastic connection element between the forearm and the drumstick appears to lower the muscular effort of drumming, while a stiff connection seems to minimize the mental load and has a positive effect on the performance of drumming novices. In Zeng et al. an augmented reality AR guiding assistance method is presented, which enhances visual feedback to the user for a combined electroencephalography—based Brain Machine Interface (BMI) and eye tracking control of a robotic arm. Experimental results show that such a hybrid Gaze-BMI controller with the inclusion of AR information increases performance efficiency and reduces the cognitive load. In Fathaliyan et al. human gaze behavior and gaze–object interactions in 3D during a complex bimanual task are investigated. The goal is to extract salient features that can be fed to machine learning algorithms for human action recognition, with promising applications to assistive robots and robotic co-workers.

### AUTHOR CONTRIBUTIONS

Both authors acted as guest editors for the related research topic. In the editorial, they have made a substantial, direct and intellectual contribution to the work, and approved it for publication.

### FUNDING

This research has received funding from the European Union's Horizon 2020 Research and Innovation Programme under Grant Agreement No. 688857 (SoftPro) and under Grant Agreement No. 645599 (Soma).


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2019 Bianchi and Salvietti. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# **Unvealing the Principal Modes of Human Upper Limb Movements through Functional Analysis**

*Giuseppe Averta1,2 \*, Cosimo Della Santina<sup>1</sup> , Edoardo Battaglia<sup>1</sup> , Federica Felici <sup>2</sup> , Matteo Bianchi 1,3 and Antonio Bicchi 1,2*

*<sup>1</sup>Centro E. Piaggio, University of Pisa, Pisa, Italy, <sup>2</sup>ADVR, Fondazione Istituto Italiano di Tecnologia, Genoa, Italy, <sup>3</sup>Department of Information Engineering, University of Pisa, Pisa, Italy*

The rich variety of human upper limb movements requires an extraordinary coordination of different joints according to specific spatio-temporal patterns. However, unvealing these motor schemes is a challenging task. Principal components have been often used for analogous purposes, but such an approach relies on hypothesis of temporal uncorrelation of upper limb poses in time. To overcome these limitations, in this work, we leverage on functional principal component analysis (fPCA). We carried out experiments with 7 subjects performing a set of most significant human actions, selected considering state-of-the-art grasp taxonomies and human kinematic workspace. fPCA results show that human upper limb trajectories can be reconstructed by a linear combination of few principal time-dependent functions, with a first component alone explaining around 60/70% of the observed behaviors. This allows to infer that in daily living activities humans reduce the complexity of movement by modulating their motions through a reduced set of few principal patterns. Finally, we discuss how this approach could be profitably applied in robotics and bioengineering, opening fascinating perspectives to advance the state of the art of artificial systems, as it was the case of hand synergies.

#### *Edited by:*

*Cecilia Laschi, Sant'Anna School of Advanced Studies, Italy*

#### *Reviewed by:*

*Dongming Gan, Khalifa University, United Arab Emirates Sunil L. Kukreja, National University of Singapore, Singapore*

> *\*Correspondence: Giuseppe Averta g.averta3@gmail.com*

#### *Specialty section:*

*This article was submitted to Bionics and Biomimetics, a section of the journal Frontiers in Robotics and AI*

> *Received: 05 May 2017 Accepted: 18 July 2017 Published: 09 August 2017*

#### *Citation:*

*Averta G, Della Santina C, Battaglia E, Felici F, Bianchi M and Bicchi A (2017) Unvealing the Principal Modes of Human Upper Limb Movements through Functional Analysis. Front. Robot. AI 4:37. doi: 10.3389/frobt.2017.00037* **Keywords: upper limb kinematics, motor control, daily living activities, functional analysis, human-inspired robotics**

## **1. INTRODUCTION**

Human hands represent an extraordinary tool to explore and interact with the external environment. Not surprisingly, a lot of studies have been devoted to model how the nervous system can cope with the complexity of hand sensory-motor architecture (Mason et al., 2001; Todorov and Ghahramani, 2004; Zatsiorsky and Latash, 2004; Thakur et al., 2008; Gabiccini et al., 2013; Santello, 2014). These studies have led to the definition of the so-called synergies, broadly intended as covariation patterns that can be represented at different levels (Santello et al., 2016). More specifically, at the level of motor units, neural activation shows a synergistic control in the time and/or frequency domain (Santello, 2014). At the muscle level, different works explored patterns of muscle activity whose timing and/or amplitude modulation enables the generation of different movements, see d'Avella and Lacquaniti (2013) for a review. Synergies have also been identified and defined as covariation patterns of joint angles, e.g., hand postural synergies (Santello et al., 1998, 2013; Mason et al., 2001), or covariation patterns among digit forces [for a review see Zatsiorsky and Latash (2004)]. However, to correctly understand human manipulation, in addition to hand analysis, the role of whole upper limb movements should be also taken into account. Indeed, the whole upper limb motions are devoted to guide and optimize position and orientation of the hand w.r.t. external targets.

Forthese reasons, in addition to many works devoted to analyze hand behavior, it is also possible to find studies modeling human upper limb motor workspace, either from a kinematic point of view, or from a muscular or neural point of view. In Heidari et al. (2016), the authors studied the kinematic movements of upper limb during selected tasks in order to compare stroke patients and normal subjects. In Butler et al. (2010), the authors developed a quantitative method to assess upper limb motor deficits in children with cerebral palsy using three-dimensional motion analysis during the reach and grasp cycle. Other papers studied muscular synergies in upper limb activities, as in d'Avella and Tresch (2002), where the authors introduced a model based on combinations of muscle time-varying synergies, and in d'Avella et al. (2006), where authors recorded electromyographic activity from shoulder and arm muscles during point-to-point movements. As for hand synergies, whose robotic applications are reviewed in Santello et al. (2016), synergies have also been applied to movement generation for virtual arms (Fu et al., 2013) as well as myocontrol of a multi-DoF planar robotic arm using muscle synergies (Lunardini et al., 2015). However, none of the previous studies considered the dynamic aspects of human upper limb motion, i.e., that different temporal evolutions and shapes of upper limb joints trajectories would result in different final hand poses.

Typical approaches based on principal component analysis are not suitable in this case because of the underlying hypothesis of temporal uncorrelation of upper limb poses in time. For this reason, to achieve this goal, we propose to use for the fist time functional principal component analysis (fPCA) to study upper limb motions. fPCA is a statistical method for investigating dominant modes of variation of functional data in time and has been widely used in one-dimensional or multi-dimensional time series analysis in chemistry, weather phenomena, and medicine (Aguilera et al., 1999; Gokulakrishnan et al., 2006; Dai et al., 2013). The interested reader in functional data analysis could refer to Ramsay and Silverman (2002), Ramsay (2006), and Ramsay et al. (2009). In human movement studies, this method has been used to explore the presence of variations in repetitions of a specific task, e.g., in Ryan et al. (2006), an analysis of knee joint kinematics in the vertical jump was performed. In Coffey et al. (2011), fPCA was used to analyze a bio-mechanical dataset examining the mechanisms of chronic Achilles tendon injury and the functional effects of orthoses by comparing injured and healthy subjects. In Dai et al. (2013), fPCA was used in conjunction with PCA for the analysis of grasping motion. In this work, a new analysis of upper limb movements by fPCA is proposed to provide a description of the kinematic trajectories as combination of functional principal components (fPCs).

The choice to use fPCA as main data analysis tool is motivated by the fact that it allows to include some important features of the signal, such as shape and time dependence, which cannot be taken into account by other simpler data dimensionality reduction techniques (e.g., principal component analysis). To achieve this goal, we propose an experimental setup for studying upper limb movements, based on a Motion Capture (MoCap) system (*Phase Space*® ). Using this tool, we carried out a series of experiments with human considering a comprehensive dataset of daily living activities (ADLs) and grasping/manipulation actions. These actions were selected relying on the study of grasping taxonomies (Cutkosky, 1989; Feix et al., 2016), and considerations on human upper limb movement workspace (Lenarcic and Umek, 1994; Abdel-Malek et al., 2004; Perry et al., 2007). Our analysis has led to the reduction of complexity of upper limb trajectories by describing these as linear combinations of few principal functions (or modes). Implications for robotics are also discussed.

### **2. EXPERIMENTAL PROTOCOL AND SETUP**

### **2.1. A Set of Daily Living Tasks**

In order to develop a comprehensive study of human upper limb movements, one of the key features for the generation of a valid dataset is the definition of a set of meaningful actions (Santello et al., 1998; Mason et al., 2001; Todorov and Ghahramani, 2004; Vinjamuri et al., 2010). For this reason, we selected a set of movements driven by the study of grasping taxonomies (Cutkosky, 1989; Feix et al., 2016), and the analysis of human upper limb movement workspace (Lenarcic and Umek, 1994; Abdel-Malek et al., 2004; Perry et al., 2007). The output of this selection resulted in a set of 30 different actions, divided into intransitive, transitive, and tool-mediated actions to avoid bias due to the affordances of the objects used for the grasp investigation. Indeed, as Cubelli et al. (2000) suggested starting from apraxia investigation and Handjaras et al. (2015) confirmed with cortical imaging, different movements are generated by different cortical activations, because require different motor schema, based on the type of interaction with the environment. These movements can be classified into three classes, according to the presence or absence of an object and, if the object is present, on the approach with it: intransitive class, which collects actions that does not need the use of an object; transitive class, which collects actions that introduces the use of an object; and tool-mediated class, which collects actions where an object is used to interact with another one. Tasks are meant to be executed three times with dominant hand, the subject seating on a chair, with the objects placed on a frontal table at a fixed distance. At the end of the task, the subject returned to the starting point. The complete list of actions is reported in **Table 1**.

### **2.2. An Experimental Setup for Data Acquisition**

We focused on kinematic recordings, which were achieved using a commercial system for 3D motion tracking with active markers (*Phase Space*® ). Ten stereo-cameras working at 480 Hz tracked 3D position of markers, which were fastened to supports rigidly attached to upper limb links. In this manner, 20 markers were accommodated on the upper limb so that the distance between elements of each support was fixed. Supports were suitably designed for these experiments and printed in ABS (see **Figure 1A**). The acquisition was implemented through a custom application developed in C++, employing Boost libraries (Schäling, 2011) to enable the synchronization between Phase Space data and other sensing modalities, such as force/torque sensors and EEG, and the Phase Space OWL library to get the optical tracking system data.

#### **TABLE 1** | Protocol actions.


setup. The subjects were comfortably sit in front of the table. In the starting position, the subject's hand was located at the right side of the table. Two cameras are included to record the scene.

Seven adult right-handed subjects (5 males and 2 females, aged between 20 and 30) performed the experiment. Each task was repeated three times in order to increase the robustness of collected data. The experimenter gave the starting signal to subjects. In the instructions, the experimenter emphasized that the whole movement should be performed in a natural fashion. The object order was randomized for every subject. Each subject performed

the whole experiment in a single day. No subject knew the purpose of the study, and had no history of neuromuscular disorders. Each participant signed an informed consent to participate in the experiment, and the experimental protocol was approved by the Institutional Review Board of University of Pisa, in accordance with the declaration of Helsinki. The complete experimental setup is reported in **Figures 1B,C**. Moreover, we used two Averta et al. Upper Limb Functional Synergies

cameras (Logitech hd 1080p) to record the scene of the experiments in order to visually compare the real and the reconstructed movement.

### **3. MOTION IDENTIFICATION**

### **3.1. Modeling of Upper Limb Kinematics**

An accurate description of human upper limb is challenging due to the high complexity of the kinematic structure, e.g., for axis location and direction, which are usually time varying. In order to explore the system complexity, the interested reader can refer to Maurel and Thalmann (2000) and Holzbaur et al. (2005). In this work, we used a trade-off between complexity and accuracy taking inspiration from Benati et al. (1980). This allows to get an acceptable computational time, still maintaining a good level of explanation of physical behavior. Taking inspiration from Gabiccini et al. (2013), we adopted a model with 7 degrees of freedom (DoFs) and 3 invariable shape links. Joints angles are defined as *q*1*, . . . , q*7: *q*<sup>1</sup> is associated with the shoulder abduction–adduction; *q*<sup>2</sup> is associated with the shoulder flexion–extension; *q*<sup>3</sup> is associated with the shoulder external–internal rotation; *q*<sup>4</sup> is associated with the elbow flexion–extension; *q*<sup>5</sup> is associated with the elbow pronation–supination; *q*<sup>6</sup> is associated with the wrist abduction–adduction; *q*<sup>7</sup> is associated with the wrist flexion–extension. In **Figure 2A**, a scheme of the model is reported.

### **3.2. Model Parameters**

In order to describe the forward kinematics of the arm, 5 different reference systems was defined: *Sref*, centered in *Oref*, fixed to the epigastrium; *SS*, centered in *OS*, Center of Rotation (CoR) of shoulder joints, fixed to the arm; *SE*, centered in *OE*, CoR of elbow joints, fixed to the forearm; *SW*, centered in *OW*, CoR of wrist joints, fixed to the hand; *SH*, centered in *OH*, fixed to the hand. The rigid transform between *Sref* and *S<sup>S</sup>* is *T<sup>O</sup>refO<sup>S</sup>* ; the rigid transform between *S<sup>S</sup>* and *S<sup>E</sup>* is *T<sup>O</sup>SO<sup>E</sup>* ; the rigid transform between *S<sup>E</sup>* and *S<sup>W</sup>* is *T<sup>O</sup>EO<sup>W</sup>* ; the rigid transform between *S<sup>W</sup>* and *S<sup>H</sup>* is *T<sup>O</sup>WO<sup>H</sup>* . The defined reference systems are shown in red in **Figure 2B**. Green arrows indicate rigid transforms from a reference system and the next one in the kinematic chain. To parameterize the *i-th* segment, we use the *product of exponentials* (POE) formula (Brockett, 1984):

$$\mathcal{g}\_{O\_{\text{ref}}\mathcal{O}\_{\text{\textquotedblleft}O\_{\text{\textquotedblright}}}}(\theta) = \left[\prod\_{k=1}^{j} e^{\hat{\xi}\_{k}\theta\_{k}}\right] \mathcal{g}\_{O\_{\text{ref}}\mathcal{O}\_{\text{\textquotedblleft}O\_{\text{\textquotedblright}}}}(0).$$

where ˆ*ξ<sup>k</sup>* are the twists of the joints defining the kinematic chain, *θ* = [*θ*1*, . . . , θk, . . . , θj*] *T* are the exponential coordinates of the second kind for a local representation of SE(3) (Special Euclidean group, 4 *×* 4 rototranslation matrices) for the *j*-th link, and *g<sup>O</sup>refO<sup>j</sup>* (0) is the initial configuration. For further details, the interested reader can refer to Gabiccini et al. (2013).

### **3.3. Markers Modeling**

Links movements were tracked by fastening optical active markers to upper limb links. Markers positioning is inspired by Biryukova et al. (2000). In order to improve tracking performance, a redundant configuration of marker was used, in particular 4 markers fixed to the chest, 6 markers fixed to the lateral arm, 6 markers

reference system centered in *OW*, CoR of wrist joints, fixed to the hand; *S<sup>H</sup>* refers to the reference system centered in *OH*, fixed to the hand. The label *TOrefOS* to the rigid transform between *Sref* and *SS*; *TOS OE* refers to the rigid transform between *S<sup>S</sup>* and *SE*; *TOEOW* refers to the rigid transform between *S<sup>E</sup>* and *SW*; *TOWOH* refers to the rigid transform between *S<sup>W</sup>* and *SH*.

fixed to the dorsal forearm, and 4 markers fixed to the hand dorsum. A picture showing marker distribution is reported in **Figure 1A**. The position of each marker can be calculated as rigid transform w.r.t. the center of the corresponding support. The supports kinematic can be described as a rigid transform from the link reference system to the support reference system, as depicted in **Figure 3**.

figure, we report the hand marker position.

The model is completely parameterized using 14 parameters (different for each subject) collected in a vector *pG*: bones length (arm and forearm, 2 parameters); rigid transform from epigastrium to the shoulder CoR (3 parameters); rigid transform from shoulder CoR to the center of arm marker support (3 parameters); rigid transform from elbow CoR to the center of forearm marker support (3 parameters); and rigid transform from wrist CoR to the center of hand marker support (3 parameters). The parameter vector *p<sup>G</sup>* was calibrated for each subject. Given *pG*, the upper limb pose is described by 7 joints angles [*q*1*, . . . , q*7] *T* collected in a vector *x*.

### **3.4. Model Calibration and Angles Estimation**

As previously mentioned, the parameters of the kinematic model must be adapted for the specific subject that performs the experiments. The optimal parameters were obtained by solving a constrained least-squares minimization problem:

$$(\mathbf{x}^\*, p\_G^\*) = \arg\min\_{\mathbf{x}\_k \in D\_{\mathbf{x}}, p\_G \in D\_{\mathbf{p}}} \frac{1}{2} \sum\_{k=1}^{N\_p} r\_k^T r\_k.$$

The residual function *r<sup>k</sup>* is calculated as *rk*(*xk*, *pG*): = *y<sup>k</sup> − f*(*xk*, *pG*), where: *y<sup>k</sup>* is the marker position vector measured with Phase Space; *x<sup>k</sup>* is the vector of estimated joint angles; *p<sup>G</sup>* is the vector of model kinematic parameters; *D<sup>x</sup>* is the upper limb joints range of motion; *D<sup>p</sup>* is the variation around a preliminary estimation of parameters performed with manual measurements; and *f*(*xk*, *pG*) is the estimated positions vector of markers using the forward kinematics. The vector of measures *y<sup>k</sup>* and the vectors of estimations *f*(*xk*, *pG*) can be obtained by concatenating the measures of marker positions and estimations at different time frames. To obtain an effective calibration output, the selected frames for the

calibration procedure must consider different poses of the kinematic chain. For the experiments reported in this work, we had *r<sup>k</sup>* normalized w.r.t. the dimension of *y<sup>k</sup>* equal to 15.30 *±* 16.25 mm; as an example we show in **Figure 4** the values of *r<sup>k</sup>* in a sample task. Taking inspiration from Gabiccini et al. (2013), the calibrated model was then used to identify the joints angles using an Extended Kalman Filter (EKF). Indeed, the model can be considered as an uncertain noisy process where at time frame *k* the joints angle vector *x<sup>k</sup>* is the state of the process, *y<sup>k</sup>* is the markers position vector, *w<sup>k</sup>* and *v<sup>k</sup>* are process and observation zero mean Gaussian noises, with covariance *Q<sup>k</sup>* and *Rk*, respectively, and *f*(*xk*) is the forward kinematics. The system can be described using the following equations:

$$\begin{cases} \mathbf{x}\_k = \mathbf{x}\_{k-1} + \mathbf{w}\_k\\ \mathbf{y}\_k = f(\mathbf{x}\_k) + \boldsymbol{\nu}\_k \end{cases} . \tag{1}$$

Given the state at time frame *k−*1, the state at time *k* was obtained using a 2-steps procedure: *prediction* of the future state ˆ*x<sup>k</sup>|k−*<sup>1</sup> = ˆ*x<sup>k</sup>−*<sup>1</sup>; *update* of the state estimated in the first step by calculating ˆ*x<sup>k</sup>|<sup>k</sup>* = ˆ*x<sup>k</sup>|k−*<sup>1</sup> + *Kk*˜*rk*. The correction amount of the state prediction is the product between the residual values vector ˜*r<sup>k</sup>* = *y<sup>k</sup> − f*(ˆ*x<sup>k</sup>|k−*<sup>1</sup>) and the Kalman Gain *Kk*. This gain is calculated as product between the covariance matrix estimation of the predicted state *P<sup>k</sup>|k−*<sup>1</sup> , the jacobian matrix, i.e., *H<sup>k</sup>* = *∂*(*f*(*x*)) *∂*(*x*) , and the inverse matrix of the residual covariance.

The performance of the estimation tool for time frame k can be evaluated by calculating the mean squared error (MSE) *R<sup>k</sup>* as

$$R\_k = \frac{1}{N\_{markers}} || (\wp\_k - f(\hat{\mathfrak{x}}\_{k|k})) ||$$

where *Nmarkers* is the number of markers, *y<sup>k</sup>* is the marker positions vector, and *f*(ˆ*x<sup>k</sup>|<sup>k</sup>*) is the vector of markers estimated positions using the joint angles calculated with the EKF. Typical values of *R<sup>k</sup>* are *≈* 1 cm, with a mean error for hand and forearm markers of *≈*0.5 cm and a mean error for arm markers of *≈*1.8 cm. Identified angles was used to reconstruct the movements, which were represented using a virtual model of upper limb suitably implemented in MatLab for these reconstructions. The reconstructions were used for a visual check w.r.t. the real movement recorded with cameras. This allows to verify the human-likeliness of the reconstructed movement.

### **4. DATA ANALYSIS**

The goal of this work is the study of functional motor synergies of upper limb. This is accomplished using functional PCA, a statistical method that allows to study the differences in shapes between functions. In order to avoid the inclusion in this analysis of undesired features due to misalignments in time or in velocity of the samples, we performed the following pre-processing techniques: segmentation, to divide the repetition of each task, time warping, to synchronize in time all the elements of the dataset.

### **4.1. Segmentation**

For each task, the three repetitions have been segmented using the following procedure (see **Figure 5**):


*q*<sup>3</sup> data (i.e., shoulder flexion–extension) was used for segmentation because it almost always contains three distinct peaks. If the peaks were not detectable, another DoF with detectable peaks was used instead. Note that the segmentation is performed using the same couple of segmentation points for all the 7 DoFs.

Considering different subjects and tasks, differences between shapes are evident (see **Figure 6A**). The three repetitions of each task should be replaced by the corresponding mean vector to increase robustness. This replacement can be performed only after signal synchronization, achieved using a time-warping procedure.

### **4.2. Time Warping**

The synchronization between two signals allows to increase the affinity by conforming starting-time and speed of the action. This can be achieved by finding the optimal time-shift and time-stretch of one signal w.r.t. the other one. This problem is known in literature as *dynamic time warping* (DTW) and widely explored in sound engineering and pattern recognition (Rabiner et al., 1978; Berndt and Clifford, 1994; Müller, 2007; Salvador and Chan, 2007). In this work, DTW is needed to allow the mediation of the three repetitions, to avoid misalignment, and to compare different tasks and subjects' data. For the problem explored in this work, the following assumptions were done: monotonicity, to preserve data integrity, and linear distortion of time. Given two time series, *v*<sup>1</sup> and *v*2, the affinity between the two signals is increased by the solution of the following least-squares minimization problem:

$$(\mathcal{S}, T) = \arg\min\_{\mathcal{S}>0, T} \left( ||\nu\_1(t) - \nu\_2(\mathcal{S}t - T)|| \right)$$

where *S* is a scaling factor for the velocity of signal *v*<sup>2</sup> and *T* is the amount of shifting in time applied to *v*2. The dataset elements were time-warped w.r.t. a reference time series, selected in the set as the element whose length is the mean value w.r.t. the length of all dataset elements. For each element, *S* and *T* are calculated by performing DTW on DoFs used for segmentation, then all the components are time-warped using the optimum set of parameters. The time-warped vectors have the same number of frames (number of elements). Once the time warping was performed on all the dataset elements, the three repetitions for each task can be replaced by the corresponding mean vector. A sample output of this procedure is reported in **Figure 6B**. In **Figure 7**, a scheme of the data analysis procedure is reported.

## **4.3. Principal Component Analysis**

To explain fPCA, it is useful to start from classic principal component analysis (PCA). Principal component analysis (PCA) is a statistical procedure that uses an orthogonal transformation to convert a set of observations of possibly correlated variables into a set of values of linearly uncorrelated variables called principal components. This transformation is defined in such a way that the first principal component (PC) has the largest possible variance (that is, it accounts for the largest part of the variability in the data). The other components explain an amount of variance in decreasing order, with the constraint that each principal component is orthogonal to the previous ones. Hence, the resulting vectors represent an orthogonal basis set. Principal components are calculated as eigenvectors of the covariance matrix of data. The variance explained by each PC is calculated as normalization of the corresponding eigenvalue. Given the first eigenvector *ξ*1, the principal component score *fi*<sup>1</sup> = *ξ ′* <sup>1</sup>*x<sup>i</sup>* maximizes ∑ *i f* 2 *i*1 subject to ||*ξ*1|| = 1; the second eigenvector *ξ*<sup>2</sup> maximizes ∑ *i f* 2 *i*2 subject to ||*ξ*2|| = 1 and *ξ ′* <sup>2</sup>*ξ*<sup>1</sup> = 0, and so on.

## **4.4. Functional Principal Component Analysis**

Functional PCA can be described as a functional extension of PCA. The first functional principal component *ξ*1(*t*) is the function for which the principal component score *fi*<sup>1</sup> = ∫ *ξ*1(*t*)*xi*(*t*)*dt* maximizes ∑ *i f* 2 *i*1 subject to ∫ *ξ* 2 <sup>1</sup> (*t*)*dt* = *||ξ*1*||* = 1; the second functional principal component *ξ*2(*t*) maximizes ∑ *i f* 2 *i*2 subject to ||*ξ*2|| = 1 and ∫ *ξ*2(*t*)*ξ*1(*t*)*dt* = 0, and so on. In practice, this is done implementing the following steps:


### **4.5. Movement Reconstruction and Performance Analysis**

We used fPCA on this dataset after the post-processing phase reported in previous sections. 15 fifth order spline basis elements were used, taking inspiration for the polynomial description in Flash and Hogan (1985). Each basis function is defined by piecewise polynomial functions. The places where the pieces of the spline intersect are known as knots. Each piece has the following form

$$s\_k(t) = \sum\_{i=1}^{5} a\_{ik}(t - t\_k)^i$$

where *t<sup>k</sup>* is the *k th* knot. The fPCs can be used to reconstruct the data sample by adding M fPCs weighted by coefficients *ci*, i.e., *xrec* = ¯*x* + *c*1*ξ*<sup>1</sup> + *. . .* + *ciξ<sup>i</sup>* + *. . .* + *cMξ<sup>M</sup>* with *M ≤ N*.

This analysis allows to infer that the first fPC by itself account for 60–70% of the variation w.r.t. the mean function, as reported in **Figure 8**, with a mean value between the DoFs of 65.2%, a minimum value of 54.4% and a maximum of 76.9%. What is noticeable is that reconstruction with the first fPCs provides good results, in fact the explained variance of the first three fPCs is

higher than 84% for all DoFs. In **Figure 9**, we show how the main principal functions can shape the reconstruction of the joints trajectories. Individual basis function does not need to represent meaningful movements. What is needed is that a combination of basis elements (plus an offset) could reproduce any original trajectory of the joints of the dataset. The reconstruction performance is showed in **Figure 10A**, in which a reconstruction using 1, 2, and 3 fPCs is reported. In order to quantify the reconstruction performance, an index of reconstruction error can be evaluated as

$$E\_{RMS} = \sqrt{\frac{1}{N\_{DoF}} \sum\_{i=1}^{N\_{DoF}} \left( \sqrt{\frac{1}{N\_{frames}} \sum\_{j=1}^{N\_{frame}} \left( \left( \mathbf{x} - \mathbf{x\_{rcc}} \right)^2 \right)^2} \right)^2}$$

where *x* is the real function and *xrec* is the reconstructed function.

**Figure 10B** reports a plot of the normalized error, calculated as *ERMS*/*max*(*ERMS*), for different number of fPCs used. Initial point refers to the case where only mean function is used for reconstruction and the value of *ERMS* is 0.6 rad. The reconstruction using one fPC has an *ERMS* value lower than 0.2 rad, adding other fPCs, the reconstruction error decay, i.e., using three fPCs the *ERMS* value is around 0.1 rad. Furthermore, the whole reconstructed movement for the upper limb (considering all DoFs) was displayed using a visualization tool developed in MatLab, showing a high level of anthropomorphism and realism. We can conclude that the kinematic complexity of upper limb trajectories can be simplified and easily described using the mean function and few principal functional modes.

**FIGURE 9** | In **(A)**, we report the mean function (in black) and the same mean function with the contribution of the first principal function, weighted with a coefficient *α* equal to one (with positive sign in red dashed line, with negative sign in red dotted line); in **(B)**, we report the mean function (in black) and the same mean function with the contribution of the second principal function with a coefficient *α* equal to one (same legend of **(A)**); in **(C)**, we report the mean function (in black) and the same mean function with the contribution of the third principal function with a coefficient *α* equal to one (same legend of **(A)**).

the first principal component. The blue line is the reconstructed data using the mean values and the first two principal components. The green line is the reconstructed data using the mean values and the first three principal components. In **(B)** we report the normalized reconstruction error (RMS). The initial point refers to the error when only the mean function is used for reconstruction. The other points refer to the error when one or more fPCs are used for the reconstruction.

### **5. CONCLUSION AND IMPLICATIONS FOR ROBOTICS AND BIOENGINEERING**

In this work, we have shown that the complexity of upper limb movements in activities of daily living can be described using a reduced number of functional principal components. To achieve this goal, we developed an experimental setup, which is based on kinematic recordings but also allows to include additional sensing modalities. Kinematic data are based on a 7 DoFs model and are quantified through a calibration-identification procedure. Collected data were used to characterize upper limb movements through functional analysis. The findings of this work can be used to pave the path toward a more accurate characterization of human upper limb principal modes, opening fascinating scenarios in rehabilitation, e.g., for automatic recognition of physiological and pathological movements (e.g., stroke affected subjects) through machine learning.

At the same time, the here reported results and future investigations could also offer a valuable inspiration for the design and control of robotic manipulators. First, recognizing that few principal modes describe most of kinematic variability could provide insights for a more effective planning and control of robotic manipulators. For the planning phase, using input trajectories as combinations of the main functional components, which explain most of the kinematic variability, could represent a successful initial guess to control the movement of the robot—eventually combined with a feedback correction. This combination of feedforward and feedback components could be successfully employed also with soft robotic manipulators, i.e., robots designed to embody safe and natural behaviors relying on compliant physical structures purposefully used to achieve desirable and sometimes variable impedance characteristics. In these cases, standard methods of robotic control can effectively fight against or even completely cancel the physical dynamics of the system, replacing them with a desired model—which defeats the purpose of introducing physical compliance. To overcome this limitation in Della Santina et al. (2017), an anticipative model of human motor control was proposed, which used a feedforward action combined with low-gain feedback, with the goal of obtaining human-like behavior through iterative learning. Results presented in this work could be used to define the feedforward component for the control of soft robots. Second, using humanlike primitives for controlling robotic systems could improve the effectiveness and safety of human–robot interaction (HRI). Indeed, several studies identified anthropomorphism as one of the key enabling factor for successful, acceptable, predictable, and safe HRI in many fields, such as human robot co-working and rehabilitative/assistive robotics (Duffy, 2003; Bartneck et al., 2009; Riek et al., 2009; Dragan and Srinivasa, 2014). Furthermore, the here reported experimental and analytical framework could be used to identify principal actuation schemes for under-actuated robotic devices. As an example, in Casini et al. (2017), we used the identification procedure and the kinematic model reported in this work to estimate the contribution of wrist joints in the most common poses for grasping. We performed PCA on the estimated joints of the wrist pre-grasp poses and we found that the flexo-extension DoF plays a dominant role. We used these results to calibrate an under-actuated wrist system, which is also adaptable and allows to implement different under-actuation schemes, demonstrating its effectiveness to accomplish grasping and manipulation tasks. Future works will aim at using functional data to allow a dynamic implementation of principal kinematic modes of human upper limb in robotic systems. Finally, the integration of other sensing modalities, such as electro-encephalographic recordings, could be used to study neural correlates of human upper limb motions, thus possibly inspiring the development of effective brain–machine interfaces for assistive robotics.

### **ETHICS STATEMENT**

This study was carried out in accordance with the recommendations of Regione Toscana, D.G.R. no. 158 23/02/2004, "Direttive regionali in materia di autorizzazione e procedure di valutazione degli studi osservazionali," with written informed consent from all subjects in accordance with the Declaration of Helsinki, and in observation of the "Guideline for good clinical practice E6(R1), International Council for Harmonization of Technical Requirements for Pharmaceuticals for Human Use (ICH)." The protocol was approved by the local Ethical Committee, i.e., "Comitato Etico di Area Vasta Nord-Ovest (CEAVNO)."

### **AUTHOR CONTRIBUTIONS**

GA, CS, MB, and AB designed the study. GA, FF, and MB designed the protocol. GA, EB, and FF designed and developed

### **REFERENCES**


the experimental setup. GA and FF performed the experiments. GA, CS, and MB performed data analysis. All authors contributed to writing the manuscript.

## **FUNDING**

This work is supported by the European Commission H2020 grants "SOMA" (no. 645599) and SOFTPRO (no. 688857) and by the ERC Advanced Grant no. 291166 "SoftHands."


*ACM/IEEE International Conference on Human Robot Interaction* (La Jolla, CA: ACM), 245–246.


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Copyright © 2017 Averta, Della Santina, Battaglia, Felici, Bianchi and Bicchi. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

# Tactile-STAR: A Novel Tactile STimulator And Recorder System for Evaluating and Improving Tactile Perception

Giulia Ballardini<sup>1</sup> , Giorgio Carlini<sup>1</sup> , Psiche Giannoni<sup>1</sup> , Robert A. Scheidt2,3,4† , Ilana Nisky5,6† and Maura Casadio<sup>1</sup> \* †

<sup>1</sup> Department of Informatics, Bioengineering, Robotics and Systems Engineering, University of Genoa, Genoa, Italy, <sup>2</sup> Marquette University and the Medical College of Wisconsin, Milwaukee, WI, United States, <sup>3</sup> Feinberg School of Medicine, Northwestern University, Chicago, IL, United States, <sup>4</sup> Division of Civil, Mechanical and Manufacturing Innovation, National Science Foundation, Alexandria, VA, United States, <sup>5</sup> Department of Biomedical Engineering, Ben-Gurion University of the Negev, Beersheba, Israel, <sup>6</sup> Zlotowski Center for Neuroscience, Ben-Gurion University of the Negev, Beersheba, Israel

Many neurological diseases impair the motor and somatosensory systems. While several different technologies are used in clinical practice to assess and improve motor functions, somatosensation is evaluated subjectively with qualitative clinical scales. Treatment of somatosensory deficits has received limited attention. To bridge the gap between the assessment and training of motor vs. somatosensory abilities, we designed, developed, and tested a novel, low-cost, two-component (bimanual) mechatronic system targeting tactile somatosensation: the Tactile-STAR—a tactile stimulator and recorder. The stimulator is an actuated pantograph structure driven by two servomotors, with an end-effector covered by a rubber material that can apply two different types of skin stimulation: brush and stretch. The stimulator has a modular design, and can be used to test the tactile perception in different parts of the body such as the hand, arm, leg, big toe, etc. The recorder is a passive pantograph that can measure hand motion using two potentiometers. The recorder can serve multiple purposes: participants can move its handle to match the direction and amplitude of the tactile stimulator, or they can use it as a master manipulator to control the tactile stimulator as a slave. Our ultimate goal is to assess and affect tactile acuity and somatosensory deficits. To demonstrate the feasibility of our novel system, we tested the Tactile-STAR with 16 healthy individuals and with three stroke survivors using the skinbrush stimulation. We verified that the system enables the mapping of tactile perception on the hand in both populations. We also tested the extent to which 30 min of training in healthy individuals led to an improvement of tactile perception. The results provide a first demonstration of the ability of this new system to characterize tactile perception in healthy individuals, as well as a quantification of the magnitude and pattern of tactile impairment in a small cohort of stroke survivors. The finding that short-term training with Tactile-STAR can improve the acuity of tactile perception in healthy individuals suggests that Tactile-STAR may have utility as a therapeutic intervention for somatosensory deficits.

Keywords: tactile stimulation, somatosensory function, skin stretch, skin brush, stroke, neurological disease, haptics

#### Edited by:

Gionata Salvietti, University of Siena, Italy

#### Reviewed by:

Luigi Tamè, Birkbeck, University of London, United Kingdom Jakob Fröhner, Technische Universität München, Germany

#### \*Correspondence:

Maura Casadio maura.casadio@unige.it †These authors have contributed equally to this work.

Received: 28 November 2017 Accepted: 05 March 2018 Published: 06 April 2018

#### Citation:

Ballardini G, Carlini G, Giannoni P, Scheidt RA, Nisky I and Casadio M (2018) Tactile-STAR: A Novel Tactile STimulator And Recorder System for Evaluating and Improving Tactile Perception. Front. Neurorobot. 12:12. doi: 10.3389/fnbot.2018.00012

## INTRODUCTION

fnbot-12-00012 April 4, 2018 Time: 18:25 # 2

Many people with neurological diseases suffer from impairments of the motor and the somatosensory functions. Reliable methods to quantify somatosensory deficits are crucial for better understanding the pathophysiology of the diseases and for enhancing the detection of early symptoms and informing novel neuro-rehabilitative approaches to improve upper-limb functions and quality of life.

Impaired somatosensory function significantly affects the quality of daily living. Somatosensation is critical for autonomy in the environment and purposeful interaction with the external world. An example of a somatosensory ability of a healthy individual is identifying an object using only haptic perception, or stereognosis (Irving, 1968). It entails active haptic exploration (Jones and Lederman, 2006), and incorporates both movement control to manipulate the object with the fingers, and the sensory capacity to provide cues from texture, size, spatial properties, and temperature (Yekutiel et al., 1994). Other examples include perception of stiffness or other mechanical properties (Jones and Hunter, 1993; Leib et al., 2016), and sensing contact and friction forces for manipulation of objects and prevention of their slippage (Kandel et al., 2000; Johansson and Flanagan, 2009).

There are two main somatosensory systems that are vital to daily functions—kinesthetic and tactile. The kinesthetic system provides information about the position and movement of the body and limbs (proprioception) using muscle spindles and joint mechanoreceptors, and force information using the Golgi tendon organs (Winter et al., 2005; Proske and Gandevia, 2009, 2012). The tactile (or cutaneous) system provides information about contact with objects using mechanoreceptors in the skin (Demain et al., 2013). Information from these two systems is integrated in the central nervous system (Gurari and Okamura, 2014; Culbertson et al., 2018) together with predictions from internal representations (Körding and Wolpert, 2004) to create perception of the external world and the body schema, to plan and control movement (Morasso et al., 2015; Farajian et al., unpublished), and acquire skill (Vidoni et al., 2010). In this study we focus on the tactile system.

In the neurological assessment, somatosensory functions are most often subjectively assessed by clinicians using qualitative clinical scales (Winward et al., 1999; Scott and Dukelow, 2011). Several approaches are currently used to assess tactile acuity (Craig and Johnson, 2000), including: two-point threshold, gap detection (Stevens and Choo, 1996), and grating orientation. The latter is a reliable index of recovery following nerve damage (Van Boven and Johnson, 1994). An example of a quick and low-cost device to detect thresholds for mechanical stimuli is the Frey filaments (Von Frey, 1896; Johansson et al., 1980; Woolf, 1983; Lambert et al., 2009). However, all of these approaches evaluate static tactile acuity. By contrast, clinicians often assess somatosensation by touching the skin of the patients to evaluate their ability to detect the extent and the direction of a moving tactile stimulus. Quantifying such dynamic acuity during neurological examination remains difficult because of the limited sensitivity and reproducibility of the clinical tests.

The introduction of robotic technologies into clinical assessment and treatment has advanced the understanding and the treatment of motor functions in many neurological diseases (Prange et al., 2006; Kwakkel et al., 2008; Mehrholz et al., 2012; Norouzi-Gheidari et al., 2012; Basteris et al., 2014; Klamroth-Marganska et al., 2014). In contrast to this vast proliferation of robotic technologies in rehabilitation of motor functions, the somatosensory functions have received less attention. Specifically, robotics technology has been successfully used to quantify and characterize proprioceptive deficits in the research domain (Carey et al., 1996; Dukelow et al., 2010, 2012; Wilson et al., 2010; Simo et al., 2011; Semrau et al., 2013; Domingo and Lam, 2014; Aman et al., 2015; De Santis et al., 2015; Chisholm et al., 2016; Kuczynski et al., 2016; Maggioni et al., 2016; Marini et al., 2016, 2017), but their use in the clinical settings is still limited. One possible impeding factor in wider adoption of the several proposed technological solutions in the clinic is their high costs. To date, in this domain, the tactile system was almost neglected.

In comparison to the above-mentioned robotic technologies, tactile stimulation devices are often low cost, small, lightweight, and can be easily integrated into wearable technologies. These qualities make tactile stimulation technology attractive for rehabilitation and clinical assessment, especially in ambulatory conditions. Tactile feedback can be provided by using electrical and mechanical stimulations. Electrotactile stimulation involves passing an electrical current through the skin (Szeto and Saunders, 1982). It has been demonstrated that this type of stimulation has positive effects on motor performance, limb sensation, and the configuration of sensory evoked potentials of the paretic limb in people with chronic stroke (Peurala et al., 2002). Mechanical stimulation can be produced by vibration, pressure, or skin stretch (Demain et al., 2013; Culbertson et al., 2018). Specifically, vibrotactile stimulation is very prominent and simple to administer, and the frequency of the delivered vibration can be modulated to convey information (Sherrick et al., 1990). It has been shown useful, for example, to synthesize and deliver vibrotactile kinesthetic feedback to enhance stabilization and reaching actions performed with the arm and hand in neurotypical people (Krueger et al., 2017) and to improve proprioception (Cuppone et al., 2016). However, some users report continuous vibration to be annoying (Bark et al., 2008). Another limitation of the vibration approach is that the Pacinian corpuscles that detect vibration have large receptive fields, and therefore, the source of the vibration cannot be accurately localized (Bark et al., 2008).

In recent years, significant progress has been made in the development of devices for tactile stimulation that deform the skin by indentation or stretch (Drewing et al., 2005; Lévesque et al., 2005; Luk et al., 2006; Gleeson et al., 2010; Prattichizzo et al., 2012; Quek et al., 2014b, 2015a,b; Memeo and Brayda, 2016; Schorr and Okamura, 2017). There are many different mechanical approaches to applying skin stimulation, including a rotation of an end-effector on the skin (Bark et al., 2009; Chinello et al., 2016; Battaglia et al., 2017) or movement of a rigid end effector against the user's fingerpad (Kuniyasu et al., 2012; Schorr et al., 2013; Quek et al., 2014a, 2015b). Skin stretch is very effective

in providing the users with rich information; for example, stretch of the skin can augment perception of stiffness (Quek et al., 2014a), force magnitude (Paré et al., 2002), and friction (Provancher and Sylvester, 2009). Importantly, skin stretch can be used to convey directional information (Gleeson et al., 2009), and even replace kinesthetic information in navigation tasks (Guinan et al., 2013; Quek et al., 2014b, 2015a). A skin-stretch device was used to substitute for force in a teleoperated palpation, more effectively than the widely used vibration feedback (Schorr et al., 2015), and in a virtual peg-in-hole insertion task (Quek et al., 2015b). This task is often used for evaluation of robotic interfaces—participants have to insert an elongated peg into a narrow hole.

In most of these applications, skin stretch was applied in the fingertip (Pacchierotti et al., 2017) and it may be that in other locations with larger surface areas and more rough skin, it may be more effective to use brush stimulation. We define tactile brushing as a slight pressure while moving along the surface of the skin. Therefore, in the current work, we designed a device that can apply a stretch or a brush stimulation to different parts of the body, and focused on brush stimulation for our evaluation.

The long-term goal of our study is to develop a low-cost haptic device for assessing and rehabilitating somatosensation in subjects suffering from sensorimotor deficits. This device shall be able to apply skin-brush and skin-stretch stimuli to various parts of the body. Toward this goal, here we aimed at: (1) designing a first prototype of the device: the Tactile-STAR—a tactile stimulator and recorder, (2) validating its utility in the assessment and training of tactile acuity by collecting normative performance and training data in healthy human participants, and (3) demonstrating its ability to detect and quantify somatosensory deficits in a small cohort of stroke survivors.

## MATERIALS AND METHODS

### System Design and Implementation

The Tactile-STAR system is composed of two interconnected devices (**Figure 1**). The first device, the stimulator, is an actuated pantograph structure driven by two servomotors. The end-effector of the stimulator is covered by a cap of rubber material that moves in contact with the skin. Depending on its mechanical configuration, the device can provide different forms of tactile stimulation (see below). The second device, the recorder, is a passive pantograph that measures the motion of its handle (its end effector) using two precision potentiometers. Both systems interface to an Arduino microcontroller system, which also interfaces to a laptop computer that runs a LabVIEW <sup>R</sup> 2016 "virtual instrument" (National Instruments Inc.) that monitors the state of both systems, controls the state of the stimulator device, and provides user interfaces for the experimenter and the research participant.

### The Pantograph Structures

Both the stimulator and the recorder have identical pantograph structures with four links and two degrees of

freedom (Campion et al., 2005); see section 1. "Direct and Inverse Kinematics of the Stimulator and Recorder Devices" (Supplementary Figure S1) of the Supplementary Material for forward and inverse kinematics. The current prototype (**Figure 1**) has a symmetric design such that the left and right links of the device are identical, with lengths of 5.75 cm for the proximal links and 6.75 cm for the distal links. We selected these dimensions to obtain a workspace large enough to stimulate almost half of the lower arm length, which ranges between 24.34 cm for females and 26.99 cm for males (Gordon et al., 1989; **Figure 2B**, B). The mechanical linkage was required to be rigid and lightweight. The rigidity is important because the linkage must maintain its shape and not bend when stimulating the skin. To increase rigidity without adding weight, we designed

the links with a T-shaped cross-section (see Supplementary Figure S2). The arm links were connected with a ball-bearing (MinebeaMitsumi Inc.) fixed into one link, and a metal axle rigidly connected to the adjoining link. We fixed a plastic ring on the top of the axle in order to maintain the axle in the correct perpendicular orientation during all movements. By configuring the connection between the two arms in this way, we ensured that: (1) the links were on two different levels to prevent collisions between the arms; and (2) the resulting workspace was maximized for the given link dimensions, and (3) there were no unreachable points inside the workspace. All the parts of the pantograph structure were manufactured by a Form 2 stereolithographic printer (FormLabs Inc.), with a resolution of 0.05 mm (see section 2. "Development of the Device Through 3D Printers" in Supplementary Material for more details).

#### The Stimulator

The arms of the pantograph structure are connected on one side to two servomotors (Parallax Standard Servo, Parallax Inc.) and on the other side to the end effector (**Figure 2B**, A). Each servomotor has a range of motion of 180◦ . To ensure against sliding between the proximal link and the motor, a linchpin is used to lock the link to the motor. Although the selected servomotor does not normally provide an output signal corresponding to its angle of rotation, it is possible to measure that signal by tapping into the servo's internal potentiometer to derive a voltage that is proportional to the angle of rotation. We read that signal to verify that each commanded position was reached correctly. The end-effector is placed on top of the upper pantograph link distal form the motor and it is composed by a base layer with a hollow cylinder. In the cylinder, there is a fillet expansion insert that houses a screw. The head of the screw is the tip of the end effector that would be in contact with the skin. To make the sensation more comfortable while increasing the friction, it is covered by a cap of rubber material (IBM ThinkPad TrackPoint Cap). This screw allows regulating the height of the tip of the end effector, thus providing different tactile sensations (**Figure 2C**).

To have a skin-stretch sensation, it is necessary to place over the stimulator device another structure with an aperture within which the end effector moves. The design of this structure is modular, such that it is possible to use different sizes and shapes of the aperture and the end-effector, without changing the entire structure (**Figure 2C**). Therefore, the sensation created by the tactile stimulator can range from light-touch to skin stretch, depending on the shape of the end-effector and on the size of the aperture. The aperture structure placed over the pantograph also serves as a support by sustaining weight placed on it by the user's arm. This structure is rigidly connected to a base-platform, upon which the motors that move the robot arms are fixed. To ensure that the end-effector remains at all times perpendicular to the horizontal plane without bending, the base platform also has a plastic plane that supports the distal, lower link of the pantograph, immediately below the end-effector. To decrease friction during sliding, the lower link's contact point is covered with a 2-mm layer of polytetrafluoroethylene (PTFE). When the device is operated and the end-effector touches the skin, this contact causes friction. Therefore, in each trial, we recorded the reading of the potentiometers, and monitored whether or not the end-effector motion was affected by the friction. During experimental setup, we adjusted the height of the end-effector such that the tactor did not become stuck at any time, and that it would arrive to all desired targets.

### The Recorder

The proximal links of the pantograph structure are connected to two rotational, single-turn potentiometers (Vishay 132, Vishay Intertechnology, Inc.) that have a linear taper, a resistance of 2 K ± 3% and a linearity of ±0.5%. The distal links are connected to a handle as described below (**Figure 1**). The recorder has a baseplate structure designed such that the centers of rotation of the two potentiometers are positioned relative to one another in an identical manner as the servo motor centers of rotation on the stimulator device. Thus, the pantograph structure of the recorder is exactly the same as that of the stimulator. The lower distal arm is connected through a brass axle to the handle

of the device. The handle itself is composed of two parts: (1) a cylinder (1 cm radius × 10 cm high), which is intended to be held in the participant's hand, and (2) a low-friction disk that supports the hand's weight. The bottom surface of the disk is coated with PTFE to decrease friction as it slides over the top surface of the rigid baseplate. The recorder can serve two purposes: (1) in its passive mode, the user can move the recorder's handle to match the direction and amplitude of the tactile stimulus generated by the stimulator, or (2) in the active mode, the user can move the handle as a master manipulator to control the tactile stimulator as a slave.

The stimulator and recorder are each mechanically connected to a larger rigid ground plane (**Figure 1**). The two devices can be mounted to the ground plane in several different configurations and in this way, we can stimulate either the right or the left hand and use the handle with the opposite hand. The distance between the two devices can be modified according to individual participant anthropometric measurements.

### System Control Architecture (Figure 3)

A circuit board based on the Atmel ATmega328p microcontroller (Atmel Inc.) performs analog-to-digital conversion on four input voltage signals derived from the two potentiometers embedded within each device. An additional analog input is reserved for a force sensor that can be inserted optionally inside the stimulator device to measure the force applied by the endeffector to the skin. The microcontroller sends as outputs an independent control signal for each of the two motors of the stimulator. These two Pulse-Width-Modulation (PWM) signals set reference angular positions for the two motors, which enforce those positions under internal, closed-loop, feedback control. The microcontroller also relays the potentiometers signals from the stimulator and the recorder to a laptop computer, and receives as input from the laptop the desired angular positions of the stimulator joints (see Supplementary Figure S3 for more details on electrical connections). The laptop runs a program that controls the system, provides visual feedback of the task to the research participant, and provides a user interface for the experimenter.

The Tactile-STAR system can work in two distinct modes. In the passive mode, the user moves the handle of the recorder to match the direction and amplitude of motions produced by the tactile stimulator. The laptop computes the desired joint angles of the stimulator from the desired end-effector path using the kinematic equations reported in section 1. "Direct and Inverse Kinematics of the Stimulator and Recorder Devices" in Supplementary Material. The joint angles from the stimulator's potentiometers are recorded to verify that the target positions commanded by the laptop and controller are reached correctly. The joint angles of the recorder are measured with its potentiometers to verify that the participant correctly replicates the stimulation. In the Tactile-STAR's active mode, the user can move the handle of the recorder as a master manipulator to teleoperate the tactile stimulator as a slave. In this mode, the joint angles of recorder device are used to set the desired joint angles for the stimulator. In both modes, scale factors may be programmed between the workspaces of the two devices in order to break the nominal 1:1 correspondence between the recorder's handle and the stimulator's end-effector.

### Stimuli

The Tactile-STAR stimulator can produce two distinct forms of tactile stimuli: skin-brush and skin-stretch stimulation. As for the skin-stretch stimulation, the tip of the end-effector is raised from 1.5 to 2.5 mm (**Figure 2C**) above the surface upon which the tested limb (or body part) is resting and moves inside a smaller aperture (elliptical shape: 0.022 m × 0.018 m) with raised margins. As for the skin-brush stimulation, the aperture is larger (rectangular shape: 0.060 m × 0.040 m), its margins are at the same level of the surface where the limb is resting, while the tip of the end-effector is slightly raised above it (<1.5 mm; **Figure 2B**).

#### Software

We used a custom LabVIEW 2016 program, along with the LabVIEW Interface for Arduino (LIFA), to control the stimulator and recorder devices, to provide real-time visual feedback to the research participant, and to provide an experimental control interface for the experimenter. The custom LabVIEW program

allows the experimenter to define experimental task parameters, including participant anthropometrics. The program also stores position (and optionally force) data to disk for subsequent (offline) analysis.

### Technical Validation

fnbot-12-00012 April 4, 2018 Time: 18:25 # 6

We validated the accuracy and precision of the stimulator's control of end-effector position using an optical motion tracking system. Three infrared cameras (V120 slim, NaturalPoint Inc., OR, United States; software: C++ custom modification of NaturalPoint SDK) recorded the three-dimensional position of an active infrared marker that we fixed to the top of the end-effector. We defined 24 spatial targets that were distributed across four elliptic arcs that spanned the stimulator's entire workspace (**Figure 2A**, right panel; **Figure 4**). We programmed the stimulator to reach each of the targets 10 times, and to stay in the commanded position for 1 s. For each target point, the constant error was less than 0.035 mm (mean ± SD 0.002 ± 0.018 mm), while the variable error was less than 0.005 mm (mean ± SD 0.002 ± 0.001 mm).

We repeated the same calibration procedure for the recorder. We manually positioned the tip of the end-effector on the same target points used for calibrating the stimulator, and verified that we reached the correct positions using the user interface of the recorder device. We then recorded these positions using both the encoders of the recorder and the optical system. For each target point, the constant error was less than 0.009 mm (mean ± SD 0.001 ± 0.004 mm), while the variable error was less than 0.009 mm (mean ± SD 0.002 ± 0.001 mm). Thus, the errors obtained with this low-cost prototype were negligible in the experimental settings used for the validation testing described below.

### Verification Study Involving Human Participants

All participants provided written informed consent to participate in the study procedures, which were approved by a local institutional ethics committee—Comitato Etico ASL3 Genovese (Italy)—in compliance with the Declaration of Helsinki.

### Verification Study Involving Healthy Participants

We sought to perform a first functional test of the Tactile-STAR system with young participants without somatosensory deficits to verify its ability to characterize and affect tactile perception. Participants were tested before and after 30 min of perceptual training (described below) using the Tactile-STAR device. We tested two main hypotheses: (1) the ability to identify correctly distinct skin-brush stimuli applied to the palm of the hand is not uniformly distributed across the palm; (2) the ability to correctly identify distinct skin-brush stimuli applied to the palm of the hand can improve following a short period (∼30 min) of practice.

#### Participants

Sixteen healthy young right-handed participants (eight females, 24 ± 2 years) participated in a single-session experiment wherein they interacted with the Tactile-STAR for approximately 45 min. All participants were naïve with respect to both the device and the experimental procedures.

### Experimental Set-Up

Participants sat on a chair in front of a table upon which we placed the Tactile-STAR system. The recorder device was centered on the participants' midline, and the stimulator device was placed on their right side (**Figure 1C**). Prior to testing, the stimulator was configured to stimulate the palm of the hand with a low end-effector profile. To prevent fatigue, the right arm was supported against gravity by a fixture placed next to the chair. The stimulator device had the center of its workspace aligned with the center of the right-hand palm. To prevent visual feedback of the stimulator's position and motion, we added an opaque box over the tactile stimulator, thus hiding the mechanical structure from view. We also added a transparent plane on top of the recorder device where we projected visual targets (red dots; 1.5 mm radius; **Figures 1A,B**) that the stimulator could reach during testing. During the experiment, the participants did not use headphones. However, they reported that the background noise was higher than the device noise and that they relied on their somatosensation and not on acoustic feedback for solving the task.

### Protocol

During testing (i.e., phase 2 and phase 4 of the experimental protocol; see below), the Tactile-STAR produced 16 unique tactile skin-brush stimuli of varying amplitudes and directions relative to the center of the stimulator's workspace (and thus, relative to the center of the palm; **Figure 4A**). The stimulator's end-effector, in light contact with the skin, made movements from the center of the workspace outward to targets placed on two concentric ellipses, resulting in center-out brushing stimulation on the participant's palm. The dimensions of the axes of the inner ellipse were half of the respective axes of the outer ellipse (outer ellipse axes: 4 and 5 cm). The larger axis was aligned along the proximal–distal direction while the minor axis was aligned along the medio-lateral direction. Eight targets were equally distributed (45◦ apart) on each ellipse.

The experimental protocol consisted of four sequential phases (**Figure 4A**):

### **Phase 1: familiarization**

The purpose of this phase was to allow participants to gain familiarity with the spatiotemporal characteristics of skin-brush stimulation. The Tactile-STAR was placed in active mode and participants used the recorder's handle to freely explore the stimulator's end-effector workspace. When the participants moved the handle of the recorder device, the stimulator device produced an end-effector motion that was identical in magnitude and direction to the movement they made. This phase continued for a minimum of 2 min and a maximum of 4 min.

#### **Phase 2: pre-training test**

The purpose of this phase was to assess each participant's ability to discriminate between skin-brush stimuli of different magnitudes

and directions (see section "Protocol"), and to use those stimuli to guide the planning and execution of goal-directed reaching movements. To do this, the Tactile-STAR was placed in passive mode and the tactile stimulator presented skin-brush stimulation to the palm of the hand using end-effector trajectories that moved from the central position to one of the target positions at a constant speed of 0.02 m/s. Upon reaching the target, the end-effector held its position as the participants moved the handle of the recorder device with their non-dominant hand until they believed that they had reached the corresponding target. Then, they held this position for a minimum of 0.5 s and declared to the experimenter that they had identified the stimulus. After having done so, they were instructed to return the handle of the recorder to the central position, and the stimulator returned to the start position at the maximal speed of the motors. After a pause of 1.5 s, the next stimulation trial started. Each of the 16 test targets was presented to the participant five times in random order (80 trials total). Participants received no feedback about their performance either during or after training.

### **Phase 3: training**

The purpose of the training phase was to provide participants with extended practice in a stimulus-discrimination and replication task designed to encourage sensorimotor learning of the mapping between the motion of the recorder device's handle and the motion of the stimulator's end-effector. Each trial in the training phase had two parts. First, as in phase 2, participants were presented with tactile skin-brush stimulation as the end effector moved at 0.02 m/s from the central target to each of eight training targets selected from the set of 16 testing targets (**Figure 4A**, training). When the end-effector arrived at the target, that position was held for 1.5 s before returning at maximum speed to the central position. Second, the participant had to replicate with the non-dominant hand the handle motion corresponding to the skin-brush stimulation they had just experienced. To encourage sensorimotor learning in this training phase, the Tactile-STAR was placed in active mode during movement replication such that the participants received tactile feedback corresponding to motions they made during the replication trials; i.e., the stimulator replicated the motion of the

recorder. In other words, participants received state feedback in the stimulated hand that corresponded to the position and motion of the recording hand. When the participant believed that they had arrived at the cued target, they declared that fact to the experimenter and then returned the handle to the central "home" position. If they had erred and reached the wrong target, they would hear an audible, non-startling error tone, and the same stimulus was repeated until the participants correctly interpreted it. Inter-trial intervals were nominally 1.5 s.

During training, participants performed three "training sets" that were separated by 3-min pauses to minimize to likelihood that participants might experience fatigue. In each training set, each of the eight training stimuli was presented three times in pseudo-random order, with the constraint that the same stimulus could not be presented more than two times in a row. To evaluate the learning without spatial accuracy biases that can arise due to the inertial anisotropy of the arm and hand (Gordon et al., 1994; Simo et al., 2011), or due to differences in the sensitivity to the stimulation, the same training target pattern was rotated 45◦ such that there were eight possible target configurations (one for every two participants). Across the participant group, each of the 16 targets was included in the training set of eight participants.

#### **Phase 4: post-training test**

The protocol in the post-training test phase was identical to that in the pre-training test phase (i.e., phase 2).

#### Data Analysis

We defined final hand position as the recorder's handle location at the moment the participant declared he/she had arrived at the desired target. We defined the final target as the target with the smallest Euclidean distance from the final hand position. When the participants moved the handle of the recorder device, they were instructed to choose one of the 16 possible targets displayed on the transparent plane on top of the recorder device. Thus, we used the minimal Euclidian distance to identify which one of these 16 targets the participant indicated as correspondent to the perceived stimuli. Our primary outcome measure was the percentage of stimuli correctly perceived and replicated by the user (i.e., percentage of correct responses).

We used the Kolmogorov–Smirnov test to assess normality of the data distribution. For all data sets, the null hypothesis that these data come from a standard normal distribution was rejected at the 5% significance level. We expected this result, because the metrics we chose describe the percentage of targets recognized correctly. The percentage (unless well in the middle of the range) is expected to be distributed binomially, and violate the assumption of normality. Therefore, we used non-parametric tests that are based on rank statistics for testing our hypotheses.

Specifically, to test our first hypothesis (i.e., that the ability to correctly identify distinct skin-brush stimuli is not uniformly distributed across the palm), we applied the Friedman test to the percentage of correct responses obtained by each participant for each stimulus during both experimental test phases. To confirm the results obtained with the primary outcome in the test sets, we repeated the same analysis comparing the first and last training sets.

To test our second hypothesis (i.e., that the ability to correctly identify distinct skin-brush stimuli applied to the palm of the hand can improve following a short period of practice), we used the Wilcoxon signed-rank test to compare the percentage of stimuli correctly perceived in the pre- and post-training test phases. We also evaluated the number of attempts participants made before correctly interpreting each stimulus during the training phase.

Then, to identify which aspects of target acquisition were affected by the tactile stimulation, and test whether the potential benefits of training were specific to the trained targets or generalized to untrained targets, we performed follow-up analyses. The purpose of these exploratory investigations was to gain a preliminary understanding of what may be the strengths and weaknesses of our novel stimulation device and training protocol, and therefore, in these follow-up analyses, we did not correct for multiple comparisons. Another reason for this decision was that our follow-up tests were not independent, and the probability of making at least one Type I error would then be less than Bonferroni or Holm–Bonferroni assume. However, we also verified and report whether our results were robust against Holm–Bonferroni corrections.

We computed the following additional metrics:

#### **Correct direction (%)**

Percentage of stimuli in which the participants correctly interpreted the direction of the stimulation, independent from the perception of the amplitude. We inferred that the direction was identified correctly if the target that was indicated by the participant was in the same direction of the correct one.

#### **Correct amplitude (%)**

Percentage of stimuli in which the participants correctly interpreted the amplitude of the stimulation, independent from the perception of the direction.

We calculated these metrics for all the targets, and also separately for (a) the trained and untrained targets and (b) the targets of the outer and the inner ellipses.

Finally, we computed:

#### **Nearest targets (%)**

To compute this metric, we considered the answer correct if the participant indicated as perceived stimulus the correct target or one of its three nearest neighbors. This metric would be higher than the percentage of correct answers if the errors were due to insufficient perceptual resolution. Two of the nearest neighbors have the same amplitude as the correct target, and the third has the same direction.

To confirm the results obtained in the test phases, we repeated the same analysis for the training block by comparing the first and the last trial set. The threshold of statistical significance was set at p = 0.05.

### Validation Study With Stroke Survivors

We sought to provide a first proof-of-concept demonstration that the Tactile-STAR system is able to detect deficits of tactile perception in participants with neurological diseases. We hypothesized that the device would be able to identify significant

stroke-related differences in tactile perception between the two hands, and that these differences would not be observable in healthy controls.

### Participants

Three chronic stroke survivors (two females) participated in the experiment, as did three healthy controls matched for gender and age (±2 years). Each participant was enrolled by a neurologist and a physiotherapist, who performed the clinical evaluation (**Table 1**).

### Experimental Set-Up

The experimental set-up described above was adapted for use by participants with a neurological injury. Since many stroke survivors have difficulty keeping the fingers of their affected hand extended, we added a wire grid (with 1 cm spaces between the bars) to the box over the stimulator. The central part of the grid was open in correspondence with the aperture of the stimulator device so as not to interfere with the end-effector motion (**Figures 1B,C**). An elastic band, adjustable in size and position for each participant, was used to keep the fingers comfortably opened and to hold the wrist on top of the grid (**Figure 1C**). When positioned correctly, the center of the palm corresponded to the center of the stimulator's workspace. The position of the participant with stroke was the same as for the healthy participants when the right hand was tested. When we tested the left hand, the stimulator was positioned under the left hand, and the recorder was in front of the participant.

### Experimental Protocol

We simplified the protocol with respect to the previous task in terms of the number and spatial distribution of the stimuli (**Figure 4B**). Here, we presented eight stimuli that tested two different amplitudes (5 and 2.5 cm) along the four cardinal directions. Since we expected that stroke survivors might have difficulty moving the matching device with the impaired hand when the unimpaired hand was tested with the stimulator device, we asked the participant to indicate verbally the target corresponding to the perceived stimulus. Both hands were tested, and the protocol was identical for the two hands. We did not test training effects in this protocol. The order in which the two hands were tested was the same for the stroke survivor and the related control participant—we first tested the right hand, and then the left.

Before each test, there was a familiarization phase in which the experimenter moved the handle of the matching device controlling the tactile stimulator motion. In this phase, participants familiarized themselves with the perception of tactile stimuli across all of the workspace, and specifically with stimuli having the same amplitude and directions as the ones used in the test phases.

In the two test blocks, each stimulus was presented five times in a random order, with no more than three consecutive repetitions of a same stimulus. When the end-effector reached the target position, the participant had to indicate the perceived stimulus. After the tactile stimulator returned to the central position, if a participant was not able to identify the stimuli, he/she could ask to repeat the stimulation up to three times. The successive stimulation started after a pause of 1.5 s. Participants did not receive any feedback about their performance. The experiment lasted about 30 min. Participants were free to stop the experiment at any time if they were tired or needed a break.

### Data Analysis

We followed a single-subject design, and tested the differences in tactile acuity between the right and the left hand within each participant by using the Wilcoxon signed-rank test. Our primary performance measure was the "percentage of correct responses" and we decomposed this metrics by looking at the percentage of correct responses referred either to the correct identification of direction or amplitude of the stimuli (see section "Validation on Healthy Participants"). The stimuli were ordered taking into account the symmetry between the two hands (i.e., by mirroring the targets on the left hand to make them corresponding to the same on the right hand). Threshold for significance was set at p = 0.05.

# RESULTS

### Validation on Healthy Participants

The tactile sensibility of 16 healthy participants was tested before and after 30 min of training. We tested two main hypotheses. Hypothesis 1 proposed that the ability to correctly identify distinct skin-brush stimuli applied by the device would not be uniformly distributed across the palm of the hand (i.e., that there would be a significant difference in perceiving brushing stimuli moving in different directions and of different extents relative to the center of the palm). Hypothesis 2 proposed that the ability to correctly identify distinct skin-brush stimuli applied to the palm can improve following a short (∼30 min) period of practice. We tested the two hypotheses in the experimental test sets and then we verified that the data from the training set confirmed results obtained in the test sets.

### Test Block Performance

We visualized each participant's ability to discriminate tactile stimuli (Hypothesis 1) by presenting, for each target, a colormap corresponding to the percentage of trials in which the user correctly identified the corresponding stimulus (**Figure 5A**). Colors for intermediate points were obtained via linear interpolation. A Friedman test detected a significant difference in the identification of the stimuli associated to different target locations both in the pre-training test (p < 0.001) and the post-training test blocks (p < 0.001). To test Hypothesis 2, we compared stimulus replication accuracy in the post-training test block to performance in the pre-training test block (**Figure 5B**).

Overall, we found a significant improvement for all the targets (p = 0.004), and for the trained (p = 0.004), while for the non-trained targets (p = 0.051) we did not reach the threshold of significance; that is, about 30 min of training led to an improvement for the trained targets, whereas improvement was not significant for the untrained targets. Analysis of individual participant's performance revealed that the significant group


TABLE 1 | Data of the stroke survivors.

fnbot-12-00012 April 4, 2018 Time: 18:25 # 10

Top rows: age, paretic hand (PH; L, left; R, right), the etiology (E) of ictus: ischemic (I)/hemorrhagic (H), the disease duration (DD) in years, and the location of the lesion. Bottom rows: clinical tests scores. FMA, Fugl–Meyer Assessment; MAS, Modified Ashworth Scale (Bohannon and Smith, 1987); NAS, Nottingham Assessment Scale (P, proprioception; S, stereognosis); vibration tested with the tuning fork.

effects were driven by 15 of the 16 participants, who improved their performance in the trained targets. By contrast, the lack of a significant effect for untrained targets was driven by four participants: whereas 12 of 16 subjects improved their performance at the untrained targets pre-to-post training, performance decreased slightly for three participants, and one participant did not change his performance pre-to-post training.

#### Detection of Stimulus Direction

To further understand the effects of short-term training with the Tactile-STAR, we repeated the analysis considering only the ability to correctly identify stimulus direction. Here, we considered a "correct answer" one that discriminates the direction of a stimulus independently from its amplitude. Across all targets, the Wilcoxon signed-rank test identified a significant improvement in the detection of stimulus direction for all the targets (p = 0.015), although this improvement was driven mainly by trials involving the trained targets (p = 0.015) and stimuli corresponding to targets on the outer ellipse (p = 0.017). Stimuli corresponding to untrained targets and to stimuli corresponding to targets on the inner ellipse did not reach statistical significance when analyzed separately (p > 0.05).

#### Detection of Stimulus Amplitude

We also isolated the ability to correctly identify the amplitude of stimuli by considering as "correct" those responses that replicated stimulus amplitude (i.e., short vs. long) regardless of movement direction. The Wilcoxon signed-rank test identified a significant improvement in the detection of all targets (p = 0.007), as well as the trained (p = 0.007), but not for the untrained targets (p = 0.087). We also find an improvement for the larger (p < 0.001) and the shorter stimuli (p = 0.041).

#### Nearest Neighbor Analysis

For this analysis, we considered a given response as "correct" if the participant's response indicated one of the cued targets' three nearest targets. Two of the nearest targets have the same amplitude as the cued target, while the third has the same direction. The value of the nearest neighbor parameter was always over 70%, indicating that even if the subject did not match the correct target identically, in most cases the error did not exceed one target distance. Participants had the same high level of performance both for trained and untrained stimuli. No trainingdependent improvements were observed for this parameter regardless of how we subdivided the stimuli (p ≥ 0.124). Thus, improvements observed with other indicators were mainly due to improvements in the resolution of stimulus recognition.

The significance obtained for the two main hypotheses was robust against Holm–Bonferroni corrections. In contrast, most of the significant effects in our follow-up analysis in the test and training data sets would not survive these corrections. Therefore, testing more subjects will be necessary to fully understand which aspects of the tactile stimulation influence the performance improvements.

### Training Block Performance

We analyzed training set data as an independent challenge of our two hypotheses. First, we considered the percentage of "correct answers" considering only the initial answers given by each subject. Next, we took into account the number of attempts needed to yield a correct response. The performance indicators were computed for cued targets in the first and last training blocks. Friedman test of Hypothesis 1 detected a significant difference in the identification of the stimuli across the palm in both the first (p < 0.001) and last training blocks (p = 0.006; **Figure 6A**). The Wilcoxon signed-rank test of Hypothesis 2 identified a significant improvement in the percentage of stimuli correctly interpreted on the first attempt between the first and the last training block (p < 0.001; **Figure 6B**). These improvements in the ability to identify stimuli during the training phase support the findings of the test-set analyses.

FIGURE 5 | Tactile acuity of healthy individuals before and after training with the Tactile-STAR. (A) A colormap of the percentage correct responses for brush stimuli as a function of palm location. 100% corresponds to red, while 0% corresponds to dark blue. The black dots indicate the coordinates of the targets reached by the stimulator device, which started moving from the central target. The colors associated to intermediate coordinates were obtained by linear interpolation from the test points. Between the two colormaps, the illustration of the right hand shows where the stimuli were applied. The central position of the map corresponds to the center of the palm. (B) Bars represent population average percentage correct responses for each parameter: correct target (gray), correct direction (green), correct amplitude (blue), and nearest targets (red). Light colors are associated with performance before training; dark colors are associated with performance after training. Error bars indicate the standard error of the mean. <sup>∗</sup>p < 0.05, ∗∗p < 0.01, ∗∗∗p < 0.001.

### Detection of Stimulus Direction and Amplitude in Isolation

As in the test phase, we repeated the analysis considering the ability to correctly identify—in isolation—the direction and amplitude of the stimuli. For stimulus direction, we found a significant improvement in detection accuracy across all targets (p = 0.008). By contrast, we only found significant improvement in detection accuracy for the larger stimuli amplitudes (p = 0.004), but not for the inner-target stimuli (p = 0.888) or for all targets considered together (p = 0.072).

#### Nearest-Neighbor Analysis

As in the analysis of test-block performance, the value of the nearest-neighbor parameter in the training set was high for every subject in each training block (i.e., over 70% in each block). There was not a statistically significant improvement of this parameter between the first and last blocks of the training phase (p = 0.363).

#### Number of Attempts

On the training data set, we also report the number of attempts required for each stimulus to be identified correctly (**Figure 6C**). In support of Hypothesis 1, Friedman test found a statistically

significant difference in the identification of stimuli across the palm in both in the first (p < 0.001) and last training blocks (p = 0.001).

### Validation on Stroke Survivors

The data of stroke survivors provide a first proof-of-concept assessment of Tactile-STAR's ability to identify somatosensory deficits. Specifically, we investigated the ability of the participant to discriminate—in both hands—brush stimuli of two different amplitudes in each of the four cardinal directions. Given the heterogeneity of sensorimotor impairments expressed in stroke survivors, we used a single-subject analysis approach to probe for statistical differences of tactile perception between the two hands on a subject-by-subject basis. We expected to find significant differences between the two hands for each of the stroke survivors, but not for their matched, healthy, controls (**Figure 7**).

Stroke survivor P1 had a left-hemisphere lesion (left basal ganglia, internal capsule, and parietal lobe), which resulted in sensorimotor impairment on the right side of his/her body. Thus, we expected his/her ability to recognize tactile

stimuli to be lower with the right hand than with the left (**Figure 8**). The experimental data confirmed this hypothesis: stimulus detection was worse with the right hand than with the left for all parameters analyzed (p < 0.001). By contrast, when we performed the same analyses with an age- and sex-matched control subject, we found no statistically significant differences in tactile perception between the two hands (p > 0.24 for all indicators).

Stroke survivor P2 had left-sided sensorimotor impairment (with a brain lesion located primarily in the right thalamus). Our experimental data showed that while the less-affected hand had better performance than the more-involved hand in terms

of identifying the correct direction (p < 0.001), the participant expressed a bilateral difficulty in correctly identifying stimulus amplitude (p = 1), particularly in the upward direction (**Figure 8**). As expected, this participant's control generally had markedly better performance, and did not show any significant difference between the two hands (p > 0.130 in all cases), although the performance was slightly better for the non-dominant hand.

Stroke survivor P3 had a right fronto-parietal, pre-Rolandic lesion (i.e., left-side impairment). As expected, he/she expressed greater difficulty in interpreting stimuli with the left hand than with the right hand (**Figure 8**) both in terms of overall correct response (p < 0.001) and in the identification of stimulus direction (p < 0.001). By contrast, the ability to discriminate between the two stimulus amplitudes was not significantly different between the two sides of the body, due to bilateral difficulty to correctly interpret the stimulus amplitude (p = 0.617). The control participant of this stroke survivor, instead, showed no significance difference for all the indicators we evaluated (all p > 0.24).

For each stroke survivor, the values of significance are reported without corrections for multiple comparisons; however, all the effects that were significant were robust to the Holm–Bonferroni corrections.

In summary, the Tactile-STAR device was able to identify specific differences in tactile acuity between the two hands in each of the three stroke survivors that participated in this study. These differences were due mainly to deficits in the ability to recognize the direction of tactile stimuli.

### DISCUSSION

We developed and validated a new mechatronic system the Tactile-STAR—for testing tactile acuity and treating somatosensory deficits in individuals with neurological diseases. Our preliminary validation testing supports the conclusions that: (1) The Tactile-STAR can characterize tactile perception and somatosensory deficits; and (2) A short bout of training with the Tactile-STAR system can improve the tactile perception of healthy individuals.

We obtained evidence in support of our first hypothesis in tests of Tactile-STAR's skin-brushing stimulation mode. For each of 16 healthy participants, testing yielded a map of tactile perception on the hand. Results indicate that tactile acuity typically is non-uniform across multiple directions and distances from the center of the palm. This perceptual anisotropy may be the result of a non-uniform density of the mechanoreceptors in the palm of the hand (Kandel et al., 2000; Johansson and Flanagan, 2009) or the result of differences in the neural processing of the signals derived from those receptors (Kandel et al., 2000). Longo and Haggard (2011) found anisotropies of tactile size perception on the dorsum, but not on the palm of the hand. However, the task was different—the participants judged which of two tactile distances felt larger: the one aligned with the proximo-distal axis (along the hand), or the one aligned with the medio-lateral axis (across the hand). Future studies are needed to examine the utility of Tactile-STAR to characterize tactile

perception with respect to skin-stretch displacement distance and direction, as well as to test the generalizability of target acquisition training under both skin stretch and skin brush modes on untrained movements guided by these tactile feedback signals.

In a small cohort of stroke survivors, we also performed a preliminary validation of the ability of the Tactile-STAR to detect sensory deficits after stroke. The device identified differences in tactile perception between the more- and less-impaired hands in each survivor. Intermanual differences were due mainly to impaired ability to recognize the direction of tactile stimuli in the more involved hand. Consistent with expectation, such differences were not found in control subjects, thus assuring that the pattern of results observed in the stroke survivors were not a result of handedness.

Taken together these results demonstrate that the Tactile-STAR system can offer quantitative and reliable measures of tactile acuity in the hand. We propose that the system also may be

effective for characterizing tactile acuity in different dermatomes, and for monitoring changes due to aging, disease progression, or therapeutic intervention. In particular, we believe that the device could be used for testing different body parts, such as the feet, where deficits in the ability to detect sensory stimuli could be a sign of early onset of disease.

We obtained evidence in support of our second hypothesis in a test of short-term perceptual training with the Tactile-STAR. A mere 30 min of training with the system improved the ability of participants to recognize (and reproduce with one hand) specific skin-brushing stimuli applied to the other hand: In the post-training tests, healthy participants improved the percentage of stimuli recognized with respect to pre-training tests. Improvements in performance were detected both in terms of direction discrimination and in the accuracy of reproduction of "larger" stimuli directed to the periphery of the palm (i.e., the outer targets). This finding corroborates historical (Ruch et al., 1938) and recent reports (Carey et al., 1993; Yekutiel and Guttman, 1993) that somatosensory training can reduce somatosensory deficits. In the current study, improvements were significant for trained targets but not for untrained targets. We speculate that this specificity of training may have been due to the short duration of training. Indeed, this is an important point to further investigate since this perceptual learning is rooted in the low-level cortex and several studies suggest that it can generalize to different locations, but within a somatotopic framework and with a tactile memory distributed differentially according to the stimulus type (Harris et al., 2001; Harrar et al., 2014). Future studies should examine the ability of extended training with Tactile-STAR to improve detection and reproduction performance for both trained and untrained stimuli of multiple magnitudes. In particular, the experimenters noticed that increasing the number of short training sessions seems to be more beneficial than having fewer, longer sessions, suggesting possible difficulties in attending to stimuli for a long time and the risk of over-stimulation.

Training-dependent improvements in the ability to recognize both trained and untrained tactile stimuli would suggest that the Tactile-STAR could be a promising technology for the rehabilitation of somatosensation. This potential as a therapeutic tool should be verified in future studies by investigating whether the improvement is present and for how long it can last in neurological patients. If proved effective, the Tactile-STAR system could be impactful because somatosensory deficits are frequent outcomes of cerebral lesions (Feigenson et al., 1977). Not only are sensory deficits limiting on their own, but they also strongly limit the possibility of motor function recovery (Van Buskirk and Webster, 1955; Kusoffsky et al., 1982; Smith et al., 1983; Zeman and Yiannikas, 1989). Despite this evidence, training methods, devices, and protocols addressing somatosensory deficits and their rehabilitation are still limited.

The preliminary results presented here suggest that Tactile-STAR can be used to deliver augmented or supplemental feedback of hand position in space to guide goal-directed reaching actions. We are currently evaluating the extent to which training with Tactile-STAR can improve goal-directed actions performed with the more involved arm after a stroke. In this line of research, it is important to verify the efficacy of various information encodings (e.g., hand position error vs. state feedback; cf., Krueger et al., 2017). Future tests will compare the ability of skin-brush and stretch stimulations to enhance both tactile acuity and the performance of goal-directed reaching with the contralateral hand.

### CONCLUSION

We have developed a modular device that can apply controlled tactile stimulations to the palm. With modifications to the stimulator's aperture, the device could be used to test the tactile acuity of different body parts. By investigating the two hypotheses described above for validating the system, the current study helps fill the gap in the literature pertaining to somatosensory assessment and retraining. Our future studies will focus on further developing the device and on advancing our understanding of tactile acuity and its training. The preliminary results described here motivate experiments aimed at both understanding the psychophysics of the sensory processing, and identifying optimal ways to enhance sensory abilities. Developing a mechanistic understanding of tactile somatosensation is important for a variety of applications that involve artificial interfaces designed to enhance sensorimotor control in both impaired and healthy motor systems. Specific examples include: rehabilitation (Krueger et al., 2017); prosthetics (Akhtar et al., 2014; Battaglia et al., 2017); brain–computer interface (Sketch et al., 2015); and sensory substitution and augmentation (Schorr et al., 2013; Quek et al., 2014a). Thus, the findings presented in this work are the first step toward a more ambitious goal of providing sensitive and reliable instruments that are capable of assessing and training tactile perception, and are suitable for enhancing sensory feedback in a variety of applications.

### AUTHOR CONTRIBUTIONS

All the authors contributed to the design of the device. GB, PG, IN, RS, and MC designed the experimental protocols. GB, GC, and MC realized the device. PG selected the subjects and conducted the clinical evaluations. GB and PG collected the data. GB, IN, RS, and MC analyzed the results. All the authors contributed to discussing the results and to writing the manuscript. All authors read and approved the final manuscript.

### FUNDING

This study was supported in part by a Marie Curie Integration Grant (REMAKE, FP7-PEOPLE-2012-CIG-334201), the Italian Multiple Sclerosis Foundation, the National Institute of Neurological Disorders and Stroke, and the Eunice Kennedy Shriver National Institute of Child Health and Human Development of the National Institutes of Health under Award Number R15HD093086; the National Science Foundation under

an Individual Research and Development plan; the United States-Israel Binational Science Foundation (Grant No. 2016850); by the Israeli Science Foundation (Grant No. 823/15); and by the Helmsley Charitable Trust through the Agricultural, Biological and Cognitive Robotics Initiative of Ben-Gurion University of the Negev, Israel. Any opinions, findings, conclusions, or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the Israel Science Foundation, the National Science Foundation, the

### REFERENCES


National Institutes of Health, the United States-Israel Binational Science Foundation, or the Helmsley Charitable Trust.

### SUPPLEMENTARY MATERIAL

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fnbot. 2018.00012/full#supplementary-material


Harris, J. A., Harris, I. M., and Diamond, M. E. (2001). The topography of tactile learning in humans. J. Neurosci. 21, 1056–1061.



**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2018 Ballardini, Carlini, Giannoni, Scheidt, Nisky and Casadio. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Human's Capability to Discriminate Spatial Forces at the Big Toe

Annette Hagengruber\*, Hannes Höppner and Jörn Vogel

German Aerospace Center (DLR), Institute of Robotics and Mechatronics, Weßling, Germany

A key factor for reliable object manipulation is the tactile information provided by the skin of our hands. As this sensory information is so essential in our daily life it should also be provided during teleoperation of robotic devices or in the control of myoelectric prostheses. It is well-known that feeding back the tactile information to the user can lead to a more natural and intuitive control of robotic devices. However, in some applications it is difficult to use the hands as natural feedback channels since they may already be overloaded with other tasks or, e.g., in case of hand prostheses not accessible at all. Many alternatives for tactile feedback to the human hand have already been investigated. In particular, one approach shows that humans can integrate uni-directional (normal) force feedback at the toe into their sensorimotor-control loop. Extending this work, we investigate the human's capability to discriminate spatial forces at the bare front side of their toe. A state-of-the-art haptic feedback device was used to apply forces with three different amplitudes—2 N, 5 N, and 8 N—to subjects' right big toes. During the experiments, different force stimuli were presented, i.e., direction of the applied force was changed, such that tangential components occured. In total the four directions up (distal), down (proximal), left (medial), and right (lateral) were tested. The proportion of the tangential force was varied corresponding to a directional change of 5◦ to 25◦ with respect to the normal force. Given these force stimuli, the subjects' task was to identify the direction of the force change. We found the amplitude of the force as well as the proportion of tangential forces to have a significant influence on the success rate. Furthermore, the direction right showed a significantly different successrate from all other directions. The stimuli with a force amplitude of 8 N achieved success rates over 89% in all directions. The results of the user study provide evidence that the subjects were able to discriminate spatial forces at their toe within defined force amplitudes and tangential proportion.

Keywords: tactile feedback, haptics, haptic display, teleoperation, prosthesis, human-in-the-loop, sensory substitution

### 1. INTRODUCTION

Tactile perception is essentially involved in manual dexterity. The interaction of sensory input (tactile feedback) and motor output (movement) at our hands allows for a dexterous manipulation of objects. The sensory information produced by the mechanoreceptors in the glabrous skin of our hands allows for manipulation tasks such as precisely lifting an object in a pinch grip, which has been investigated intensively (Johansson and Flanagan, 2008). The importance of tactile feedback can be seen for example in experiments with anesthetized digits as done in

#### Edited by:

Hong Qiao, University of Chinese Academy of Sciences (UCAS), China

#### Reviewed by:

Ilana Nisky, Ben-Gurion University of the Negev, Israel Arnaud Leleve, Universite de Lyon/INSA Lyon, France

#### \*Correspondence:

Annette Hagengruber annette.hagengruber@dlr.de

Received: 05 October 2017 Accepted: 08 March 2018 Published: 10 April 2018

### Citation:

Hagengruber A, Höppner H and Vogel J (2018) Human's Capability to Discriminate Spatial Forces at the Big Toe. Front. Neurorobot. 12:13. doi: 10.3389/fnbot.2018.00013 Johansson et al. (1992) and Nowak et al. (2001). With the evoked absence of the tactile grasp information of the digits the regulation of the grip force is impaired leading to a considerably reduced success of grasping. Even if visual feedback is still an essential feedback during grasping, the mechanoreceptors in the hands provide information about shape and stiffness, which can override the visual prediction (Johansson and Flanagan, 2008). Haptic Feedback is nowadays state-of-the-art in teleoperation and in virtual reality. It can be provided by various haptic feedback devices and can be composed of kinesthetic or tactile sensation. Naturally, the haptic sense consists of both, tactile and kinesthetic sensing (Anderson et al., 1999). While the kinesthetic sensing provides perception of limb position and movement from the mechanoreceptors in the muscles, tactile sensation provides touch information and is generated by the mechanoreceptors in the skin (Jones, 2000). Designs of both modalities are possible (e.g., in haptic feedback devices Fritschi et al., 2006; Meli et al., 2014).

Also modern teleoperated robotic systems are able to feed back forces from the task scene. An example for this is the DLR HUG (Hulin et al., 2011). It provides, besides the visual feedback, haptic information of the remote robot directly to the hands of the operator. A more complex system is the DLR MIRO (Hagn et al., 2008), a system for minimal invasive surgery, which consists of three robotic arms equipped with different force and torque sensing technologies. Such complex systems may require additional feedback channels to provide more information to the surgeon. Another robotic systems is the BairClaw (Hellman et al., 2015), a robotic finger equipped with different haptic sensors which can provide remote information to the user.

Due to the importance of the grasp information, the lack of feedback is also a major issue in upper limb prosthesis. Consumer studies on amputees using hand-prosthesis showed that one of the most wanted features they would love to be added is to feel the forces occurring during grasping (Biddiss et al., 2007; Pylatiuk et al., 2007). Plenty of invasive and non-invasive feedback approaches for hand prosthesis were investigated in research. In invasive methods nerves are often directly stimulated by implanted electrodes (e.g., Dhillon and Horch, 2005; Rossini et al., 2010; Raspopovic et al., 2014). A different invasive approach is the Targeted Muscle Reinervation (TMR), where the nerve endings are reinervated in available muscles (Kuiken et al., 2009). Non-invasive techniques include methods of electrical and mechanical (force or vibrotactile) stimulation. Examples thereof can be seen in Patterson and Katz (1992), Cipriani et al. (2008), Witteveen et al. (2012), Meek et al. (1989), and Antfolk (2012). Furthermore, neural interfaces for amputees which provide force feedback are discussed in Hellman et al. (2015). However, tactile feedback is not available in commercial hand prosthesis yet. While most investigations show a positive influence for the control of the prosthesis, the lack of product availability is possibly caused by different reasons. Firstly, in studies of noninvasive methods, the feedback is often applied to the hairy skin of the body (e.g., forearm and upper arm). However, the mechanoreceptors in the hairy skin provide different sensory information as those in the glabrous skin of the hands (Vallbo et al., 1995; Koeppen and Stanton, 2010). Furthermore, the space on the forearm is usually occupied by the shaft of the prosthesis, which makes it troublesome to access this area for feeding back sensory information.

Some of the aforementioned systems show applications where the usage of our hands as natural feedback channel is limited or impossible. Either in prosthesis, where the hand is not available at all, or in teleoperated robotic systems, where the hands and respective feedback channels may already be overloaded with other tasks. New feedback approaches can help to address this problem. One promising approach is to provide the grasp information to the bare front side of the toes. This approach is of special interest, since the neural structure of the skin of the toe shows similarities to that in the hands (Kennedy and Inglis, 2002). All four kinds of mechanoreceptors, which are known to exist in the glabrous skin of the hand, are available in the glabrous skin of the foot as well. A particularly interesting region of the foot sole is the big toe, where three of the four mechanoeceptors found in the hand are also available. This may lead to superior perception capabilities compared to other skin regions. Looking at the two-point discrimination threshold, it is evident, that the value at the toe with 9–10 mm is closer to that of the finger (2– 3 mm) as compared to other body parts with hairy skin (35 mm) (Panarese et al., 2009).

In Panarese et al. (2009) uni-directional forces representing the grasp force in a teleoperation task were applied to the subjects' toes. Their work showed that the mechanoreceptors at the toe allow for embedding uni-directional force feedback into the sensorimotor-system, which improved the control of a robotic hand. Furthermore, it demonstrates the basic concept, i.e., that humans are able to close the loop between (artificial) motor functionality provided at the hand and sensory information given to the toe. However, in contrast to the human hands, literature lacks about psychophysical analysis at the toe for spatial force feedback. The perception in hands and fingers is a broad topic with many interesting findings about the perception and discrimination abilities. Among others, these involve investigations of discrimination of curvature (Gordon and Morison, 1982), vibrotactile frequencies (Franzén and Nordmark, 1975), or gratings (Sinclair and Burton, 1991). Furthermore, for the fingers it is known that the mechanoreceptors allow not only discrimination of unidirectional feedback but also the discrimination of spatial forces. Panarese and Edin (2011) showed that the mechanoreceptors of the skin of fingertips enable the discrimination of threedimensional (spatial) forces. A mechanical force of 5 N was applied to the index fingertip of twelve participants. The authors found that a minimal tangential angle of 7.1◦ could be perceived. Another study of Wheat et al. (2004) could demonstrate human's ability to discriminate tangential forces at the fingers during grasping. In an additional proof of concept, we were able to show that spatial toe force-feedback can be successfully integrated into the sensorimotor control for teleoperating a robotic arm (Hagengruber et al., 2017). However, literature is lacking comparable fundamental studies of how well humans are able to discriminate spatial toe force-feedback.

Based on these findings, this study aims at analyzing the capability of humans for discriminating spatial forces at the bare front side of the big toe. In order to achieve this, we performed a study with 24 healthy subjects. During the experiments various force stimuli were presented to subjects' toe using a standard force feedback device. In each trial the force vector changes its direction with respect to the force acting normal to the toe. Four directions are used: up (distal), down (proximal), left (medial), and right (lateral). The proportion of the tangential force was varied in each stimulus. It changes between 5◦ and 25◦ with respect to the normal acting force. Furthermore, catchtrials, in which no tangential force component was present, were applied. The tests were performed three times with absolute force amplitudes of 2 N, 5 N, and 8 N. A modified state-of-theart haptic feedback device was used to realize the stimulation of the toe. The experiments present a pure investigation on the perception at the toe and not the integration of a feedback to the sensorimotor-control. The haptic device provided forces to a specific location of the skin and thereby relies more on tactile than on kinesthetic sensation. It is known that such approaches do not impair the perception of the feedback (Meli et al., 2014; Pacchierotti et al., 2014). Relying on tactile sensation may be of importance with respect to creating a miniaturized and wearable toe-feedback-device.

## 2. METHODS

In this section we will outline the experimental design, explain used equipment as well as the experimental protocol and will present the statistical model for investigating dependencies between factors and defined metrics.

### 2.1. General Description and Participant Task

Since object manipulation provides normal as well as tangential force information, the experiment is designed to cover both force types. Therefore, two main tests were implemented: a Tangential Test (TT) and a Normal Test (NT). The Tangential Test is designed to investigate the capability for discriminating spatial forces at the toe, meaning the possibility to discriminate for certain directions is of interest. Whereas the Normal Test deals with the participants' perception of changes in the normalforce amplitude. In each test a reference force was applied to the glabrous skin of the distal phalanx of the subjects' right big toe. Depending on the test, either the direction of force (TT) or its amplitude (NT) was changed in each trial with respect to this reference force. Subjects were asked to identify the presented force change at the toe either in amplitude or direction. In total, the subjects were asked to perform four test cycles (one Normal Test and three Tangential Tests). The NT was performed with a reference contact force of 5 N. Starting at this reference, the force was either decreased, or increased and returned back to the reference after each trial. This test allows to draw conclusions about the minimal required change in amplitude to be detected at the toe. The NT was always performed first by the subjects. The TT is designed to investigate if a spatial discrimination at the toe is possible at all. Furthermore, it allows to draw conclusions whether the proportion between tangential and total force amplitude has an influence on this discrimination. The TT was performed at three different Force Levels of 2 N (low), 5 N (medium), and 8 N (high). For each Force Level, the factor Direction (up, down, left, and right) as well as the Tangential Component (5◦ to 25◦ ) was varied randomly. The participants were asked to enter the perceived force change at their toes via the number pad of a computer keyboard. To guide the subjects through the test, a graphical user interface (GUI) was implemented. The GUI visualizes the possible answers for selection. For the NT increased, decreased, or same is displayed, whereas for the TT the options up, down, left, right, and same were available. As soon as the subjects reported their decision on the given stimulus, they could start the next stimulus by pressing the space bar, which allowed them to take as much time as they need for the experiment. Depending on the decision time, a test cycle (NT, or one TT) lasted between 4 to 9 minutes. The subjects had to complete a training phase before the main tests to become familiar with the experimental procedure and the amount of applied force. Only during this training, the subjects got experimenter feedback about correctness of their decision. The results of the training have not been used for the analysis and thus, were not recorded. In order to ensure identical experimental conditions, subjects had to sit — not walk or stand — 5 min before starting the experiments. A 5 min break was included between tests accordingly. During each break a questionnaire had to be filled, in which the subjects were asked about their mental demand, their self-estimation in performance, their frustration level, and their comfort during the test. Each of these metrics had to be rated on a scale of 1–20, with 1 corresponding to very low/very well and 20 to very high/very bad. The NASA TLX questionnaire (Hart and Staveland, 1988) served as a guideline. The average time for the whole test procedure was about 70 min.

### 2.2. Participants

A total of 24 healthy subjects including 20 men and 4 women, age 21–38 years, performed the experimental protocol as described above. No subject had a reported history of neurological disorder or neuromuscular injury affecting the CNS or the muscles. All subjects participated voluntarily and gave written consent to the procedures, which were conducted in partial accordance with the principles of the Helsinki agreement (non-conformity concerns the point B-16 of the 59th World Medical Association Declaration of Helsinki, Seoul, October 2008: no physician supervised the experiments). Approval was received from the works council of the German Aerospace Center, as well as its institutional board for data privacy ASDA; the collection and processing of experimental data were approved by both committees. Before starting the experiments, subjects were quickly briefed by describing them the experimental procedure and the goal of the experiments. The experimental setup was adjusted to each subject such that they felt comfortable and the stimulation could be performed properly.

### 2.3. Experimental Setup

The setup includes the haptic feedback device, an adjustable foot shell with a fixation for the toe, and a table with adjustable height with a screen and a keyboard on top. The setup is on a table with adjustable height.

a graphical user interface that allows the subject to provide feedback about the perceived stimulation at the toe. The interface is realized by a screen and a keyboard

depicted in **Figure 1**. The stimulation was realized with the modified haptic feedback device omega.3 of the company Force Dimension (Force dimension, 2013). The device is a deltabased parallel kinematic with active gravity compensation and 3 Degrees of Freedom (DoF). It provides a cylindrical work space with a diameter of about ∅ 160 mm and length of 110 mm. Furthermore, it allows for a maximum force of 20 N and a stiffness of 14.5 N/mm. To measure the exact interaction forces between toe and device the DLR Fingertip sensor was mounted to the end-effector of the haptic device. The dimensions of the cylindrical 6 DoF force-torque sensor are ∅ 30 × 17 mm. It is based on a strain-gauge technology and designed for forces of up to 30 N in each direction. The sensor allows for a closed control loop to adjust the applied forces at the toe using a PID controller. The control software for the haptic device was developed in MATLAB Simulink and executed on a Linux based real-time computer. Implementation of the user interface and the test protocol was also realized in MATLAB and MATLAB Simulink.

For an optimal skin connection a hemispheric plastic tip with comparably high stiffness (in relation to the stiffness of human skin) having a diameter of 10 mm was mounted at the force sensor. The modified haptic device allowed the stimulation of spatial forces of up to 10 N at a maximum frequency of 5 Hz. The device is mounted out of view for the subjects, i.e., below the table and 200 mm above the ground, in order to prevent subjects being influenced by visual feedback of the devices movement. Additionally, the participants were equipped with hearing protection to block any acoustic information originating from the feedback device and to avoid distraction due to surrounding sounds. An adjustable foot shell in front of the feedback device holds the foot in the right position. The ball of the foot props to a wooden plate. A slight angle occurs at the joint between the first proximal phalanx and the metatarsal bone. The toe is straightened and the glabrous skin of the toe is positioned perpendicular to the haptic device. A stabilizing orthosis made of medical grade thermoplastic is used to immobilize the toe. It guarantees a rather fixed stiffness of the toe's joints during

the tests. Without the orthosis, the applied force could have been compensated by the subjects. A close-up view of the toe and the haptic device can be seen in **Figure 2**. The z-axis acts perpendicular to the skin of the toe, the y-axis acts to the distal, and the x-axis to the lateral side of the toe, respectively.

### 2.4. Force Pattern

The Tangential Test is based on changes of the effective direction of the force. A schematic illustration of the force pattern is given in **Figure 3**. In order to stabilize the contact of the stimulation device to the skin, an offset force of 0.5 N is initially applied, as soon as the subject is in position. Once the subject starts the test cycle the normal force is increased to the respective Force Level of the test (2 N, 5 N, or 8 N). This pure normal force (plantar to the toe, i.e., in z-direction) represents the reference force to which the subject compares the stimulus. The individual trials of the tests were started by pressing the space bar. After a randomly selected waiting time t<sup>w</sup> (1s ≤ t<sup>w</sup> ≤ 1.5s), the actual stimulus is applied. In the TT, the total force of the stimulus is being kept constant. Consequently, the presence of the tangential force (±x or ±y) results in a decrease of normal force. The exact values of the

Different directional stimuli occur and are preserved until the decision for a direction has been taken. When the stimulus is started a randomly selected waiting time tw is applied. The trial is terminated by resetting the force to the starting point.

tangential and normal parts can be seen in **Table 1**. The portion of the Tangential Component was selected from the discrete levels of 0◦ , 5◦ , 10◦ , 15◦ , 20◦ , or 25◦ . To achieve a smooth transition from reference force to the actual stimulus, the application of the stimulus is blended with a scaled 2 Hz half-sinusoid waveform. This blending function was selected analogously to the work by Panarese et al. (2009). After the directional stimulus is fully achieved, it stays constant until the subject has given its decision about the perceived Direction of stimulation. Upon decision, the force is reset to the reference within the same time as the Tangential Component returns to zero. During this process the Normal Component decreases shortly to the offset of 0.5 N and then back to the reference force. This way, no further haptic information about the previous stimulus is provided to the subject. After resetting, the reference force is constantly applied until the next trial is initiated by the subject pressing the space bar.

Each TT comprises five stimuli in four directions and four catch trials. This sums up to 24 different stimuli, which were randomized and repeated three times each. This 72 stimuli were repeated for each of the single Force Levels (i.e., 216 stimuli in total). In order to detect whether the order of presentation of the three Force Levels makes any difference, we permuted it resulting in six different constellations (3!). With a total of 24 subjects, each possible constellation was performed by four subjects.

The force pattern of the Normal Test follows the same scheme as the Tangential Test. The initial force of 5 N represents the reference force. Starting at this level, the stimulus consists either of an increased or decreased normal force. No tangential forces are applied here. As prior tests indicated that an increase in force is easier to detect than a decrease in force, more stimuli of decreasing forces opposed to increasing force were used. Therefore, increasing stimuli occur in the range of +0.25 N to +2.0 N sampled at 0.25 N steps. Whereas, decreasing stimuli occur from −0.25 N to −2.75 N with −0.25 N steps. This sums up to a total of 20 different stimuli with increasing or decreasing force. Similar to the TT, a reset sequence occurs after each decision and before the reference force is applied again. The different stimuli were repeated three times each, summing up to a total of 60 stimuli for the NT. The randomization has been performed within the 20 different trials.

### 2.5. Data Analysis

For statistical analysis a logistic regression model with fixed and random effects based on Equation (1) is used. The fixed values are given by the vector of parameters β <sup>T</sup> = (β0, β1, ..., βm) and the vector of influential variables **x**. The random effect is given by γ and the error term by ε.

$$\log\left(\frac{P}{1-P}\right) = \mathbf{x}^T \ast \boldsymbol{\mathcal{B}} + \boldsymbol{\varepsilon} + \boldsymbol{\mathcal{V}} \tag{1}$$

Here, the vector of parameters is defined by the factors Level and Direction, whereas the only influential variable is the factor Degree. For the analysis of NT the factor of Direction (increase – decrease) is the fixed parameter and the Force Change acts as the influential variable. The correct answer per trial with y<sup>i</sup> ∈ {0, 1} is used for the analysis. The direction same was not considered in the deeper analysis. These catch trials were recognized with close to 100% and showed no further information about the directional recognition. The statistical analysis was performed in R. Based on the logistic regression model the Just Noticeable Difference (JND) averaged over all subjects can be determined. This psychometric value relates to the difference required in a stimulus, such that the subjects are able to notice it on 50% of the trials. This recognition-rate is clearly higher than the chance-level of 20% (one out of five possible answers).

### 3. RESULTS

In the following, the results for the Tangential and the Normal Test and their statistical analysis are presented.

### 3.1. Tangential Test

The collected data include the correct answers with y<sup>i</sup> ∈ {0, 1}, each assigned to a factor Force Level (low, medium, and high), Direction (up, right, down, and left), and Degree of Tangential Component (0◦ , 5◦ , 10◦ , 15◦ , 20◦ , and 25◦ ). **Table 2** shows the



The acting force vector results from both components.

TABLE 2 | Estimated value of the logistic regression model.


Given are all values of the parameter vector β and the influential variable degree of the logistic regression model. Parameters linked with ":" indicate the interaction of these factors.

estimated values of the logistic regression model. The shown results relate to the used base (i.e., level low and direction down). The values in estimate coefficient describe the increase of the odds for a correct answer in comparison to the base value. Since the analysis is based on a logit-model, it originally provides results in a logarithmic scale. For better readability, the values for estimate coefficient are converted to linear scale. Thus, for example, the odds for a correct answer on a trial at level medium in comparison to the base low is increased by a factor of exp(βLow) = 4.08. The p-values with the corresponding std. error in the table show again the influence to the odds. The used logit-model is based on the two factors Degree and Level, which showed significant interaction.

The statistical analysis shows that all factors have a significant influence on the directional discrimination. The interaction of the factors Force Level and Degree of Tangential Component is significant as well. This implies that the Tangential Component influences the results dependent on the applied total force and vice versa. Furthermore, it can be seen that the direction right was recognized significantly worse than the other directions. The factor Direction shows no significant interaction with the other factors.

At the lowest level of 2 N a maximum of about 60% of correct answers is achieved. The medium level (5 N) shows maximum results of about 80%, and the highest level with 8 N reaches correct answers up to 90% and above. The direction same achieved a considerably higher recognition-rate of 92% and above in all levels. The used regression model allowed to calculate the probability to recognize a tangential force stimulus depending on the given factors. **Figure 4** shows this probability per Level and Direction. The maximal tangential stimulus of 25◦ leads to a probability close to one in medium and high level. However, at the low level only a probability of 0.75 is achieved. This means, that the probability to recognize a tangential stimulus in the high level is much higher than for a stimulus in the low Force Level. Finally, the 50% recognition-rate can be determined which indicates clear differences in the three Force Levels. **Table 3** lists the values for the 50%-JND per Level and Direction. In the level low more than 20◦ of Tangential Components were necessary to achieve a recognition-rate of 50%. The JND of the medium level is between 11.6◦ and 13.5◦ depending on the direction. At the high level, the estimated JND is between 8.4◦ and 10.1◦ .

At the Force Level of 2 N only the 25◦ stimulus is larger than the JND at this level. In this stimulus a tangential force of 0.85 N (and 1.81 N in z) is applied to the toe. The exact values can be taken from **Table 1**. A stimulus corresponding to the JND at this level would have a Tangential Component of 0.72 N considering the average JND over all directions at 20.95◦ . The Force Levels 5 N and 8 N result in higher correct perception rates. At the medium level, the 15◦ stimulus exceeds the JND. The values for the spatial and normal force reach 1.29 N and 4.83 N, respectively, whereas the stimulus corresponding to the average JND would have about 1.06 N and 4.88 N. The amount of correct answers per Tangential Component increases until 25◦ (up to: 86% up; 86% down; 89% left; 79% right).

At the high Force Level at least 50% correct answers occurred within 10◦ Tangential Component (tangential force: 1.39 N; normal force: 7.88 N). Here, the stimulus corresponding to JND would consist of 1.25 N tangential force and 7.90 N normal force, considering the average JND of the directions with 9.05◦ . Again, higher spatial force changes lead to a higher amount of correct answers. 25◦ Tangential Component evoke correct answers of about 90% in all directions. **Table A1** presents detailed results about the correct answers of the subjects. The mean values of correct answers over all subjects per Direction and Tangential Component is given.

The results provide evidence that the minimal angle needed to reliably detect a spatial force, is depending on the total amount of force applied.

TABLE 3 | Just Noticeable Difference (JND) per Level and Direction.


The JND is the difference that subjects are able to notice in 50% of trials. The values show that the JND changes for the different Force Levels.

The factor Direction showed a significant influence, whereas the direction right has an influence on the outcome measures. A further analysis, which is illustrated in **Figure 5**, shows that there is no general confusions in terms of two directions across all subjects. In two-third of wrong decisions, subjects have chosen the answer same. The remaining three possible wrong answers are almost equally distributed within each Direction. Each of the directions has been chosen wrongly in 8–15% of the cases.

Additionally, we examined subjects' decision time, i.e., the time required to take a decision to an individual stimulus. The time recording started with the onset of the stimulus i.e., the 250 ms needed to apply the full stimulation force is included. The mean time for the decision for all stimuli was about 1.7 s. Hence, a decision time of more than 12 s appears to be unrealistic and therefore was defined as an outlier. According to this, 11 decisions were not considered for the time analysis. These outliers appeared when subjects paused the test session for example to ask questions to the supervisor of the experiment. Focusing on the decision time, the lowest Force Level of 2 N showed a significant difference in comparison to 5 N and 8 N. **Figure 6** depicts in form of a confusion matrix per Force Level the average times required for answering. Mean times between 1.3 and 5.5 s occur. The diagonal represents the required reaction time for the correct answers. The white box illustrates a confusion which did not occur. The confusion matrix for the lowest Force Level shows in contrast to the other Force Levels least variation. The decision time required for correct and wrong answers at 2 N is comparable. The other two matrices, especially the one for 8 N, show shorter decision times for the correct answers.

Furthermore, we had a look on effects of learning or fatigue between the three repetitions of the 24 different stimuli and found no considerable variation across these repetitions. For the analysis, the success rate per repetition was used. An overall analysis of learning or fatigue effects in terms of subject's performance, across all subjects and the whole test procedure, cannot be performed because of the different permutations of Force Levels.

### 3.2. Normal Test

The results of the NT show that the amount of Force Change (−2.75 N .. +2.0 N), the Directions (increase - decrease) as well as the interaction between these factors play a significant role for its discrimination (with increased forces as base: βDirection,decrease with std. error = 0.45 p-value < 2e-16, βDirection,decrease : βforce with std. error 0.35 and a p-value of 5.71e-05). The 20 blue bars in **Figure 7** represent the correct responses for the different stimuli in percent. The bar with the label 0.0 refers to same during the stimulation phase. The positive values from 0.25 N to 2.0 N describe increasing force stimuli and the negative values represent the decreasing forces. The result shows that correct answers increase with an increase in force. The amount of correct answers at +0.25 N is about 3%. The change of +1.0 N in comparison to the reference force shows 65% and a change

direction was selected by the subjects); each bar illustrates the wrong decision for the respective applied direction.

FIGURE 6 | Required time for answering. Illustration of the needed answering time across all subjects per Force Level. Excluded are outliers with times more than

12 s. The y-axes shows the force which was applied to the subjects' toe and the x-axis encodes the Direction which was chosen by the subjects. The diagonal corresponds to the correct answers. The white box illustrates a confusion which did not occur.

of +2.0 N 96% correct answers, respectively. The stimuli with reduced normal force shows as well a higher success rate with higher changes in force. The range between −0.25 N and −1.0 N causes only a correct perception of a maximum of 13% at −0.75 N. The result of the decreased force stimulus of −1.25 N and −1.5 N achieves results comparable to the +0.75 N stimulus. Starting with a force change of −1.75 N (65% correct answers) the amount of correct answers increases continuously with higher changes. The stimulus of −2.75 N produced 95% correct responses. **Figure 7** also illustrates the wrong decisions of the subjects during the test. Similar to the TT, in most of the cases of a wrong decision same was chosen.

The logistic regression model allowed to estimate a 50%-JND as well. The JND of the NT was at 0.95 N for the increasing force and at −1.62 N decreasing force with the reference force of 5 N.

### 3.3. Secondary Results

The questionnaire about mental demand, the self-estimation performance, frustration level, and comfort were rated with respect to the test results. The mental demand and frustration was higher in the level with lower acting force and the performance at the highest Force Level was perceived as the best. However, the subjects felt more comfortable during the application of the lowest Force Level, followed by the medium and the high Force Level.

Additionally a Touch Test based on von Frey Filaments has been performed before and after the test cycles of the four main tests. Sensory evaluators were applied to the same effective area as in the main tests. The test allows the identification of the minimal noticeable touch force at the skin. The evaluator size and consequently the acting force were increased until the person could notice at least six out of ten trials. The results are of secondary importance for this paper and hence not mentioned before. Neither the results of the preceding nor of the succeeding test showed any correlation to the Normal or Tangential Tests (r-value of about −0.35 and 0.16 p-values larger than 0.05).

### 4. DISCUSSION AND CONCLUSION

We investigated the ability of 24 subjects to discriminate spatial forces given to their right big toe. We varied force amplitude, relation between tangential and normal component, as well as the acting direction while asking subjects for the perceived changes. The experimental protocol was divided into two tests, namely the Tangential and Normal Test, separating for the influences of direction and amplitude. We found the Degree of Tangential Component, the Force Level and the Direction having a significant influence on subjects' success in perceiving the applied direction. Moreover, we found an influence whether the force amplitude increases or decreases, meaning that subjects were significantly better to sense an increase in force compared to a decrease.

In principal, these results provide evidence for the basic purpose of our study, meaning that subjects were able to discriminate spatial forces at the toe varying in amplitude and Tangential Component. Although subjects were able to recognize all directions, the direction right could be perceived significantly worse compared to the other directions. The reason for this effect can not be determined from the result of our study, but it would be of interest to investigate whether this effect can analogously be found on the left big toe.

Our study shows that the directional discrimination threshold at the toe is clearly increased compared to that at the fingertips. Panarese and Edin (2011) identified a minimal tangential discrimination threshold of about 7.1◦ for the fingertips. The threshold is valid for a total force of 5 N. At the toe, the respective JND is between 11.9◦ and 12.2◦ depending on the direction, at the Force Level of 5 N. When applying 8 N to the toe, lower JNDs with about 9◦ are reached. This is not surprising due to physiological differences between the plantar and the palmar skin (Kennedy and Inglis, 2002). The toe consists of a thicker skin with less mechanoreceptors. Additionally, the distribution of these receptors is different in comparison to the fingers. Moreover, the tactile sensation at the toe is primarily used for balancing during walking and standing. It was a new experience for the subjects to recognize forces at their toe and to assign them to certain directions. The results show that there is a difference in terms of perceiving forces in comparison to the fingers. Nevertheless, a directional discrimination at the toe is possible, but with less accuracy compared to the fingertips.

The applied forces to the toe can describe force feedback during gasping. This type of force feedback represents a natural modality (i.e., force information is fed back as force) at a non-natural stimulation site. The natural stimulation modality offers the advantage that no relearning of the provided stimulus is necessary. Nevertheless, it needs to be investigated, whether the non-natural stimulation location may reduce the acceptance in possible applications. Methods like direct intraneural electrical stimulation or reinnervation techniques can potentially allow for a more natural feedback. However, these invasive techniques are not yet widely available and stimulation to the surface of the skin may provide a viable alternative. Here, the glabrous skin offers a better resolution than hairy regions of the body, when considering the two-point discrimination threshold. With respect to the two-point discrimination threshold, the toe is — besides the skin of the face — the only region of the body that offers values close to those of the hand (Weinstein, 1968). Stimulation of glabrous skin instead of hairy skin seems to be advantageous also from a physiological point of view.

Furthermore, the results indicate that a minimal Tangential Component is needed for reliable (50%-JND) directional discrimination. The medium and highest Force Level reached recognition rates of more then 50% with Tangential Components of more than 1 N. The lowest level does not exceed this force. However, in a friction based setup, a high total force is needed in order to apply a large Tangential Component. While higher Force Levels lead to more clear results, the questionnaire showed that

subjects' comfort was significantly reduced. By realizing higher friction between stimulation device and skin, this issue can be compensated for.

In a previous proof of concept we could show that force feedback to the toe can be integrated into the sensorimotorcontrol, when teleoperating a robotic arm in a force task (Hagengruber et al., 2017). Subjects teleoperated blind-folded a DLR Light-Weight Robot by external optical tracking of their index finger. The task was to push a toy train along the rails. The only feedback of the performed task was presented as force feedback to the subjects' toe. The stimulation of the toe was comparable to this work, despite that the force was applied continuously and presented the forces which were measured at the robots end-effector. With this earlier result and the results obtained in this work, we assume that at least from the physiological point of view, force feedback to the toe can be used for applications in telepresence scenarios or prosthetics. A schematically illustration of such applications can be seen in **Figure 8**. A practicable technical application is not existent yet. However, using the findings of this work, it is possible to determine a mapping function, to ensure that the tactile stimuli provided to the toe can actually be perceived by the subject. Provided with force information to the toe, users may be able to improve control of such an assistive device. People who rely on a prosthetic hand could get the possibility for a more precise and

### REFERENCES


natural interaction with their environment. Finally, such devices could help in future to further increase the personal acceptance of assistive technologies by increasing their practicality. Moreover, it would be of interest to see, whether similar results can be obtained from stimulation of the other toes, and finally whether subjects are able to discriminate different force vectors at multiple toes, simultaneously.

### AUTHOR CONTRIBUTIONS

AH developed the device, researched the literature, designed the experiments and statistical model, conducted the experiments, analyzed the data, and wrote the manuscript. HH and JV developed the idea, contributed to the experiment and statistical model design, contributed to the analysis and interpretation of data, and revised the work.

### ACKNOWLEDGMENTS

We would like to thank Maxim Bort and Prof. Küchenhoff of the Consulting Laboratory of the Institute of Statistics (STABLAB) at the Ludwig Maximilians University, Munich for their input to the statistical design. Furthermore we want to thank Thomas Hulin for his valuable comments to this manuscript as well as all our subjects for their participation.


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2018 Hagengruber, Höppner and Vogel. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

APPENDIX

TABLE A1 | Mean values of correct answers of three trials over all subjects per Direction and Tangential Component; Results for (A) Level Low; (B) Level Medium; (C) Level High.


Hagengruber et al. Discrimination of Spatial Forces at the Big Toe

# Visuomotor resolution in Telerobotic grasping with Transmission Delays

*Omri Afgin1 , Nir Sagi1 , Ilana Nisky2,3, Tzvi Ganel3,4 and Sigal Berman1,3\**

*1Department of Industrial Engineering and Management, Ben-Gurion University of the Negev, Beer-Sheva, Israel, 2Department of Biomedical Engineering, Ben-Gurion University of the Negev, Beer-Sheva, Israel, 3Zlotowski Center for Neuroscience, Ben-Gurion University of the Negev, Beer-Sheva, Israel, 4Department of Psychology, Ben-Gurion University of the Negev, Beer-Sheva, Israel*

#### *Edited by:*

*Matteo Bianchi, University of Pisa, Italy*

#### *Reviewed by:*

*Alessandro Moscatelli, Università degli Studi di Roma Tor Vergata, Italy Ryad Chellali, Nanjing University of Technology, China Simone Toma, Arizona State University, United States Leonard James Smart, Miami University, United States*

> *\*Correspondence: Sigal Berman sigalbe@bgu.ac.il*

#### *Specialty section:*

*This article was submitted to Bionics and Biomimetics, a section of the journal Frontiers in Robotics and AI*

*Received: 30 July 2017 Accepted: 06 October 2017 Published: 25 October 2017*

#### *Citation:*

*Afgin O, Sagi N, Nisky I, Ganel T and Berman S (2017) Visuomotor Resolution in Telerobotic Grasping with Transmission Delays. Front. Robot. AI 4:54. doi: 10.3389/frobt.2017.00054*

Weber's law is among the basic psychophysical laws of human perception. It determines that human sensitivity to change along a physical dimension, the just noticeable difference (JND), is linearly related to stimulus intensity. Conversely, in direct (natural), visually guided grasping, Weber's law is violated and the JND does not depend on stimulus intensity. The current work examines adherence to Weber's law in telerobotic grasping. In direct grasping, perception and action are synchronized during task performance. Conversely, in telerobotic control, there is an inherent spatial and temporal separation between perception and action. The understanding of perception–action association in such conditions may facilitate development of objective measures for telerobotic systems and contribute to improved interface design. Moreover, telerobotic systems offer a unique platform for examining underlying causes for the violation of Weber's law during direct grasping. We examined whether, like direct grasping, telerobotic grasping with transmission delays violates Weber's law. To this end, we examined perceptual assessment, grasp control, and grasp demonstration, using a telerobotic system with time delays in two spatial orientations: alongside and facing the robot. The examination framework was adapted to telerobotics from the framework used for examining Weber's law in direct grasping. The variability of final grip apertures (FGAs) in perceptual assessment increased with object size in adherence with Weber's law. Similarly, the variability of maximal grip apertures in grasp demonstration approached significance in adherence with Weber's law. In grasp control, the variability of maximal grip apertures did not increase with object size, which seems to violate Weber's law. However, unlike in direct grasping, motion trajectories were prolonged and fragmented, and included an atypical waiting period prior to finger closure. Therefore, in this condition, maximal grip aperture was an inappropriate indicator of JND. Instead, we calculated the aperture at the end of the opening phase, the initial grip aperture (IGA), and the FGA at the beginning of the waiting period, as more appropriate indicators for the JNDs. The IGAs adhered to Weber's law. The FGAs approached significance in the same direction. This suggests that perception–action association during telerobotic grasping with transmission delays significantly diverges from direct grasping.

Keywords: Weber's law, telerobotics, grasping, motor control, human factors

### INTRODUCTION

Through the years, psychophysical research has uncovered several laws governing human perception–action integration. Among these is Weber's law, which is considered a basic psychophysical principle of human perception (Baird and Noma, 1978). According to Weber's law, the perceptual sensitivity, largely across all sensory modalities, to a change along a physical dimension is linearly related to the intensity of the stimulus, i.e., the just noticeable difference (JND) is larger for stronger stimuli. This finding was supported by numerous experiments. In striking contrast, visually guided, direct (natural) grasping violates Weber's law.

When remotely controlling a robotic manipulator, the human operator is required to integrate cognition, perception, and action while accounting for the spatial separation of the local and remote sites. In such setups, the human operator controls a manipulator that differs in structure and dynamics from the operator's limbs, sensory perceptions are limited and biased, and there are inherent transmission delays. These characteristics make both design and operation of telerobotic systems, challenging. Grasping is fundamental in most telerobotic tasks. It is especially challenging, as contact must be made between the robotic end-effector and the object to be grasped, which requires high spatiotemporal perception–action integration. It is, thus, important to study adherence to Weber's law in the context of telerobotic grasping, as indication of the underlying internal processing mechanisms employed by the user. Additionally, the inherent spatial and temporal separation between perception and action in telerobotic systems offers a unique platform for examining underlying causes for the violation of Weber's law in grasping.

It is common in psychophysics to use explicit methods to measure JNDs, such as the method of constant stimuli, i.e., extracting JND values from the psychophysical function. However, for the purpose of measuring JNDs during motor control, it is necessary to use a different method that directly taps onto movement trajectories. For this purpose, we and others (Ganel et al., 2008) use the classical method of adjustment. According to this method, the variance of the responses to a stimulus reflects an "area of uncertainty," which is a measure of the JND for that stimulus. The use of the method of adjustment for measuring JNDs has not been limited to grasping or to motor control *per se*. Indeed, this method has been used for many years across different perceptual domains, such as time and auditory perception [for discussion, see Ganel et al. (2014)].

In grasping, the JND is measured as the within-participant variability of the maximal finger aperture during the reach-tograsp movement and it remains invariant with object size, in violation of Weber's law (Ganel et al., 2008). Several experimentally confirmed hypotheses have been suggested for explaining this phenomenon (Smeets and Brenner, 2008; Jazi et al., 2015; Löwenkamp et al., 2015; Utz et al., 2015; Jazi and Heath, 2016). The different perspectives from which these hypotheses have emerged have not been resolved thus far. However, it is commonly assumed that the immunity of the visuomotor system to Weber's law reflects an absolute processing style during grasping, which is in sharp contrast to the relative processing style of the human perceptual system (Ganel and Goodale, 2003; Ganel et al., 2008; Jazi and Heath, 2016).

Perceptual–motor transparency is a major concern in telerobotic system interfaces as it determines system fidelity and usability (Preusche and Hirzinger, 2007; Nisky et al., 2013). It was analyzed extensively based on the characteristics of the communication channel. A three-layered human-centered measure of transparency was suggested, where the layers include perceptual transparency, local motor transparency, and remote transparency (Nisky et al., 2013). Perceptual transparency is assessed by quantifying perceptual bias and discrimination thresholds in the mechanical properties of the environment. Local (remote) motor transparency is assessed through comparison of human (remote manipulator) motion trajectories while teleoperating the robot, to those that would be executed if the operation was performed directly on the remote environment. Yet, even these measures are of external operation parameters, such as motion trajectories, and cannot ascertain internal processing similarity within the central nervous system (CNS). Indeed, only systems that can elicit such a degree of transparency, can be considered truly transparent and facilitate very high fidelity and usability. Similarity of the internal processing in natural (direct) and in telerobotic environments can be assessed only by uncovering the underlying mechanisms determining human perception–action operation during natural, direct motion and during telerobotic control. Such analysis is expected to improve the understanding of human operation, facilitate the development of objective measures for quantifying transparency, and lead to design of efficient telerobotic interfaces.

### THE UNDERLYING CAUSES OF WEBER'S LAW

For assessing the implications of violation or adherence to Weber's law in telerobotic grasping, it is important to understand the underlying perception and action processing mechanisms. Hypotheses explaining the lack of Weber's law in grasping vary considerably in their postulations regarding these mechanisms. These hypotheses relate to visual or haptic sensory perceptions, motion planning processes, and biomechanical constraints during motion execution. In the following, we detail the major hypotheses.

The violation of Weber's law in grasping may stem from the functional separation of visual information processing. In the neuroscience literature, it is well established that, perception and action are mediated by separated neural networks. The two visual systems hypothesis proposed by Goodale and Milner (1992) provides a contemporary example for such an account that details the organization of the primate visual system. According to this proposal, the ventral "perception" pathway provides the rich and detailed visual representation of the world, and the dorsal "action" pathway enables flexible control of actions directed to objects.

**Abbreviations:** CNS, central nervous system; FGA, final grip aperture. The aperture at the end of the movement (mm); IGA, initial grip aperture. The aperture after the end of the opening time (mm); JND, just noticeable difference; MGA, maximum grip aperture. The maximum aperture during the movement (mm); STCPD, The scaled sagittal TCP transport distance; TCP, Robotic Tool center point.

This proposal of a functional separation between visual systems underlying action and perception is supported by converging evidence from neuropsychological patient data and behavioral psychophysics. Behavioral studies provide evidence that, unlike visual perception, which is largely governed by relational and Gestalt representations of objects' size and shape, visually guided action treats objects in a more analytic fashion. In-line with this view, the lack of adherence to Weber's law in visually guided, direct grasping, results mainly from the functional separation between the visual systems (Ganel et al., 2008). This view is consistent with many other examples of dissociations between perception and action, such as in the case of the size–weight illusion (Flanagan and Beltzner, 2000) or the effect of delayed force information on perception of impedance and grip force adjustment (Leib et al., 2015).

An alternative account for the violation phenomena is based on the role of haptic feedback, as grasping a physical object involves haptic cues when the fingers touch the target object. Thus, integration of haptic and visual cues may be imperative for absolute specification of object size leading to the violation of Weber's law in grasping. Moreover, the lack of haptic feedback may be the cause of Weber's law appearing in pantomime grasping (Jazi et al., 2015; Jazi and Heath, 2016). Ozana and Ganel (2017) examined adherence to Weber's law in direct grasping of physical objects placed beyond a transparent glass surface. They found that Weber's law was maintained when subjects were instructed to end the motion close to the glass surface, but without touching it, and conversely, that Weber's law was violated when subjects were instructed to touch the surface at the end of the motion. Their findings suggest that even indirect haptic information is sufficient to allow analytic processing during grasp.

Another alternative explanation of the violation of Weber's law in grasping stems from motion planning mechanisms. Most contemporary research of reach-to-grasp motion asserts that it is comprised of two separately controlled, yet coordinated, functional components, the reaching motion bringing the hand toward the object, and the grasp formation shaping the hand according to object features (Jeannerod, 1981; Jeannerod et al., 1995). In contrast, Smeets and Brenner (1999, 2001) claim reachto-grasp motion should be viewed as a coordination of separate finger motion plans. They suggest that reach-to-grasp motion planning is based on reaching with the finger to a position on the object rather than on the object's size (Smeets and Brenner, 2008), and therefore, finger aperture during reach-to-grasp movements does not reflect the computation of size and is not expected to adhere to Weber's law.

Finally, the violation of Weber's law in grasping may be attributed to biomechanical constrains effecting motion execution rather than to planning or perceptual processing (Löwenkamp et al., 2015; Utz et al., 2015). Ceiling effects caused by the limited human finger span and human tendency to avoid large and uncomfortable apertures, can suppress variation in large finger apertures precluding the manifestation of Weber's law. We note, however, that recent research has shown that the dissociation between perception and action in terms of their adherence to Weber's law persists even when the possibility of biomechanical constrains are accounted for (Ganel et al., 2017; Heath and Manzone, 2017; Heath et al., 2017; Manzone et al., 2017).

### THE EFFECTS OF TRANSMISSION DELAYS

In natural grasping, reach-to-grasp motion profiles comprise two components, arm motion for moving the hand toward the object, and hand (finger) motion for grip formation (Lacquaniti and Soechting, 1982; Jeannerod, 1984; Marteniuk et al., 1990; Wallace et al., 1990; Santello et al., 2002). Arm motion profiles follow a stereotypical human motion path based on minimum jerk optimization (Flash and Hogan, 1985) and adhere to Fitts' law for various object types and sizes (Crossman and Goodeve, 1983). Fitts' law, among the basic psychophysical laws related to movement control, models the speed-accuracy tradeoff of human motion. It determines that reaching motion time is a logarithmic function of the ratio between the distance and the width of the target (Fitts, 1954). Grip formation has two stages, opening (finger stretching) and closing (closing fingers toward contact with the object). The formation of the finger grip occurs during arm motion (hand transportation), where maximum arm endpoint (wrist) velocity is typically reached in parallel to maximal aperture (Jeannerod, 1984; Marteniuk et al., 1990; Rand et al., 2000).

In teleoperation, transmission delays between control movements and feedback from the remote system response are inevitable, especially when the distances between the human operator and the controlled robotic device are long. The effects of such delays on operator performance have been extensively studied (Rohde and Ernst, 2016). It was shown that a modified form of Fitts' law modeling a multiplicative relationship between movement time, an index of difficulty, and transmission delays, provides an accurate predictor of the experimental data (Hoffmann, 1992). Visuomotor delays increase errors in driving (Cunningham et al., 2001) distort drawing and writing (Kalmus et al., 1955; Morikiyo and Matsushima, 1990) and impede motor adaptation (Honda et al., 2012a,b). Moreover, a consistent exposure to delay eventually leads to adaptation (Foulkes and Miall, 2000; de la Malla et al., 2014; Farshchiansadegh et al., 2015; Rohde et al., 2014; Avraham et al., 2017; Leib et al., 2017), and aftereffects are evident upon delay removal (Smith and Bowen, 1980; Botzer and Karniel, 2013; Avraham et al., 2017). A delayed visual feedback also affects weight perception, with participants' reports of an increased mass (Honda et al., 2013) or resistance (Takamuku and Gomi, 2015) in the presence of delay. Similarly, delayed force feedback biases perceived stiffness of elastic objects (Leib et al., 2015, 2016) where the effects of delay on actions with elastic objects are often different from their effects on perception (Nisky et al., 2011; Leib et al., 2016).

For long delays (above about 0.7 s), a change in control strategy was also found, from a more continuous form of control to a move-and-wait strategy. Experimental data in both long and short delays fit the modified Fitts' model predictions with different coefficients (Sheridan and Ferrell, 1963; Ferrell, 1965). It was additionally shown that when participants were asked to track the motion of a visual bar with their hand, they were able to adjust to motion displacement only when there were no transmission delays. When the delays were longer than 0.3 s, participants were unable to adjust to their motion displacements, leading the authors to conclude that for such delays the correlation between visual feedback and motor control commands is disrupted (Held et al., 1966).

The amalgamation of these findings make the region of 0.3–0.7 s delays, where motion is continuous yet perception– action synchronization is disrupted, particularly interesting for analyzing telerobotic control. In the current work, we examined Weber's law in a telerobotic control scenario with such transmission delays. We sought to determine if indeed participant behavior during telerobotic control with such delays would adhere to Weber's law. We hypothesized that, as in direct conditions, in telerobotic perceptual-based tasks, participant behavior would adhere to Weber's law. Indeed, establishing adherence to Weber's law during perceptual assessment is crucial for establishing testbed validity. We further hypothesized that, when perceptual transformations are required during telerobotic control, e.g., when viewing and action directions are not aligned, participant behavior would also adhere to Weber's law. To this end, we developed a telerobotic environment with transmission delays and telerobotic versions of direct tasks used for examining Weber's law. A different study from our lab examined Weber's law in a surgical robotic setup with negligible transmission delays (Milstein et al., submitted).1

### MATERIALS AND METHODS

### Participants

Sixty-three healthy, right-handed participants (age 18–31 years, mean 24.3, 30 males) participated in the experiment. Participants had normal or corrected-to-normal vision with no neurological, sensorimotor, or orthopedic impairments. To avoid fatigue, the participants were divided into six groups (two perceptual assessment groups with 10 participants each, two grasp control groups with 11 participants each, and two grasp demonstration groups with 10 and 11 participants), where each group performed one of the experimental procedures described below. According to the requirements of the Helsinki declaration, the Human Subject Research Committee of Ben-Gurion University of the Negev approved the experimental protocol.

### Apparatus

A unilateral telerobotic system (without force feedback to the user) was constructed based on a Motoman UP6 robotic manipulator (Yaskawa, Japan), a controlled jaw gripper, AVG 55 (Schunk, Germany), and a pair of Phantom Premium devices fitted with finger thimbles (Geomagic, USA). The human finger aperture determined the robotic gripper opening (without scaling) and the center of the human finger aperture determined the tool center point (TCP) position (with a 1:2.2 scaling). To simplify the task, robot motion toward (and away from) the object was possible only along a straight horizontal line (forward and backward). Similarly, lifting and placing the object back on the table were also possible only along the vertical axis.

To support robustness and modularity, the system was developed as a distributed system with each hardware component constituting a separate agent. The control was implemented in a data-driven approach, where communication between the components was established over the internet. The communication apparatus was developed using the data distribution service (RTI, USA). The data transmission rate from the Phantom devices was set to 100 Hz and the control cycle delays of the robot and gripper were 0.6 and 0.3 s, respectively. The delays were determined based on hardware constraints and preliminary examination of system operation. The system's transmission delays are determined by the control cycle delays. For such delays motion is expected to be continuous yet perception–action synchronization is disrupted.

Five cylinders with different diameters ranging from 20 to 40 mm in 5-mm steps (XS, S, M, L, XL) were used in the experiments (**Figure 1D**). A small table was placed in front of the robot inside the robot's work-volume for placing the cylinder to be grasped. A single cylinder was placed on the table for each experimental run.

### Experimental Procedure

The experiment comprised three tasks: perceptual assessment, grasp control, and grasp demonstration. The tasks were constructed as a telerobotic version of classical (direct) visual perception, visually guided grasping, and pantomimed grasping tasks typically used for assessing Weber's law (Ganel et al., 2008; Smeets and Brenner, 2008; Jazi et al., 2015; Löwenkamp et al., 2015; Utz et al., 2015; Jazi and Heath, 2016). In the grasp control task, the subject is required to grasp the object. In the visual perception task, the subject indicates her perception of the size of the object with finger aperture. So this task includes finger motion, though not a grasping movement. In pantomimed grasping, the subject pantomimes a grasping motion, so the task includes grasping motion, but not toward a physical object.

Each task was conducted in two orientations of the operator with respect to the robot: alongside (**Figure 1A**) and across (**Figure 1B**). These two orientations were selected because they provide different control directions and viewing conditions. When alongside each other the robot and operator have aligned control directions, as would be the case in direct grasping (when the participant grasps the object using his/her own hand), yet unlike direct grasping, the view of the grasp contact point on the object of the remote robotic finger is obscured. When facing each other in the across orientation the movement directions of the operator and robot are mirrored, i.e., different from direct grasping conditions, yet grasp contact points on the object of both fingers are clearly visible. In both orientations, the participants sat outside the robotic work-volume, about 2 m away from the robot base. Two pseudorandom sets of object order were prepared. In each condition (task and orientation combination), the participants were equally divided and performed the experiment according to one of the two sets. In all tasks at the beginning of each trial, participants placed their fingers at the initial position (**Figure 1C**) with their

<sup>1</sup>Milstein, A., Ganel, T., Berman, S., and Nisky, I. (submitted). The effect of gripper scaling on human-centered transparency of grasping in robot-assisted minimally invasive surgery.

FIGURE 1 | Experimental setup. (A) Alongside, (B) across, (C) initial position, (D) tested cylinders. Written and informed consent has been obtained from the depicted individual for the publication of his identifiable image.

eyes closed, waiting for a computerized audio cue to open their eyes and start moving. Then, they performed the task, paused, and then returned to the initial position. Transitions between the stages were marked by a computerized audio cue. Participants performed the trials for each of the five objects and were allowed to rest at will twice during the experiment. Before starting the experiment, participants practiced the task for a few repetitions until they reported feeling comfortable in performing it.

In the perceptual assessment task, participants were asked to indicate the cylinder's width by opening the gripper to an equivalent aperture during a 5.1-s time window. In this task, only the gripper opening was controlled by the participants, and the robot manipulator did not move. The choice of such pantomimed reporting of a perceptual assessment is consistent with prior studies of perception–action dissociations in their adherence to Weber's law. It is important for making sure that the perceptual assessment and the grasp control are performed using similar finger motion and, therefore, attributing any differences in violation or adherence to Weber's law to the underlying neural processing. The adaptation to the telerobotic environment is in that the aperture of the robotic figures, rather than the aperture of the participant's fingers, is the object size indicator. Participants performed 20 trials for each of the five objects (100 trials overall).

In the grasp control task, participants were requested to teleoperate the robot, and to use it to grasp and lift the object in three consecutive stages, pausing between stages until they received a computerized audio command to continue. The stages were reach and grasp the object (during a 7.2-s time window), raise the object and place it back on the table (during a time window of 3.9 s), and release the object and return to the initial position. Participants performed 20 trials for each of the five objects (100 trials overall).

In the grasp demonstration task, there were two experimental stages. In the first stage, participants practiced remotely controlling the robot with the Phantom interface. They remotely grasped and lifted a cylinder placed on the table using the robotic system (as was performed in the grasp control task). This was done for several minutes until they reported feeling comfortable with the task. This stage was introduced to assure that the participants attain an understanding of the robotic task and appreciate the capabilities of the robotic system. In the second stage, participants were asked to demonstrate reach-to-grasp motion to the robot (in a 4.2-s time window), while their fingers were placed in the Phantom thimbles, just as in the grasp control stage. During the demonstrations the robot or gripper did not move. The adaptation of the pantomime task to the telerobotic demonstration has three components: acquainting the participants with the capabilities of the robotic system; the use of the Phantom interface, and requesting the participants to demonstrate the task to the robot, which is important for placing their actions in context of the robotic operation, rather than their own direct operation. Participants performed 15 demonstration trials for each of the five objects (75 trials overall).

### Data Analysis

Motion trajectories were recorded at 100 Hz and were filtered using a standard two-way, low-pass Butterworth filter (*n* = 3) with a 5.54-Hz cutoff (verified against the data). For the assessment task (**Figure 2B**), Maximal aperture speed was determined over all the movement. Movement start was determined as the time at which the aperture speed exceeded and remained above

10% of the maximal aperture speed, for 0.1 s. To ensure inclusion of final motion corrections movement end was similarly determined as the time at which the aperture speed decreased and remained below 10% of the maximal aperture speed, for 0.5 s. For the grasp control and grasp demonstration tasks (**Figure 2A**), only the reach-to-grasp phase was analyzed. Two maximal speeds were determined, maximal opening speed, during the first part of the movement in which finger aperture increased, and maximal closing speed during the final part of the movement (after reaching maximal aperture) in which finger aperture decreased. Movement start was determined as the time at which the aperture speed exceeded and remained above 10% of the maximal aperture opening speed, for 0.1 s. The end of the aperture opening phase was determined as the time at which the aperture opening speed decreased and remained below 10% of the maximal aperture opening speed, for 0.1 s. The start of the aperture closing phase was determined as the time at which the aperture closing speed increased and remained above 10% of the maximal aperture closing speed, for 0.1 s. Movement end was determined as the time at which the aperture closing speed decreased and remained below 10% of the maximal aperture closing speed, for 0.1 s.

Two measures were defined for all tasks and additional four measures were defined for the grasp control and grasp demonstrations tasks. For all tasks, movement time was computed as the time difference between movement start and end. The mean aperture opening speed was computed as the mean speed during the aperture opening time. For the grasp control and grasp demonstration tasks, aperture opening time ratio (OTR) was computed as the time between movement start and the end of the aperture opening divided by movement time. Aperture transport time ratio (TTR) was computed as the time between the end of the aperture opening and the beginning of the aperture closing divided by movement time. The final waiting time (FWT) was computed as the time between the end of both aperture opening and forward TCP motion, and the beginning of aperture closing. When aperture closing started prior to the end of the forward TCP motion, the FWT was set as 0, i.e., the FWT is a non-negative measure. The scaled sagittal TCP transport distance (STCPD) was calculated as the difference between the TCP position at movement start and end, multiplied by the robot movement scaling-factor (which was 1:2.2 in the experimental apparatus).

For the assessment task, the final grip aperture (FGA) was computed as the aperture at the end of the movement. For the grasp control and grasp demonstration tasks the maximum grip aperture (MGA) was computed over all the aperture motion. For the grasp control task the initial grip aperture (IGA), and the FGA were also calculated. IGA was determined as the aperture after the end of the opening phase, when aperture speed additionally decreased to 3.3% of the global mean maximum aperture opening speed where, the global mean maximum aperture opening speed was computed over all the movements of all the subjects who performed the grasp control task. This value was chosen to ensure the aperture was sampled after the end of the opening phase in a speed that is not related to object size. This is important for verifying that aperture variability is not affected by aperture velocity, which may lead to an indirect dependence on object size (Ganel et al., 2014; Ganel, 2015). FGA for the grasp control task was determined as the aperture at the end of both aperture opening and the forward TCP motion (when aperture closing had not yet started), i.e., the aperture at the beginning of the FWT.

### Statistical Analysis

Failure in the task was defined as failure to complete the task within the designated time window or, additionally in the grasp control task, if the robot collided with the object. Participants were excluded from the analysis if more than 10% of their movements resulted in failure. Data distribution was symmetrical, and therefore, outliers were determined for each remaining participant using the interquartile range of the MGA.

A mixed model ANOVA analysis was conducted for movement time with task (assessment, control, demonstration) and orientation (alongside, across) as between-subjects independent factors, and movement set, as the within-subject independent factor. A similar analysis was conducted separately for each task with orientation as the between-subjects independent factor, and movement set, as the within-subject independent factor for all other measurers (movement time, aperture OTR, aperture TTR, FWT, and STCPD) except for mean aperture opening speed. Mean aperture opening speed was analyzed for each task with orientation as the between-subjects independent factor, and movement set and object size as within-subject independent factors. A confidence interval was determined for the mean STCPD for facilitating comparison to the physically required distance. A linear trend analysis was conducted for the mean and SD of FGA, MGA, and IGA for each task. The analysis of the mean was conducted to verify that the participants were sensitive to object size. Similar to other experiments for assessing Weber's law in grasping, participants were excluded from the analysis when linear trend analysis of the mean did not show a significant linear trend, i.e., analysis showed they were not sensitive to object size. For the perceptual assessment task, the analysis was based on FGA, for grasp control and grasp demonstration the analysis was based on MGA. The analysis of the SD was conducted to test for the adherence to Weber's law. The coefficients used for the linear components of the trend analysis were: −2, −1, 0, 1, 2 for object sizes XS, S, M, L, XL, respectively. These are the coefficients commonly used for linear trend analysis for a set of size five.

### RESULTS

Two participants in the grasp control task were excluded from the analysis as they had many failures and outliers (one alongside 14%, one across 12%). All participants in the perceptual assessment and the grasp demonstration group succeeded in completing the task. Six additional participants (perceptual assessment: one alongside and one across; grasp control: one across; grasp demonstration: one alongside and two across) were excluded from the analysis as statistical analysis showed they were not sensitive to object size (they did not show a linear relationship for MGA or FGA as a function of object size, with a significance threshold of 0.05) and, therefore, they had failed to comply with the experimental task. For the remaining participants (perceptual assessment: nine alongside, nine across, grasp control: 10 alongside, nine across, grasp demonstration: 10 alongside, eight across), failure and outlier ratio ranged between 0 and 7%, with a mean ratio of 1.6% and they performed all tasks without major retractions (**Figures 3** and **4**).

Statistical tests showed that all measures in each task had similar values for both orientations and both movement sets; therefore, all subsequent analysis of the data from each task was conducted jointly for participants from both orientations and both movement sets. Mean values and SD for each task for movement time, aperture OTR, aperture TTR, FWT, STCPD, and mean aperture opening speed are presented in **Table 1**.

The mean movement time significantly differed between tasks [*F*(2,52) = 197.5, *p* < 0.0001]. Mean movement time for grasp control was longer (3.67 s) then grasp demonstration (1.45 s), and perceptual assessment (0.77 s). Mean aperture opening speed for grasp control and grasp demonstration was significantly larger for wider cylinders [grasp control *p* < 0.001, *F*(4,1852) = 18.78; grasp demonstration *p* < 0.001, *F*(4,1321) = 74.32].

Taking the length of the gripper's fingers into account (90 mm finger length), the physical distance the robot has to transverse for performing the grasp successfully is 180–270 mm. That is, when the object is grasped at the tip of the gripper's fingers, the required travel distance is 180 mm and when it is grasped near the wrist, the distance is 270 mm. For grasp control the 95% confidence interval of STCPD was 231.9–236.0 mm and for grasp demonstration, it was 271.7–280.6 mm.

Aperture motion in the grasp control task was fragmented and had a clear opening stage, a transport stage in which participants kept their fingers open, and finally a short closing stage. The forward motion of the TCP started with the aperture opening (**Figures 3B,C**) or after the end of the opening phase (**Figure 3A**).

FIGURE 4 | Grasp demonstration: representative motion profiles for tool center point motion (TCP) toward the object (dotted line) and for the grip aperture (full line). (A) Alongside, object L (Subject 1). (B) Across, object XL (Subject 2). (C) Across, object M (Subject 3). (D) Alongside, object L (Subject 4). Gray background marks aperture opening and closing epochs. Dotted background marks TCP motion epochs.

TABLE 1 | Mean values for motion descriptors, SD values in parentheses.


<sup>+</sup>*MT, movement time; MOS, mean aperture opening speed; OTR, aperture opening time ratio; TTR, aperture transport time ratio; FWT, final waiting time; STCPD, the scaled sagittal tool center point transport distance.*

Almost all of the trials (98.93%, 1,855 of the trials) included a significant FWT (mean 1.74 s), at the end of the transport stage, after the end of the aperture opening and TCP forward movement, and before aperture closing. MGAs were found in various time points along the transport stage (**Figure 3**).

In the grasp demonstration task, grip formation had three stages: opening, transport, and closing in only 20% of the movements (263 trials) (**Figures 4A,B**). Other movement trials had either two stages, where the transport occurred simultaneously with the opening (35%, 469 trials) or the closing (8%, 110 trials) of the fingers (**Figure 4C**), or one stage where the transport occurred simultaneously with the finger opening and closing movements (36%, 483 trials) (**Figure 4D**). Many of the movements (52.67%, 710 of the trials) had a distinguishable waiting time, yet waiting time (mean 0.11 s) was significantly smaller than for grasp control [*p* < 0.0001, *F*(1,35) = 211].

During grasp control, for all objects, mean MGAs were larger than mean FGAs which in turn, were larger than mean IGAs [*p* < 0.0001, *F*(2,5592) = 312.8] (**Figure 5**). The mean MGAs for grasp control were smaller than the mean MGAs for grasp demonstration [*p* < 0.001, *F*(1,35) = 15.188], and mean IGAs for grasp control were larger than mean FGAs for perceptual assessment [*p* < 0.0001, *F*(1,35) = 31.34].

For grasp assessment, the SD of FGAs increased linearly with object size [*p* < 0.01, *F*(1,84) = 9.08] (**Figure 6A**). For grasp control, the SD of IGAs increased linearly with object size [*p* < 0.05, *F*(1,89) = 4.26], and the SD of FGAs had an approaching significance linear trend with object size [*p* = 0.08, *F*(1,89) = 3.16], but the SD of MGAs did not change with object size (**Figures 6C,D**).

For grasp demonstration, the SD of MGAs had an approaching significance linear trend with object size [*p*= 0.09, *F*(1,84) = 2.93] (**Figure 6B**).

## DISCUSSION

As in direct conditions (Ganel et al., 2008), the motion of participants in the perceptual assessment task adhered to Weber's law, i.e., the JND, as measured by the SD of FGA, increased linearly with the size of the object. Similar to pantomime-grasping conditions (Jazi et al., 2015; Jazi and Heath, 2016), the motion of participants during the grasp demonstration task adhered to Weber's law, i.e., the JND, as measured by the SD of MGA, approached significance for the linear increase with the size of the object. The SD of MGA in the grasp control task did not linearly increase with the size of the object. Yet, movement fragmentation and the prolonged aperture transport stage in the grasp control task have most likely influenced MGA, and therefore, reduced the validity of the SD of MGA as an indicator for the JND. Accordingly, we argue that the lack of the linear trend for the MGA in the grasp control task cannot be considered as reliable indicator for the violation of Weber's law. In contrast, IGA, the grip aperture after the opening stage, and FGA, the grip aperture after the end of both the opening and forward motion, are less influenced by fatigue and random movement fluctuations, and thus their SD provide a better indicator for JNDs. Our results indicate that the SD of IGA increased linearly with the size of the object and the SD of FGA approached significance for the linear increase with the size of the object. Therefore, the results of the current study show, for the first time that, unlike direct grasping, telerobotic grasping with transmission delays adheres to Weber's law. This suggests that telerobotic grasping with transmission delays is mediated by different perception–action associations compared to direct grasping. The evidence for the use of different perception–action mechanisms points at an inherent lack of transparency. Such a lack of transparency may indicate an inefficient visuomotor control during telerobotic operation.

In a related study, by Milstein et al. (see text footnote 1), the authors report that telerobotic control without transmission delays violates Weber's law. These findings indicate that the time delays may be the most probable cause of the disruption to the ability of the operator to effectively utilize dorsal-stream computations when performing grasping tasks. Although a clear grasp success indication was attained from the lifting stage following object grasping, the participants did not receive haptic feedback during trials, as under the system's time delays such feedback was highly unnatural and confusing. Therefore, an additional disruption of natural grasp processing may have been contributed by the lack of complete haptic and visual sensory integration in conveying grasp success. Yet, this was also the case in Milstein et al. (see text footnote 1), where violation of Weber's law was found during telerobotic grasping without transmission delays.

The two additional mechanisms suggested as possible causes for the violation of Weber's law in grasping, namely motion planning and biomechanical constraints, were not supported in the current study. While biomechanics may have influenced MGAs during the aperture transport stage, it clearly did not affect IGAs, or grasp control FGAs, i.e., the preparation of the finger aperture for grasping. Yet, biomechanics cannot be fully ruled out, as MGAs were larger than the IGAs and FGAs. As for motion planning for grasping, movement fragmentation and the similarity of results in both orientations support the assertion that, at least for remote manipulation with time delays, participants plan two motion components, reach and grasp formation, and that the planned grasp formation is based on object size.

In almost all cases, the participants were able to complete the tasks successfully. Most of the participants were clearly sensitive to object size as shown by the linear increase in mean aperture value for MGA, IGA, and FGA. This is also strengthened by the demonstrated dependence of the mean opening speed on the object size, as predicted by Fitts' law (Hoffmann, 1992). These results indicate that the experimental setup and protocol were suitably adapted to the capabilities of the participants.

The participants in the grasp demonstration task controlled the telerobotic system for a few minutes during the first, training stage of the experiment. This stage was constructed to familiarize the users with the capabilities of the system. During the training, they were exposed to the system's transmission time delays. It is well documented that consistent exposure to delay leads to adaptation (Foulkes and Miall, 2000; de la Malla et al., 2014; Farshchiansadegh et al., 2015; Rohde and Ernst, 2016; Avraham et al., 2017; Leib et al., 2017). Demonstrated movement characteristics were slower and larger (larger aperture opening, longer forward TCP motion) than required for successful task completion. Mean movement time was much longer (approximately twice) than movement time in natural reach-to-grasp motion (Jeannerod, 1984; Marteniuk et al., 1990; Wallace et al., 1990), and considerably longer than the movement time attainable by the robot. The spatial motion parameters, i.e., MGA and STCPD, were larger than the values required for performing task. It seems that participants have mistakenly assigned the temporal control error induced by the transmission delays to system dynamics. Identifying a lag in visual response as an increased system mass, force, or inertia is a well-known phenomenon (Smith, 1972; Vercher and Gauthier, 1992; Sarlegna et al., 2010; Honda et al., 2013; Takamuku and Gomi, 2015; Leib et al., 2017). In addition, compensating for inertial perturbations can be achieved by slowing down, which has been frequently found as a way to compensate for sensory delays, albeit being suboptimal (Rohde and Ernst, 2016). Regardless, the demonstrated movement profiles were easily adapted for successfully programing a robot to perform the task (Davidowitz and Berman, 2016).

The current research highlights the impacts of telerobotic system characteristics, specifically the transmission delay, on the operator's motion. We show differences in adherence to Weber's law between telerobotic operation with transmission delays, and operation without transmission delays, or direct grasping. Such differences may be related to differences internal perception–action mechanisms within the CNS. Telerobotic operation with transmission delays has been shown to lack transparency

### REFERENCES


and to be more largely dependent on high-level cognitive control mechanisms than direct operation. In such conditions, telerobotic operators are more likely to be subjected to higher cognitive loads, a potential factor which should be accounted for in the design of telerobotic systems.

### ETHICS STATEMENT

This study was carried out in accordance with the recommendations of "The Human Subject Research Committee of Ben-Gurion University of the Negev" with written informed consent from all subjects. All subjects gave written informed consent in accordance with the Declaration of Helsinki. The protocol was approved by the "The Human Subject Research Committee of Ben-Gurion University of the Negev."

### AUTHOR CONTRIBUTIONS

OA, NS, SB, IN, and TG, each made substantial contributions to the study design, revised it critically for important intellectual content, and approved the final version of this written work. All authors listed meet the authorship criteria and no one qualified for authorship has been omitted. The first two authors, OA and NS, equally contributed to the manuscript, they collected data, aided in the design of data analysis procedures, performed data analysis, generated the first draft of this manuscript, and contributed to edits and updates to the document. The third author, IN, contributed to edits and updates of the manuscript. The fourth author, TG, aided in the design of data analysis procedures and contributed to edits and updates of the manuscript. The fifth author, SB, provided guidance for refining the data collection method, aided in the design of data analysis procedures, generated the first draft of this manuscript, and contributed to edits and updates of the manuscript.

### ACKNOWLEDGMENTS

The authors thank Prof. Yisrael Parmet for his help with the statistical analysis.

### FUNDING

Research supported by the Helmsley Charitable Trust through the Agricultural, Biological and Cognitive Robotics Center of Ben-Gurion University of the Negev. IN is supported by the Israeli Science Foundation (grant 823/15), and the Binational United States–Israel Science Foundation (grant 2016850).

Botzer, L., and Karniel, A. (2013). Feedback and feedforward adaptation to visuomotor delay during reaching and slicing movements. *Eur. J. Neurosci.* 38, 2108–2123. doi:10.1111/ejn.12211

Cunningham, D. W., Chatziastros, A., von der Heyde, M., and Bülthoff, H. H. (2001). Driving in the future: temporal visuomotor adaptation and generalization. *J. Vis.* 1, 3. doi:10.1167/1.2.3

Crossman, E. R. F. W., and Goodeve, P. J. (1983). Feedback control of hand-movement and Fitts' law. *Q. J. Exp. Psychol.* 35, 251–278. doi:10.1080/14640748308402133

Davidowitz, I., and Berman, S. (2016). "Robot motion learning and adaptation," in *Computational Motor Control Workshop (CMCW)*, Jun 19–21; (Beer-Sheva).


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Copyright © 2017 Afgin, Sagi, Nisky, Ganel and Berman. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

# Switching in Feedforward Control of Grip Force During Tool-Mediated Interaction With Elastic Force Fields

Olivier White1,2 , Amir Karniel 3,4 , Charalambos Papaxanthis <sup>1</sup> , Marie Barbiero<sup>1</sup> and Ilana Nisky 3,4 \*

1 INSERM UMR1093-CAPS, Université Bourgogne Franche-Comté, UFR des Sciences du Sport, Dijon, France, <sup>2</sup>Acquired Brain Injury Rehabilitation Alliance, School of Health Sciences, University of East Anglia, Norwich, United Kingdom, <sup>3</sup>Department of Biomedical Engineering, Ben-Gurion University of the Negev, Beer-Sheva, Israel, <sup>4</sup>Zlotowski Center for Neuroscience, Ben-Gurion University of the Negev, Beer-Sheva, Israel

Switched systems are common in artificial control systems. Here, we suggest that the brain adopts a switched feedforward control of grip forces during manipulation of objects. We measured how participants modulated grip force when interacting with soft and rigid virtual objects when stiffness varied continuously between trials. We identified a sudden phase transition between two forms of feedforward control that differed in the timing of the synchronization between the anticipated load force and the applied grip force. The switch occurred several trials after a threshold stiffness level in the range 100–200 N/m. These results suggest that in the control of grip force, the brain acts as a switching control system. This opens new research questions as to the nature of the

#### Edited by:

Matteo Bianchi, Università degli Studi di Pisa, Italy

#### Reviewed by:

Virginia Ruiz Garate, Fondazione Istituto Italiano di Technologia, Italy Qiushi Fu, University of Central Florida, United States Minas Liarokapis, University of Auckland, New Zealand Alessandro Moscatelli, Università degli Studi di Roma Tor Vergata, Italy

#### \*Correspondence:

Ilana Nisky nisky@bgu.ac.il

Received: 01 December 2017 Accepted: 23 May 2018 Published: 07 June 2018

#### Citation:

White O, Karniel A, Papaxanthis C, Barbiero M and Nisky I (2018) Switching in Feedforward Control of Grip Force During Tool-Mediated Interaction With Elastic Force Fields. Front. Neurorobot. 12:31. doi: 10.3389/fnbot.2018.00031 discrete state variables that drive the switching.

Keywords: phase transition, grip force, internal model, stiffness, uncertainty

## INTRODUCTION

A driver switches between different gears, air conditioners switch between on and off states and irrigation mechanisms switch between closed and open circuits. In control theory, hybrid systems are systems with continuous and discrete states. The examples outlined above are switched systems, a subclass of hybrid systems, that are defined as continuous time systems with isolated discrete switching events (Liberzon, 2003). The discrete switching often occurs based on a threshold value of another continuous variable, e.g., the velocity in the former example, the temperature of the thermostat in the second and moisture in the last. Such control systems have many benefits, including economy in control effort (Ben-Itzhak and Karniel, 2008; Karniel, 2011; Leib and Karniel, 2012) and the ability to stabilize otherwise unstable systems (Wicks et al., 1998; Liberzon, 2003; Margaliot and Liberzon, 2006; Lin and Antsaklis, 2009).

Switching is also common in human control of movement. For example, human hand and limb movements are intermittent (Craik, 1947; Navas and Stark, 1968; Neilson et al., 1988; Miall et al., 1993; Doeringer and Hogan, 1998; Squeri et al., 2010; Gawthrop et al., 2014), they switch between different types, such as phase and anti-phase cyclic movements (Kelso, 1984; Levy-Tzedek et al., 2010, 2011) and neural activity states, such as bistability of Purkinje cells firing patterns upon sensory input (Gross et al., 2002; Loewenstein et al., 2005; Yartsev et al., 2009). Several models based on switching were proposed to describe control of standing (Bottaro et al., 2005; Asai et al., 2009; Gawthrop et al., 2014), stick balancing (Gawthrop et al., 2013), and hand movements (Ben-Itzhak and Karniel, 2008; Leib and Karniel, 2012). Intermittent control was proposed to be at least as efficient as continuous control (Loram et al., 2011). Here we present evidence suggesting that the feedforward control of grip force during object manipulation is a switched control system, and we mention several candidate variables that correlate with the switching and that are therefore worth exploring in future investigations.

Many studies have used the modulation of grip force with anticipated load force as an evidence for prediction in the control of voluntary movement (Johansson and Westling, 1984, 1988). Moving an object held in precision grip requires the anticipation of inertial and gravitational forces that may cause its slippage (Flanagan et al., 1993; Flanagan and Wing, 1995). The anticipatory adjustment of grip force generalizes to less usual forms of load force including those dependent on object position (Descoins et al., 2006; Danion and Sarlegna, 2007; Sarlegna et al., 2010; Leib et al., 2015), velocity (Flanagan et al., 2003; Nowak et al., 2004), modified gravitational forces (Augurelle et al., 2003; White, 2015) and when forces are generated by whole body actions such as walking or jumping (Gysin et al., 2008). These predictive mechanisms also generalize to other forms of grip configurations (Flanagan and Tresilian, 1994). Without exception, when load forces are generated by a direct action of the body on the environment, grip force and load force profiles match closely as usually quantified by close-to-zero lags between their peak values, or close-to-zero lags in peak cross-correlation between them.

These studies present evidence for the anticipation of smoothly varying, often self-generated, forces (soft forces). However, in many natural object manipulation tasks, the central nervous system must also adjust grip forces to deal with impulse-like destabilizing forces induced by the nearly instantaneous contact between an object and a hard surface (stiff forces). Several studies also addressed the control of grip force in impact-like tasks: when participants had to anticipate a sudden increase of weight after dropping a ball in a hand-held receptacle (Johansson and Westling, 1988; Bleyenheuft et al., 2009), when opening a drawer to its mechanical stop (Serrien et al., 1999), when hitting an object against a pendulum (Turrell et al., 1999) or a surface (White et al., 2011, 2012) or in a step-down task (Ebner-karestinos et al., 2016). A common observation was the occurrence of a maximum of grip force approximately 60 ms after peak load force that signed the impact. A natural question occurred as to whether this delayed grip force peak resulted from a feedback process. Recently, by studying grip force in catch trials, where load forces are not applied, experiments unambiguously demonstrated this behavior reflects a feedforward process and is not a mere reflex response to a perturbation signal (Bleyenheuft et al., 2009; White et al., 2011). Nonetheless, this feedforward strategy contrasts sharply with the zero-delay coupling observed between grip and load forces when the latter vary smoothly. To sum up, past investigations showed that grip force control in soft and stiff elastic force fields exhibits different feedforward control strategies. This is surprising since the underlying mechanics is described by a single stiffness parameter (k) that varies continuously.

Here, we set out to explore the nature of the transition between these two different feedforward control strategies. We studied grip force adjustment during repeated interactions with virtual objects rendered as elastic force fields. In the repeated interactions, the objects properties varied between soft objects to rigid surfaces or vice versa, resulting in systematically changing impact forces, either increasing or decreasing. We hypothesized that if participants adopt a continuous control strategy, when the stiffness will increase (or decrease) continuously over trials, the grip force—load force delay will continuously increase (or decrease) with respect to the impact. Alternatively, if participants adopt a switching control strategy, we expect to find a stiffness level around which there will be a phase transition in the synchronization between the modulation of grip force and the anticipated load force.

### MATERIALS AND METHODS

### Participants

Eighteen right-handed adults (14 females and 4 males, 20–40 years old, mean = 24.3, SD = 10.2 years) participated voluntarily in the experiment. All participants were healthy, without neuromuscular disease and with normal or corrected to normal vision. The experimental protocol was carried out in accordance with the Declaration of Helsinki (1964), the procedures were approved by the local ethics committee of Université de Bourgogne and a written informed consent was obtained from all participants. All participants were naïve as to the purpose of the experiments and were debriefed after the experimental session.

### Apparatus and Stimuli

Participants sat in front of a virtual haptic environment with their head on a chin rest (**Figure 1A**). A mini40 force-torque sensor (ATI Industrial Automation, NC, USA) was mounted on the handle of a robotic device (Phantom 3.0, Sensable Technologies, RI, USA) to record grip force which is the normal force applied by the thumb and the index finger on the transducer (−Fz) and load force (q F 2 <sup>x</sup> + F 2 y ). The 3d positions and forces of the robotic arm were controlled in closed loop at 1 kHz. Participants looked into two mirrors that were mounted at 90 degrees to each other, such that they viewed one LCD screen with the right eye and one LCD screen with the left eye. This stereo display was calibrated such that the physical location of the robotic arm was consistent with the visual disparity information.

### Experimental Procedure

Participants grasped the force sensor with a precision grip (thumb on one side and index finger on the other side, **Figure 1B** inset). To initiate a trial, participants moved their right hand, displayed as a gray 0.5-cm sphere, into another gray starting sphere (1 cm diameter), displayed at body midline and at chest height. Then, a green target rectangle (12 cm width, 1 cm height) appeared 15 cm above home position (**Figure 1B**). Participants were instructed to move the cursor straight upward to touch the target and bring it immediately back to home

the target (bright gray) and increases when vertical position approaches the target (dark gray). Left: feedback of achieved peak velocity. The right inset depicts the force sensor attached to the end of the robot handle held in precision grip. (C) Examples of force-position trajectories of five elastic force fields parametrized by five pairs of stiffness levels and force onset position. Force onset occurred between 3 cm and 14.5 cm above home position. The steeper the slope the stiffer the force field. The circle highlights the second order interpolation between a null force field and a linear elastic force field. (D) Structure of "Ascending" and "Descending" blocks, where stiffness increases (red) and decreases (blue), respectively. Catch trials, for which the stiffness is set to 0 N/m, are positioned at the bottom of the trace.

position without stopping at the reversal point. No instructions were provided regarding how they had to adjust grip force. To avoid large trial-to-trial variability in movement kinematics, after each trial, a line was displayed at a height proportional to peak velocity together with lower (45 cm/s) and upper (55 cm/s) bounds displayed as black horizontal segments. The color of the line was red if peak velocity was outside the interval or green in successful trials (**Figure 1A**). Participants adjusted their movement such that peak velocity fell in that interval.

The target was located inside an elastic force field F that was haptically rendered according to F = 0, y(t) < y<sup>0</sup> k (y(t) − y0), y(t) ≥ y<sup>0</sup> , where y<sup>0</sup> is the boundary of the object and k the stiffness value. Such force field emulates a one-sided spring-like object that only resists compression. The more the cursor approached the target, the more effort was required to move it to the target (see gray gradient in **Figure 1B**). The stiffness of the force field and its onset were varied systematically such that force fields with higher levels of stiffness also onset further along the movement progression (as depicted in **Figure 1C** for five different force fields). The weakest elastic force field was generated when force onset occurred at 3 cm and linearly ramped up to a maximum force of 4 N along the 12 remaining cm (**Figure 1C**, k = 4/0.12 = 33 N/m). Similarly, the strongest force field was obtained when force onset occurred 0.5 cm below the target's lower surface and ramped up to a maximum force of 14 N (**Figure 1C**, k = 14/0.005 = 2800 N/m). Force onset (y0) and stiffness (k) pairs, were parameterized independently trial by trial. The transitions between zero force outside of the elastic force field and non-zero force was smoothed with a second order polynomial interpolation (**Figure 1C**, circle) to avoid mechanical vibrations and overheating of the robot motors, particularly in stiff trials. Consequently, movements were felt as natural and continuous.

The recording session consisted of 10 blocks with 45 movements in each block. In the first five blocks, the stiffness of the elastic force field increased during 41 trials and plateaued for the last four trials (**Figure 1D**, red ''Ascending'' blocks). Force onsets were linearly spaced between 3 cm and 14.5 cm by steps of exactly 0.2875 cm. In the last five blocks, force field stiffness decreased over trials (**Figure 1D**, blue ''Descending'' blocks). Ten participants started the experiment with the ''Ascending'' blocks and eight participants started the experiment with the ''Descending'' blocks. In every block, six trials (13%) were randomly chosen to be catch trials in which the stiffness and onset of the elastic force field were set to zero, effectively vanishing the force field (**Figure 1D**, green disks). Their order was counterbalanced between participants. In the remaining 39 trials, the natural dynamics remained intact.

### Data Processing and Statistical Analyses

Position and grip forces were recorded at 500 Hz. Grip force rate, velocity and acceleration were obtained using a central-difference algorithm and smoothed with a zero phase-lag autoregressive filter (cutoff 20 Hz). All trials were aligned to movement onset, defined as the time when velocity went above 3 cm/s during at least 100 ms. We also recorded temporal occurrences and values of peak acceleration, grip forces and vertical force field (minmax function in matlab and visual check). The last one corresponded to the time of impact. Finally, we extracted the value of grip force rate at expected force onset. We programmed the trial sequences in such a way to record what stiffness would have been generated in catch trials. Therefore, we know the theoretical stiffness and the corresponding force profile (that is not rendered in catch trials). This measure provides an estimate of feedforward mechanisms of grip force and allows direct comparison between normal and catch trials. To compare real and catch trials, we grouped trials of the same block in seven mini blocks of five trials each (except the first and last mini blocks with 10 trials each). That way, every mini block had both real and catch trials. Because stiffness spanned multiple orders of magnitudes, we sometimes used logarithmic scale to plot these values.

We verified that starting with five ''Ascending'' blocks (N = 10) or five ''Descending'' blocks (N = 8) did not influence any of the above variables (all F(1,17) < 1.2, p > 0.312). We therefore pooled these two groups together. Quantile-quantile plots were used to assess normality of the data. A three-way ANOVA was conducted on the above variables to assess the effects of stiffness (Mini block, 1–7), Block condition (''Ascending'' vs. ''Descending'') and Type of trial (''Real'' vs. ''Catch''). Paired t-tests of individual participant means or bootstrap procedures were used to investigate differences between conditions on the above variables. Significance level was set to alpha = 0.05. Data processing and statistical analyses were done using Matlab (The Mathworks, Chicago, IL, USA). Linear fits were calculated with the polyfit function. Partial eta-squared values are reported for significant results to provide indication on effect sizes.

### RESULTS

Participants grasped a force transducer attached to the handle of a haptic device and produced vertical arm movements to touch a virtual target situated 15 cm above home position (**Figure 1A**). The robotic device generated a resistive vertical elastic force field that was parameterized by the stiffness of the field. As trials progressed, the stiffness of the force field either increased or decreased between two extremes, and force onset was shifted further from or closer to movement onset, depending on block condition (''Ascending'' and ''Descending'', respectively, **Figure 1C**). Force fields with the lowest stiffness were similar to a soft elastic force fields that are typically used in other studies (Descoins et al., 2006). In contrast, the force fields with the highest stiffness resembled collisions between the hand-held device and a rigid surface (White et al., 2011, 2012). To measure the feedforward grip force adjustment, we interspersed catch trials, in which visual information was available but no forces were applied. We explored the transition in grip force control between these two extremes.

FIGURE 2 | Averaged traces corresponding to the stiffest (left column) trials and softest (right column) trials across blocks and participants. Top to bottom: vertical position, vertical acceleration, load force, grip force and grip force rate are depicted as a function of time. Black and green lines correspond to real and catch trials, respectively. All traces are aligned with peak of impact (vertical dashed cursor across panels, time 0). Cursors for the grip force traces are positioned at their respective maximum. The lags calculated between peaks of grip and elastic forces illustrate the difference between high-stiffness (lag = 40 ms, SD = 6 ms) and low-stiffness (lag = 4 ms, SD = 6 ms) conditions. Error shade areas correspond to SEM. Traces are not normalized.

### Grip Force Is Different When Interacting With High-Impact and Low-Impact Elastic Force Fields

**Figure 2** illustrates trials in the stiffest condition (left column, solid line) and in the softest condition (right column, solid line) averaged across blocks and participants in a force field trial (black line) and in a catch trial (green line). This figure highlights how the trials differ between the two extreme conditions. In the high-stiffness condition, the vertical position increased until the target was touched at 15 cm and then decreased to return to the home position. Participants achieved mean peak velocities of 49.9 cm/s (SD = 8.3 cm/s), within the prescribed 45–55 cm/s interval. The elastic force field was null until the position of the hand reached the boundary of the field at x = 14.5 cm and then

increased up to 13.75 N (middle row, black line). The vertical pushing force increased when the cursor approached the target and decreased on its way back to the starting position. Grip force increased first to counteract the inertial force (**Figure 2**, Acceleration row) induced by accelerating the mass of the device and exhibited a first local peak synchronized with a local peak in the load force. Then, after a small dip, grip force increased again in anticipation of the contact. This is also reflected by positive grip force rates for 200 ms before impact (bottom row). However, in this stiffness condition, peak grip force was clearly delayed by 40 ms (SD = 6 ms) after the peak of the elastic force field.

In the low-stiffness trials (**Figure 2**, right column), the position and acceleration trajectories resembled those for the high-stiffness trials. However, the elastic force field was smoother: it increased for 400 ms up and reached a 4-N peak. Grip force and grip force rates paralleled the traces observed in the stiff condition with one notable difference: peak grip forces were synchronized with the impact, both in real and catch trials (mean = 4 ms, SD = 6 ms).

### Motor Planning Is Similar Between Real and Catch Trials

It is important to stress that the delay between grip force peak and load force peak observed in high-stiffness trials is not a consequence of a feedback control, but rather a feedforward control that includes a delay. We observed the same behavior in catch trials without the presence of the resistive elastic forces (**Figure 2**, green lines). The green vertical line is positioned at the time when peak elastic force would have occurred given the vertical position. In particular, grip force peaks were delayed by a similar amount relative to impact or expected impact in both real and catch trials.

Due to the absence of resistive forces, different kinematic profiles after t = 0 (the anticipated onset of the perturbation) were induced, and the peak position overshot the target. Moreover, in these trials, participants expected a force ramp but did not feel it. It could be argued that these errors signals could have driven a feedback adjustment of grip force rather than reflect a feedforward strategy even in catch trials. We conducted complementary analyses to show that several parameters characterizing motor planning were the same between real and catch trials.

First, we extracted the value of grip force rate, a reliable index of feedforward grip force control (White et al., 2008), at the time of expected force rise. A t-test failed to report a difference between real and catch trials on grip force rate (t<sup>17</sup> = 0.5, p = 0.642).

Second, we examined the trial-by-trial variations, and verified that grip force rate in catch trials were not statistically

different from grip force rate in the real trial that immediately preceded or succeeded them. To do so, we defined two additional variables by subtracting grip force rate in the previous (Rt−1) or next real trial (Rt+1) from grip force rate in the catch trial (Ct) between them (C<sup>t</sup> − Rt−<sup>1</sup> and C<sup>t</sup> − Rt+1). The ANOVA reported no difference for C<sup>t</sup> − Rt−<sup>1</sup> (Mini block: F(6,238) = 1.8, p = 0.109; Block condition: F(1,238) = 0.7, p = 0.414) and for Ct−Rt+<sup>1</sup> (Mini block:

F(6,238) = 0.5, p = 0.821; Block condition: F(1,238) = 1.0,

"Ascending" and "Descending" conditions. Error bars are SEM.

p = 0.329). **Figure 2** shows that acceleration traces diverge around the perturbation induced by the elastic force field. Acceleration signals also reflect the output of the motor plan. We conducted a last analysis to quantify how acceleration profiles differed between catch and real trials. To do so, we considered mini blocks because they included real and catch trials. We averaged acceleration traces in real trials and in catch trials separately, per participant and per mini block. Then, we ran an independent iterative t-test that compared values of acceleration between both trial types, from trial onset to maximum elastic force (time = 0), and by 20-ms bins. This allowed us to extract the exact time point from which both acceleration traces diverged significantly for at least 150 ms (p < 0.05). We identified a divergent point unambiguously on every averaged acceleration profile in mini blocks (all t<sup>17</sup> > 4.1, all p < 0.001, all η 2 <sup>p</sup> ≥ 0.56). Acceleration values between catch and real trials exhibited a divergent point some 30 ms after force onset in every stiffness condition. This analysis clearly shows that motor planning, in terms of its consequences measured through acceleration, is not affected by trial type.

Based on these analyses, we conclude that in both trial types: (1) motor planning was similar; and (2) grip forces is adjusted on a feedforward manner.

### Grip Force Switches Between Different Control Strategies

At the individual trial level, grip force always exhibited a clear peak over time (see **Figure 2**). Interestingly, the distribution of grip force peaks themselves as a function of stiffness of the elastic force field also reached a global extremum, both in ''Ascending'' and ''Descending'' blocks (**Figure 3A**, average across participants). This observation also held in individual participants except for participant 14 in the ''Descending'' block condition (**Figure 3B**). To compare these global grip force extrema as a function of stiffness, we fitted polynomial

models to the data averaged across participants (**Figure 3A**, dashed lines) or for each participant (**Figure 3B**, dashed lines). Since inter-participant variability was large, we adopted a bootstrap method to test whether grip force peaks of participant fits occurred at the same stiffness level between ''Descending'' and ''Ascending'' block conditions (sample = 18, repetitions = 10,000, SD = 0.18). The 95%-confidence interval (CI) of the difference between both population means was 0.08–0.77, excluding zero (at p = 0.014). Hence, we conclude that the extremum for the ''Descending'' block condition occurred at lower stiffness values than in the ''Ascending'' block condition.

Another strategic change in grip force control is illustrated in **Figure 4**. The difference between times of grip force peaks and times of elastic force peaks is depicted as a function of the natural logarithm of stiffness (no significant difference between block conditions, t<sup>17</sup> = 0.04, p = 0.973), for all participants together (**Figure 4A**) or individually (**Figure 4B**). The individual plots in **Figure 4B** reveal that most of the participants had a prominent transition in the dependency of the time difference as a function of natural logarithm of stiffness around a threshold value. Indeed, the delay seems to linearly increase from negative (leading latencies) and then plateau to a positive value (lagging time).

To reliably quantify this effect, we first extracted slopes of the linear fits of the lag in function of ln(stiffness) on individual participant data. Note that there was large inter participant variability in the quality of the change. Therefore, our analysis focused on determining the statistical significance of the difference in the slopes rather than on the analysis of the average behavior in each stiffness range. We partitioned the data in a low-stiffness and high-stiffness subset, according to the individual thresholds (averaged between the two block conditions) estimated as the value ln(stiffness) for which peak grip force occurred (see **Figure 3B**). We then statistically tested whether slopes differed between both stiffness conditions. To do so, we defined a random variable as the absolute difference of slopes between low and high stiffness conditions and bootstrapped that statistics (sample = 18, repetitions = 10,000, SD = 7.14). The CI was calculated by finding the interval containing 95% of the data (2.5 and 97.5 percentiles). As previously, we reasoned that if zero belonged to the 95% CI, then, the means could not be deemed as being different. In contrast, if zero is found outside the CI, then, slopes are different at p < 0.05. The results of our analysis show that slopes were different (p < 0.001) between low and high stiffness conditions (CI: 10.5–38.3). The Akaike information criteria (AIC) confirmed that a piecewise linear model describes our experimental data better than a single linear regression, despite the larger number of free parameters in the former. Indeed, the AIC for the piecewise regression is smaller than the AIC for the single regression both in ''Ascending'' (222.8 < 235.5) and ''Descending'' conditions (219.8 < 272.41).

In agreement with previous studies, grip force exhibited a first local peak that coincided with a peak of acceleration that occurred early after movement onset. In contrast to the observation of the change in the lag of synchronization of grip force with load force for the elastic load force, **Figure 5A** shows that the delay of the first, inertial, peak varied around zero on average for all participants and held true without exception on a participant basis (**Figure 5B**). The ANOVA confirmed this observation and failed to report any effect on the lag between these two peaks (all F < 0.2, all p > 0.551). A t-test revealed its value was not different from zero (mean = −2.3 ms; t<sup>34</sup> = 0.3, p = 0.365). This highlights that the switching strategy was specific to the interaction with the elastic load force and the impact that characterized this interaction rather than a general change in lag between grip force and load force.

The system responds by switching the grip force lag to a different control strategy following a monotonic change in the environment. We hypothesize that this switch is triggered by a change in the value of a discrete variable that indicates the need for a qualitative change in the behavior of the system. One immediate candidate for such variable is the crossing of a stiffness threshold. Indeed, the switching in both increasing and decreasing series occurs around k = 147 N/m. However, crossing a stiffness threshold is not the only possible switching variable. In previous investigations, acceleration was shown to be a key information to perform tasks involving, for instance, eye-hand coordination (Binsted and Elliott, 1999; Helsen et al., 2000; White et al., 2012). Interestingly, we found that the stiffness threshold that we identified previously marked the transition between positive and negative accelerations at the time of force onset, that is, whether the hand of the participant was accelerating or decelerating when forces started acting on the hand (**Figure 6**). We identified, for each participant, the switch in the acceleration sign at impact, and plotted the switch in strategy as a function of the switch in the sign of acceleration at impact. The correlation was statistically significant across participants (r = 0.36, p = 0.030). This highlights that a correlation exists at a participant-level between these two variables, which however, does not imply causation.

### DISCUSSION

We set out to understand the phase transition between two distinct grip force control strategies during tool-mediated interaction with elastic force fields. Participants interacted with springs with increasing or decreasing stiffness between two

values and controlled grip force according to the expected dynamics in all trials, including in zero-stiffness catch trials. Peak grip force reached a maximum for an average stiffness of 147 N/m. Participants exhibited different qualitative behaviors related to the lag between peak grip force and elastic force in function of stiffness. For most of them, the lag increased with stiffness upon a certain threshold. Based on these observations, we suggest that the central nervous system acts as a hybrid controller that is characterized by continuous and discrete states and operates a phase transition upon a specific stiffness value, potentially triggered by the stiffness value, the sign of the acceleration at the time of the initial contact with the elastic force field, or other candidate switching variables.

### The Brain Modulates the Grip Force Lag to Optimize Stability

Some tasks are quintessentially complex, nonlinear and high dimensional, leading to postural instabilities and task uncertainties such as when we make contacts between two objects. Our results support a view according to which the central nervous system switches strategy in grip force control in the face of locally unstable tasks. Participants unconsciously modulate the lag between peak grip force and (expected, in catch trials) peak elastic load force. Stiff trials produce larger uncontrollable transitory forces (see Figure 6 in White et al., 2011). Our data is in line with the idea that long latencies are better suited to trials for which instability is largest. Indeed, as we showed previously, long latencies allow the viscoelastic properties of the skin to dissipate more energy than short latencies, for which the hand is stiffer (White et al., 2011). In other words, the latency is proportional to instability. Consistently, it was suggested that increasing the delay in a control loop may, in some cases and for certain values, improve stability (Malakhovski and Mirkin, 2006). Consequently, in stiff trials, grip force is smaller at the time when perturbations are maximal than a few tens of millisecond after. In addition to the ability to dissipate energy, lowering the forces has two other positive effects. First, the perturbing forces attributable to signal-dependent noise also decrease with lower forces (Jones et al., 2002; Hamilton et al., 2004). Second, excessive co-activation is energy greedy (Foley and Meyer, 1993; Sih and Stuhmiller, 2003). These grip force adjustment differences are happening within the range of grip forces that protect the participant from object slippage, as evidenced by the fact that none of the participants ever lost grip of the object.

This latency was not constant. Prior studies observed this latency without attempting to experimentally control it (Johansson and Westling, 1988; Johansson et al., 1992; Serrien et al., 1999; Turrell et al., 1999; Bleyenheuft et al., 2009), and found values consistent with the maximal latency (75 ms) we observed in stiff interactions between a hand-held object and a surface (White et al., 2011). In a recent study, we failed to alter that latency by changing stiffness of a virtual surface and direction of movement (White et al., 2011). This was likely the case because the stiff and soft targets were implemented with 1200 N/m and 240 N/m virtual springs, which were both above the stiffness values encountered here. However, in a different study, we observed modulation of latency during profound gravitational changes induced by parabolic flights that challenged participants by confronting them with fundamental environmental uncertainties (White et al., 2012).

### Switching Is Stiffness-Dependent

Perhaps the most striking observation is that the central nervous system switched between grip force strategies around a certain individual threshold identified through three independent observations. First, it marked the average stiffness at which grip force peaked (**Figure 3B**). Second, the piecewise linear fit had a remarkable point close to this stiffness (**Figure 4A**). Third, hand acceleration at force onset reversed its sign around that threshold. It is also worth reporting that a few hundreds of milliseconds before impact, in the very same trial, grip force exhibited a local peak that was synchronized with the small yet significant load force maximum due to inertia (**Figures 5A,B**). When comparing **Figures 4A,B**, **5A,B**, it is very clear that participants predictively control grip force very differently when they are confronted to inertia or impacts. A more subtler adjustments also hold for low and high stiffness interactions.

Interestingly, while the lag was not statistically different between the two block conditions, there were two well identified global maxima in the peaks of the grip force as a function of stiffness. There is a hysteresis in the stiffness level at which the grip force is maximal: the maximum in the grip force appears at a higher level of stiffness in the ascending series than in the descending series. Such hysteresis behavior does not appear in the lag between grip force and load force. This suggests that during repeated interactions with the elastic force fields, the motor system identifies crossing a stiffness threshold, and switches the feedforward control strategy. Then, once evidence for having crossed the threshold is available and the success of the change in the strategy accumulates, the system reacts with a decrease of the magnitude of the grip force peak. The presence of a hysteresis is a signature of some inertia in the mechanisms that drive the switching.

The switching in grip force control strategy might be coupled with another example of a switch between two dichotomies in interaction with elastic force fields: during tool-mediated interaction with elastic objects, the motor system can choose between controlling movement trajectories to controlling interaction forces. Previous studies suggested that stiffness (Chib et al., 2006; Mugge et al., 2009) and stiffness discontinuity crossing (Nisky et al., 2008) lead to different weighting of position and force control in manual interaction. When participants interact with low-stiffness force fields, they control kinematics, and estimate the stiffness of the elastic field based on integration of position information with sensed forces. With increasing stiffness, the reliability of stiffness estimation deteriorates in accordance with Weber's law (Jones and Hunter, 1990). When participants interact with elastic force fields with higher stiffness (Chib et al., 2006; Mugge et al., 2009), or more frequently cross stiffness discontinuity (Nisky et al., 2008), they favor control of interaction forces rather than the control of kinematics. When this transition happens, the central nervous system might start estimating the compliance of the elastic field (the ratio between the displacement and the force that caused it) rather than its stiffness (the ratio between the force and the displacement that caused it), resulting again in reliable estimates. Such view of different estimation is consistent with our observation that peak grip forces are largest around the transition and are smaller for very high or very low stiffness levels. It is also strikingly consistent with the threshold of around 100–200 N/m in the stiffnessdependent weighting of force and position feedback (Mugge et al., 2009).

If indeed a stiffness threshold is used as a switching variable, to use this information in control of robotic interfaces, it is important to model how the brain estimates stiffness. Various computational models were proposed, including: peak force divided by perceived penetration (Pressman et al., 2007, 2008, 2011), or regression of force over position or position over force data (Nisky et al., 2008, 2010, 2011). Another proposed measure is Extended Rate Hardness, a measure of the perceived hardness of a surface based on rate of force change and penetration velocity (Han and Choi, 2010). Skin deformation accompanying the probing also likely plays a role in perception of stiffness (Quek et al., 2014, 2015; Farajian et al., 2017). Here, we do not attempt to spill more light on this matter, but it is likely that the estimated stiffness is used in the process of the switching.

We investigated the behavioral aspects of the switching and its potential underlying switching variable. The question remains open which neural structures operate this switching. In previous studies, we have shown that the left supplementary motor area (SMA) is a crucial node in the network that processes the internal representation of object dynamics (White et al., 2013) leaving this neural structure as a potential candidate to host the decision variable that controls the phase transition. We also showed that the posterior parietal cortex (PPC) is involved in the perception of stiffness (Leib et al., 2016). Other candidate areas may include the cerebellum and the insula. Several studies reported bistable states of Purkinje cells in the cerebellum that may serve as a switching trigger (Yartsev et al., 2009). Finally, the insula seems to be involved in switching between the executive control and default networks (Sridharan et al., 2008).

Finally, it is worth pointing out that our results are apparently at odd with the fact participants cannot switch control policies in reaching movements between two opposing viscous force fields (Karniel and Mussa-Ivaldi, 2002). The occurrence of each force field was cued. The authors suggested that competition occurred between two different internal models that could not co-exist in the brain. In our experiment, switching occurred while the nature of the force field varied predictively as well. However, its variations were far more continuous than in the Karniel and Mussa-Ivaldi experiment. Therefore, we suggest that in our paradigm, the brain could rely on a single internal model and re-estimate the upcoming stiffness value based on recent history. As shown, the switched behavior took some trials to really occur after the thresholds were broken. Instead, in the Karniel and Mussa-Ivaldi study, the difference between force fields were more contrasted, making it difficult to re-estimate the force field based on a single parameter that took very different discrete values. We think that switching or learning is possible if the nature of the force field changes continuously and gently, whatever the complexity of the changes (for instance, we showed recently that participants could immediately adapt grip force in new gravitational phases generated by a centrifuge (Barbiero et al., 2017; White et al., 2018)). In addition, we previously found that the control of grip force may be characterized by different control policies than perception or manipulation—for example, during interaction with elastic force fields, delay causes bias in perception but not in the control of grip forces (Leib et al., 2015). Moreover, and perhaps closer to our current study, the predictive control during lifting of series of objects with increasing mass (Mawase and Karniel, 2010) was fundamentally different from during reaching with perturbing force fields with a series of increasing viscosity parameters (Mawase and Karniel, 2012).

### Limitations and Perspectives

We should however also underline two limitations of the present study. First, we failed to explain these results within a fully coherent, average behavior. Instead, we found some idiosyncratic changes in strategy. This new question should be addressed in a follow-up experiment aiming at identifying what caused these switches. Second, our data exhibit large variability, which is inevitable when studying the interaction between mechanical interactions and physiological processes. Future investigations should improve the technical design of these experiments. Our contribution paves the way toward using switched systems theory in modeling human motor control and opens new research questions as to the nature of the discrete state variables that drive the switching between different control strategies.

How can these results be employed in the control of robotic systems? Identifying human control strategies in object manipulation is crucial for developing efficient control algorithms for a variety of human-operated robotic applications ranging from tele-operated surgical robotics to smart prostheses. Modulation between grip force and load force in human grasping allows for securing held objects against slippage without applying excessive forces. This modulation is impossible in the absence of force feedback (Gibo et al., 2014). Therefore, in state of the art tele-operation robot-assisted surgery systems, users apply unnatural grip force control strategies (Gibo et al., 2014), likely leading to suboptimal performance. Adding some form of force feedback about the load force of manipulated objects contributes to natural coordination between grip force and load force. Our current results suggest that the force feedback that is presented to the user should be designed in a manner that the switching strategy can be employed. If this is impossible due to limited tele-operation control gains, the tele-operated gripper can incorporate local smart switching in grip force control. Similar ideas may be implemented in next generation prostheses to facilitate natural manipulation of objects. Future studies are needed to develop such human-inspired controllers and test their potential benefits compared to state of the art grippers and prostheses controllers.

To conclude, we show here evidence that the central nervous system adopts qualitatively different grip force controls to cope with impact-like environments. Our results show that the central nervous system acts as a switching system. Our findings may have very practical implications since human-machine interfaces nowadays involve haptic feedback, but many applications of fine object manipulation are missing haptic feedback, such as robotassisted surgery and prosthetics.

### AUTHOR CONTRIBUTIONS

OW and AK designed the experiments, collected and analyzed data. OW, AK and IN interpreted the results. MB collected data. OW, AK, CP, MB and IN contributed to writing the manuscript, manuscript revision, read and approved the submitted version.

### FUNDING

This research was supported by the Institut National de la Santé et de la Recherche Médicale, the Conseil Général de Bourgogne and the Fonds européen de développement régional. AK and IN were supported by the Leona M. and Harry B. Helmsley Charitable Trust through the Agricultural, Biological and Cognitive Robotics Initiative of Ben-Gurion University of the Negev, and the Israeli Science Foundation (grant number 823/15).

### REFERENCES


freedom skin deformation feedback. IEEE Transactions on Haptics 8, 209–221. doi: 10.1109/TOH.2015.2398448


**Conflict of Interest Statement**: The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2018 White, Karniel, Papaxanthis, Barbiero and Nisky. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Replicating Human Hand Synergies Onto Robotic Hands: A Review on Software and Hardware Strategies

#### Gionata Salvietti 1,2 \*

<sup>1</sup> Department of Information Engineering and Mathematics, University of Siena, Siena, Italy, <sup>2</sup> Department of Advanced Robotics, Istituto Italiano di Tecnologia, Genoa, Italy

This review reports the principal solutions proposed in the literature to reduce the complexity of the control and of the design of robotic hands taking inspiration from the organization of the human brain. Several studies in neuroscience concerning the sensorimotor organization of the human hand proved that, despite the complexity of the hand, a few parameters can describe most of the variance in the patterns of configurations and movements. In other words, humans exploit a reduced set of parameters, known in the literature as synergies, to control their hands. In robotics, this dimensionality reduction can be achieved by coupling some of the degrees of freedom (DoFs) of the robotic hand, that results in a reduction of the needed inputs. Such coupling can be obtained at the software level, exploiting mapping algorithm to reproduce human hand organization, and at the hardware level, through either rigid or compliant physical couplings between the joints of the robotic hand. This paper reviews the main solutions proposed for both the approaches.

#### Edited by:

Ganesh R. Naik, Western Sydney University, Australia

#### Reviewed by:

Ashley Kleinhans, Ford Motor Company, United States Yinlai Jiang, University of Electro-Communications, Japan Alejandro Linares-Barranco, Universidad de Sevilla, Spain Agnes Roby-Brami, Institut National de la Santé et de la Recherche Médicale (INSERM), France

> \*Correspondence: Gionata Salvietti salviettigio@dii.unisi.it

Received: 11 December 2017 Accepted: 16 May 2018 Published: 07 June 2018

#### Citation:

Salvietti G (2018) Replicating Human Hand Synergies Onto Robotic Hands: A Review on Software and Hardware Strategies. Front. Neurorobot. 12:27. doi: 10.3389/fnbot.2018.00027 Keywords: human hand synergies, robotic hand control, mapping strategies, human motor control, hand

### 1. INTRODUCTION

In the last decade, several roboticists have tried to replicate human hand motor control to possibly simplify the design and actuation of robotic hands. The neuroscientific foundation supporting this approach is the demonstration that, despite the intricate nature of the human hand, a reduced number of variables is able to explain a large part of the variance in patterns of the human hand configurations and movements, as pioneered by Bernstein (1966) and later reported by Santello et al. (1998). These variables are usually referred to as postural synergies and can be interpreted as a correlation of DoFs in frequently used patterns, Santello et al. (2016). Several experimental approaches, ranging from recording of electromyography and cortical activities to the studies of finger movement kinematics, have investigated the neural control of the hand. The results confirms that the simultaneous motion of the fingers underlay to coordinated patterns that reduce the number of independent DoFs to be controlled. The idea that particular arrangements of muscular activities could compose a base set analogous to the concept of basis in the theory of vector spaces was introduced by Easton (1972). Todorov et al. (2005) proposed an optimal stochastic control based on the same geometrical system of redundancy resolution.

Santello et al. (1998) investigated the postural synergies hypothesis by recording a large data set of grasping poses from subjects that were asked to mime grasps of a set of 57 objects. A Principal Components Analysis (PCA) of this data reported that more than 80% of the variance could be accounted with the first two principal components, whereas the first three components explained up to 90% of the variance in the data. This suggests that a much lower-dimensional subspace of the hand DoFs space can efficiently characterize the recorded data. In other words, instead of controlling the single 20 DoFs of a human hand, only two or three joints coupling leading to coordinated motions of the hand could be used to achieve many of the grasps used in everyday life. These ideas can be exploited in robotics, as they introduce a novel and principled manner to simplify the design and analysis of hands different from other sometimes arbitrary and more empirical design attempts.

In this work, the main approaches used to design and control robotic hands exploiting the synergy concept are reviewed. First, several robotic hands mechanically designed to resemble human hand synergies are described in section 3. Then, in section 4, the principal mapping algorithms used to synergically control multi-DoFs hands are introduced. Finally, in section 5, a discussion on the current work and future direction is reported.

### 2. HAND SYNERGIES FROM NEUROSCIENCE TO ROBOTICS

Fully actuated robotic hands have been extensively studied and several tools for modeling and control are available in literature, as reported by Murray et al. (1994) and Prattichizzo and Trinkle (2016). However, to fully exploit the wide dexterity of multi-DoF hands with independent actuated joints it is necessary to design sophisticated control strategies that often represent the main roadblock to the plain usability and efficiency of robot hands in real-world scenarios. Among the several attempts to reduce robotic hand control parameters, the one based on synergies is attracting a critical mass of researchers. The main reason behind this diffusion resides on the neuroscientific results reporting that, between other possible choices for the basis to describe the hand configuration, most of the hand grasp posture variance is explained by the first two synergies, as reported by Santello et al. (1998).

A direct interpretation of these results would implicate that the robotic hand joint configuration vector q ∈ ℜn<sup>q</sup> , where n<sup>q</sup> is the number of joints in the hand, could be represented as a function of fewer elements, collected in a synergy vector z ∈ ℜn<sup>z</sup> with n<sup>z</sup> ≤ nq. As formalized by Bicchi et al. (2011) and Prattichizzo et al. (2013), indicating with q˙ the hand joint velocities, we can define the linear map q˙ = S(z)z˙, where S is the synergy matrix and z˙ represents synergy velocities. Columns of the matrix of synergies S ∈ ℜnq×n<sup>z</sup> represent the postural synergies, also named as eigengrasps in the literature, e.g., by Ciocarlie and Allen (2009). In other terms, the columns represent the joint velocities that are obtained acting on each single synergy z˙. This pure kinematic model fails to describe the possible grasps of an object since does not consider a possible hand adaptation to the shape of the grasped object. A possible solution is to consider the most general case of statically-indeterminate grasps (Prattichizzo and Trinkle, 2016), and thus introduce both contact and joint compliance in the analysis. Doing so, we assume that the synergistic hand displacements δz ∈ ℜn<sup>z</sup> does not directly command the joint displacements δq ∈ ℜn<sup>q</sup> , but the synergistic displacements input δz commands the joint reference positions qref as:

$$
\delta q\_{ref} = \mathcal{S} \delta z,\tag{1}
$$

which are related to the actual joint displacements by the constitutive equation:

$$
\delta q = \delta q\_{ref} - C\_q \delta \mathbf{r}, \tag{2}
$$

where C<sup>q</sup> models the joint compliance and δτ represents the torques at the joints, as reported by Prattichizzo et al. (2010). When no contact with the object is present, the reference and real joints positions overlap, whereas if contact forces are present, the compliance of the hand forces the real hand to diverge from the reference one. This means that the real hand configuration is synergy driven, but can modify its posture so to comply with the object shape. Gabiccini et al. (2011) defined this approach as soft synergy model of hands.

In the following sections, the main attempts to reproduce, either mechanically or by means of the control, the matrix S representing the synergistic joint coupling are reported.

### 3. MECHANICAL IMPLEMENTATION OF POSTURAL SYNERGIES

In section 2, two possible ways to model the hand synergies have been introduced. The distinction between "rigid" and "soft" synergies also represents the two main approaches in literature to mechanically implement the coordination of joint motions in underactuated hands. Brown and Asada (2007) pioneered the idea of using a mechanism to rigidly couple the motion of the joints according to the human synergies. A train of pulleys of different radii was used to transmit simultaneously different motions to each joint. The radii of the pulleys were set according to the scalar weight that compose the columns of synergy matrix S. In other words, changing the radius of the pulleys, it was possible to regulate how much a certain joint is displaced once the motor is activated. Motions corresponding to the first two synergies were superimposed via tendons and idle pulleys resulting in the prototype illustrated in the left hand side of **Figure 1**. A similar approach has been used by Li et al. (2014) to design a prosthetic hand where twelve DoFs are activated using only two motors. Xu et al. (2014b) proposed a prototype where the postural synergies were mechanically implemented in an underactuated anthropomorphic hand using planetary gears. Rosmarin and Asada (2008) proposed a hybrid actuation system using two DC motors and shape memory alloy (SMA) actuators. The two DC motors drove the entire robotic hand according to the direction of the two most significant synergies. The synergies were determined through the PCA analysis of a set of robotic hand postures. The higher order terms were actuated by SMA so to reduce the actuators' encumbrance.

The "soft" synergies approach described in section 2 is an efficient solution to design anthropomorphic hands with a synergistic motion. Catalano et al. (2014) have investigated how to exploit the soft synergy concept through the design of

underactuated hands that have desirable adaptivity to shapes of the grasped objects. Birglen et al. (2008) have reported how underactuation can be achieved effectively with simple differential and elastic elements. Catalano et al. leveraged on this design principles to realize a soft synergy model defined by a synergy matrix S and a joint compliance matrix C<sup>q</sup> through the definition of a proper transmission matrix and the design of the joint stiffness. The authors defined this solution as adaptive synergies. The resulting prototype, called the Pisa/IIT SoftHand, has 19 DoFs arranged in four fingers and an opposable thumb, see the right side of **Figure 2**. Only one actuator drives all the fingers so to resamble the first synergy defined as in Santello et al. (1998). Recently, Piazza et al. (2017) have exploited the same concept of adaptive synergies to design a prosthetic hand.

Finally, Xu et al. (2014a) proposed a continuum structure for the mechanical implementation of the postural synergies. Using a continuum mechanism, two independent translational inputs were scaled and combined to generate six translational outputs to drive a prosthetic hand prototype.

### 4. SOFTWARE PROCEDURES FOR SYNERGISTIC CONTROL OF MULTI-DOF ROBOTIC HANDS

Software synergies refers to all the techniques that have been proposed in literature to control a multi-DoF hand with a reduced number of parameters so to resamble the synergistic actuation of the human hand. The several approaches proposed over the last decade can be classified into two main categories: (i) mapping of synergies from humans to robots, and (ii) redefinition of synergies for robotic hands. The main idea of the former method is to define a synergy matrix computed through some statistical analysis of human poses over objects and to replicate the synergistic motion onto the kinematic of the robotic hand using a proper mapping strategy. The work of Ciocarlie and Allen (2009) is one of the first examples of this method. They used a joint-to-joint mapping to replicate the synergy subspaces obtained by Santello et al. onto four different models of hands. Joint-to-joint mapping considers a direct association between joints on the human hand and joints on the robotic hand. Other researchers have investigated this approach. Rosell and Suárez (2014) used a sensorized glove to collect data from the joints of the human hand captured while an operator was moving freely the fingers, i.e., without executing or simulating grasping or manipulation actions, and then joint-to-joint mapped the data onto the Schunk SAH hand. Kim et al. (2016) proposed an algorithm that uses a tensor composed of data relevant to different individuals and various motions in multiple dimensions to evaluate human hand synergies. The corresponding values for a robot hand were then computed assuming that the coefficients of the synergies of the human hand were identical to those of the robotic hand. It is worth noting that joint-to-joint mapping represents the simplest way to define a correspondence between the joints of human and robotic hands. This mapping results efficient when the robotic hand has an anthropomorphic structure, whereas, when non-anthropomorphic devices are considered, the joint correspondence is usually defined considering some heuristics that often reduce the reliability of the motion reproduction.

Another method to map the human hand synergies is the so called Cartesian space mapping. Cartesian mapping focuses on the relation between the workspaces of the human and robot hand and usually tries to find a correspondence between the motion of the fingertips of the two hands. Ficuciello et al. (2018)

mapped human grasps onto a robotic underactuated hand using fingertips measurements, obtained through a RGBD camera sensor, and inverse kinematics. Geng et al. (2011) realized a two stage mapping. Firstly, they extracted the synergies from human grasping data and later they implemented an optimized mapping to replicate fingertip positions of the human hand to those of a robot hand. Cartesian mapping presents some advantages with respect to the joint-to-joint mapping, since it is not a necessity to relate each robotic joint motion to that of human joints. However, this method fails in replicating a correct mapping in terms of forces and movements exerted by the robotic hand on a grasped object. Gioioso et al. (2013) have presented a method for mapping synergies defined in the task space that tries to overcome the problem of dissimilar kinematics between human and robotic hand. The main idea of the approach is to define two virtual objects, one on the robotic hand and one on a model of the human hand. Each virtual object is defined considering the minimum volume sphere containing a set of reference points defined on the hand, see the bottom part of **Figure 2**. The human hand model can be moved according to a synergistic motion computed using the dataset of Santello et al. (1998). Such motion displaces the reference points on the hand generating a rigid body motion and a deformation of the virtual object. These transformations of the object are then imposed, possibly scaled, onto the object defined on the robotic hand. An inverse kinematic technique is used to compute robotic joint motions that comply with the virtual object motion and deformation. The authors proved that the virtual object method is more efficient in terms of force mapping and accuracy in reproduction of the directions of motion with respect to joint-to-joint and Cartesian mappings. In Gioioso et al. (2012), the method was extended considering an ellipsoid instead of a sphere as virtual object. This improvement consented to describe the virtual object deformation using three parameters, the ellipsoid semi-axis variations, instead of one, the sphere radius variation. Salvietti et al. (2014) used the average homogenous transformation of the reference points so to capture a larger set of possible motions of the virtual object. All the techniques related to the object-based mapping of the human hand synergies have been collected in a freely available Matlab toolbox, called SynGrasp (Malvezzi et al., 2015). **Figure 2** shows a schematic representation of the mapping strategies.

The second main approach to define software synergies consists in collecting data from grasps obtained directly with the robotic hand and using a statistical analysis to extract the primitives for the specific hand. Ficuciello et al. (2014) computed the first two fundamental synergies for the UB Hand IV applying PCA on a set of 36 grasps of different objects, involving both precision and power grasps. Matrone et al. (2010) collected the sensory data of a prosthetic hand while performing 50 different grasps, and subsequently used a PCA based algorithm to drive the 16 DoFs of an underactuated prosthetic hand prototype, called CyberHand, with a two dimensional control input. Wimböck et al. (2011) analyzed a large grasp database collected over years of use of the DLR Hand II. Using PCA, they found that 74% of these grasps, originally defined by the twelve joint variables of the hand, could be represented by two coordinates. As a second step, a synergy impedance controller was derived and implemented to extend the work on passivity based hand control for the DLR Hand II. Later, Salvietti et al. (2013) combined the object basedmapping with the synergy impedance controller to simplify robotic hand control in the synergy subspace. Bernardino et al. (2013) teleoperated a Shadow Hand and an iCub Hand so to perform the grasp of 12 different objects and then used the collected joint data from the robotic hands to compute postural synergies using PCA. Finally, Cotugno et al. (2014) used a kinaesthetic teaching approach to collect data from the iCub Hand. The teaching was performed by a human operator guiding the fingers of the robot with the motors switched off so to perform a pick and place operation over a set of objects. Singular value decomposition was later performed on the preprocessed joint data in order to obtain the postural primitives of the hand that span the variability of the corresponding grasping demonstrations.

### 5. DISCUSSION AND PERSPECTIVE

In this review, the main mechanical and software solutions that explicitly exploit the concept of human hand synergies have been reported. The main reason is the direct link between neuroscientific studies and robotics. There are several other works on underactuation both from a software and a hardware point of view that have not been treated in this work. Among

### REFERENCES


all, it is worth mentioning recent results on the design of underactuated soft hands which are designed to include intrinsic passive compliant elements, see e.g., the hand proposed by Deimel and Brock (2016). In this context, Salvietti et al. (2017) have proposed a procedure to compute the stiffness ratio between the passive compliant joints of a robotic hand so to resemble the trajectory for the fingertips obtained through the execution of the first synergy.

Concerning the software synergies, both the presented approaches have pros and cons. The use of data collected from the human hand allows to exploit human brain control mechanisms that resulted from thousands of years of evolution. However, the adaptation of the data to the kinematics of a robotic hand is prone to errors that may compromise the fine control of the forces exerted on a grasped object. On the contrary, synergies defined directly on the robotic hand are highly specialized for the specific hand kinematics, but may highly depend on the set of grasps decided by the operator or by the operator kineasthetic teaching. This could result in very specialized primitives that may difficultly generalize over a wider set of objects.

Although the complexity reduction brought by the synergistic organization of the hand have led to encouraging results in grasping, how to exploit high order synergies to perform more complex manipulation tasks is still an open issue. A possible tradeoff between the complexity of the control and the level of dexterity of the robotic hand will probably come from a more deep interaction between designers and controllers so to embed part of the control directly in the hand structures.

### AUTHOR CONTRIBUTIONS

GS have organized and written this mini-review. GS has also contribute to the topic of the review with publications that have been mentioned in the review.

### FUNDING

We gratefully acknowledge the funding provided by the SOMA project (European Commission, Horizon 2020 Framework Programme, H2020-ICT-645599) and the SoftPro project (European Commission, Horizon 2020 Framework Programme, H2020-ICT-688857).

### ACKNOWLEDGMENTS

The author is grateful to Prof. Domenico Prattichizzo and Prof. Monica Malvezzi for the joint works on synergy mapping and for inspiring this manuscript.


analysis," in Proceedings of IEEE/RSJ International Symposium Intelligent Robots and Systems (San Diego, CA: IEEE), 2877–2882.


and Systems VI, eds Y. Matsuoka, H. Durrant-Whyte, and J. Neira (Zaragoza: The MIT Press), 49–56.


**Conflict of Interest Statement:** The author declares that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2018 Salvietti. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Improving Fine Control of Grasping Force during Hand–Object Interactions for a Soft Synergy-Inspired Myoelectric Prosthetic Hand

*Qiushi Fu1,2\* and Marco Santello1*

*1Neural Control of Movement Laboratory, School of Biological and Health Systems Engineering, Arizona State University, Tempe, AZ, United States, 2Mechanical and Aerospace Engineering, University of Central Florida, Orlando, FL, United States*

The concept of postural synergies of the human hand has been shown to potentially reduce complexity in the neuromuscular control of grasping. By merging this concept with soft robotics approaches, a multi degrees of freedom soft-synergy prosthetic hand [SoftHand-Pro (SHP)] was created. The mechanical innovation of the SHP enables adaptive and robust functional grasps with simple and intuitive myoelectric control from only two surface electromyogram (sEMG) channels. However, the current myoelectric controller has very limited capability for fine control of grasp forces. We addressed this challenge by designing a hybrid-gain myoelectric controller that switches control gains based on the sensorimotor state of the SHP. This controller was tested against a conventional single-gain (SG) controller, as well as against native hand in able-bodied subjects. We used the following tasks to evaluate the performance of grasp force control: (1) pick and place objects with different size, weight, and fragility levels using power or precision grasp and (2) squeezing objects with different stiffness. Sensory feedback of the grasp forces was provided to the user through a non-invasive, mechanotactile haptic feedback device mounted on the upper arm. We demonstrated that the novel hybrid controller enabled superior task completion speed and fine force control over SG controller in object pick-and-place tasks. We also found that the performance of the hybrid controller qualitatively agrees with the performance of native human hands.

#### *Edited by:*

*Keum-Shik Hong, Pusan National University, South Korea*

#### *Reviewed by:*

*Kyujin Cho, Seoul National University, South Korea Lorenzo Masia, Nanyang Technological University, Singapore*

#### *\*Correspondence:*

*Qiushi Fu qiushi.fu@ucf.edu*

*Received: 31 October 2017 Accepted: 18 December 2017 Published: 10 January 2018*

#### *Citation:*

*Fu Q and Santello M (2018) Improving Fine Control of Grasping Force during Hand–Object Interactions for a Soft Synergy-Inspired Myoelectric Prosthetic Hand. Front. Neurorobot. 11:71. doi: 10.3389/fnbot.2017.00071*

Keywords: neuroprosthetics, hand function assessment, object manipulation, grasping, haptic feedback, force control

### INTRODUCTION

Restoring hand function through prostheses in individuals with upper limb loss is critically important to help them regain independence and improve quality of life. Unfortunately, the current state of commercially available prosthetic hands is still far from achieving human level dexterity, even in relatively simple object grasping tasks. Limitations in the reliability, function, and robustness of hand prosthesis has led to little use or abandonment of advanced terminal devices, as these factors are considered to be most important to the amputees (Atkins, 1989; Atkins et al., 1996; Biddiss and Chau, 2007a,b). Human-inspired approaches have been recently proposed to tackle this challenge through novel mechanical design (Godfrey et al., 2013), intuitive control (Ajoudani et al., 2014; Jiang et al., 2014), and sensory feedback (Clemente et al., 2015). Specifically, by observing how the human neuromuscular system solves the sensorimotor complexity of controlling hand movements during grasping tasks, it was found that hand postures used to grasp a large set of common objects can be approximated by a few finger joint coordination patterns, i.e., synergies (Santello et al., 1998, 2002). This implies a synergy control scheme in which controlling a large number of degrees of freedom could be simplified by using a reduced set of neural signals [for review, see Santello et al. (2013, 2016)]. By combining the concept of synergy with soft robotics technologies, a prosthetic hand, the SoftHand-Pro (SHP), was developed to simultaneously maximize simplicity and functionality (Godfrey et al., 2013). Specifically, this hand employs an under-actuated design where the number of synergies, and thus, the number of actuators, was reduced to one, i.e., the first principal component observed in human grasping data that accounts for more than 50% of the variance in grasp posture data (Santello et al., 1998). Movement from the motor was transmitted to all 19 finger joints of the SHP by means of a single tendon, hence the SHP follows the movement described by the first synergy for human grasping: flexion and adduction of the metacarpal-phalangeal and inter-phalangeal joints of the fingers, accompanied by flexion and palmar abduction of the thumb. This design is combined with an elastic recoil force implemented as elastic ligaments in all joints to help the fingers conform to arbitrary object shapes, and bring the fingers back to their starting position. These ligaments also accommodated temporary joint displacements during unexpected perturbations through hyperextension and/or torsion. Such flexibility avoids stress that could damage the hand and the environment, while enabling versatility to grasp a wide variety of objects. The embedded flexibility in the mechanical design also simplifies myoelectric control with surface electromyography (sEMG), as the user does not need to generate a sequence of muscle activation to produce hand postures that match different object shapes. Indeed, only two sEMG channels from a pair of antagonistic muscles are needed to operate the hand efficiently in individuals with upper limb loss (Godfrey et al., 2017). Although the SHP demonstrated human-like motion during reach-to-grasp (Fani et al., 2016), its capability to interact with objects with human-like force remains to be systematically validated. Such human-like force control is important in activities of daily living (ADL), which includes but are not limited to moving delicate objects, modulating grasp force to object weight, and manipulate compliant objects.

The main objective of the current study was to improve force control of the SHP. The default control gain of the SHP is tuned to enable fast free motion response, but the motor current (and, therefore, grasp force) ramps up quickly after the SHP contacts the object. This makes it difficult for the user to modulate the grasp force to the desired level (Gailey et al., 2017). In fact, a recent study, individuals with upper limb loss using SHP did not exhibit proper modulation of grasp force when lifting objects with different weights, even with the help of a mechanotactile haptic feedback device (Godfrey et al., 2016). One way to solve this limitation is to let the user modulate the control gain through co-contraction of the muscles (Ajoudani et al., 2014), such that a high co-contraction level can be mapped to high stiffness of the SHP. However, this approach could increase the complexity of myoelectric control, as the user would have to adjust cocontraction level while exerting differential activity between the flexor and extensor. Another approach is to use a force-position hybrid control scheme to handle motion and force automatically within the hand based on feedback from force/position sensing (Engeberg et al., 2008; Engeberg and Meek, 2013). However, such controller relies on accurate measurement of finger force and position in a prosthetic hand with rigid structure, and therefore, it is not fully compatible with SHP, a device designed to be mechanically compliant with only synergistic sensing of force and position across all fingers. Therefore, we propose a novel approach that automatically switches control gain based on grasping context detected from combined information from force, position, and EMG channels. This approach will be tested against the conventional SHP controller, as well as human hands, in functional tasks that require fine control of grasp forces.

### MATERIALS AND METHODS

### Subjects

Sixteen subjects enrolled in the study (nine females and seven males, ages 19–34 years). They had normal or corrected-tonormal vision, and no history of musculoskeletal or neurological disorders. All subjects were naive to the experimental purpose of the study and gave informed consent to participate in the experiment. The experimental protocols were approved by the Institutional Review Board at Arizona State University in accordance with the Declaration of Helsinki. Before data collection, subjects signed an informed consent and completed the Edinburgh Handedness Questionnaire. Fifteen subjects were right-handed, and one subject was ambidextrous. They were randomly assigned to two "controller" groups [i.e., single-gain (SG) and hybrid-gain (HG) controllers, see below].

### Experimental Apparatus

For the present investigation of myoelectric controllers for hand prostheses, we used the SHP which is a soft robotic hand inspired from human hand synergies (Godfrey et al., 2013). Although we tested only able-bodied subjects, it has been shown that transradial amputees are able to use SHP effectively in ADL (Godfrey et al., 2017). In addition to the SHP, each subject wore a *C*lenching *U*pper-limb *F*orce *F*eedback device (CUFF) for haptic feedback of the grasping force (Casini et al., 2015). Finally, we built a gravity compensation system to off-load the weight of harness worn by subjects on their forearm and the SHP, thus minimizing fatigue (**Figure 1A**). We describe these systems below.

#### SoftHand-Pro

The SHP is the prosthetic version of the Pisa/IIT SoftHand (Catalano et al., 2014). The size and weight of the SHP were designed to approximate a large male hand. The electronic control board was enclosed in the back of the hand. A glove is used to cover the joints and increase contract area and friction. The battery was placed on user's body and connected to the hand through a cable. For testing with able-bodied subjects, a customized socket interface was used to mount the SHP on their

forearms (**Figure 1A**). Importantly, as part of the interface, we used a Quick Disconnect Wrist (Hosmer 61921, Fillauer LLC, TN, USA) to allow task-specific manual adjustment of supination/pronation angle. This ensures subjects to maintain a neutral supination/pronation angle with their own wrist throughout the experiment. The onboard microcontroller drives the motor with PID position/current control. It also communicates with EMG sensors and external programs. For myoelectric control, we used two sEMG electrodes that are commonly used for myoelectric prostheses (13E200 Myobock electrodes, Otto Bock, Germany). These electrodes are equipped with a logarithmic sensitivity adjustment and high common-mode rejection in the low frequency range (>100 dB at 50 Hz). The output of the electrodes was appropriately filtered and rectified. We placed the electrodes over m. flexor digitorum superficialis (FDS) and m. extensor digitorum communis (EDC) for flexion and extension, respectively. The difference between the sEMG magnitude measured from flexor and extensor muscles is used to drive the change of the reference motor position for the SHP (see below). This type of velocity-based proportional control allows users to scale the speed of the finger motion by modulating their EMG activities, as well as to minimize fatigue.

The SHP does not have force sensors, and it estimates the overall grasp force by current sensor. This approach takes advantage of the synergy design, since all fingers are connected by a single cable to one motor. Therefore, the grasp forces of all fingers can be transmitted to this cable, absorbing current from the motor. The motor total current (*C*) is the sum of grasp force-dependent and motor kinematics-dependent (*CK*) components. The latter component can be calibrated with a model that consists of position, velocity, and acceleration terms (Ajoudani et al., 2014; Casini et al., 2015). After proper calibration, the grasp forcedependent current (Residual Current, *C*R) can be estimated as the difference between *C* and *CK*. It has been demonstrated that relation between the overall grasp force and the residual current is approximately linear (Casini et al., 2015).

### *SG Controller*

The SG controller is mostly identical to the best performing SHP motion controller demonstrated by Fani and colleagues (Fani et al., 2016). A small modification was made to dynamically limit the reference position. Specifically, this EMG-to-motion mapping uses the difference between the sEMG signals from wrist flexor and extensor muscles to drive the SHP. After a signal dead zone of 2% MVC was applied to each channel, the channel differential *E*d was used to drive the change of SHP motor reference position with a predetermined gain *K*m based on preliminary testing and previous studies (**Figure 1B**). Therefore, the sign and the magnitude of the differential *E*d dictate the direction and velocity of the finger movement during free motion, respectively. Furthermore, we used an adaptive motor position limit which prevents the increase of reference position if the motor total current C is close to the max capacity. This prevents the reference position "closing into" the object too much, thus allowing consistent opening motion from objects with any size. Eight subjects were assigned to use the SG controller (SG group).

#### *HG Controller*

As mentioned earlier, the main drawback of the SG differential controller is that it cannot adapt to both free motion control and grasp force control equally well, if the reference position changes too quickly. To overcome this problem, we created a HG Fu and Santello Force Control of Prosthetic Hand

controller. The overall design of the HG controller is similar to the SG controller. However, the EMG-to-motion gain changes adaptively depending on the state of the SHP (**Figure 1C**). We defined three sensorimotor states of the SHP using the residual current CR as well as the EMG differential Ed. Specifically, Free Motion state is when the grasp force is 0 or very low, i.e., *C*<sup>R</sup> = 0. Fine Force state is when grasp force is above minimum and the user is trying to control grasp force, i.e., *C*<sup>R</sup> > 0 and *E*<sup>d</sup> > 0. The last state, Quick Release, is when grasp force is above minimum and subject is trying to quickly release the grasped object i.e., *C*<sup>R</sup> > 0 and *E*<sup>d</sup> < 0. We used a large gain *K*m for both Free Motion and Quick Release states, and a small gain *K*f for Fine Force state (see **Table 1**). Eight subjects were assigned to use the HG controller (HG group). We would like to emphasize that the adjustable gain is used to map EMG activity to the reference position of the motor. Unlike previous work (Ajoudani et al., 2014), the control gain for the internal motor control loop remain unchanged, therefore preserving the stability during the passage from one state to another.

#### Clenching Upper-Limb Force Feedback Device

The force feedback device CUFF used in this study has been demonstrated to enable intuitive modulations of grasp forces and correct softness discrimination (Ajoudani et al., 2014; Casini et al., 2015). Briefly, the CUFF is comprised of two DC motors attached to an elastic belt worn around the upper arm (**Figure 1A**). When the motors spin in opposite directions to tighten or loosen the band on the arm, the pressure around the arm applied by the band would increase or decrease, respectively. This type of mechanotactile cues provides the same modality of somatosensation as the one involved in the hand–object interactions (e.g., grasping), although at a different location. This may have advantages over other types of haptic feedback due to its ability to deliver natural feeling of force/pressure (Li et al., 2017). When the subjects use CUFF and SHP as an integrated system, the SHP estimates the grasp force using residual current *C*R which is then linearly mapped to CUFF motor positions. Therefore, the grasp force can be proportionally delivered as pressure through the CUFF. Specifically, due to differences in the biomechanical characteristics of arms, we calibrated *C*R to CUFF motor mapping for each subject. The automated calibration procedure finds the motor position range when the CUFF motor current reaches high and low threshold. Then the full range of *C*R is linearly mapped to CUFF motor position range.

#### Gravity Compensation System

In most studies of hand prosthetics with able-bodied subjects, the prosthesis is either mounted on subject's arm or fixed on the table separately from the subject. Both approaches are suboptimal in the investigation of object manipulation. When the prosthetic


hand is mounted on the arm, healthy subjects have to overcome significant added weight to move the system. This could lead to muscle fatigue in long period of testing, thus negatively impact subjects' performance. Additionally, the weight from the hand prosthesis may influence subject's perception of the object physical property, preventing modulation of grasping force in response to object weight. In contrast, if the prosthetic hand is detached from the subjects, the experiments could not assess the handarm coordination (e.g., reach to grasp), which is an important component of natural hand-object interactions (Grafton, 2010; Davare et al., 2011). To overcome these drawbacks, we built a gravity compensation system that offsets the gravitational force created by wearing the prosthetic hand (**Figure 1A**). This system is functionally similar to the one developed in Wilson et al. (2017). Specifically, we use a light cable and a series of pulleys to connect the wrist part of the prosthesis to a counter-weight. The counter-weight has the same weight as the entire hand prosthetics (SHP and harness) worn by the subjects. This system helped to prevent fatigue in our study, which required intensive repetition of hand movement over more than 1 h of testing.

### Experimental Protocol

Our study consisted of three sessions: (1) baseline trials: experimental tasks with normal right hand, (2) training trials: training tasks with SHP and CUFF, and (3) SHP trials: experimental tasks with SHP and CUFF. We use the data from normal hand as a benchmark to evaluate the performance of the prosthetic system. The tasks used in our study are described below.

### Training Tasks

We developed a two-step simple training scheme that helps subjects to familiarize with myoelectric control of the SHP and haptic feedback from the CUFF.

#### *Motion Control Training*

The objective of this training was to help subjects learn the EMGto-motion mapping of the SHP. No CUFF feedback was given during this task. Subjects sat comfortably wearing the prosthetic system, with their forearm resting on the table. We adjusted the quick connector at the wrist to have the SHP 90° supinated, such that palm of the SHP facing upwards. A monitor was placed in front of the subjects, showing continuous visual feedback of the motor position of the SHP, as well as a target motor position. Subjects were required to control the open and close of the SHP to match the target motor positions as quickly as possible (0° and 170° are fully open and close, respectively). We defined five levels of target motor position: 30°, 60°, 90°, 120°, and 150°. The target position automatically advanced to the next if the actual motor position stays within target with an error margin of ±5° for an accumulated 1 s. There were three trials for this task. Each trial consists of eight "close and open" actions that always start from 30° and move to one of the other positions, then move back to 30° (**Figure 2A**).

#### *Force Control Training*

The objective of this training is to help subjects learn the haptic feedback given by the CUFF. We directly measure grasp force

with a cylindrical object fixed to the table. The object is split into two grasp surfaces with a Force/Torque sensor (Nano 25, ATI Industrial Automation, NC, USA) is installed in the center (**Figure 2D**). Each grasp surface is a curved surface (3.25 cm radius 150° arch) with a height of 12 cm. In this task, the quick connector at the wrist was adjusted to neutral position to allow natural power grasp around the cylindrical object, with thumb and fingers of the SHP placed on each grasp surface. Visual feedback of the actual grasp force and target force were shown on the monitor. To get ready for each trial, subjects were instructed to move the SHP to the close proximity of the cylindrical object. Upon hearing a "Go" cue, subjects control the SHP to grasp on the object and they were instructed to match the target force as quickly as they can. Three target levels of grasp force were defined and repeated three times in the same order within a trial: 6, 12, and 0 N (**Figures 2B,C**). Similar to motion training, the target force automatically advanced to the next if the actual grasp force stays within target with an error margin of ±1 N for an accumulated 1 s. There was a total of five training trials. Most importantly, subjects were told that the pressure applied by the CUFF is proportional to the displayed grasp force, and they should familiarize themselves with the CUFF feedback.

#### Experimental Tasks

To assess the performance of two myoelectric controllers, we developed the following three Experimental Tasks. They were inspired by commonly used clinical hand function assessment tools (e.g., Southampton Hand Assessment Procedure, Block and Box Test, etc.), with the focus on the ability of fine control of grasp forces during functional use of the prosthetic hand. Note that these tasks were performed with native right hand and the SHP in baseline trials and SHP trials, respectively.

#### *Large Object Pick and Place*

Grasp and transport object is one the most common activities in daily life. Subjects were instructed to pick and place a cylindrical object (**Figure 2D**) with power grasp repetitively. The object was the same as the one used in the CUFF familiarization task, but it was free to move instead of being fixed to the table. Additionally, the weight of the object can be modified by inserting mass into the base of the object. There were two object weights: Medium (420 g) and Heavy (820 g). A soft mat was placed on the table in front of the subjects to prevent damage to the object if dropped. We marked two *target regions* separated by 30 cm, and a 5-cm high metal bar was placed on the mid-line between the two target regions as an obstacle (**Figure 2E**). Subjects had to align their right shoulder with this obstacle. The proximal end of the obstacle was defined as the *start region*, which was 30 cm away from the right shoulder. For SHP trials, we set the wrist at neutral position to enable natural grasp posture (i.e., thumb and fingers of the SHP placed on each grasp surface). Subjects were asked to start a trial with either their normal hands or the SHP in the start region, and the object started in the right target region.

On an auditory "GO" signal, subjects reached to grasp the object and transport it to the other target region and move their hand back to the start region, and repeat this process as many times as possible successfully within 45 s. A successful transport was recorded if the object was not dropped or "crushed." The crushing of the object was rendered by giving "glass breaking" sound when the force normal to the grasp surface exceeded a pre-defined *crushing threshold*. There were two types of fragility. The Solid type had a crushing threshold of 80 N, therefore subjects did not need to be careful about crushing the object. The Fragile type had a crushing threshold defined based on the object weight, such that the threshold is ~3 N above the minimum grip force required to prevent slipping. The coefficient of friction between the glove and the object was estimated to be 0.5. Therefore, the fragile crushing threshold for the Medium and Heavy objects was 6 and 9 N, respectively. We instructed subjects to replace the object to the closest target region if the object is "crushed," and they can retry without completely release the object. The kinematics of the object was tracked by motion capture system with a marker placed on the base of the object (Impulse, Phasespace, Inc.).

#### *Small Object Pick and Place*

This task was similar to the Large object pick-and-place task, the only difference being that the object is smaller (**Figure 2D**). Specifically, there were two small grasp surfaces (size: 3.5 cm × 4 cm, 3 cm distance) and subjects were required to use a three-digit precision grasp (tips of thumb, index, and middle finger). For SHP trials, we set the wrist at 45° pronation to allow natural grasp posture. The small object used in this task required a higher precision in reach-to-grasp in order to place the thumb accurately on the grasp surface. Similar to the large object, the object weight can be adjusted by inserting weight into the base of the object. Two object weights were used: Light (220 g) and Medium (420 g). The solid crushing threshold was again 80 N. The fragile crushing thresholds for the Light and Medium objects were 4 and 6 N, respectively. Subjects received same instruction as the large object pick and place regarding task objective. The kinematics of the object was tracked by motion capture system.

#### *Compliant Object Squeeze*

Subjects were instructed to repetitively squeeze a compliant object (**Figure 2D**) with power grasp. The object consisted of two curved grasp surfaces, which were connected by a pair of linear sliders and a spring. Therefore, the object only allows one dimensional deformation with maximum width of 8.5 cm (determined by a mechanical stop). The compliance of the object was determined by the stiffness of the spring, and two types of compliance were selected: Soft (0.33 N/mm) and Hard (0.54 N/mm). Visual feedback about the deformation of the object was given to the subjects on the monitor by tracking the positions of the grasp surfaces with motion capture system. To prepare for a trial, subjects had to lightly grasp the object (<0.2 cm deformation) with either their normal right hand or the SHP. For SHP trials, the wrist was set at 60° supination to allow natural grasp posture. On a "Go" signal, subjects were asked to match the target deformation shown on the monitor repetitively. There were two levels of target deformation 0.8 cm, and 1.8 cm with an error margin of 0.2 cm, each was presented five times within a trial. These two target levels alternated, and each level had to be maintained for 1 s continuously to automatically proceed to the next one.

### Experiment Procedure

Both SG and HG groups followed the exact same experimental procedure, and the only difference between the two groups was the myoelectric controllers. In experiment preparation, we placed the sEMG electrodes over the muscle bellies of the target muscles (i.e., FDS and EDC). The skin was cleaned with alcohol pads and the electrodes were secured by elastic medical tape. A calibration procedure was implemented by asking subjects to perform maximal voluntary isometric contraction (MVC) of the FDS or EDC. The onboard gains of the electrodes were adjusted such that the maximum output voltage represents the MVC of the corresponding muscle. In the baseline session, subjects performed all experimental tasks with their normal right hand wearing the same glove as the SHP glove, such that the friction conditions are matched. There were four conditions for Large object pick-and-place task: Heavy-Solid, Heavy-Fragile, Medium-Solid, and Medium-Fragile. Similarly, there were four conditions for Small object pick-and-place task: Medium-Solid, Medium-Fragile, Light-Solid, and Light-Fragile. Finally, there were two conditions for the Compliant object squeeze task: Soft and Hard. One baseline trial was performed for each of these ten conditions, with 1 min break given between conditions (see **Table 2** for summary of conditions). The order of these conditions was randomized within each task for each subject. Most importantly, before each condition involving fragile object, subjects were given 15 s to understand the corresponding crushing threshold. Subjects were instructed to slowly ramping up the grasp force multiple times without lifting the object, until they heard the glass breaking sound. We also told subjects to memorize the fragility in association with the object type (e.g., Large Heavy) for the SHP trials later. For all baseline trials, we also recorded the position of the wrist center with the motion capture system, in addition to object positions, grasp forces, and sEMG.

Following baseline session, subjects were fully equipped with the prosthetic system and went through two training tasks. One-minute break was given after each training task. After training session, subjects performed all ten experimental conditions again with SHP, with three consecutive trials per each condition (**Table 2**). The order of these conditions was also randomized within each task for each subject. In contrast to the baseline session, subjects were not allowed to explore the crushing threshold for the fragile objects, but only rely on their previous experience with the same object instead. For all baseline trials, we recorded the position of the wrist center of the SHP, object positions, grasp forces, SHP residual current, SHP motor position, CUFF motor current, and sEMG. All tasks were implemented using customized Matlab, C++, and LabView programs.

### Data Processing and Analysis

#### Experimental Variables

### *Training Tasks*

For motion control training, we assessed the performance by computing the averaged time to perform each target action, which was defined as the time between the onsets of two consecutive target positions. Within each trial, there were four types of required change of motor positions (30°, 60°, 90°, and 120°) combined with two actions (open or close). Each specific action (e.g., open 60°) occurred twice, and we used the average time of the two as within trial performance. Note that the two controller groups had the same EMG-to-motion gains in this task, because the SHP is always in Free Motion state. For force control training, we assessed the performance by computing the total time to complete each trial. Additionally, we computed the averaged EMG magnitude as an indicator of motor effort within each trial.

#### Experimental Tasks

For both object pick-and-place tasks, we mainly focus on the following measures. First, we use the number of successful transport completed within 45 s as the gross outcome measure. This is computed from both object marker data and object force sensor data, since successful completion requires no dropping (kinematics) or crushing (force) of the object between two target regions. Second, we assess hand-arm coordination by defining transport speed during successful transport. This is computed as wrist velocity at the time when the object is moving across the obstacle. Third, we assess the force modulation by defining grasp force during successful transport. This is computed as the force normal to the grasp surface at the time when the object is moving across the obstacle. Finally, we evaluate the myoelectric control by defining flexor activation, extensor activation, and co-contraction. These are computed as the average magnitude of the corresponding sEMG signals. Note that for SHP trials, each experimental condition consists of three trials and we take the average for these measures. A representative trial (sub13, Large object, Medium


*Actual order was randomized.*

FIGURE 3 | Sample experimental recording. (A) Representative 3-dimensional trajectory profile during object pick-and-place task. (B) Representative temporal profile of multiple experimental variables from one object pick-and-place trial.

weight, Fragile) is shown in **Figure 3**. For the object squeeze task, we computed the time to complete one trial (i.e., five squeezes), as well as the averaged EMG magnitude for flexor, extensor, and co-contraction. A summary of variables is given in **Table 3**.

#### Statistical Analysis

One subject in the differential control group was excluded from the data analysis because he was not able to finish training within an acceptable performance range, therefore did not participate the experimental task with SHP. To compare between two controllers, we used mixed ANOVA with Group as the between subject factor and task conditions as within subject factors. We also used repeated measure ANOVA to assess benchmark performance with subject's normal hand. *Post hoc* comparisons were used with Bonferroni correction when needed.

### RESULTS

### Motion Control Training

The motion training was designed to familiarize subjects with the myoelectric controller. For both "open" and "close" actions, subjects were able to perform quite well from the beginning and we did not observe improvement over three training trials. With separate three-way mixed ANOVA (Group, Trial, and Target), we found only a significant effect of Target for both


"close" and "open" actions (*p* = 0.001 and *p* < 0.001, respectively). This is expected, since both SG and HG groups used the same free motion controller that has been previously shown to be intuitive and efficient (Fani et al., 2016). Furthermore, we performed another three-way mixed ANOVA after averaging across trials (Group, Target, and Action). We found a significant Target × Action interaction (*p* = 0.015, **Figure 4A**). *Post hoc* T-test showed that 30° close took significantly shorter time than the other three closing actions, whereas 120° open took significantly longer than the other three opening actions (*p* < 0.05). This indicates that subjects were able to take advantages of the proportional control implemented for the SHP to scale the movement speed of the fingers as a function of the distance to be covered. Note that such scaling is an important feature observed in human when grasping object with different sizes (Bootsma et al., 1994).

### Force Control Training

Force training was designed to precisely generate desired grasp force with the help of visual feedback. Additionally, subjects could associate the haptic feedback from the CUFF to their own actions. Unlike motion control, force control with SHP was challenging in the beginning for both controller groups. The performance gradually improved over five training trials. Importantly, SG group performed consistently worse than the HG group (**Figure 4B**). These findings were confirmed by two-way ANOVA (Group and Trial) which showed significant effect of both Trial (*p* < 0.001) and Group (*p* = 0.048). Furthermore, we examined the average EMG used in force control training with two-way mixed ANOVA (Group and Trial). For both flexor and extensor muscles, we found HG group used significant larger activity than the SG group across training trials (main effect of Group *p* < 0.001 and *p* = 0.01; no effect of Trial). This result suggests that the hybrid controller allows better control of grasping force but requires greater effort/energy. We want to point out that, unlike natural grasping, here the energy is spent in modulating grasp force, but not maintaining grasp force, due to the nature of velocity-based myoelectric control.

## Object Pick-and-Place Tasks: Performance

We first quantified subjects' performance with their native hand. This allows us to establish benchmark behavior for our novel tasks, which is then used evaluate the SHP controllers. Similar benchmark quantifications will also be used in the following sections regarding different aspects of the object pick-and-place tasks. The overall task performance is assessed by the number of successful transport within 45 s, using three-way mixed ANOVA (Group, Weight, and Fragility) per object size. For both Large and Small object pick-and-place tasks, two groups performed equally well. Furthermore, only Fragility but not Weight of the objects played a role in the net performance (**Figures 5A,B**). We found that the number of successful transport for fragile objects is significantly less than the solid ones (only main effect of Fragility with both Large and Small object *p* < 0.001). This could be because subjects handled the fragile objects with more caution, thus being slower.

With SHP, both groups performed the tasks much slower than their native hands, and we found that HG controller outperformed SG controller when transporting fragile objects (**Figures 5C,D**). Specifically, with the Large object, we found significant Fragility × Group (*p* = 0.003) and Fragility × Weight interactions (*p* = 0.023). *Post hoc* comparisons suggested that the HG group performed significantly better than the SG group in Heavy-Fragile, Medium-Solid, and Medium-Fragile conditions (*p* < 0.05; **Figure 5C**). No difference was found between two groups in the Heavy-Solid condition. Similarly with the Small object, we found a significant Fragility × Group interaction (*p* = 0.035). Further *t*-test suggested that hybrid group performed significantly better than differential group in Heavy-Fragile, Medium-Solid, and Medium-Fragile conditions (*p* < 0.05; **Figure 5D**), but not in Heavy-Solid condition. Interestingly, we demonstrated a qualitatively similar pattern of Fragility effect between SHP and native hand in the HG group but not the SG group, despite of significantly less number of completion overall.

To further understand the difference between the HG and SG controller, we examined the hand-arm coordination using the velocity of the wrist center when the object was moving over the obstacle during successfully completed object transport. For the native hands, the velocity is significantly lower for Fragile objects than Solid objects (**Figures 6A,B**). Three-way mixed ANOVA (Group, Weight, and Fragility) showed only main effect of Fragility with both Large and Small objects (*p* < 0.001). With SHP, subjects were also moving slower with Fragile objects (main effect of Fragility, *p* = 0.003 and *p* = 0.005 for Large and Small objects, respectively). There were also significant Weight × Group interaction (*p* = 0.033 and *p* = 0.019 for Large and Small objects, respectively). *Post hoc* analyses showed that HG group was moving significantly faster in Heavy-Fragile, Medium-Solid, and Medium-Fragile conditions (*p* < 0.05, **Figures 6C,D**). These results suggest that the superior performance of HG controller can be partially attributed to the faster arm movement when holding an object.

### Object Pick-and-Place Tasks: Grasp Forces

In addition to performance, we also measured grasp force when the object was moved over the obstacle during successfully completed object transport. For the native hands, we found that subjects scaled grasp force to object weight and fragility in both object size conditions (**Figures 7A,B**). Specifically, subjects used larger grasp force for heavier objects, and smaller grasp force when the object was fragile. These observations were confirmed

FIGURE 6 | Wrist velocity in pick-and-place tasks. (A) and (B) Native hand wrist velocity for successful transport of Large and Small objects, respectively. (C) and (D) SHP wrist velocity for successful transport of Large and Small objects, respectively.

by three-way mixed ANOVA (Group, Weight, and Fragility). With the Large object, there was a significant main effect of both Weight (*p*= 0.003) and Fragility (*p*< 0.001), but not Group. Similarly with the Small object, we found significant main effect of both Weight (*p* < 0.001) and Fragility (*p* < 0.001), but not Group.

With SHP, subjects were able to modulate grasp force in successful transport (**Figures 7C,D**). With the Large object, we found a main effect of Weight (*p* < 0.001), and a significant Fragility × Group interaction (*p* = 0.024). *Post hoc* comparisons showed that HG group used significantly smaller grasp force than the SG group in Medium-Solid condition (*p* < 0.05; **Figure 4C**). Similarly with the Small object, we also found a main effect of Weight (*p* = 0.003), and significant Fragility × Group interaction (*p* < 0.001). *t*-Test showed that HG group used significantly smaller grasp force than the SG group in both Medium-Solid and Light-Solid conditions

FIGURE 8 | sEMG magnitude in pick-and-place tasks. (A), (B), and (C) sEMG activities of flexor, extensor, and co-contraction in Large object tasks, respectively. (C), (D), and (E) sEMG activities of flexor, extensor, and co-contraction in Small object tasks, respectively.

(*p* < 0.05; **Figure 4D**). When compared with native hand, we found that HG group showed qualitative similar pattern of grasp force modulation, but SG group did not.

### Object Pick-and-Place Tasks: EMG

To better understand how subjects use their muscle activities with SG and HG controllers, we also compared the average EMG used in these tasks. Note that we do not use EMG from native hand here as benchmark because (1) velocity-based myoelectric control is different from natural muscle control by nature and (2) the two sEMG channels cannot provide comprehensive measure of the muscle activity from native hand used in these tasks (e.g., missing intrinsic muscles).

With the Large object, we found no difference in the average flexor EMG magnitude between the two controller groups (**Figure 8A**; three-way mixed ANOVA, only main effect of Weight and Fragility, *p* = 0.035 and *p* < 0.001, respectively). Furthermore, we found no difference in the extensor EMG magnitude between the two groups (**Figure 8B**; only main effect of Fragility *p* = 0.002). Finally, we found no difference in the co-contraction of the muscles between two groups (**Figure 8C**). With the Small object, we found a main effect of Fragility (*p* = 0.009) for the wrist flexor muscle, as well as a Group × Weight interaction (*p* = 0.038). *Post hoc* comparisons showed that subjects in the HG group used less EMG for light weight than for the medium weight, but the SG group did not show difference between weights (**Figure 8D**). For the wrist extensor muscle, we found no difference between the two controller groups (**Figure 8E**; only main effect of Fragility *p* = 0.005). Finally, we again found no difference in the cocontraction of the muscles between two groups (**Figure 8F**). To summarize, subjects used less EMG from both flexor and extensor muscles for fragile objects regardless of group (**Figure 8**#). This is expected because fragile objects require much smaller grasp force (**Figures 8C,D**), therefore less EMG was needed to drive the reference motor position. Interestingly, it was found that the flexor activity was scaled to object weight in all conditions for the HG group, but not SG group.

### Compliant Object Squeeze Task

In addition to object pick-and-place tasks, subject performed compliant object task in which they had to deform a compliant object with either their native hand or the SHP. There were two levels of compliance that were set by the stiffness of the spring inside the object (soft and hard, 0.33 and 0.54 N/mm, respectively). There was no difference between the two compliance levels for the native hand (**Figure 9A**). For the SHP, we found that single gain and the HG controller performed similarly in this task, and both were much slower than their native hands (**Figure 9B**). Two-way mixed ANOVA (Group and Compliance) showed a significant Group × Compliance interaction (*p* = 0.032). However, *post hoc* comparison between two compliance levels did not reveal significant differences.

We also compared the average EMG between the two controller groups with two-way mixed ANOVA (Group and Compliance). For the flexor muscle, we found that subjects used significant larger activity in the hybrid controller group (**Figure 9C**; only main effect of Group, *p* = 0.011). For the extensor activity and co-contraction, no significant difference was found between the two groups (**Figures 9D,E**).

## DISCUSSION

The goal of this study was to improve the myoelectric control of grasp forces in functional tasks using a soft synergy-based prosthetic hand. Specifically, we developed and implemented a HG controller and compared its performance to a previously validated conventional SG controller. We demonstrated that the new controller (a) significantly improved subjects' ability to perform fine force control when transporting objects with different shapes, weights and fragility (**Figure 5**) and (b) qualitatively demonstrated natural modulation of grasp force in response to object' physical property, such as weight (**Figure 7**). We discuss our results and future work below.

### Quantitative Assessment of Performance of Hand Prosthesis with Functional Tasks

To meet the needs of creating reliable, functional, and robust hand prostheses, it is important to assess their performance in functional tasks that require physical interaction with objects, as well as coordination of arm and hand. This is because the use of prosthetic hand in ADL often involves dynamic and unstructured environments, in which the effect of gravity and object physical properties needs to be properly compensated. There are several clinical assessment tools available, such as Southampton Hand Assessment Procedure, Box and Block Test, and Jebsen Hand Function Test. However, these tests are usually scored based on gross measures, such as task completion time or quality of movement. As such, they do not provide information about how the tests are completed (i.e., movement kinematics, grasp force). Additionally, these tasks typically do not assess subjects' ability to control grasp force, which plays an important role in ADL. Researchers have recently started to incorporate motion capture and force sensors to quantify and standardize the evaluation of the hand–object interactions during use of prosthetic hands (Hebert and Lewicke, 2012; Engeberg and Meek, 2013; Fani et al., 2016; Godfrey et al., 2016; Wilson et al., 2017). Such quantitative assessment can identify potential bottlenecks and issues within the complex integration among hardware, control, and human user input, therefore helping to validate and optimize the prosthetic systems. In the current study, we developed a set of novel functional tasks that aimed to quantify the capability of prosthetic system to control grasp forces. The advantages of our tasks are threefolds. First, our tasks use objects that can be easily adjusted to cover a wide range of different physical properties, such as size, weight, and fragility. This allows us to test the versatility of the function of a prosthetic hand. Second, our tasks require repetitive dynamic actions similar to the Box and Block Test, which can be used to assess the reliability and robustness of the prosthesis. Third, our setup is fully equipped with both motion capture and force sensing technologies, thus being able to capture multiple dimensions of the task performance. Furthermore, our experimental design also allows comparison between the prosthetic system and benchmark performance from the native hands. We believe that the ability of a prosthetic system to exhibit humanlike kinematic and kinetic behavior is critical for the acceptance of the terminal device.

### Improved Force Control with Context-Dependent HG Controller

Fine grasp force control is a defining feature in human's manual dexterity. When grasping and moving an object, it is well known that the grasp force is regulated to the object's weight and friction. Specifically, there is a minimum level of grasp force required to prevent object slip, given a weight and friction coefficient combination. The applied grasp force is normally slightly higher than the minimally required, demonstrating a consistent "safety margin" which balances energy efficiency and slip prevention (Johansson and Westling, 1984; Westling and Johansson, 1984). When friction is constant as in our study, such grasp force control will lead to the natural scaling of grasp force to object weight (e.g., larger grasp force on heavier objects). Indeed, in current study we showed that subjects were able to modulate grasp force in response to object weight with their native hands even when wearing a glove (**Figure 7**). Extensive investigation has revealed that weight specific grasp force control is achieved by a combination of memory based feed-forward control and sensory feedback driven corrections. During initial encounter with a novel object, feedforward motor command can be generated based on the object's physical properties which are visually estimated using previous experiences with similar objects (Gordon et al., 1993). If the motor command is erroneously programmed due to inaccurate estimation, the central nervous system (CNS) can use somatosensory feedback to generate corrective responses after contact and/or after lift (Johansson and Westling, 1988; Johansson and Cole, 1992). After repetitive interaction with the same object, internal representation of the object properties can be formed and used to generate more precise feedforward motor command in the following interactions (Flanagan et al., 2001). Importantly, the "safety margin" for grasp force can also be flexibly adjusted in a feedforward fashion to account for uncertainty in the dynamic environment (Hadjiosif and Smith, 2015), or the fragility of objects (Gorniak et al., 2010). In the current study, subjects used much less grasp force on the fragile object than on the solid ones (assuming same object weight) with their native hands. Such drop of "safety margin" was accompanied by decreased arm movement speed, which is consistent with previous findings (Gorniak et al., 2010).

There are two common ways to enable grasp force control in prosthetic hands. The first approach is fully automated by the implementation of force feedback loop using force and position sensors (Engeberg et al., 2008, 2009; Engeberg and Meek, 2013). While the accuracy and reliability of automated force control is very good in single degrees of freedom rigid prosthetic hand (e.g., Motion Control Hand), it is challenging to scale this approach up to multi-finger hands and/or hands with embedded compliance (e.g., SHP) due to complex hand– object interactions. Alternatively, the force control can be fully operated by the user with some form of haptic feedback about the grasp force [for review, see Antfolk et al. (2013) and Li et al. (2017)], such as vibrotactile stimulation (Rombokas et al., 2013; Lum et al., 2014), electrotactile stimulation (Wang et al., 1995), mechanotactile stimulation (Ajoudani et al., 2014; Casini et al., 2015), and direct nerve stimulation (Raspopovic et al., 2014). In most cases, the feedback signal carries continuous information about grasp force, allowing subjects to reduce grasp force or perceive object softness. A more recent study also showed that discrete feedback about mechanical events during hand-object interaction is sufficient to allow user to better handle fragile object (Clemente et al., 2015). However, most of the studies used relatively static and/or constrained tasks, and have not tested the user's ability to integrate haptic feedback in force control during highly dynamic tasks across a range of different object size and weight. In the current study, we showed that our novel hybrid controller, paired with a soft synergy-based hand and continuous mechanotactile feedback can achieve this goal. Most importantly, to the best of our knowledge, we are the first to demonstrate human-like grasp force modulation to object weight.

Instead of tuning the haptic feedback, our approach focused on the design of EMG-to-motor control interface. This is because we acknowledge the importance to enable users to (1) accurately generate anticipatory motor command and (2) to make fine corrective motor response after receiving sensory feedback. Both of which are crucial to human's manual dexterity, as reviewed in the beginning of this section. Furthermore, we propose that, in a proportional EMG control scheme as in SHP, the EMG-to-motor control mapping has to be optimized separately for free motion and grasp force due to distinct behavior of the motor during motion and force generation. Note that the modulation of EMG-to-motor mapping can be designed to fully rely on the user, such as the concept of "teleimpednace" where muscle co-contraction is used to change the mapping (Ajoudani et al., 2014). However, this would increase the complexity of the myoelectric interface, leading to higher demand in attention to simultaneously control multiple variables. It has been shown that a trade-off has to be made when deciding the level of sharing of control between the user and the hardware, and an intermediate level of interaction between the two was favored (Cipriani et al., 2008). Following this idea, context-dependent switching scheme can be found in several recent studies to control kinematics of the prosthetic hands based on sEMG pattern (Amsuess et al., 2016), limb kinematics and/or grasp force (Jiang et al., 2013; Patel et al., 2017), or vision (Markovic et al., 2014, 2015; Ghazaei et al., 2017). It has been argued that such semi-autonomous shared control can help to shield some low level execution details and decreases cognitive burden while maintaining high level function (Castellini et al., 2014). We agree with this assessment, and furthermore believe that the prosthetic system needs to merge both sensory and motor information to best determine the context of operation, including both sensors in the hardware and the sEMG from user input. Therefore, we choose to improve prosthesis force control by implementing a "context aware" controller that changes the EMG-to-motor mapping based on both the condition of the hand (i.e., free motion or object grasping) and the intent of the user (i.e., open or close). We note that the state switching rules were relatively simple in our study due to limited sensing capability associated with the design goal of enabling intuitive control of the SHP Nevertheless, the present work provides proof-of-concept evidence that human-like force control can be achieved using the proposed approach.

### Effort-Performance Trade-Off in Myoelectric Control of Hand Prosthesis

We want to point out that the superior performance of the hybrid controller came at a cost of increased demand of energy (i.e., muscle activation at higher amplitude and for longer time). However, this is not necessarily undesired. In fact, our result where subjects scale grasp force to object weight indicates that the increased energy demand from our controller effectively evoke the CNS's ability to optimize the motor command for energy efficiency. This indirectly leads to lower energy consumption in the prosthetic hand as the grasp force is subsequently optimized, which can extend the usage time thanks to less battery consumption and reduced tension of the driving tendon.

### Conclusion and Future Work

Our results provide strong support to the functional advantage of a context-dependent myoelectric interface for the control of grasp force during hand–object interactions. Although the controller was only tested with a soft-synergy based prosthetic hand and a mechanotactile feedback system, we believe that it can be extended to other terminal devices and feedback systems, including next generation of SHP with multiple actuators (Delia Santina et al., 2015) and direct nerve stimulation. Future work includes, but not limited to, finding optimal EMG-to-motor mapping parameters in different sensorimotor states, better state definition and transitions, as well as determining the level of control sharing between user and hardware.

### ETHICS STATEMENT

This study was carried out in accordance with the recommendations of "Institutional Review Board at Arizona State University" with written informed consent from all subjects. All subjects gave written informed consent in accordance with the Declaration of Helsinki. The protocol was approved by the "Institutional Review Board at Arizona State University."

### REFERENCES


### AUTHOR CONTRIBUTIONS

QF and MS designed the study and wrote the manuscript. QF implemented the controllers of the SoftHand Pro, built the experimental setup, and performed the experiments and data analysis.

### ACKNOWLEDGMENTS

The authors would like to thank Dr. Antonio Bicchi, Dr. Giorgio Grioli, Dr. Manuel Catalano, and Simone Fani for their input on code development and hardware technical support. This team of collaborators acknowledges support by the European Union's Horizon 2020 Research and Innovation Programme under Grant Agreement No.688857 (SoftPro). We would also like to thank Dr. Simone Toma, Dr. James Abbas, and Dr. Ranu Jung for their input on the experimental protocol design, as well as Fangchi Shao and Nicole Robinson for their help with data collection.

### FUNDING

This work was supported by grant W911NF-17-1-0049 from the Defense Advanced Research Projects Agency and the U.S. Army Research Office.

going beyond traditional surface electromyography. *Front. Neurorobot.* 8:22. doi:10.3389/fnbot.2014.00022


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Copyright © 2018 Fu and Santello. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

# Elastic Elements in a Wrist Prosthesis for Drumming Reduce Muscular Effort, but Increase Imprecision and Perceived Stress

Georg Stillfried<sup>1</sup> \* † , Johannes Stepper <sup>1</sup> , Hannah Neppl <sup>2</sup> , Jörn Vogel <sup>1</sup> and Hannes Höppner 1†

<sup>1</sup> German Aerospace Center (DLR), Institute of Robotics and Mechatronics, Wessling, Germany, <sup>2</sup> Statistisches Beratungslabor, Ludwig-Maximilians-Universität, Munich, Germany

#### Edited by:

Yongping Pan, National University of Singapore, Singapore

#### Reviewed by:

Ningbo Yu, Nankai University, China Nicholas Cheng, National University of Singapore, Singapore

Hsiao-ju Cheng contributed to the review of Nicholas Cheng

> \*Correspondence: Georg Stillfried

stillfried-drumming@posteo.net

† These authors have contributed equally to this work.

Received: 01 December 2017 Accepted: 19 February 2018 Published: 19 March 2018

#### Citation:

Stillfried G, Stepper J, Neppl H, Vogel J and Höppner H (2018) Elastic Elements in a Wrist Prosthesis for Drumming Reduce Muscular Effort, but Increase Imprecision and Perceived Stress. Front. Neurorobot. 12:9. doi: 10.3389/fnbot.2018.00009 Recently, progress has been made in the development of mechanical joints with variable intrinsic stiffness, opening up the search for application areas of such variable-stiffness joints. By varying the stiffness of its joints, the resonant frequency of a system can be tuned to perform cyclical tasks most energy-efficiently, making the variable-stiffness joint a candidate element for an advanced prosthetic device specifically designed for the cyclical task of drumming. A prerequisite for a successful variable-stiffness drumming prosthesis is the ability of human drummers to profitably employ different stiffness levels for playing different beats. In this pilot study, 29 able-bodied subjects (20 drumming novices and 9 experts) wear a cuff on the forearm, to which a drumstick is connected using changeable adapters, consisting of several leaf springs with different stiffness and one maximally stiff connection element. The subjects are asked to play simple regular drum beats at different frequencies, one of which is the resonant frequency of the adapter-drumstick system. The subject's performance of each drumming task is rated in terms of accuracy and precision, and the effort is measured using questionnaires for the perceived stress as well as electromyography (EMG) for the muscular activity. The experiments show that using springs instead of the stiff connection leads to lower muscular activity, indicating that humans are able to use the energy-storing capabilities of the springs, or that muscular activity is reduced due to the lower mass of the springs. However, the perceived stress is increased and the novices' performance lowered, possibly due to a higher cerebral load for controlling the elastic system. The hypothesis that "matching the resonant frequency of the spring-drumstick system to the desired frequency leads to better performance and lower effort" is not confirmed. Possible explanations are discussed. In conclusion, a series-elastic element appears to lower the muscular effort of drumming, while a stiff connection appears to minimize the mental load and has a positive effect on the performance of drumming novices.

Keywords: variable-stiffness actuators, series-elastic actuators, wrist prosthetics, drumming, cyclical tasks, energy storage

### 1. INTRODUCTION

Series-elastic actuators (SEAs) as well as variable-stiffness actuators (VSAs) have been recently introduced into robotics (Vanderborght et al., 2013) to address current problems of robotic arms with torque sensing and actively controlled compliance (Albu-Schäffer and Hirzinger, 2002; Albu-Schäffer et al., 2007).

Compared to actively compliant actuators, VSAs and SEAs yield several advantages. They can (a) reduce impacts of collisions for motors and gears (Grebenstein et al., 2011), (b) increase dynamic capabilities by allowing to frequently store elastic energy, and (c) embody desired behavior (Visser et al., 2011).

SEAs include an elastic element to decouple motor and linkside positions. VSAs are additionally able to tune this elasticity by using a second motor. This intrinsic elasticity inherently dominates the orientation of favorable compliant directions for multi-joint robotic arms and determines their resonance modes. While this inherent behavior and the more complex dynamics of SEAs and VSAs increase the complexity of controlling arbitrary behavior, they allow reducing the control effort (Visser et al., 2011) and energy consumption of dominating tasks if these are accounted for in the design process of the robotic system.

Currently, we are searching for tasks that can exploit the full potential of SEAs and VSAs. Using joint elasticity to store potential energy appears particularly promising in the reversal points of cyclical tasks. They can be performed energy-efficiently at different speeds if the system performing the task contains VSAs, because the resonance frequency of the system can be tuned by changing the stiffness values of its joints. An example for a cyclical task involving the upper limbs is drumming. It has already been proven that the drum roll frequency of a robotic drummer can be controlled by varying the robot's passive stiffness (Hajian et al., 1997; Hajian, 1997). In the same study, the authors ". . . present evidence that drummers vary the stiffness of their hands to control the bounce frequency. . . ". They showed that healthy subjects increase grip force and—since grip stiffness and grip force go hand in hand (Höppner et al., 2011)—naturally grip stiffness as well, to increase the drumming speed during double-stroke drum rolls.

Studying human impedance is an active field of research that frequently inspires robotics. Studies in human wrist stiffness show that wrist stiffness is increased in the presence of unstable loads and, similar to grip stiffness, increases linearly with the applied load (De Serres and Milner, 1991). Additionally, active control, namely the stretch reflex, considerably assists the wrist in a fast return of the limb after displacement (Sinkjær and Hayashi, 1989). Moreover, it is essential to note that wrist stiffness and damping increase as finger force increases, e.g., applied in a tripod-grasp (Kuchenbecker et al., 2003). In general, the human hand can be considered as a VSA rather than a SEA and is able

FIGURE 1 | Jason Barnes— The drummer is wearing a prosthesis with actively controlled drumsticks on the stump. The rebound of the drumstick is tuned by using a DC motor in a variable-impedance shared-control framework. Picture by Lwp Kommunikáció (2014). License: CC-BY 2.0.

to decouple stiffness from its linear increase with force using cocontraction of antagonistic pairs of muscles (Höppner et al., 2017).

There are cases in which drummers lose their wrist due to accident or illness, and cases in which persons with congenitally absent parts of the arm would like to play the drums. In these cases, the drumstick has to be attached to the remaining part of the arm (except if they use only the intact arm, or compensate using their legs). A prosthetic drumstick holder is commercially available, including an elastic element, but without variable stiffness (TRS Inc., 2017). Increasing the level of technological sophistication, researchers from Georgia Tech in Atlanta equipped drummer Jason Barnes' right arm, which has suffered a transradial amputation, with a myo-electrically controlled robotic prosthesis (see **Figure 1**). The prosthesis was equipped with a drumstick and, by using a DC motor in a variable-impedance shared-control framework, the rebound of the drumstick after initial impact was controlled (Bretan et al., 2016). However, no variable mechanical elasticity was implemented.

If variable mechanical elasticity was implemented, it could be tuned by either a myo-electrical or velocity-proportional control. Myo-electrical control of position and stiffness of a VSA was investigated by Hocaoglu and Patoglu (2012), while this study did not investigate energetic interactions with the environment. Ajoudani et al. (2012) and Godfrey et al. (2013) tuned an actively controlled compliance—also called apparent stiffness in biomechanics (Latash and Zatsiorsky, 1993)—using electromyography (EMG) and were able to show general advantages of compliance during object handling and manipulation. However, benefits of mechanical elasticity as energy storage in reversal points during cyclic movements were not the focus of these studies.

Fujii et al. (2009) investigated the drumming performance of unimpaired non-drummers, ordinary drummers and the world's fastest drummer. They found a maximal single-stroke drumming frequency for non-drummers and ordinary drummers alike of

**Abbreviations:** DC, Direct current, static part of a signal; DLR, Deutsches Zentrum für Luft- und Raumfahrt, German Aerospace Center; EMG, electromyography; LMU, Ludwig-Maximilians-Universität; RMS, square root of mean squared; SD, standard deviation; SEA, series-elastic actuator; VSA, variable-stiffness actuator.

6–7 Hz, while the world's fastest drummer was able to play beats of 10 Hz. We use the values of non-drummers and ordinary drummers in our study to limit the range of drumming frequencies that subjects are supposed to play.

Based on the double motivation of searching for an application of VSAs and trying to improve prosthetic drumming experience, we want to investigate in a user study whether introducing inherent elasticity into a wrist prosthesis might be beneficial for drumming and want to answer the question: Can drummers take advantage of variable stiffness in a prosthetic wrist?

It seems clear that stiffness is increased in healthy drummers to increase drumming speed (Hajian et al., 1997; Hajian, 1997). But it remains unclear whether this coupling between stiffness and bounce frequency is the result of an increase in grip strength or whether stiffness per se is the primarily optimized parameter, for example to enhance exploitation of energy storing capabilities. Moreover, it is known that the motor control of humans is able to achieve and stabilize coordinated cyclic movements even in the presence of strong dynamic nonlinearities (Lakatos et al., 2014). But can we exploit benefits of artificial elastic joints to reduce the required energy and to increase comfort? Compared to a rigid attachment, an elastic attachment might not only reduce the required amount of energy but would be able to absorb the impact of the drum on the drumstick.

Furthermore, it is of interest whether a prosthetic wrist with a fixed elasticity is sufficient, or whether changing the stiffness during drumming is beneficial. Fixed elasticity is implemented using SEAs and variable stiffness using VSAs, which has a serious commercial background, since VSAs require the implementation of an additional drive.

Hence, the main purpose of this study is to investigate which of the following prosthetic wrist types enables best drumming performance and provides the most comfortable playing experience: a stiff connection, a spring with a fixed stiffness, or an elastic adapter with variable stiffness.

To simplify the preparation of the experiment, instead of a joint with continuously variable stiffness, discrete elastic elements with different stiffness values are used.

Using a velocity-proportional control scheme, the discretely variable stiffness is chosen so that the design frequency, which is an approximation of the resonant frequency of the system, matches the desired playing frequency. We call this control scheme "diagonal-type variable stiffness," because it corresponds to the diagonal of the combination matrix of desired frequency and variable stiffness (**Figure 2**).

Our hypotheses are (a) that subjects play best with diagonaltype variable stiffness, and, (b) that experts are better in taking benefit of the variable stiffness, since they are familiar with the general task of drumming, which probably frees their mental capacities for adapting to the new device.

### 2. MATERIALS AND METHODS

This study involved 20 unimpaired drumming novices and 9 unimpaired drumming experts who wore a cuff on the forearm, to which a drumstick was attached using stiff and elastic adapters.


FIGURE 2 | Trials— Combination matrix of the factors "desired frequency" and "adapter type." Each dot represents one trial, i.e., the subjects are asked to play all combinations. Note that the desired frequencies are chosen so that they correspond to the design frequencies of the four springs. The variable-stiffness strategy of this study, where the desired frequency is matched by the design frequency, corresponds to the highlighted diagonal of the matrix.

Hence, the subjects' intact wrists were not used but replaced with an experimental prosthetic attachment, simulating the situation of a missing hand. The subjects were asked to play all combinations of adapter types and desired frequencies shown in the combination matrix in **Figure 2**. The desired frequencies ranged from typical beats of popular music (3–4 Hz) to the maximum single stroke frequency that an average drumming novice can reach according to Fujii et al. (2009) (6–7 Hz). Measurements of muscular activity and a questionnaire were used to assess each subject's stress level, while the inaccuracy and imprecision of the beat were used to judge the quality of the drumming.

### 2.1. Experiment Setup

The experiment setup is shown in **Figure 3**. The experiment took place in a space secluded by curtains from the rest of the room in order to avoid distraction of the subject.

The subject wore a cuff constraining movements of the wrist (extension/flexion and radial/ulnar deviation). The cuff was coupled to the drumstick via a changeable adapter. The set of changeable adapters consisted of four leaf springs with different stiffness values (see exemplary spring in **Figure 4**) and one maximally stiff connection element (see **Figure 5**). The adapter was approximately aligned in parallel to the plane of the radial and ulnar deviation of the wrist.

Between spring and cuff a force-torque sensor ATI Mini45 SI-290-10 was placed (measurement range: Fx/<sup>y</sup> = ±290 N, F<sup>z</sup> = ±580 N, Tx/y/<sup>z</sup> = ±10 Nm; resolution: 1Fx/y/<sup>z</sup> = ±1/8 N, 1Tx/<sup>y</sup> = ±1/376 Nm, 1T<sup>z</sup> = ±1/752 Nm). This sensor was used for determining the design frequency of each adapter

element, which corresponds to the resonant frequency of the whole cuff including force-torque sensor, spring and drumstick. For this, the cuff was fixed in a bench vise, and the drumstick was jolted into free oscillation. From the force-torque sensor data, the frequencies of the oscillation were found to be at 3.7, 4.4, 5, and 6.3 Hz for the springs and at 27 Hz for the stiff connection. The force sensor remained between spring and cuff during the experiments to avoid changing the design frequencies. The springs were approximately 70–90 g lighter than the stiff connection, leading to a reduction of the moment of inertia around the elbow flexion axis of about 5–6%.

The drum strokes were recorded by a JR3 90M31A3 forcetorque sensor fixed on a table (measurement range: Fx/<sup>y</sup> = ±200 N, F<sup>z</sup> = ±400 N, Tx/y/<sup>z</sup> = ±20 Nm; resolution: 1Fx/<sup>y</sup> = ±0.050 N, 1F<sup>z</sup> = ±0.10 N, 1Tx/y/<sup>z</sup> = ±0.005 Nm). The sensor was covered with rubber mats for damping the noise and making the drumming more comfortable (see **Figure 3**). Note that this damping influenced the peak forces, but not the time of impact, the latter being relevant for evaluating drumming imprecision and inaccuracy. The height of the table was adjusted to the subject's height.

The force-torque sensor data was low-pass filtered using a Butterworth filter with 100 Hz cutoff frequency. The time between strokes was determined using the peak detection

FIGURE 4 | Drumming cuff with a soft spring— The force-torque sensor as well as the markers for optical tracking of the drumstick can be seen. The cuff is designed such that it prevents the wrist from extension/flexion and radial/ulnar deviation.

FIGURE 5 | Drumming cuff with the stiff connection.

function of Matlab on the filtered vertical force data, with the parameters minimum peak height set to 0.5 times the 95th percentile of the filtered data set and minimum peak distance set to 0.5 times the desired time between strokes. Times between strokes that exceeded 1.5 times the median time between strokes were counted as missed strokes or pauses and discarded.

For possible later reference, the positions of cuff and drumstick were continuously monitored through optical tracking.

The desired frequency was given by the beat of a metronome via head phones, which was also recorded for later reference.

We used EMG electrodes for measuring the muscular activity of subjects. The surface electrodes of the Delsys Trigno Wireless System have an internal amplification of 1 kV/V and provide an analog signal at 4 kHz with a constant delay of 48 ms. These electrodes comply with the requirements put forth by the Medical Device Directive 93/42/EEC, and we complied with their intended use.

We measured the muscular activity of 8 muscles involved in shoulder and elbow movements: biceps brachii (elbow and shoulder flexion, shoulder abduction), pectoralis major (humerus adduction), deltoideus posterior (shoulder extension), deltoideus medius (shoulder abduction), deltoideus anterior (shoulder flexion), anconeus (elbow extension), triceps brachii long head (elbow and shoulder extension), and triceps brachii lateral head (elbow extension). The mean of the EMG values over all trials of each electrode was subtracted subject-wise to remove any constant DC offset. Furthermore, after subtracting the DC offset, the EMG values of each electrode were normalized subject-wise by dividing by the RMS value over all trials. This eliminates differences in electrodes due to location-specific tissue resistance and gives the variations between trials of the signal of each electrode the same weights.

Additionally, the measurement setup consisted of a host computer running Linux, a real-time target computer running VxWorks and a Windows computer. The real-time computer ran the software (developed using Matlab/Simulink) to read out the force-torque and EMG sensors at 1 kHz. The marker positions were recorded by the Windows computer and transferred to the Linux host using the DLR communication protocol aRDnet (Bäuml and Hirzinger, 2008).

### 2.2. Study Design and Experiment Session Protocol

A total of 29 healthy subjects, 25 male and 4 female, all righthanded and initially fully naive to the experiment, performed the experimental protocol as described below. 9 out of the 29 subjects had at least 1 year of drumming experience and were therefore considered as experts, and the other 20 were counted as novices. All subjects participated voluntarily and gave written informed consent to the procedures, which were conducted in partial accordance with the principles of the Helsinki agreement (nonconformity concerns the point B-16 of the 59th World Medical Association Declaration of Helsinki, Seoul, October 2008: no physician supervised the experiments). Approval was received from the works council of the German Aerospace Center, as well as its institutional board for data privacy ASDA; the collection and processing of experimental data were approved by both committees. For all subjects and experiments the right hand was used, which was restricted by the design of the cuff. Subjects stood upright in all experimental conditions.

The experiment session lasted between 20 and 25 min per subject. At the beginning of each experiment session, the participant was instructed about the experiment using a standardized presentation.

The subjects were asked to play all 20 trials consisting of the combinations of adapter types and desired frequencies shown in the combination matrix in **Figure 2**. During the trials, subjects were observed and asked to keep the orientation of the leaf spring so that the drumming motion is in the direction of its minimal stiffness. In order to prevent effects of learning or fatigue, the combinations were given to them in a block-randomized order: the adapter types were randomized, and within each adapter type, the desired frequencies were randomized.

Before each trial, subjects had the possibility to get used to the current combination of desired frequency and adapter type within a time of 10 s, followed by a phase of 15 s of collecting during playing at the respective combination of desired frequency and adapter type. The question was "How high was the perceived stress while playing of each frequency, with respect to physical and mental effort" 1: very low, 8: very high<sup>1</sup> .

Before and after the trials, base noise EMG during rest of the arm was measured, as well as EMG and maximum force during maximum voluntary contraction in a lifting-up and a pushingdown task. This was used for checking whether the signals look plausible. Unfortunately, for unknown reasons, the electrodes of pectoralis major and anconeus showed very noisy signals for many subjects. We therefore discarded the results of these two electrodes.

### 2.3. Statistical Design

To answer our research questions and evaluate our hypotheses, we predefined four outcome measures, which are gathered for each subject and trial:

• The inaccuracy, i.e., the difference between the desired time interval and the mean played time interval between two drum strokes, which tells how well the desired frequency could be met; a lower inaccuracy means a better performance:

$$\mathcal{Y}\_{\text{inaccuracy}} = \left| \frac{1}{f\_{\text{desired}}} - \frac{1}{n\_{\text{strokes}}} \sum\_{s=1}^{n\_{\text{stroks}}} \left( T\_{\text{played},s} \right) \right|; \tag{1}$$

• the imprecision, i.e., the standard deviation of the time between two strokes, which tells how evenly the frequency was played; a lower imprecision means a better performance:

$$\mathcal{Y}\_{\text{imprecision}} = \underset{s \in \{1, \dots, n\_{\text{strokses}}\}}{\text{SD}} \{T\_{\text{played},s}\};\tag{2}$$

• the perceived stress consisting of physical and mental effort, i.e., the result of the questionnaire; lower stress means higher comfort:

$$\mathcal{Y}\_{\text{perceived\\_stress}} = \text{questionnaire\\_entry}; \text{ and} \qquad \text{(3)}$$

• the measured muscular activity, i.e., the mean normalized root-mean-square (RMS) EMG signals, where normalizing means subtracting the per-electrode DC offset and dividing by the per-electrode RMS of the signals of all trials; again, a lower muscular activity means a higher comfort:

$$\text{DC\\_offset}\_{\text{\\_offset}\_{\text{\\_}}} = \frac{1}{n\_{\text{trials}}} \sum\_{l=1}^{n\_{\text{trials}}} \sum\_{t=1}^{n\_{\text{samples}}} \text{EMG}\_{\text{el}t} \tag{4}$$

$$\text{EMG}\_{\text{RMS,all},\varepsilon} = \sqrt{\frac{1}{n\_{\text{trials}} n\_{\text{samples}}} \sum\_{l=1}^{n\_{\text{trials}}} \sum\_{t=1}^{n\_{\text{samples}}} \left( \text{EMG}\_{\text{el}l} - \text{DC\\_offset}\_{\varepsilon} \right)^2} \tag{5}$$

$$\chi\_{\text{muscle\\_activity}} = \frac{1}{n\_{\text{electrolyte}}} \sum\_{\varepsilon=1}^{n\_{\text{electrolyte}}} \sqrt{\frac{1}{n\_{\text{samples}}} \sum\_{t=1}^{n\_{\text{samples}}} \left( \frac{\text{EMG}\_{\text{eff}} - \text{DC\\_offset}\_{\varepsilon}}{\text{EMG}\_{\text{RMS},\text{all},\varepsilon}} \right)^{2}},\tag{6}$$

drumming data. The 10 s of training data were not recorded. After each trial, subjects were asked to fill in a questionnaire about the combined physical and mental stress level that they felt

<sup>1</sup>This is translated from the original questionnaire, which was in German: "Wie hoch war die empfundene Belastung beim Spielen der einzelnen Frequenzen, in Bezug auf körperliche und geistige Anstrengung" 1: sehr wenig, 8: sehr viel.

where EMGelt is the measured EMG signal of electrode e at time sample t in trial l, with nelectrodes = 6, ntrials = 20 and nsamples = 3,000.

To statistically analyze the results we built and applied a mixed-effects regression model with fixed and random effects. It allowed to directly include the two hypotheses as fixed effects into our model. The formula of the mixed-effects model is:

$$\begin{aligned} \boldsymbol{\chi}\_{ijklmn} &= \boldsymbol{\beta}\_0 + \boldsymbol{\beta}\_{\text{adapter\\_type},i} + \boldsymbol{\beta}\_{\text{desired\\_frequency},j} + \boldsymbol{\beta}\_{\text{expected\\_status},k} \\ &+ \boldsymbol{\beta}\_{\text{diagonal},l} + \boldsymbol{\beta}\_{\text{diagonal}\times\text{expected\\_status},kl} + \boldsymbol{\beta}\_{\text{adapter\\_type}\times\text{expected\\_status},ik} \\ &+ m\boldsymbol{\beta}\_{\text{trial\\_number}} + \boldsymbol{\epsilon}\_{\text{subject},n} + \boldsymbol{\epsilon}\_{mm} \end{aligned} \tag{7}$$

where yijklmn is the response variable, i.e., any of the four above-mentioned outcome measures, i is the adapter type, j is the desired frequency, k is the expert status, l is the diagonal status, which is 1 if the combination of adapter type and desired frequency is on the diagonal and 0 otherwise, m is the within-subject trial number, n is the subject number, β<sup>0</sup> is the intercept, which is a constant term, βadapter\_type,<sup>i</sup> is the fixed effect of the adapter type, βdesired\_frequency,<sup>j</sup> is the fixed effect of the desired frequency, βexpert\_status,<sup>k</sup> is the fixed effect of the expert status, βdiagonal,<sup>l</sup> is the fixed effect of playing on the diagonal, βdiagonal×expert\_status,kl is the fixed effect of the interaction between the expert status and playing on the diagonal, βadapter\_type×expert\_status,ik is the fixed effect of the interaction between expert status and adapter type, m βtrial\_number is the trialnumber-dependent fixed effect of learning or fatigue, ǫsubject,<sup>n</sup> is the subject-specific random effect and ǫmn is the residual random error. The random effects are assumed to follow normal distributions as follows:

$$
\epsilon\_{mn} \stackrel{\text{i.i.d.}}{\sim} \mathcal{N}(0, \sigma^2) \text{ and} \tag{8}
$$

$$
\epsilon\_{\text{subject,m}} \stackrel{\text{i.i.d.}}{\sim} \mathcal{N}(0, \mathfrak{r}^2). \tag{9}
$$

The factors adapter type, desired frequency, expert status and diagonal status can assume the levels shown in **Table 1**. The zero level of each factor is taken as the reference configuration, for which all categorical βs are zero.

The parameters of the mixed models were fitted to the measured outcome measures using the lmer function of the lme4 library (Bates et al., 2015) of the R statistics software (R Core Team, 2015). It turned out that for inaccuracy and imprecision, the distribution of the residuals ǫmn could be made much more similar to the assumed normal distribution by transforming them with natural logarithms of their values in s:

$$\mathcal{I}\_{\text{ln(inaccuracy)}} = \ln\left( \left| \frac{1}{f\_{\text{desired}}} - \frac{1}{n\_{\text{stroks}}} \sum\_{s=1}^{n\_{\text{stroks}}} \langle T\_{\text{played},s} \rangle \right| / s \right) \tag{10}$$

$$\mathcal{Y}^{\text{ln(imprecision)}} = \ln \left( \sup\_{s \in \{1, \dots, n\_{\text{stokes}}\}} \langle T\_{\text{played},s} \rangle / s \right). \tag{11}$$

Hence, these transformed outcome measures were used in the statistical analysis of the experiment. The results of the statistical model were calculated as the numerical values of the parameters of the mixed-effects regression model. The most important effects were plotted as 95% confidence intervals, which allows hypothesis testing at a significance level of α = 0.05.

### 3. RESULTS

The measurement data is summarized in **Table 2**. The measured inaccuracy ranges from less than 0.1 to 39 ms, the imprecision from 4 to 33 ms, the perceived stress from 1 to 8 (the whole range of the questionnaire) and the normalized muscular activity from 0.36 to 2.76. Comparing the imprecision at the highest desired frequency to the results of Fujii et al. (2009, **Table 3**), the experts in Fujii et al. (2009) play slightly better, while their novices play slightly worse.

Fitting the statistical models to the measurement data yields values for the parameters that tell how much each of the factors influenced the outcome measures. Values for all fitted parameters are found in **Table 4**.

The inaccuracy is most strongly influenced by the expert status and the desired frequency. The imprecision is most strongly influenced by the adapter type and expert status. The perceived stress is dominated by the desired frequency. The most important influence factors on muscular activity are desired frequency and adapter type.

The effect of the trial number, which represents learning and fatigue, is between one and three orders of magnitude smaller than the other effects.

The effects of the most important parameters on the outcome measures are shown as 95% confidence intervals in **Figures 6**–**8**. Whenever the 95% confidence interval does not include zero, the effect is statistically significant at a significance level of α = 0.05. In some cases of a slight overlap between the confidence interval and the zero level we carefully speak of tendencies.

**Figure 6** depicts the influence of the desired frequency on the outcome measures. On all of them, increasing the desired frequency has a detrimental effect.

The influence of the factors that are related to the question which adapter type is most suitable for a drumming prosthesis, namely the adapter type, the diagonal status (i.e., whether the design frequency matches the desired frequency) and their interactions with the expert status, are shown in **Figure 7**.

The first hypothesis, that variable stiffness provides the best performance and comfort, and the second hypothesis, that experts are better able to make use of the variable stiffness, can be tested by regarding the differences between playing on or off the diagonal shown in **Figure 7A**. The effects of the diagonal on the inaccuracy are in opposite directions for experts and novices<sup>2</sup> . For the experts, the effect of the diagonal is a reduction of the inaccuracy, while for novices it is an increase in inaccuracy. The imprecision of novices also increases with

<sup>2</sup> Since the logarithm is a monotonically increasing function, a higher logarithmic outcome measure corresponds to a higher plain outcome measure, so that, for example, a higher logarithmic inaccuracy can be described simply as a higher inaccuracy.



#### TABLE 2 | Summary of the measurement data.


Note that for the statistical model, the values inaccuracy and imprecision were transformed by the natural logarithm. Here the values are shown untransformed for easier interpretation.



TABLE 4 | Estimates for the parameters of the fitted linear mixed models in Equation (7).


playing on the diagonal and the imprecision of experts shows a trend of increase. The comfort measures were less affected by the diagonal.

**Figure 7B** shows the differences between playing with an elastic spring and playing with the stiff connection and helps to answer the question whether stiff or elastic adapters are

more suitable. Regarding performance, the springs show a mostly detrimental effect. The imprecision when playing with the springs is higher than when playing with the stiff connection and increases with increasing softness of the springs. The inaccuracy also shows a tendency to increase when using springs instead of a stiff connection, especially for the springs with lower stiffness. Regarding the influence of using elastic springs on the comfort measures, there is a discrepancy between perceived stress and muscular activity. On the one hand, the perceived stress of novices is higher when using the springs, and also the perceived stress of experts tends to be higher. On the other hand, playing with springs rather than a stiff connection reduces the muscular activity of experts and shows a tendency for reduction of the muscular activity of novices.

Since the use of diagonal-type variable stiffness shows a beneficial effect on the accuracy of experts but the use of some of the springs shows a detrimental effect with a similar magnitude, it is interesting to see the differences in inaccuracy of experts between using the springs in diagonal-type variable stiffness mode and using the stiff connection. These effects of the springs on the diagonal are the sum of the effect of the diagonal (**Figure 7A**) and the effects of the springs (**Figure 7B**) and are shown in **Figure 8**. The diagram shows a tendency for reduced inaccuracy when playing with diagonal-type variable stiffness.

### 4. DISCUSSION AND CONCLUSION

In this study, we investigated whether drummers can take advantage of a variable-stiffness joint in a prosthetic wrist. We asked 20 novices and 9 experts to play different frequencies using different elastic elements and one stiff element in a connection between a cuff on the forearm and a drumstick. We hypothesized that subjects will perform best and require the least effort when playing the elastic elements at their resonant frequencies, which is one main argument for variable-stiffness actuation. Moreover, we hypothesized that such an effect will be more obvious for an expert drummer.

Can drummers take advantage of a variable-stiffness joint in a prosthetic wrist? Our experimental design was unable to verify that variable stiffness is useful for a prosthetic wrist. Even if experts showed a trend for a reduced inaccuracy when playing the diagonal, they showed the opposite trend for imprecision, and it is difficult to judge which of the two is more relevant. For novices, both inaccuracy and imprecision increased when using the diagonal-type variable stiffness.

The results showed the expected influence of elasticity on the EMG-measured effort, namely that elasticity reduces the effort, and that experts are better than novices in doing so. Despite that, there was a clear discrepancy between measured and perceived effort for both groups. While the muscular activity decreased with decreasing stiffness of the adapter, the perceived stress increased. We find this effect surprising.

A possible interpretation is that subjects are indeed able to save muscular effort by making use of the energy-storing capabilities of the spring. However, the effort by the brain for controlling the more complex dynamics of the system involving the springs is likely higher than for controlling the system involving the stiff connection. In the answers to the questionnaire, the increased cerebral effort might therefore outweigh the reduced muscular effort.

A further explanation for the lower EMG-measured effort when using the springs lies in the fact that the they are more lightweight than the stiff connection and that less muscular effort might have been necessary to accelerate and decelerate it. However, the reduction of inertia (5–6%) is considerably lower than the reduction of EMG-measured effort (16–25%). Assuming that muscular activity is roughly proportional to accelerated inertia, we therefore estimate that the influence of the different weights on the EMG-measured effort is small and that the effect of the adapter types on muscular activity is dominated by their elasticity.

These findings somewhat confirm the study of Fujisawa and Miura (2010), who found that removing the rebound of the drum or adding weight would lead to increased EMG levels. However, this comparison is limited by differing places of energy storage (drum skin vs. elastic wrist) and investigated muscles (wrist muscles vs. elbow and shoulder muscles).

In interpreting the results with respect to variable stiffness, three main limitations of our experimental design have to also be considered. (a) We measured the resonant frequency of the cuff including the elastic element and the drumstick. The influence of the rebound of the drum skin as well as the elasticity of the soft tissue and the (variable) stiffness of the joints due to muscle contraction and cocontraction were not gathered when

measuring the resonant frequency of the cuff-spring-drumstick combination. This possibly leads to lower resonant frequencies of the whole system including the human arm, which could explain

the better performance of the stiffer springs. (b) There might be variable-stiffness strategies other than matching the resonant frequency of the system to the desired frequency. In order to

are better.

potentially discover those other variable-stiffness strategies, in future experiments, one could ask subjects to play a more fine grained set of frequencies and try to discover patterns in the outcome measures. (c) Even the expert drummers were naive to the experiment and the time to become acquainted with the system for any combination of adapter type and desired frequency was limited to 10 s. While this was helpful for achieving a reasonable experiment duration, allowing a subject to practice using a variable-stiffness drumming system for weeks or months might show long-term learning effects.

Future experiments into variable-stiffness drumming prostheses might use the following improvements to possibly find a beneficial effect of variable stiffness. A new device with

### REFERENCES


continuously variable stiffness instead of separate adapters with different stiffness levels could be built so that subjects can better match the resonant frequency of the whole system including the rebound of the drum to the desired frequency or employ a different variable-stiffness strategy. This would also remove the problem of different masses of the stiff connection adapter and the springs. Furthermore, subjects could be given more time (hours, days or weeks) to become more acquainted with the system in order to learn the more complex dynamics.

Conclusively, our experimental results argue that serieselastic elements can be used to reduce the muscular activity of drumming, but that their stiffness does not need to be variable. However, the elastic elements appear to initially put an increased control burden on the user. While expert drummers seem to be able to deal with their more complex dynamics, novice drummers seem to reach a better performance with a stiff connection. Prosthesis users may benefit from this study if its results or the results of a future, improved study are incorporated into the design of an actual prosthetic device.

### AUTHOR CONTRIBUTIONS

GS, HH, and JV were involved in planning and design of the experiments. HH and GS were involved in the statistical analysis. JS was involved in setting up and conducting the experiments. HN, HH, and GS were involved in the statistical design. HH and GS wrote the manuscript. JV provided corrections to the manuscript. All authors contributed to manuscript revision, read and approved the submitted version.

### ACKNOWLEDGMENTS

We want to thank Prof. Küchenhoff of StaBLab at LMU Munich for his assistance in the statistical design. We want to thank the subjects for participating in the study.


force task," in Proceedings of IEEE International Conference on Robotics and Automation (ICRA), 2011 (Shanghai), 3312–3316.


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

The reviewer NC and handling Editor declared their shared affiliation.

Copyright © 2018 Stillfried, Stepper, Neppl, Vogel and Höppner. This is an openaccess article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Closed-Loop Hybrid Gaze Brain-Machine Interface Based Robotic Arm Control with Augmented Reality Feedback

Hong Zeng<sup>1</sup> \*, Yanxin Wang<sup>1</sup> , Changcheng Wu<sup>2</sup> , Aiguo Song<sup>1</sup> , Jia Liu<sup>3</sup> , Peng Ji <sup>1</sup> , Baoguo Xu<sup>1</sup> , Lifeng Zhu<sup>1</sup> , Huijun Li <sup>1</sup> and Pengcheng Wen<sup>4</sup>

*<sup>1</sup> School of Instrument Science and Engineering, Southeast University, Nanjing, China, <sup>2</sup> College of Automation Engineering, Nanjing University of Aeronautics and Astronautics, Nanjing, China, <sup>3</sup> Jiangsu Collaborative Innovation Center of Atmospheric Environment and Equipment Technology, Nanjing University of Information Sciences and Technology, Nanjing, China, <sup>4</sup> AVIC Aeronautics Computing Technique Research Institute, Xi'an, China*

Brain-machine interface (BMI) can be used to control the robotic arm to assist paralysis people for performing activities of daily living. However, it is still a complex task for the BMI users to control the process of objects grasping and lifting with the robotic arm. It is hard to achieve high efficiency and accuracy even after extensive trainings. One important reason is lacking of sufficient feedback information for the user to perform the closed-loop control. In this study, we proposed a method of augmented reality (AR) guiding assistance to provide the enhanced visual feedback to the user for a closed-loop control with a hybrid Gaze-BMI, which combines the electroencephalography (EEG) signals based BMI and the eye tracking for an intuitive and effective control of the robotic arm. Experiments for the objects manipulation tasks while avoiding the obstacle in the workspace are designed to evaluate the performance of our method for controlling the robotic arm. According to the experimental results obtained from eight subjects, the advantages of the proposed closed-loop system (with AR feedback) over the open-loop system (with visual inspection only) have been verified. The number of trigger commands used for controlling the robotic arm to grasp and lift the objects with AR feedback has reduced significantly and the height gaps of the gripper in the lifting process have decreased more than 50% compared to those trials with normal visual inspection only. The results reveal that the hybrid Gaze-BMI user can benefit from the information provided by the AR interface, improving the efficiency and reducing the cognitive load during the grasping and lifting processes.

Keywords: brain-machine interface (BMI), eye tracking, hybrid Gaze-BMI, human-robot interaction, augmented reality feedback, closed-loop control

## INTRODUCTION

It has been demonstrated that Brain-machine interface (BMI) can be used for paralysis people to control the robotic arm for the objects manipulation tasks in activities of daily living (Millan et al., 2010). BMI users can directly control the robot using the extracted movement intentions from the brain without any muscular intervention (Schwartz, 2016). Although the user can control the

#### Edited by:

*Matteo Bianchi, University of Pisa, Italy*

#### Reviewed by:

*Noman Naseer, Air University, Pakistan Rifai Chai, University of Technology Sydney, Australia Simone Toma, Arizona State University, United States*

> \*Correspondence: *Hong Zeng hzeng@seu.edu.cn*

Received: *30 June 2017* Accepted: *18 October 2017* Published: *31 October 2017*

#### Citation:

*Zeng H, Wang Y, Wu C, Song A, Liu J, Ji P, Xu B, Zhu L, Li H and Wen P (2017) Closed-Loop Hybrid Gaze Brain-Machine Interface Based Robotic Arm Control with Augmented Reality Feedback. Front. Neurorobot. 11:60. doi: 10.3389/fnbot.2017.00060* robotic arm in three dimensional space to reach and grasp the objects after training via invasive BMIs (Hochberg et al., 2012; Downey et al., 2016), where the neural activity of the brain is measured using the electrodes placed on the surface of the cerebral cortex or implanted directly into the gray matter of the brain, it is necessary to place the electrodes via surgery procedure with medical risks and fewer patients can benefit from this method (Morgante et al., 2007). Noninvasive techniques, which measure the brain activity from the external surface of the scalp without surgical implantation, are more valuable than the invasive ones, e.g., functional magnetic resonance imaging (fMRI; Gudayol-Ferre et al., 2015), functional near-infrared spectroscopy (fNIRS; Naseer and Hong, 2015), magneto encephalography (MEG; Fukuma et al., 2016), electroencephalography (EEG; Moghimi et al., 2013). The EEG signals acquired by placing the electrodes on the surface of the scalp are mostly studied because of its high time resolution, few risks to the user and requires less expensive equipment.

For the EEG based non-invasive BMI, the EEG signals obtained during visual cue or motor imagery are mapped to the commands for the external devices such as humanoid robots (Duan et al., 2015; Andreu-Perez et al., 2017), virtual helicopter (Doud et al., 2011; Shi et al., 2015), wheelchairs (Kim et al., 2017; Li et al., 2017), locomotion exoskeletons (Lee et al., 2017), telepresence mobile robot (Escolano et al., 2012; Zhao et al., 2017), and even animals (Kim et al., 2016). In order to obtain sufficient number of commands for controlling the robotic arm with multiple degrees of freedom, it is desired to perform the multiple mental states classification (Hortal et al., 2015; Kim et al., 2015; Meng et al., 2016). Nevertheless, it is a challenging task in practice for the BMI user to switch among multiple mental states constantly. In fact, it is much easier for a user to maintain a switch between two mental states than that among multiple states. However, it is unable to provide a sufficient degree of control flexibility in such a way. To overcome this shortcoming, many hybrid methods are proposed by combining BMI with additional signals, such as eye-tracking (Kim et al., 2014), electromyography (Leeb et al., 2011; Bhagat et al., 2016), electrooculography (Ma et al., 2015; Soekadar et al., 2016), fNIRS (Khan and Hong, 2017) and so on, so as to increase the number of commands (Hong and Khan, 2017). Gaze selection is demonstrated to be natural, convenient and faster compared with other interaction approaches (Wang et al., 2016). Therefore, the method has been proposed in Onose et al. (2012) and McMullen et al. (2014) where the target is selected via eye tracking and the classified result of the EEG signals is used to initiate the automatic reaching, grasping and delivering actions by a robotic arm.

Although the hybrid Gaze-BMI system by combing eyetracking and BMI has shown its ability to help the patients with motor disabilities to complete the sophisticated motor task, recent studies have demonstrated that patients working with assistive devices are not satisfied with fully automatic control by the robot only (Kim et al., 2012; Downey et al., 2016). In other words, it is desired for the BMI user to intervene with the controlling process when working with assistive devices rather than fully automatic control. Nevertheless, it is still a challenging task for the user to control the process of objects grasping and lifting via non-invasive BMI (Popovic, 2003 ´ ). High efficiency and accuracy are hard to achieve, even after extensive training (Lampe et al., 2014). An important reason is that usually only the visual feedback is provided to the BMI user, and the user relies exclusively on the visual feedback during the grasping and lifting processes, which may contribute to a time-consuming and ineffective controlling process (Johansson and Flanagan, 2009; Mussa-Ivaldi et al., 2010). Moreover, studies show that it will cause significant increase of the cognitive load if the user has to rely on the visual inspection only to find out whether the current controlling process is completed (Biddiss and Chau, 2007; Antfolk et al., 2013). Therefore, it is desired by the patients to have more intuitive and understandable feedback approaches in BMI based systems.

To this end, we propose to utilize the AR technique to provide the intuitive and effective feedback for a hybrid Gaze-BMI based robotic arm control system, where the eye tracking system is used for the robot position control (i.e., the target selection) and the movement intention is decoded from the EEG signals as the confirmation of the target position selected by the user or the trigger command to be executed on the target. Experiments for the objects manipulation tasks while avoiding the obstacle in the middle of the workspace are designed, where the manipulation tasks are divided into five phrases: reaching, grasping, lifting, delivering and releasing. For the grasping and lifting tasks that requires fine operations, the human supervisory is often desired. For the less demanding tasks, i.e., reaching, delivering and releasing, they can be automatically completed by the robotic arm once the movement intention is detected from the EEG signals. Therefore, our main idea is to maintain as much manual control as possible in the grasping and lifting processes using the hybrid Gaze-BMI, while providing the user with the enriched visual information about the gripper status through the AR technique in real time. The performance of the hybrid Gaze-BMI based systems both in open-loop (with visual inspection only, without AR feedback) and close-loop (with AR feedback) will be compared in the experiments.

The rest of the paper is organized as follows: section Materials and Methods describes the components of the proposed system as well as the experimental protocols used in this study. The results of the experiments are presented in section Results. The discussion of this study is provided in section Discussion and followed by the conclusion in section Conclusion.

### MATERIALS AND METHODS

### System Architecture

The block diagram of the proposed system is shown in **Figure 1**. The functional modules of BMI, eye tracking, image processing, automatic control and AR interface are integrated in this system to allow the user performing the objects manipulation tasks. Image processing is applied to segment all the potential cuboids from the image of the workspace. The segmented objects can be selected by the subjects via eye tracking. The outputs decoded from the BMI are used to (1) confirm the object selection by the user, or (2) trigger the switching of action sequence, or (3) constantly control the aperture and height of the gripper

during the grasping and lifting processes, respectively. The intentionally selected object by the user as well as the status of the grasping and lifting operations is visually fed back to the user via the computer screen using AR techniques in real time. Eventually, the robotic arm implements the reaching, grasping, lifting, delivering and releasing tasks, in response to the outputs decoded from the hybrid Gaze-BMI. The experimental setup used in this study is shown in **Figure 2**. The physical system is composed of an eye tracker, an EEG headset, a PC, a robotic arm, and an USB camera. The participants are seated in front of the computer comfortably wearing the EEG headset on their head to perform the object manipulation tasks. The distance from the user to the "23.6" LCD monitor is ∼90 cm. The monitor displays the live video captured from the workspace. The interaction between the subjects and the system is via the hybrid Gaze-BMI and the enhanced visual feedback by AR.

### Brain-Machine Interface

A low-cost commercial EEG acquisition headset, Emotiv EPOC+ (Emotiv Systems Inc., USA), is used to obtain the user's intention to rest or to perform hand motor imagery. This device is consisted of 14 EEG channels (AF3, F7, F3, FC5, T7, P7, O1, O2, P8, T8, FC6, F4, F8, and AF4) and two reference channels (P3, P4). The data are sent to the computer through Bluetooth with a sampling rate of 128 Hz.

The OpenVibe toolbox is used for the training session of the BMI decoding model. Firstly, the Graz Motor Imagery BCI Stimulation in the OpenVibe toolbox is used as the EEG signals acquisition paradigm, where the right arrow and the left arrow are shown in a random order to guide the user for the motor imagery tasks as is shown in **Figure 3**. When the right arrow is presented, the user should imagine the right hand movements until the green cross in the window disappears, while the user should keep relaxed when the left arrow or no arrow is presented. Participants are asked to remain relaxed to reduce the effects from muscle signals during the EEG recording process. Nextly, the preprocessing and feature extraction are applied on the EEG data. A 5th-order Butterworth band pass filter is utilized for temporal filtration with cut-off frequency from 8 to 12 Hz. The filtered signals are then segmented with a 1s-long sliding window in steps of 62.5 ms. The commonly used feature extraction method, i.e., common spatial pattern (CSP), is applied on the signals to extract the features that discriminates between the hand motor imagery and the relax states. Subsequently, a linear discriminant analysis (LDA) classifier is trained to classify the two mental states. Finally, the learned CSP filter and the LDA classifier are applied for the online user intent identification. Two kinds of brain states, i.e., rest and motor imagery, are classified from the EEG signals, alone with an action power, a unidimensional scalar index ranging between 0 and 1 representing the detection certainty that the user has entered the "motor imagery" state. To achieve a reasonable trade-off between true positives and false positives, the detection certainty threshold for the "motor

FIGURE 2 | Experimental setup used in this study. The live video of the workspace captured by the camera and the enhanced visual feedback are presented to the user via the monitor. Using the eye-tracking device EyeX, the user can select the object that he/she intends to manipulate. The movement intention can be detected by the BMI device Emotiv EPOC+, which can confirm the user's selection or initiate the control on the selected object. Dobot executes the reaching, grasping, lifting, delivering, and releasing tasks in response to the trigger commands from the user. The enlarged Graphical User Interface, which is programmed in C++ under Qt framework, is shown on the right side of the picture above.

imagery" state is set to 0.60 by rule of thumb in our experiments. Namely, motor imagery state with the detection certainty above 0.60 is used to initiate the execution of a command, otherwise the decoded mental state will be deemed as the "rest" state. The movement intention decoded by OpenVibe is delivered to the robotic arm control engine through the Analog VRPN Server in the OpenVibe every 62.5 ms. When the robotic arm is in operation, no action will be executed.

### Image Processing and Eye Tracking

An USB camera, with a resolution of 1,280 × 720 pixels, is used to capture the live video data of the workspace and sends the video to the computer via an USB 2.0 connection. For the eye tracking, a commercial desktop eye tracker, EyeX (Tobii AB Inc., Sweden), is used to detect and map the user's pupil position to the cursor on the monitor. The eye tracker is fixed at the bottom of the computer monitor (cf. **Figure 2**). The data are transmitted to the computer via USB 3.0 at a rate of 60 Hz. The gaze points acquired from the EyeX system are filtered to remove the minor gaze fluctuations, which is achieved by calculating the 10-point moving average. Then the filtered gaze points are fed to the computer for updating the position of the cursor position on the monitor every 30 ms.

Image processing and eye-tracking are used for the objects identification and selection in the manipulation tasks. Three kinds of cuboids (10 × 20 × 10 mm) with different colors (red, green and blue) are used in the experiment (cf. **Figure 2** right). Cuboids in the workspace are detected using image processing techniques based on their colors. Firstly, the image of the workspace is converted from the RGB space to the HSV space to lessen the illumination effect from the natural environment. Subsequently, the contours of the objects in the image are confirmed based on the threshold of different colors. Finally, all the potential cuboids are segmented from the image of the workspace. It is necessary to perform the calibration procedure for the eye tracker before the experiment. The calibration procedure lasts <1 min for each subject, during which the user gazes at seven points shown on the computer monitor one by one.

The user can move the cursor on the monitor over the target to be manipulated, and then a visual feedback is provided to the user by highlighting a red rectangle surrounding the target (cf. **Figure 7A**). When the object is confirmed by the subject, i.e., when the subject fixates upon the object and the motor imagery state is detected from the EEG signals, the color of the rectangle changes from red to green (cf. **Figure 7B**). Similarly, the switch of the action sequence will be triggered when the user fixates their gaze points on the specific position and meanwhile the movement intention is detected. For example, when the target position for placing the objects is fixated on with the motor imagery state being decoded from the EEG signals, the action sequence will switch from the lifting process to the delivering process.

### Robotic Arm

For the actuated system, a desktop robotic arm with 5◦ of freedom, Dobot (Shenzhen Yuejiang Technology Co Inc., China), is used. The robotic arm controller can directly convert the XYZ position to the corresponding joints positions based on the inverse kinematics. Therefore, the user can directly give the motion end-point information in 3D environment via the hybrid Gaze-BMI, and the controller of Dobot will plan the path to the target position automatically. Then the robotic arm executes the manipulation tasks in response to the trigger commands from the hybrid Gaze-BMI user.

The workspace is predefined using a rectangle (150 mm × 150 mm) in the real scene. The webcam screen view coordinates will then be mapped with the corresponding robot workspace coordinates, as is shown in **Figure 4**. Firstly, the coordinates of the vertexes (p1, p2, p3, and p4 in **Figure 4A**) in the image plane are acquired. Nextly, the pose value of the robotic arm in the four vertexes (P1, P2, P3, and P4 in **Figure 4B**) of the rectangle is obtained. Subsequently, a perspective transform matrix from the pixels to the coordination of the robotic arm is calculated based on the calibration data (p1∼p4 and P1∼P4). Finally, the position of the objects in the image plane of the workspace is mapped to the coordination of the robotic arm based on the perspective projection. The commands are sent to the robotic arm engine via its Application Programming Interface (API). In this way, the height and the aperture of the gripper can be obtained from the Dobot engine in real time, so as to present the current state of the tasks to the user with the AR feedback.

### Augmented Reality Interface

The AR interface is implemented with OpenCV and OpenGL. The marker-based tracking method is used to calculate the camera pose relative to the real world to align the real camera and the virtual camera in OpenGL. Firstly, the camera is calibrated using a chessboard. The distortion parameters and the intrinsic parameters of the camera are obtained during the calibration procedure. Then, the extrinsic parameters should be solved, which encode the position and the rotation of the camera relative to the 3D world. To calculate the extrinsic parameters, a square with the same center of the cuboids is used as the simulated marker, as shown in **Figure 5**. The width of the square is 1 mm, which is calibrated in advance. The virtual objects are of the same size of the virtual markers. Therefore, the size of the virtual objects can be controlled by the size of the simulated marker. The center of the square (O) is assumed to be (0, 0, 0) in 3D world. Then the extrinsic parameters can be solved using solvePnP in OpenCV (Opencv, 2017). Finally, a perspective projection in OpenGL with the field of view and the aperture angle of the camera from intrinsic parameter are obtained, and the virtual camera in OpenGL is put in the position given by the extrinsic parameters to align the virtual and real objects.

In the objects manipulation tasks, the AR feedback is provided to the user during the grasping and lifting processes. Firstly, the enriched visual information, such as the virtual gripper aperture and the simulated grasping force, is presented to the user on the screen during the grasping process in real time. A virtual box whose length is of the same with the aperture of the gripper is placed near to the object, representing the information about the gripper aperture (**Figure 7C**). When the gripper aperture becomes smaller than the width of the object, i.e., the objects has been grasped by the gripper, the grasping force then will be simulated by two arrows normal to the gripper that are overlaid over the cuboid in the image (**Figure 7D**). In addition, the greater the difference between the size of the object and the aperture of the gripper is, the longer the arrows are (i.e., the stronger the grasping force is). Secondly, during the lifting process, the altitude of the gripper is fed back to the subject through the height of the virtual box in the middle of the virtual obstacle (see **Figure 7F**). The altitude of the gripper on the table is calibrated in advance. The height of the virtual box is calculated by the difference value between the real time pose data of the robotic arm in vertical direction and the height of the gripper on the table.

### Experimental Protocol

Experiments for the objects manipulation tasks are designed to evaluate whether the hybrid Gaze-BMI users can benefit from the AR feedback for the grasping and lifting processes, where the human supervisory is involved. The workspace is shown in the right side of **Figure 2**. The user is instructed to select and grasp the object, then deliver it to the target position. The height of the virtual obstacle is 15 mm, which should be avoided by the robotic arm during the delivering process. The object should be released to the rectangular area with the same color as the object. The grasping and lifting processes are controlled manually by the

BMI user, i.e., the user will decide when to stop the grasping process and whether the height of the gripper is enough for a safe delivering. The complete objects manipulation protocol is introduced as follows.

#### Reaching

Several cuboids in different colors are placed randomly in the workspace with different orientations (**Figure 2** right). The cuboid will be highlighted with a virtual red rectangle surrounding it when the cursor (gaze point) is over it (**Figure 7A**). Once the reaching action is triggered successfully, i.e., the gaze point is being over the object and the motor imagery state has been decoded from the BMI, the color of the rectangle surrounding it will change from red to green indicating the confirmation of the selected object (**Figure 7B**). The position of the selected object in the workspace is mapped to the coordination of the robotic arm as the end-point information. Then the robotic arm will move to the pre-grasp position over the objects. The orientation of the gripper will be adjusted automatically, according to the angle of the object in the workspace based on the image processing results. If a motor imagery state is detected from the EEG signal while no object is being selected, this command will be ignored by the system.

#### Grasping

Subsequently, the aperture of the gripper will be controlled manually by the user. The gripper is open in the initial state with an aperture of 25 mm. The aperture of the gripper will decrease 1 mm each step in the grasping process if the user maintains the motor imagery state and meanwhile fixates on the object in the image panel. The aperture of the gripper is mapped to the angle of the servo to accomplish the control of the gripper. The relation between the aperture of the robotic arm and the angle of the servo is estimated based on data fitting, as is shown in **Figure 6**. The circle with a letter "G" in it will appear at the bottom of the GUI, indicating that the user has arrived at the grasping phrase. The width of the virtual box changes with the aperture of the gripper (**Figure 7C**). The arrows shown in the video means that the grasping force is being generated on the object (**Figure 7D**). If the cuboid has already been grasped tidily while the user insists on generating the trigger commands, the gripper will continue responding to the commands, and the length of the arrow will continue to increase so as to present the increasing of the grasping force.

### Lifting

Then the individual should switch the grasping process to the next action sequence that picking the object up to avoid the obstacle. The user should fixate their gaze at the red circle with a letter "G" inside at the bottom of the GUI and perform motor imagery to initiate the switch. The letter in the red circle changes from "G" to "M" indicating a successful state switching from the gasping process to the lifting process (**Figure 7E**). After that the user is able to control the robotic arm by moving in the vertical direction to avoid the virtual obstacle in the middle of the workspace. The height of the robotic arm will increase 1 mm in response to each trigger command from the hybrid Gaze-BMI. A virtual box, whose height is equal to the altitude of the robotic gripper obtained from the Dobot engine, will be presented right in the middle of the virtual obstacle. In this way, the subject can easily find out whether the height of the robotic gripper is enough for a safe delivering (**Figure 7F**).

### Delivering and Releasing

Subsequently, the subject may switch from the lifting process to the delivering process, by fixating his gaze to one of the three target rectangular areas in different colors and then performing motor imagery. Then the Dobot will generate a path in the plane with the same height as that of the gripper and deliver the object to the target position automatically (**Figure 7G**). Finally, once the OpenVibe has detected the motor imagery state from the EEG signals, the object will be released and the robotic arm returns to the initial position automatically, waiting for the next trial (**Figure 7H**).

Grasping and lifting processes in open-loop (with visual inspection only, without AR feedback) are also implemented for the comparison with the same protocol above. **Figure 7** shows the whole process in the object manipulation tasks both with and without AR feedback. In the open-loop protocol, the user decides when to stop the grasping and lifting processes by visual inspection only, as is shown in **Figures 7I–L**.

### Performance Evaluation

Eight participants (all males, 24.5 ± 1.2 years old) are recruited from the campus to perform the objects manipulation tasks using the proposed system. All of them are healthy and right handed. This study is carried out in accordance with the recommendations of the Ethics Committee of Southeast University with written informed consent from all subjects. All

subjects gave written informed consent in accordance with the Declaration of Helsinki.

Firstly, the BMI decoding model was trained for each subject in the training session described in the subsection brain machine interface. The training session for each subject was composed of a randomly sorted sequence of 40 trials, 20 for the hand motor imagery tasks and 20 for the relax tasks. The execution of each task lasted for 4 s, and it was spaced from the beginning of the next task with an interval lasting randomly from 1 to 3 s, during which the subject could relax concentration. Each task was triggered through visual cues displayed on the screen. The 5-fold cross-validation BMI decoding performance on the data from the training session is then reported.

Secondly, the online evaluation of the robotic arm control system based on the hybrid Gaze-BMI with or without AR feedback was performed. For each subject, the online evaluation session consisted of a randomly sorted sequence of 30 trials, 15 for the system with AR feedback and 15 for the system without AR feedback (i.e., with normal visual inspection only). The online decoding model of BMI is obtained by training with all the data from the training session above. For each online trial, the BMI user operates the robotic arm to transfer a cuboid to the target area in the same color while avoiding the virtual obstacle in the middle of the workspace. The subject can have a rest whenever needed between two trials. We do not limit the task completion time for each trial and the user is asked to bare successful grasping and safe delivering in mind. Therefore, all the subjects can successfully accomplish the object transferring task both with and without the AR feedback.

The online manipulation performance will be evaluated with the following two indices: (1) The number of trigger commands used in both the grasping and the lifting process, as used in Tonin et al. (2010) and Kim et al. (2012). The BMI user generates the trigger commands with the hybrid Gaze-BMI, thereby the number of commands used in the grasping and lifting processes can be used to characterize the efforts of the hybrid Gaze-BMI users with or without AR feedback during the object manipulation tasks. When the object has already been grasped tidily while the user still maintains the motor imagery state and fixates on the object, the robotic arm will continue to execute the trigger commands. Though the aperture of the gripper may not change dramatically, the contact force on the object will increase which may be harmful to the object and the robotic gripper. Similarly, when the height of the gripper is enough for a safe delivering while the user still produces the trigger commands, the gripper will continue moving in the vertical direction. Those unnecessary mental commands will increase the workload of the BMI users and reduce the efficiency of the controlling process. (2) The height gap of the robotic gripper in the lifting process. This index is used for the following considerations. When the BMI user move their gaze point to the target area and perform motor imagery to finish the lifting process, the robotic arm will move to the target area in the plane with the same height as that of the gripper. An ideal condition is that the final height of the robotic gripper in the vertical direction (Z) is just fine for a safe delivering over the obstacle. Therefore, the height gap of the gripper in the lifting process is defined as the altitude difference between the gripper and the obstacle. Those two indices are used to evaluate whether the BMI user can benefit from the AR feedback to successfully complete the delivering task with less efforts. The performance difference between the proposed approach with AR feedback and the one with visual inspection only was evaluated using the one-tailed Wilcoxon rank sum test.

### RESULTS

### The Classification Performance of the BMI

The 5-fold cross-validation classification accuracy of the BMI for each subject is shown in **Table 1**. The average classification accuracy for the relax state is 85.0 ± 6.3%. An average accuracy of 86.4 ± 6.4% for the motor imagery state is achieved using the BMI decoding model. The aggregated classification accuracy across the subject is 85.16%, with a standard deviation of 4.83%. The highest accuracy of the BMI achieved on subject 6 is 94.01%. Subject 7 has obtained the worst performance with an average accuracy of 77.42%.

### Online Manipulation Performance in Grasping Process

The average number of commands used in the grasping process for each subject is shown in **Figure 8A**. The number of trigger commands used for the objects grasping with AR feedback is generally less than that with visual inspection only. In particular, for subject 4, the number of trigger commands has been reduced from 33 to 17 when the enhanced visual feedback is provided. With normal visual inspection only, (i.e., no AR feedback is provided), it is hard for the users to clearly observe the status of the grasping process, especially when the robotic arm hinders the objects from the subjects' view (e.g., **Figure 7J**). Furthermore, in order to grasp the object tightly, the user has to generate more controlling commands by the hybrid Gaze-BMI in the grasping process without AR feedback than that with AR feedback. By

FIGURE 7 | The process of objects manipulation tasks with and without AR feedback. The area of the gripper is expanded as is shown in (C–L). Reaching: (A) The robotic arm is in the initial position. An object can be selected by the gaze points of the user, and a red rectangle will then appear around the object, indicating that the user is starring at it. (B) The color of the rectangle changes from red to green when the target object is confirmed by the user once the motor imagery state is detected. Next, the robotic arm moves to the position for the subsequent grasping. Grasping (AR): (C) The circle with a letter "G" in it will appear at the bottom of the GUI, indicating that the user has arrived at the grasping phrase. The orientation of the gripper is adjusted automatically based on the orientation of the object in the workspace. The aperture of the gripper is presented to the user based on AR feedback interface via a virtual box near the object. (D) When the selected object has been grabbed tidily, two virtual arrows normal to the gripper are then overlaid over the object, simulating the grasping force. Lifting (AR): (E) the letter in the circle changes from "G" to "M" indicating a successful switching of action sequence from the grasping process to the lifting process. The user can control the gripper moving in the vertical direction to lift the object. The height of the gripper to the table is represented by that of a virtual box in the middle of the obstacle. (F) When the height of the virtual box is higher than the obstacle, it is deemed that the altitude of the robotic arm is enough for a save delivering. Delivering and Releasing: (G) when the lifting process is completed, the user fixates his/her gaze on the target rectangle and performs motor imagery to trigger the robotic arm moving to the target position automatically. Besides, the color of the rectangle around the object changes from green to cyan, indicating a successful action sequence switching. (H) The object is released in the target position. Then Dobot returns to the initial position automatically, waiting for the next trial. Grasping and Lifting (NoAR): (I–L) the grasping and lifting processes without AR feedback, where the hybrid Gaze-BMI user has to decide when to stop the current process by the visual inspection only.

contrast, the aperture of the gripper and the simulated grasping force between the gripper and the objects are shown for the user with AR feedback in real time. Therefore, it is much easier for the user to handle the grasping process. The results have revealed that the grasping task can be completed with less trigger commands and more consistent performance across the subjects with AR feedback than that with visual inspection only. The number of trigger commands used in the grasping task with the AR feedback is statistically less than that without the AR feedback for each subject (ps1 = 0027, ps2 = 0.0022, ps3 = 0.0089, ps4 = 0.0032, ps5 = 0.0025, ps6 = 0.0018, ps7 = 0.0029, ps8 = 0.0010).


### Online Manipulation Performance in Lifting Process

The average number of commands used in the lifting process is shown in **Figure 8B**. In order to avoid the obstacle in the middle of the workspace, the user should control the gripper moving in the vertical direction until the height of the gripper is higher than the obstacle for a safe delivering. The number of commands generated from BMI has been reduced significantly with AR feedback. When no AR feedback is provided, it is hard for the user to decide whether the height of the robotic gripper is already higher than that of the obstacle in the lifting process. Therefore, to ensure a safe delivering, the user tends to generate more controlling commands by the hybrid Gaze-BMI. In the approach with AR feedback, a virtual box, whose height is equal to the altitude of the robotic gripper obtained from the Dobot engine, was presented right in the middle of the virtual obstacle. Furthermore, the height of the virtual box changes along with the altitude of the gripper in real time. In this way, the user can better perceive the status of the lifting process based on the enhanced visual feedback. The results have revealed that all the subjects can finish the lifting task in around 20 trigger commands with AR feedback. By contrast, much more commands are used in the same task with visual inspection only than the one with AR feedback. The number of trigger commands used in the lifting task with the AR feedback is also statistically less than that without the AR feedback for each subject (ps1 = 0.0054, ps2 = 0.0066, ps3 = 0.0089, ps4 = 0.0039, ps5 = 0.0135, ps6 = 0.0018, ps7 = 0.0036, ps8 = 0.0010).

FIGURE 8 | Comparisons of the number of trigger commands and the height gaps in the objects manipulation tasks between the system with AR feedback and those with visual inspection only. The statistically significant performance difference has been marked by "\*" (*p* < 0.05). (A) The number of trigger commands used in the grasping process for each subject. (B) The number of trigger commands used in the lifting process for each subject. (C) The height gaps of gripper for each subject in the object lifting process. (D) The height gaps of the gripper and the number of trigger commands used in the grasping and lifting processes averaged over all the subjects.

The height gap of the robotic gripper for each user is shown in **Figure 8C**. The height gaps with AR feedback are generally smaller than those with visual inspection only for all subjects, which shows that the subject is capable of find out when to finish the lifting process in time with less efforts based on the enhanced visual feedback. Moreover, the results also show that the height gaps in the lifting task are much more consistent across the subjects with AR feedback than those without AR feedback. The gripper height gaps obtained with AR feedback are statistically smaller than those without AR feedback (ps1 = 0.0040, ps2 = 0.0066, ps3 = 0.0018, ps4 = 0.0040, ps5 = 0.0282, ps6 = 0.0018, ps7 = 0.0015, ps8 = 0.0021).

### Overall Manipulation Performance for All Subjects

**Figure 8D** shows the average height gaps of the gripper as well as the average number of trigger commands used in the grasping and lifting processes for all the subjects with the system with or without AR feedback. The average height gap of the gripper is <4 mm with AR feedback, whereas it is more than 9 mm when only the visual inspection is provided, leading to a reduction in more than 50%. The average number of commands for all subjects decreases from 26.75 to 18.28 and 30.92 to 18.12 in the grasping and lifting processes, respectively. Furthermore, the standard deviation of the number of commands with AR feedback is smaller than that without AR feedback. This is because different subjects may have different understandings of the current task status with visual inspection only. By contrast, it is easier for all the subjects to perceive the task status with AR feedback, and to take advantage of the feedback information provided by AR interface in completing the grasping and lifting tasks. Therefore, the performance with AR feedback of all the subjects is more consistent than that with visual inspection only, indicating that the AR feedback indeed can enhance the performance of the hybrid Gaze-BMI controlled grasping and lifting processes in the objects manipulation tasks.

## DISCUSSION

### Subject Variability of the Manipulation Performance

Firstly, we will illustrate the necessity to remove the subject variability effect of the BMI decoding when evaluating the manipulation performance for the systems with or without the AR feedback. It is well-known that there is the BMI decoding performance variability across the subjects (Huster et al., 2015; Ouyang et al., 2017), which is also the case for our implementation of BMI (see subsection The Classification Performance of the BMI). Because the aim of our online experiments is to testify the possible manipulation performance improvement by introducing the AR feedback to the hybrid Gaze-BMI based robotic arm control system, the subject variability factor associated with the BMI decoding should be removed. To this end, the number of trigger commands used in both the grasping and the lifting process, and the height gaps of the robotic gripper in the lifting process were utilized as the indices for the system manipulation performance, since the commands only can be triggered when the motor imagery state has been detected successfully.

Secondly, the subject variability on the manipulation performance of the complete system will be discussed. As can be observed from **Figure 8**, these three manipulation performance indices are almost consistent across subjects when the AR feedback is provided in the system, whereas this is not the case for the system without the AR feedback. This is mainly due to the reason that the AR feedback can provide the timely hints for the user to switch on the next action. For example, once the subject observes the arrows overlaid over the gripper, which simulate the grasping force between the gripper and the object, the subject can stop generating the trigger commands by the hybrid Gaze-BMI. By contrast, when there is no AR feedback provided, the user has to rely on their own perception of the grasping status by normal visual inspection only. As a result, the manipulation performance of the system without the AR feedback has demonstrated significant subject variability.

### AR Feedback vs. Visual Inspection Only

The objects manipulation tasks with AR feedback and with visual inspection only are performed by the subjects, respectively. In this work, AR feedback is presented in the real scene, which will help the user to understand the meaning of the feedback information. The most significant advantage of AR feedback is that it can provide abundant and flexible information for the patients in an intuitive way via the visual communication channel. In specific, the change in color of the virtual rectangle surrounding the objects indicates the user's conformation of the selected objects, the width of a virtual box is used to represent the aperture of the gripper, the arrow stands for the simulated grasping force in the contacted phase, and the virtual box, whose length is the same as the altitude of the gripper, is overlaid right in the middle of the virtual wall. For the object manipulation tasks, the grasping and lifting processes are executed manually by the hybrid Gaze-BMI users with AR feedback. The hybrid Gaze-BMI can provide a sufficient degree of flexibility for the robotic arm control with the combined gaze selection and BMI control strategy. Meanwhile the subject can utilize the enriched visual information provided by the AR interface to establish the closed-loop control. The performance of the hybrid Gaze-BMI based system using AR feedback is improved notably compared to the one without AR feedback, in terms of both the number of commands used in the controlling process and the height gap of the robotic gripper.

It is necessary to point out that the AR feedback is not a rigid requirement in the objects manipulation tasks according to our experimental results, because the subject can also complete the tasks without AR feedback. However, the performance of the proposed method is improved significantly with the enhanced visual feedback. When no AR feedback is provided, the BMI users tend to rely exclusively on the visual feedback. However, the object may be hided from the field of view by the robotic gripper, in addition, it is hard to estimate the difference between the altitudes of the gripper and the height of the obstacle with the normal visual inspection only. Therefore, this approach may contribute to time-consuming and ineffective performance, thus increasing the workload on the BMI user. The experimental results has demonstrated that the closed-loop control for the grasping and lifting tasks can be achieved by the hybrid Gaze-BMI based system integrating with the AR guiding assistance. Furthermore, the performance of the BMI user with the enhanced visual feedback is improved significantly over that with visual inspection only.

### Fully Automatic Control vs. Manual Control

Previous studies have demonstrated that subjects can perform the objects manipulation tasks using the BMI. The object is selected in the workspace using gaze tracking (McMullen et al., 2014) or using EEG P300-evoked response to the visual cue over the object (Lenhardt and Ritter, 2010; Ying et al., 2017). In those studies, once the object is confirmed by the BMI user, the task will be completed by the robotic arm automatically without the user's intervention, which may fail to improve the user's level of gratification (Kim et al., 2012; Downey et al., 2016). Rather than completing the task automatically, we divide the task into five phrases. For the grasping and lifting tasks requiring fine operations where the human supervisory is desired, they are controlled by the BMI users manually. For the less demanding tasks, such as reaching, delivering and releasing, are completed by the robotic arm automatically once the movement intention is detected from the EEG signals.

The main challenge of the manual control is that the feedback information from the visual inspection only is not sufficient for the user, which may lead to time-consuming and ineffective grasping and lifting tasks (Johansson and Flanagan, 2009).

In order to achieve an effective and efficient manual control in the grasping and delivering processes, AR feedback is used to provide the user with the enhanced visual feedback information about the current status of the tasks. Specifically, the aperture and the altitude of the gripper are controlled manually by the user, and the user can decide when to stop the current action and switch to the next action by means of the information providing by AR interface. In this way, the user is able to maintain as much control as possible in the grasping and lifting processes via the hybrid Gaze-BMI, while obtaining the feedback information via the AR interface.

### Comparison with Other BMI Systems

It is important for patients working with assistive devices to restore their ability for performing activities of daily living such as objects manipulation. Patients with severs motor disabilities cannot fully benefit from assistive devices because of their limited access to the latest assistive products (Millan et al., 2010). To solve the problem, many researchers have focused on BMI based on both invasive and non-invasive neural signals (Nicolas-Alonso and Gomez-Gil, 2012; Chaudhary et al., 2016).

For the invasive BMI, the neural activities of the brain are measured using the electrodes placed on the surface of the cerebral cortex or implanted directly into the gray matter of the brain. Then the acquired neural signals are used to control the robotic arm continuously in three dimensional (Hochberg et al., 2012; Collinger et al., 2013; Downey et al., 2016). In Hochberg et al. (2012), the neural activity is collected with the implanted microelectrode array, and the endpoint velocity of the robotic arm is continuously mapped from the decoded neural activity without other assistance. However, it is very difficult to establish a fine continuous mapping for the low-level control of the robotic arm from the noisy neural activities, two tetraplegia and anarthric patients can only complete the tasks in about 60% trials. Moreover, it has to implant the electrodes via surgical procedures with medical risks.

For non-invasive BMI, various modalities have been proposed such as fMRI, fNIRS, MEG, and EEG (Nicolas-Alonso and Gomez-Gil, 2012). Although fMRI and MEG have better spatial resolution compared with EEG, these two methods need expensive equipment which is nonportable (Muthukumaraswamy, 2013). fNIRS is a relative new measurement method which employs infrared light to characterize non-invasively acquired fluctuations in cerebral metabolism during neural activity. Though fNIRS uses low cost equipment and an acceptable temporal resolution, one of the major limitations of fNIRS based BMI is the inherent delay of the dynamic response (Naseer and Hong, 2015). Therefore, the EEG signals by placing the electrodes on the surface of the scalp are mostly studied, due to its high temporal resolution, few risks to the user and requires less expensive equipment.

It has been shown that the EEG signals acquired during multiple types of motor imagery tasks can be decoded for moving the robotic arm in multiple directions (Wang et al., 2012). Nevertheless, it is difficult to achieve an accurate classification of multiple mental states using EEG signals of poor signal-tonoise ratio. Furthermore, it is a challenging task in practice for the BMI user to switch among multiple mental states constantly. It is much easier to implement a 2-class based BMI, but it lacks sufficient flexibilities for controlling the robotic arm. Therefore, the hybrid Gaze-BMI is used in our study: the user's gaze points on the monitor are provided by the eye-tracking for the object selection, and the movement intention of the user can be detected by the BMI for confirming the selected object or initiating the control command to be executed on the selected object.

### Limitations and Future Work

One of the drawbacks of our study is that AR feedback is provided to the subjects via the computer monitor. It will reduce the hommization of this system and limit the scope of application to communication with the assistive devices via the computer monitor. Besides, we are also aware that patients may interact with various objects with different size and colors in activities of daily living, while the object manipulated in this study are of the same size. Besides, the AR feedback in our paper is based on the difference between the width of the objects and the gripper aperture, which may limit the usability of this method in activities of daily living.

The purpose of our study is to find out whether the hybrid Gaze-BMI user can benefit from AR feedback to perform the closed-loop control in the grasping and lifting tasks. Such a functional ability will be enhanced with the following improvements in our future work. Firstly, the ponderous computer monitor can be replaced by the wearable AR glasses integrated with eye tracking to increase the flexibilities and the scope of application. Secondly, the gripper with pressure sensors will be used to monitor the grasping status, and the real force generated in the contacting phrase will be presented to the user using AR techniques. Thirdly, the participants in this study are all healthy individuals, the feasibility of this method will be evaluated on the patients with motor impairments after stroke. Lastly, the performance of the proposed system will be integrated with other kinds of feedback interfaces, such as the haptic feedback, the auditory feedback, and so on.

In addition, the hybrid Gaze-BMI and the proposed AR feedback method for the assistive robot used in our paper can be seamlessly applied for the rehabilitation robot. For example, patients use eye gaze to indicate a desired position in a real environment setting, the robotic arm exoskeleton can be used to assist the patients to perform the reaching movement along online human-like generated trajectories when the self-initiation movement intention is detected with BMI. Besides, the wearable AR glasses can be exploited for the user to provide AR feedback for the operation status in order to implement an effective closedloop control.

### CONCLUSION

In this paper, we have proposed a novel AR guiding assistance for closing the hybrid Gaze-BMI based robotic arm control loop. The subjects are trained to reach, grasp, lift, deliver and release an object while avoiding the obstacle in the workspace, by operating a robotic arm with the hybrid Gaze-BMI. Instead of perceiving the current states of the tasks by the visual inspection only, the AR interface has been established in the real scene from the workspace to feedback the current gripper status for the subjects. The hybrid Gaze-BMI users are instructed to rely on the AR feedback information while accomplishing the objects manipulation tasks.

### REFERENCES


The experimental evaluation of the complete setup was conducted with eight healthy subjects. The average BMI classification accuracy across the subjects is 85.16 ± 4.83%. The number of trigger commands used for controlling the robotic arm to grasp and lift objects with AR feedback has reduced significantly compared to that without AR feedback, and the height gaps of the gripper in the lifting process have decreased more than 50% compared to those trials with normal visual inspection only. The results have revealed that the hybrid Gaze-BMI user can benefit from the information provided by the proposed AR interface, improving the efficiency and reducing the cognition load during the hybrid Gaze-BMI controlled grasping and lifting processes.

### AUTHOR CONTRIBUTIONS

HZ and YW designed the study, analyzed the data and wrote the manuscript. CW and PJ set up the experiment platform, BX and LZ performed the experiment. AS, JL, HL, and PW were involved in critical revision of the manuscript. All authors read and approved the final manuscript.

### FUNDING

Research supported by the National Key Research and Development Program of China (No. 2016YFB1001302), the National Nature Science Foundation of China (No. 61673105, No. 91648206, No. 61325018, No. 61673114, No. 61403080, No. 61773219), the Aeronautical Science Foundation of China (No. 20141969010) and the Fundamental Research Funds for the Central Universities (No. 2242015R30030).

### ACKNOWLEDGMENTS

The authors would like to thank all the volunteers who participated in the experiments.


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2017 Zeng, Wang, Wu, Song, Liu, Ji, Xu, Zhu, Li and Wen. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

*Alireza Haji Fathaliyan† , Xiaoyu Wang† and Veronica J. Santos\**

*Biomechatronics Laboratory, Mechanical and Aerospace Engineering, University of California, Los Angeles, Los Angeles, CA, United States*

#### *Edited by:*

*Matteo Bianchi, Università degli Studi di Pisa, Italy*

#### *Reviewed by:*

*Dimitrios Kanoulas, Fondazione Istituto Italiano di Technologia, Italy Sunil L. Kukreja, National University of Singapore, Singapore Stefanos Nikolaidis, Carnegie Mellon University, United States*

> *\*Correspondence: Veronica J. Santos vjsantos@ucla.edu*

*† These authors have contributed equally to this work.*

#### *Specialty section:*

*This article was submitted to Bionics and Biomimetics, a section of the journal Frontiers in Robotics and AI*

*Received: 01 December 2017 Accepted: 01 March 2018 Published: 04 April 2018*

#### *Citation:*

*Haji Fathaliyan A, Wang X and Santos VJ (2018) Exploiting Three-Dimensional Gaze Tracking for Action Recognition During Bimanual Manipulation to Enhance Human–Robot Collaboration. Front. Robot. AI 5:25. doi: 10.3389/frobt.2018.00025*

Human–robot collaboration could be advanced by facilitating the intuitive, gaze-based control of robots, and enabling robots to recognize human actions, infer human intent, and plan actions that support human goals. Traditionally, gaze tracking approaches to action recognition have relied upon computer vision-based analyses of two-dimensional egocentric camera videos. The objective of this study was to identify useful features that can be extracted from three-dimensional (3D) gaze behavior and used as inputs to machine learning algorithms for human action recognition. We investigated human gaze behavior and gaze–object interactions in 3D during the performance of a bimanual, instrumental activity of daily living: the preparation of a powdered drink. A marker-based motion capture system and binocular eye tracker were used to reconstruct 3D gaze vectors and their intersection with 3D point clouds of objects being manipulated. Statistical analyses of gaze fixation duration and saccade size suggested that some actions (pouring and stirring) may require more visual attention than other actions (reach, pick up, set down, and move). 3D gaze saliency maps, generated with high spatial resolution for six subtasks, appeared to encode action-relevant information. The "gaze object sequence" was used to capture information about the identity of objects in concert with the temporal sequence in which the objects were visually regarded. Dynamic time warping barycentric averaging was used to create a population-based set of characteristic gaze object sequences that accounted for intra- and inter-subject variability. The gaze object sequence was used to demonstrate the feasibility of a simple action recognition algorithm that utilized a dynamic time warping Euclidean distance metric. Averaged over the six subtasks, the action recognition algorithm yielded an accuracy of 96.4%, precision of 89.5%, and recall of 89.2%. This level of performance suggests that the gaze object sequence is a promising feature for action recognition whose impact could be enhanced through the use of sophisticated machine learning classifiers and algorithmic improvements for real-time implementation. Robots capable of robust, realtime recognition of human actions during manipulation tasks could be used to improve quality of life in the home and quality of work in industrial environments.

Keywords: action recognition, bimanual manipulation, eye tracking, gaze fixation, gaze object sequence, gaze saliency map, human–robot collaboration, instrumental activity of daily living

## INTRODUCTION

Recognition of human motion has the potential to greatly impact a number of fields, including assistive robotics, human–robot interaction, and autonomous monitoring systems. In the home, recognition of instrumental activities of daily living (iADLs) could enable an assistive robot to infer human intent and collaborate more seamlessly with humans while also reducing the cognitive burden on the user. A wheelchair-mounted robot with such capabilities could enhance the functional independence of wheelchair users with upper limb impairments (Argall, 2015). During bimanual iADLs, humans rely heavily on vision to proactively gather task-relevant visual information for planning (Johansson et al., 2001). For example, task-relevant information for manipulation could include the three-dimensional (3D) location of an object as well as its structure-related and substance-related properties, such as shape and weight, respectively (Lederman and Klatzky, 1987). Saccades typically precede body movement (Land et al., 1999) and reflect one's strategy for successful completion of a task.

The relationships between human vision, planning, and intent have inspired roboticists to adopt similar vision-based principles for planning robot movements and to use human gaze tracking for the intuitive control of robot systems. For instance, gaze fixation data collected during the human navigation of rocky terrain have been used to inspire the control of bipedal robots, specifically for the identification and selection of foot placement locations during traversal of rough terrain (Kanoulas and Vona, 2014). Human eye tracking data have also been used in the closed loop control of robotic arms. Recently, Li et al. (2017) demonstrated how 3D gaze tracking could be used to enable individuals with impaired mobility to control a robotic arm in an intuitive manner. Diverging from traditional gaze tracking approaches that leverage two-dimensional (2D) egocentric camera videos, Li et al. presented methods for estimating object location and pose from gaze points reconstructed in 3D. A visuomotor grasping model was trained on gaze locations in 3D along with grasp configurations demonstrated by unimpaired subjects. The model was then used for robot grasp planning driven by human 3D gaze.

In this work, we consider how human eye movements and gaze behavior may encode intent and could be used to inform or control a robotic system for the performance of bimanual tasks. Unlike repetitive, whole-body motions such as walking and running, iADLs can be challenging for autonomous recognition systems for multiple reasons. For instance, human motion associated with iADLs is not always repetitive, often occurs in an unstructured environment, and can be subject to numerous visual occlusions by objects being manipulated as well as parts of the human body. Prior studies on recognition of iADLs often applied computer vision-based approaches to images and videos captured *via* egocentric cameras worn by human subjects. Video preprocessing methods typically consist of first subtracting the foreground and then detecting human hands, regions of visual interest, and objects being manipulated (Yi and Ballard, 2009; Fathi et al., 2011, 2012; Behera et al., 2014; Nguyen et al., 2016).

A variety of methods have been presented for feature extraction for use in machine learning classifiers. In some studies, hand–hand, hand–object, and/or object–object relationships have been leveraged (Yu and Ballard, 2002; Fathi et al., 2011; Behera et al., 2012). The state of an object (e.g., open vs. closed) has been used as a feature of interest (Fathi and Rehg, 2013). Another study leveraged a saliency-based method to estimate gaze position, identify the "gaze object" (the object of visual regard), and recognize an action (Matsuo et al., 2014). Other studies have employed eye trackers in addition to egocentric cameras; researchers have reported significant improvements in action recognition accuracy as a result of the additional gaze point information (Yu and Ballard, 2002; Fathi et al., 2012).

In the literature, the phrase "saliency map" has been used to reference a topographically arranged map that represents visual saliency of a corresponding visual scene (Itti et al., 1998). In this work, we will refer to "gaze saliency maps" as heat maps that represent gaze fixation behaviors. 2D gaze saliency maps have been effectively employed for the study of gaze behavior while viewing and mimicking the grasp of objects on a computer screen (Belardinelli et al., 2015). Belardinelli et al. showed that gaze fixations are distributed across objects during action planning and can be used to anticipate a user's intent with the object (e.g., opening vs. lifting a teapot). While images of real world objects were presented, subjects were only instructed to mimic actions. In addition, since such 2D gaze saliency maps were constructed from a specific camera perspective, they cannot be easily generalized to other views of the same object. One of the objectives of this work was to construct gaze saliency maps in 3D that could enable gaze behavior analyses from a variety of perspectives. Such 3D gaze saliency maps could be mapped to 3D point clouds trivially obtained using low-cost RGB-D computer vision hardware, as is common in robotics applications. Furthermore, given that all manipulation tasks occur in three dimensions, 3D gaze saliency maps could enable additional insights into action-driven gaze behaviors. Although our experiments were conducted in an artificial lab setting using an uncluttered object scene, the experiment enabled subjects to perform actual physical manipulations of the object as opposed to only imagining or mimicking the manipulations, as in Belardinelli et al. (2015).

The primary objective of this study was to extract and rigorously evaluate a variety of 3D gaze behavior features that could be used for human action recognition to benefit human–robot collaborations. Despite the increasing use of deep learning techniques for end-to-end learning and autonomous feature selection, in this work, we have elected to consider the potential value of independent features that could be used to design action recognition algorithms in the future. In this way, we can consider the physical meaning, computational expense, and value added on a feature-by-feature basis. In Section "Materials and Methods," we describe the experimental protocol, methods for segmenting actions, analyzing eye tracker data, and constructing 3D gaze vectors and gaze saliency maps. In Section "Results," we report trends in eye movement characteristics and define the "gaze object sequence." In Section "Discussion," we discuss observed gaze behaviors and the potential and practicalities of using gaze saliency maps and gaze object sequences for action recognition. Finally, in Section "Conclusion," we summarize our contributions and suggest future directions.

### MATERIALS AND METHODS

### Experimental Protocol

This study was carried out in accordance with the recommendations of the UCLA Institutional Review Board with written informed consent from all subjects. All subjects gave written informed consent in accordance with the Declaration of Helsinki. The protocol was approved by the UCLA Institutional Review Board. A total of 11 subjects (nine males, two females; aged 18–28 years) participated in the study, whose preliminary results were first reported in Haji Fathaliyan et al. (2017). According to a handedness assessment (Zhang, 2012) based on the Edinburgh Handedness Inventory (Oldfield, 1971), two subjects were "pure right handers," seven subjects were "mixed right handers," and two subjects were "neutral."

Subjects were instructed to perform a bimanual tasks involving everyday objects and actions. In this work, we focus on one bimanual task that features numerous objects and subtasks: the preparation of a powdered drink. To investigate how the findings of this study may generalize to other iADL tasks, we plan to apply similar analyses to other bimanual tasks in the future. The objects for the drink preparation task were selected from the benchmark Yale-CMU-Berkeley (YCB) Object Set (Calli et al., 2015b): mug, spoon, pitcher, and pitcher lid. The actions associated with these objects were reach for, pick up, set down, move, stir, scoop, drop, insert, and pour.

Subjects were instructed to repeat the task four times with a 1 min break between each trial. The YCB objects were laid out and aligned on a table (adjusted to an ergonomic height for each subject) as shown in **Figure 1**. The experimental setup was reset prior to each new trial. Subjects were instructed to remove a pitcher lid, stir the contents of the pitcher, which contained water only (the powdered drink was imagined), and transfer the drink from the pitcher to the mug in two different ways. First, three spoonfuls of the drink were to be transferred from the pitcher to the mug using a spoon. Second, the pitcher lid was to be closed to enable to pouring of the drink from the pitcher to the mug until the mug was filled to two-third of its capacity. In order to standardize the instructions provided to subjects, the experimental procedure was demonstrated *via* a prerecorded video.

Subjects wore an ETL-500 binocular, infrared, head-mounted eye tracker (ISCAN, Inc., Woburn, MA, USA) that tracked their visual point of regard, with respect to a head-mounted egocentric scene camera, at a 60 Hz sampling frequency. Calibration data suggest that the accuracy and precision of the eye tracker are approximately 1.43° and 0.11°, respectively. Six T-Series cameras sampled at 100 Hz and a Basler/Vue video camera (Vicon, Culver City, CA, USA) were used to track the motion of the subjects and YCB objects (**Figure 1**). Retroreflective markers were attached to the YCB objects, eye tracker, and subjects' shoulders, upper arms, forearms, and hands (dorsal aspects). Visual distractions were minimized through the use of a blackout curtain that surrounded the subject's field of view.

Figure 1 | (A) Each subject was seated in the motion capture area. A blackout curtain was used to minimize visual distractions. (B) The subject wore a head-mounted eye tracker. Motion capture markers were attached to the Yale-CMU-Berkeley objects, the eye tracker, and subjects' upper limbs. Each trial used the object layout shown. (C) Retroreflective markers were placed on a mug, spoon, pitcher, pitcher lid, and table. These objects will be referenced using the indicated color code throughout this manuscript. The subject shown in panels (A,B) has approved of the publication of these images.

### Action Segmentation: Task, Subtask, and Action Unit Hierarchy

Land et al. (1999) reported on gaze fixation during a tea-making task. In that work, a hierarchy of four activity levels was considered: "make the tea" (level 1), "prepare the cups" (level 2), "fill the kettle" (level 3), and "remove the lid" (level 4). Spriggs et al. (2009) reported on a brownie-making task and divided the task into 29 actions, such as "break one egg" and "pour oil in cup." Adopting a similar approach as these prior works, we defined an action hierarchy using a task–subtask–action unit format (**Table 1**). Subtasks were defined similar to Land et al.'s "4th level activities" while the action units were defined according to hand and object kinematics. All subjects performed all six subtasks listed in **Table 1**, but not all subjects performed all action units. For example, a couple of subjects did not reach for the pitcher during Subtask 2 ("move spoon into pitcher").

The start and end time of each action unit were identified according to hand and object kinematics and were verified by observing the egocentric video recorded from the eye tracker. For example, the angle of the spoon's long axis with respect to the pitcher's long axis and the repetitive pattern of the angle were used to identify the beginning and end of the action unit "stir inside pitcher" (**Figure 2**).

### Gaze Fixation and Saccade Labeling

Saccadic movements of the eye were discovered by Edwin Landott in 1890 while studying eye movements during reading (Kandel et al., 2000). According to Kandel et al., saccadic eye movements are characterized by "jerky movements followed by a short pause" or "rapid movements between fixation points." In our study, saccades were detected using the angular velocity of the reconstructed gaze vector (see 3D Gaze Vector and Gaze

Table 1 | Six subtasks (bold) were defined for the task of making a powdered drink; action units were defined for each subtask according to hand and object kinematics.


Figure 2 | The repetitive nature of the spoon's kinematics with respect to the pitcher was used to identify the start and end of the action unit "stir inside pitcher." Although the spoon was not manipulated until approximately 6 s had elapsed in the representative trial shown, the full trial is provided for completeness.

Saliency Map Construction) and intervals between saccades that exceeded 200 ms were labeled as gaze fixations, as in Nyström and Holmqvist (2010). As described previously, the beginning and end of action units were defined based on hand and object kinematics. A heuristic approach, as outlined in **Figure 3**, was used to associate gaze fixation periods and saccades in the eye tracker data with action units. A given gaze fixation period was associated with a specific action unit if the gaze fixation period overlapped with the action unit period ranging from 0.3 to 0.7 *T*, where *T* was the duration of the specific action unit. A given saccade was associated with a specific action unit if the saccade occurred during the action unit period ranging from −0.2 to 0.8 *T*. Saccade to action unit associations were allowed prior to the start of the action unit (defined from hand and object kinematics) based on reports in the literature that saccades typically precede related motions of the hand (Land et al., 1999; Johansson et al., 2001). The results of the approach presented in **Figure 3** were verified through careful comparison with egocentric scene camera videos recorded by the eye tracker.

### 3D Gaze Vector and Gaze Saliency Map Construction

The eye tracker provided the 2D pixel coordinates of the gaze point with respect to the image plane of the egocentric scene camera. The MATLAB Camera Calibration Toolbox (Bouguet, 2015; The MathWorks, 2017) and a four-step calibration procedure were used to estimate the camera's intrinsic and extrinsic parameters. These parameters enabled the calculation of the pose of the 2D image plane in the 3D global reference frame. The origin of the camera frame was located using motion capture markers attached to the eye tracker. The 3D gaze vector was reconstructed by connecting the origin of the camera frame with the gaze point's perspective projection onto the image plane.

Using the reconstructed 3D gaze vector, we created 3D gaze saliency maps by assigning RGB colors to the point clouds obtained from 3D scans of the YCB objects. The point cloud for the mug was obtained from Calli et al. (2015a). The point clouds for the pitcher, pitcher lid, and spoon were scanned with a structured-light 3D scanner (Structure Sensor, Occipital, Inc., CA, USA) and custom turntable apparatus. This was necessary because the YCB point cloud database only provides point clouds for the pitcher lid assembly and because the proximal end of the spoon was modified for the application of motion capture markers (**Figure 1C**). Colors were assigned to points based on the duration of their intersection with the subject's 3D gaze vector. In order to account for eye tracker uncertainty, colors were assigned to a 5 mm-radius spherical neighborhood of points, with points at the center of the sphere (intersected by the 3D gaze vector) being most intense. Color intensity for points within the sphere decreased linearly as the distance from the center of the sphere increased. Both gaze fixation and saccades were included during RGB color assignment. For each subtask, the RGB color intensity maps were summed across subjects and then normalized to the [0, 1] range, with 0 as black and 1 as red. The normalization was performed with all task-relevant objects considered simultaneously and not on an object-specific basis. This enabled the investigation of the relative visual importance of each object for each subtask.

### RESULTS

### Eye Movements: Gaze Fixation Duration and Saccade Size

Gaze fixation duration and saccade size have previously been identified as important features for gaze behaviors during iADLs. As in Morrison and Rayner (1981), we use "saccade size" to refer to the angle spanned by a single saccade. Land et al. (1999) reported overall trends and statistics for the entire duration of a tea-making task. However, information about dynamic changes in gaze behavior is difficult to extract and analyze when eye tracker data are convolved over a large period of time. In order to address eye movements at a finer level of detail, we investigated trends in gaze fixation duration and saccade size at the action unit level. Gaze fixation duration data were normalized by summing the durations of gaze fixation periods that belonged to the same action unit and then dividing by the total duration of that action unit. This normalization was performed to minimize the effect of action unit type, such as reaching vs. stirring, on gaze fixation duration results. Gaze fixation duration and saccade size were analyzed according to groupings based on six common action unit verbs: "reach," "pick up," "set down," "move," "pour," and "stir" (**Figure 4**). "Drop" and "insert" were excluded, as they occurred infrequently and their inclusion would have further reduced the power of the statistical tests.

We conducted two ANOVA tests with a significance level of α = 0.05. One test compared the distributions of gaze fixation duration across the six action unit verb groups while the other test compared the distributions of saccade size. In both cases, the ANOVA resulted in *p* < 0.001. Thus, *post hoc* pairwise *t*-tests were conducted to identify which verb groups were significantly different (**Table 2**). A Bonferroni correction was additionally applied (α = 0.05/*k*, where *k* = 15, the total number of pairwise comparisons) to avoid type I errors when performing the *post hoc* pairwise comparisons. It was found that the average gaze fixation durations for "pour" and "stir" were significantly greater than those of other verbs (**Figure 4A**). Saccade sizes for "move" and "stir" were significantly different from those of other verbs (**Figure 4B**). Saccade sizes for "move" were significantly larger than those of other verbs while those for "stir" were significantly smaller (**Figure 5**).

### 3D Gaze Saliency Maps and Gaze Object Percentages

The 3D gaze saliency map for each object is shown for each of the six subtasks in **Figure 5**. We use "gaze object" to refer to the object

Figure 4 | Box and whisker plots are shown for each of the six action unit verb groups for (A) normalized gaze fixation duration and (B) saccade size. The tapered neck of each box marks the median while the top and bottom edges mark the first and third quantiles. The whiskers extend to the most extreme data points that are not considered outliers (black dots). For normalized gaze fixation duration, both "pour" and "stir" were statistically significantly different from the other action unit verb groups, as indicated by underlines. For saccade size, both "move" and "stir" were statistically significantly different from the other action unit verb groups.

Table 2 | The lower left triangle of the table (shaded in gray) summarizes *p*-values for *t*-tests of average normalized gaze fixation duration for different pairs of action unit verbs while the upper right triangle represents *p*-values for *t*-tests with regards to saccade size.


*Asterisks indicate the t-tests that were statistically significant for a Bonferroni-corrected*  α =*0.003.*

that is intersected by the reconstructed 3D gaze vector. This 3D approach is analogous to the use of 2D egocentric camera videos to identify the gaze object defined as the "object being fixated by eyes" or the "visually attended object" (Yi and Ballard, 2009). In the case that multiple objects were intersected by the same gaze vector, we selected the closest object to the subject as the gaze object. We defined the gaze object percentage as the amount of time, expressed as a percent of a subtask, that an object was intersected by a gaze vector. Gaze object percentages, averaged across all 11 subjects, are presented for each of the six subtasks in pie chart form (**Figure 5**). Although the table in the experiment setup was never manipulated, during some subtasks, the gaze object percentage for the table exceeded 20% for subtasks that included action units related to "set down."

## Recognition of Subtasks Based on Gaze Object Sequences

#### The Gaze Object Sequence

In order to leverage information about the identity of gaze objects in concert with the sequence in which gaze objects were visually regarded, we quantified the gaze object sequence for use in the automated recognition of subtasks. The concept of a gaze object sequence has been implemented previously for human action recognition, but in a different way. Yi and Ballard (2009) performed action recognition with a dynamic Bayesian network having four hidden nodes and four observation nodes. One of the hidden nodes was the true gaze object and one of the observation nodes was the estimated gaze object extracted from 2D egocentric camera videos. In this work, we define the gaze object sequence as being comprised of an (*M* × *N*) matrix, where *M* is the number of

objects involved in the manipulation task and *N* is the total number of instances (frames sampled at 60 Hz) that at least one of the *M* objects was visually regarded, whether through gaze fixation or saccade (**Figure 6C**). Each of the *M* = 5 rows corresponds to Haji Fathaliyan et al. 3D Gaze-Based Action Recognition

a specific object. Each of the *N* columns indicates the number of times each object was visually regarded within a sliding window consisting of 10 frames (**Figures 6A,B**).

A sliding window was used to filter the raw gaze object sequence to alleviate abrupt changes of values in the matrix. The size of the sliding window was heuristically selected to be large enough to smooth abrupt changes in the object sequence that could be considered as noise, but also small enough so as not to disregard major events within its duration. In preliminary analyses, this sliding window filtration step was observed to improve recognition accuracy.

### Creating a Library of Characteristic Gaze Object Sequences

Intra- and inter-subject variability necessitate analyses of human subject data that account for variations in movement speed and style. In particular, for pairs of gaze object sequences having different lengths, the data must be optimally time-shifted and stretched prior to comparative analyses. For this task, we used dynamic time warping (DTW), a technique that has been widely used for pattern recognition of human motion, such as gait recognition (Boulgouris et al., 2004) and gesture recognition (Gavrila and Davis, 1995).

Dynamic time warping compares two time-dependent sequences *X* and *Y*, where *X* ∈*S U*<sup>×</sup> and *Y* ∈*S V*<sup>×</sup> . A warping path *W p i i* = … [ , *p p i ij i* … *p Ki* 1 2 ,,,, ] defines an alignment between pairs of elements in *X* and *Y* by matching element(s) of *X* to element(s) of *Y*. For example, *pij* = (*u*, *v*) represents the matched pair of *x<sup>u</sup>* and *yv*. If the warping path is optimized to yield the lowest sum of Euclidean distances between the two sequences, the DTW distance between the two sequences *X* and *Y* can be defined as the following:

$$\text{DTW}\left(X, Y\right) = \min\_{W\_i} \left\{ d\left(W\_i\right) \mid W\_i \in \left\langle W\_1, W\_2, \dots, W\_L \right\rangle \right\},\tag{1}$$

where *d Wi p j K ij i* ( ) = = ∑1 and *pij* = *x y u v* − <sup>2</sup> .

In order to identify a characteristic gaze object sequence for each subtask, we employed a global averaging method called dynamic time warping barycenter averaging (DBA), which performs the DTW and averaging processes simultaneously. This method uses optimization to iteratively refine a DBA (average) sequence until it yields the smallest DTW Euclidean distance (see Recognition of Subtasks Using DTW Euclidean Distances) with respect to each of the input sequences being averaged (Petitjean et al., 2011). The gaze object sequences were averaged across all trials for all subjects for each subtask using an open source MATLAB function provided by the creators of the DBA process (Petitjean, 2016). A total of 43 trials (4 repetitions per each of 11 subjects, less 1 incomplete trial) were available for each subtask. **Figure 7** shows visual representations of the DBA gaze object sequence for each of the six subtasks.

#### Recognition of Subtasks Using DTW Euclidean Distances

Traditionally, the Euclidean distance is used as a metric for similarity between two vectors. However, the Euclidean distance

alone is not an accurate measure of similarity for time series data (Petitjean et al., 2011). Here, we use the "DTW Euclidean distance," which is calculated as the sum of the Euclidean distances between corresponding points of two sequences. The DTW process minimizes the sum of the Euclidean distances, which enables a fair comparison of two sequences. The smaller the DTW Euclidean distance, the greater the similarity between the two sequences. A simple way to associate a novel gaze object sequence with a specific subtask is to first calculate the DTW Euclidean distance between the novel sequence and a characteristic sequence (generated using the DBA process) for each of the six candidate subtasks and to then select the subtask label that results in the smallest DTW Euclidean distance.

normalized for visualization.

**Figure 8** shows a novel gaze object sequence and its DTW Euclidean distance with respect to each of the candidate DBA sequences (one for each of six subtasks). The DTW Euclidean distance is reported as a function of the (equal) elapsed times for the novel and DBA gaze object sequences. This enables us to relate recognition accuracy to the percent of a subtask that has elapsed and to comment on the feasibility of real-time action recognition. For instance, for Subtask 4 ("transfer water from pitcher to mug using spoon"), the DTW Euclidean distance between the novel gaze object sequence and the correct candidate DBA sequence does not clearly separate itself from the other five DTW distances until 30% of the novel gaze object sequence has elapsed for the specific case shown (**Figure 8**). Subtask recognition accuracy generally increases as the elapsed sequence time increases. **Figure 8** illustrates how a primitive action recognition approach could be used to label a subtask based on a gaze object sequence alone. However, only one representative novel gaze object sequence was shown as an example.

In order to address the accuracy of the approach as applied to all 43 gaze object sequences, we used a leave-one-out approach. First, one gaze object sequence was treated as an unlabeled, novel sequence. Dynamic time warping barycenter averaging

Figure 8 | (A) A representative novel gaze object sequence is shown. The colors in the figure correspond to the color-coded objects in Figure 1C. (B) A DBA gaze object sequence is shown for Subtask 4, which is the correct subtask label for the novel gaze object sequence shown in panel (A). (C) The DTW Euclidean distance is shown for the comparisons of a novel gaze object sequence and the DBA sequence for each of the six subtasks. The DTW distance was calculated using equal elapsed times for the novel and DBA sequences. The lowest DTW distance would be used to apply a subtask label. Subtask recognition accuracy generally increases as the elapsed sequence time increases.

was applied to the remaining sequences. The DTW Euclidean distance was calculated between the novel and candidate DBA sequences, and the pair with the smallest DTW distance was used to label the novel sequence. This process was repeated for each of the gaze object sequences. The DTW distance was calculated using equal elapsed times for the novel and DBA sequences.

The resulting recognition accuracy, precision, and recall for each subtask are reported in **Figure 9** as a function of the percent of the subtask that has elapsed. Accuracy represents the fraction of sequences that are correctly labeled. Precision represents the fraction of identified sequences that are relevant to Subtask *i*. Recall represents the fraction of relevant sequences that are identified (Manning et al., 2008)

$$\text{accuracy}\_{i} = \frac{\text{TP}\_{i} + \text{TN}\_{i}}{\text{TP}\_{i} + \text{TN}\_{i} + \text{FP}\_{i} + \text{FN}\_{i}},\tag{2}$$

$$\text{precision}\_{i} = \frac{\text{TP}\_{i}}{\text{TP}\_{i} + \text{FP}\_{i}},\tag{3}$$

$$\text{recall}\_{i} = \frac{\text{TP}\_{i}}{\text{TP}\_{i} + \text{FN}\_{i}}.\tag{4}$$

TP*i*, TN*i*, FP*i*, and FN*i* represent the number of true positive, true negative, false positive, and false negative sequences when attempting to identify all sequences associated with Subtask *i*. For example, consider the task of identifying the 43 sequences relevant to Subtask 1 out of the total of (43\*6) unlabeled sequences. Using all sequence data, at 100% elapsed time of a novel gaze object sequence, the classifier correctly labeled 36 of the 43 relevant sequences as Subtask 1, but also labeled 10 of the

(A–F). The characteristic gaze object sequence is shown above each subplot. The colors in the sequence correspond to the objects shown in Figure 1C.

(43\*5) irrelevant sequences as Subtask 1. In this case, TP1 = 36, TN1 = 205, FP1 = 10, and FN1 = 7. Using Eqs 2–4, this results in an accuracy of 93.4%, precision of 78.2%, and recall of 83.7% for Subtask 1, as shown in **Figure 9A**.

**Figure 10** shows a confusion matrix that summarizes the subtask labeling performance of our simple action recognition algorithm at 100% of the elapsed time for the novel and DBA gaze object sequences. Predictions of subtask labels (columns) are compared to the true subtask labels (rows). Consider again the task of identifying the 43 sequences relevant to Subtask 1. TP1 is shown as the first diagonal element in the confusion matrix (row 1, column 1). FP1 and FN1 are the sum of off-diagonal elements in the first column and first row, respectively.

### DISCUSSION

### Gaze Fixation Duration and Saccade Size May Reflect Differences in Visual Attention

Eye movements were investigated at the action unit level through gaze fixation duration and saccade size. For gaze fixation duration, both "pour" and "stir" were statistically significantly different from the other action unit verb groups (**Figure 4A**). The median normalized gaze fixation duration values for "pour" and "stir" were, respectively, 41 and 33% greater than the largest median duration value of the "reach," "pick up," "set down," and "move" verb groups (36% for "move"). The lengthier gaze fixation durations could be due to the fact that pouring and stirring simply took longer than the other movements. The trends could also indicate that more visual attention is required for successful performance of pouring and stirring. For instance, pouring without spilling and stirring without splashing might require greater manipulation accuracy than reaching, picking up, setting down, or moving an object. However, based on the data collected, it is unknown whether

Figure 10 | The confusion matrix is shown for 100% of the elapsed time of a novel gaze object sequence for each subtask. Predicted subtask labels (columns) are compared to the true subtask labels (rows). Each subtask has a total of 43 relevant sequences and (43\*5) irrelevant sequences. Each shaded box lists the number of label instances and parenthetically lists the percentage of those instances out of 43 relevant subtasks.

subjects were actively processing visual information during these fixation periods. Gaze fixation durations could also be affected by object properties, such as size, geometry, color, novelty, etc. For instance, fixation durations might be longer for objects that are fragile, expensive, or sharp as compared to those for objects that are durable, cheap, or blunt. The effects of object properties on gaze fixation duration and saccade size require further investigation.

For saccade size, both "move" and "stir" were statistically significantly different from the other action unit verb groups (**Figure 4B**). The relatively large saccade size for "move" was likely a function of the distance by which the manipulated objects were moved during the experimental task. The relatively small saccade size for "stir" (4.7° ± 2.7°) could be due to the small region associated with the act of stirring within a pitcher and the fact that subjects did not follow the cyclic movements of the spoon with their gaze during stirring.

The concept of "quiet eye," originally introduced in the literature with regards to the cognitive behaviors of elite athletes, has been used to differentiate between expert and novice surgeons (Harvey et al., 2014). Quiet eye has been defined as "the final fixation or tracking gaze that is located on a specific location or object in the visuomotor workspace within 3° of the visual angle for ≥100 ms" (Vickers, 2007). It has been hypothesized that quiet eye is a reflection of a "slowing down" in cognitive planning (not body movement speed) that occurs when additional attention is paid to a challenging task (Moulton et al., 2010). Based on the gaze fixation duration trends (**Figure 4A**), one might hypothesize that pouring and stirring require additional attention. Yet, "stir" was the only verb group that exhibited a small saccade size in the range reported for quiet eye. We are not suggesting that stirring is a special skill that can only be performed by experts; we would not expect a wide range of skill sets to be exhibited in our subject pool for iADL. Nonetheless, it could be reasoned that certain action units may require more visual attention than others and that gaze fixation and saccade size could assist in recognition of such action units employed during everyday tasks.

### Gaze Saliency Maps Encode Action-Relevant Information at the Subtask and Action Unit Levels

Gaze saliency maps at the subtask level can be used to represent gaze fixation distribution across multiple objects. The gaze saliency maps for the six subtasks (**Figure 5**) supported Hayhoe and Ballard's finding that gaze fixation during task completion is rarely directed outside of the objects required for the task (Hayhoe and Ballard, 2005). Considering Subtask 4, ("transfer water from pitcher to mug using spoon"), the objects comprising the majority of the gaze object percentage pie chart (**Figure 5D**) were grasped and manipulated (spoon) or were directly affected by an action being performed by a manipulated object (pitcher and mug). While the table was not manipulated, it was often affected by action units that required the picking up or setting down of an object, as for the pitcher lid, spoon, and pitcher in Subtasks 1, 2, and 6 (**Figures 5A,B,F**), respectively. The gaze fixation percentage for the table was dwarfed by the importance of other objects in Subtasks 4 and 5 (**Figures 5D,E**).

In some cases, a gaze saliency map could be easily associated with a subtask. For instance, gaze saliency was uniquely, simultaneously intense on the spoon bowl and tip, inner wall of the mug, and inner wall of the pitcher for Subtask 4 ("transfer water from pitcher to mug using spoon") (**Figure 5D**). In other cases, differences between gaze saliency maps were subtle. For example, the gaze saliency maps were quite similar for the inverse subtasks "remove pitcher lid" and "replace pitcher lid" (**Figures 5A,E**). In both cases, gaze saliency was focused near the handle of the pitcher lid and the upper rim of the pitcher. However, gaze fixation was slightly more intense near the pitcher spout for Subtask 5 ("replace pitcher lid") because subjects spent time to carefully align the slots in the pitcher lid with the spout for the "pour liquid into mug" Subtask 6 that was to immediately follow.

Likewise, the gaze saliency maps for Subtask 2 ("move spoon into pitcher") and Subtask 3 ("stir inside pitcher") were distinguished only by the subtle difference in gaze fixation distribution on the spoon (**Figures 5B,C**). The diffuse and homogeneous distribution across the entirety of the spoon for Subtask 2 was contrasted by a focused intensity on the bowl of the spoon for stirring. This was because the "reach for," "pick up," and "move" action units performed with the spoon were summed over time to produce the gaze saliency map at the subtask level. Given that the details of each action unit's unique contribution to the saliency map becomes blurred by temporal summation, it is worth considering gaze saliency maps at a finer temporal resolution, at the action unit level. Due to the short duration of action units (approximately 1 s long), the gaze saliency maps at the action unit level only involve one object at a time. A few representative gaze saliency maps for different action units are shown in **Figure 11**. The RGB color intensity maps were summed across subjects and then normalized to the [0, 1] range, with 0 as black and 1 as red, according to the duration of the action unit.

Some gaze saliency maps could also be easily associated with specific action units. For instance, gaze saliency intensity was greatest at the top of the pitcher for the action unit "reach for pitcher," but greatest at the bottom for "set down pitcher" (**Figure 11C**). By contrast, the gaze saliency maps for the pitcher lid were similar for action units "pick up pitcher lid" and "insert pitcher lid into pitcher." Subtle differences were observed, such as more focused gaze intensity near the slots in the lid, in preparation for the "pour liquid into mug" Subtask 6 that was to immediately follow. Gaze saliency maps for different action units were also similar for the mug (**Figure 11A**), possibly due to its aspect ratio. Not only is the mug a relatively small object but also its aspect ratio from the subject's viewpoint is nearly one. During both "reach for mug" and "set down mug," gaze fixation was spread around the mug's centroid. This was surprising, as we had expected increased intensity near the mug's handle or base for the "reach" and "set down" action units, respectively, based on the findings of Belardinelli et al. (2015). There are a couple of possible explanations for this. First, the Belardinelli et al. study was conducted with a 2D computer display and subjects were instructed to mimic manipulative actions. In this work, subjects physically interacted with and manipulated 3D objects. It is also possible that subjects grasped the mug with varying levels of precision based on task requirements (or lack thereof). For instance,

a mug can be held by grasping its handle or its cylindrical body. Had the task involved a hot liquid, for example, perhaps subjects would have grasped and fixated their gaze on the handle of the mug for a longer period.

Although 3D gaze saliency maps are not necessarily unique for all subtasks and action units, it is likely that a combination of the gaze saliency maps for a subtask and its constituent action units could provide additional temporal information that would enable recognition of a subtask. While beyond the scope of this work, we propose that a sequence of gaze saliency maps over time could be used for action recognition. The time series approaches presented for the analysis of gaze object sequences could similarly be applied to gaze saliency map sequences.

### Practical Considerations and Limitations of Gaze Saliency Maps

If the dynamic tracking of 3D gaze saliency maps is to be practically implemented, one must address the high computational expense associated with tracking, accessing, and analyzing dense 3D point clouds. In this work, the 3D point clouds for the spoon and pitcher were comprised of approximately 3,000 and 20,000 points, respectively. At least two practical modifications could be made to the gaze saliency map representation. First, parametric geometric shapes could be substituted for highly detailed point clouds of rigid objects, especially if fine spatial resolution is not critical for action recognition. The use of a geometric shapes could also enable one to analytically solve for the intersection point(s) between the object and gaze vector. Second, gaze fixation can be tracked for a select subset of regions or segments, such as those associated with "object affordances," which describe actions that can be taken with an object (Gibson, 1977), or "grasp affordances," which are defined as "objectgripper relative configurations that lead to successful grasps" (Detry et al., 2009). Computational effort could then be focused on regions that are most likely to be task-relevant, such as the spout, rim, handle, and base of a pitcher. Additionally, techniques can be leveraged from computer-based 3D geometric modeling. For example, triangle meshes and implicit surfaces have been used for real-time rendering of animated characters (Leclercq et al., 2001). A similar approach could be used to simplify the 3D point clouds. In addition to tracking the shape and movement of an object, one could track the homogeneous properties (e.g., RGB color associated with gaze fixation duration) of patch elements of surfaces. The spatial resolution of each gaze saliency map could be tuned according to the task-relevant features of the object and reduced to the minimal needs for reliable action recognition.

One limitation of this work is that we cannot comment on the subject's true focal point or whether subjects were actively processing visual information. A gaze vector may pass through multiple objects, or even through materials that are not rigid objects (e.g., a stream of flowing water). We calculated the intersection points between a gaze vector and objects in its path and then treated the closest intersection point to the user as a gaze fixation point. This approach may not work if some of the task-relevant objects are transparent and subjects look through one object to visually attend to a more distant object. In this work, objects sometimes passed through the path of a stationary gaze vector, but may not have been the focus of active visual attention. For example, the gaze saliency map for Subtask 3 ("stir inside pitcher") displayed regions of greater intensity on both the bowl of the spoon and the inner wall of the pitcher (**Figure 5C**). However, the egocentric camera attached to the eye tracker revealed that the gaze fixation point remained near the water level line in the pitcher. Since the spoon was moved cyclically near the inner wall of the pitcher, in the same region as the surface of the water, the gaze fixation point alternated between the spoon and the pitcher. As a result, both the spoon and pitcher gaze saliency maps were affected. In one case, a subject's gaze fixation point was calculated as being located on the outer wall of the pitcher during stirring. This interesting case highlights the fact that a direct line of sight (e.g., to the spoon, water, or inner pitcher surface) may not be necessary for subtask completion, and mental imagery ("seeing with the mind's eye") may be sufficient (Pearson and Kosslyn, 2013).

Future work should address methods for enhancing the robustness of action recognition algorithms to occlusions. For example, if a gaze object is briefly occluded by a moving object that passes through the subject's otherwise fixed field of view, an algorithm could be designed to automatically disregard the object as noise to be filtered out. In addition, a more advanced eye tracker and/ or calibration process could be leveraged to estimate focal length. Focal length could be combined with 3D gaze vector direction to increase the accuracy of gaze object identification in cases, where the 3D gaze vector intersects multiple objects.

Human gaze behavior "in the wild" will differ to some (as yet unknown) extent as compared to the gaze behavior observed in our laboratory setting. Our use of black curtains and the provision of only task-relevant objects enabled the standardization of the experimental setup across subjects. However, this protocol also unrealistically minimized visual clutter, the presence of novel objects, and distractions to the subject. In a more natural setting, one's gaze vector could intersect with task-irrelevant objects in the scene. This would result in the injection of noise into the gaze object sequence, for example, and could decrease the speed and/ or accuracy of action recognition. Probabilistic modeling of the noise could alleviate this challenge.

### The Gaze Object Sequence Can Be Leveraged for Action Recognition to Advance Human–Robot Collaborations

During everyday activities, eye movements are primarily associated with task-relevant objects (Land and Hayhoe, 2001). Thus, identification of gaze objects can help to establish a context for specific actions. Fathi et al. (2012) showed that knowledge of gaze location significantly improves action recognition. However, action recognition accuracy was limited by errors in the extraction of gaze objects from egocentric camera video data (e.g., failing to detect objects or detecting irrelevant objects in the background), and gaze objects were not treated explicitly as features for action recognition. Moreover, model development for gaze-based action recognition is challenging due to the stochastic nature of gaze behavior (Admoni and Srinivasa, 2016). Using objects tagged with fiducial markers and gaze data from 2D egocentric cameras, Admoni and Srinivasa presented a probabilistic model for the detection of a goal object based on object distance from the center of gaze fixation. In this work, we propose to leverage 3D gaze tracking information about the identity of gaze objects in concert with the temporal sequence in which gaze objects were visually regarded to improve the speed and accuracy of automated action recognition.

In the context of human-robot collaboration, the gaze object sequence could be used as an intuitive, non-verbal control signal by a human operator. Alternatively, the gaze object sequence could be provided passively to a robot assistant that continuously monitors the state of the human operator and intervenes when the human requires assistance. A robot that could infer human intent could enable more seamless physical interactions and collaborations with human operators. For example, a robot assistant in a space shuttle could hand an astronaut a tool during a repair mission, just as a surgical assistant might provide support during a complicated operation. Maeda et al. (2014) introduced a probabilistic framework for collaboration between a semi-autonomous robot and human co-worker. For a box assembly task, the robot decided whether to hold a box or to hand over a screwdriver based on the movements of the human worker. As there were multiple objects involved in the task, the integration of the gaze object sequence into the probabilistic model could potentially improve action recognition accuracy and speed.

The practical demonstration of the usefulness of gaze object sequence is most likely to occur first in a relatively structured environment, such as that of a factory setting. Despite the unpredictability of human behavior, there are consistencies on a manufacturing line that suggest the feasibility of the gaze object sequence approach. The number of parts and tools used during manual manufacturing operations are uniform in their size and shape and are also limited in number. Although the speed with which a task is completed may vary, the task itself is repetitive. Luo et al. (2017) have demonstrated human–robot collaboration for industrial manipulation tasks for which human reaching motions were predicted to enable robot collaboration without collision in a small-shared workspace. In that work, the robot had access to real-time information about the human collaborator's upper limb kinematics, such as palm and arm joint center positions. Focusing on the safety of human–robot collaboration, Morato et al. (2014) developed a framework that uses a collision avoidance strategy to assist human workers performing an assembly task in close proximity with a robot arm. Numerous RGB-D cameras were used to track the location and configuration of humans within the collaborative workspace. The common theme of such approaches is to track human kinematics and infer intent from kinematic data alone. The additional use of the gaze object sequence could infer human intent at an earlier stage and further advance safety and efficiency for similar types of human– robot collaboration tasks.

The gaze object sequence could also be demonstrated in the familiar environment of someone's home if a recognition system were properly trained on commonly used objects, where the objects are typically located (e.g., kitchen vs. bathroom), and how they are used. The performance of household robots will largely depend on their ability to recognize and localize objects, especially in complex scenes (Srinivasa et al., 2012). Recognition robustness and latency will be hampered by large quantities of objects, the degree of clutter, and the inclusion of novel objects in the scene. The gaze object sequence could be used to address challenges posed by the presence of numerous objects in the scene. While the combinatorial set of objects and actions could be large, characteristic gaze object sequences for frequently used subject-specific iADLs could be utilized to quickly prune the combinatorial set.

Up to now, we have focused primarily on the task-based aspects of gaze tracking for human–robot collaboration. However, gaze tracking could also provide much needed insight into intangible aspects such as human trust in robot collaborators (Jenkins and Jiang, 2010). Our proposed methods could be used to quantify differences in human gaze behavior with and without robot intervention and could enhance studies on the effects of user familiarity with the robot, human vs. non-human movements, perceived risk of robot failure, etc. Consider, for example, a robot arm that is being used to feed oneself (Argall, 2015). Such a complicated task requires the safe control of a robot near sensitive areas such as the face and mouth and may also be associated with a sense of urgency on the part of the user. A gaze object sequence could reveal high-frequency transitions between task-relevant objects and the robot arm itself, which could indicate a user's impatience with the robot's movements or possibly a lack of trust in the robot and concerns about safety. As the human–robot collaboration becomes more seamless and safe, the frequency with which the user visually checks the robot arm may decrease. Thus, action recognition algorithms may need to be tuned to inter-subject variability and adapted to intra-subject variability as the beliefs and capabilities of the human operator change over time.

Other potential applications of the gaze object sequence include training and skill assessment. For instance, Westerfield et al. (2015) developed a framework that combines Augmented Reality with an Intelligent Tutoring System to train novices on computer motherboard assembly. *Via* a head-mounted display, trainees were provided real-time feedback on their performance based on the relative position and orientation of tools and parts during the assembly process. Such a system could be further enhanced by, for example, using an expert's gaze object sequence to cue trainees *via* augmented reality and draw attention to critical steps in the assembly process or critical regions of interest during an inspection process. Gaze object sequences could also be used to establish a continuum of expertise with which skill level can be quantified and certified. Harvey et al. (2014) described the concepts of "quiet eye" and "slowing down" observed with surgeons performing thyroid lobectomy surgeries. Interestingly, expert surgeons fixated their gaze on the patient's delicate laryngeal nerve for longer periods than novices when performing "effortful" surgical tasks that required increased attention and cognition. Gaze behavior has also been linked with sight reading expertise in pianists (Truitt et al., 1997). Gaze fixation duration on single-line melodies was shorter for more skilled sight-readers than less skilled sight-readers.

In short, the gaze object sequence generated from 3D gaze tracking data has been demonstrated as a potentially powerful feature for action recognition. By itself, the gaze object sequence captures high-level spatial and temporal gaze behavior information. Moreover, additional features can be generated from the gaze object sequence. For instance, gaze object percentage can be extracted by counting instances of objects in the gaze object sequence. Gaze fixation duration and saccades from one object to another can be extracted from the gaze object sequence. Even saccades to different regions of the same object could potentially be identified if the resolution of the gaze object sequence were made finer through the use of segmented regions of interest for each object (e.g., spout, handle, top, and base of a pitcher).

### Practical Considerations and Limitations of Gaze Object Sequences

In this work, we have presented a simple proof-of-concept methods for action recognition using a DTW Euclidean distance metric drawn from comparisons between novel and characteristic gaze object sequences. In the current instantiation, novel and characteristic sequences were compared using the same elapsed time (percentage of the entire sequence) (**Figure 8**). This approach was convenient for a *post hoc* study of recognition accuracy as a function of time elapsed. However, in practice, the novel gaze object sequence will roll out in real-time and we will not know *a priori* what percent of the subtask has elapsed. To address this, we propose the use of parallel threads that calculate the DTW Euclidean distance metric for comparisons of the novel sequence with different portions of each characteristic sequence. For instance, one thread runs a comparison with the first 10% of one characteristic gaze object sequence; another thread runs a comparison for the first 20% of the same characteristic gaze object sequence, etc. Such an approach would also address scenarios in which an individual happens to be performing a subtask faster than the population, whose collective behavior is reflected in each characteristic gaze object sequence. For example, it can be seen that the novel gaze object sequence in **Figure 8A** has a similar pattern as the characteristic gaze object sequence in **Figure 8B**. However, the individual subject is initially performing the subtask at a faster rate than the population average. The (yellow, blue, black, red, etc.) pattern occurs within the first 10% of the novel sequence, but does not occur until 30% of the characteristic sequence has elapsed. The delayed recognition of the subtask could be addressed using the multi-thread approach described above **Figure 8**. To further address the computational expense commonly associated with DTW algorithms, one could implement an "unbounded" version of DTW that improves the method for finding matching sequences, which occur arbitrarily within other sequences (Anguera et al., 2010).

For human-robot collaborations, the earlier that a robot can recognize the intent of the human, the more time the robot will have to plan and correct its actions for safety and efficacy. Thus, practical limitations associated with the computational expense of real-time gaze object sequence recognition must be addressed. At the least, comparisons of a novel sequence unfolding in real-time could be made with a library of characteristic subtask sequences using GPUs and parallel computational threads (one thread for each distinct comparison). The early recognition of a novel subtask is not just advantageous for robot planning and control. The computational expense of DTW increases for longer sequences. Thus, the sooner a novel sequence can be recognized, the less time is spent on calculating the proposed DTW Euclidean metric. Since DTW uses dynamic programming to find the best warping paths, a quadratic computational complexity results. While not implemented in this work, the computational expense of the DTW process could be further reduced by leveraging a generalized time warping technique that temporally aligns multimodal sequences of human motion data while maintaining linear complexity (Zhou and De la Torre, 2012).

### Potential Advancements for a Gaze Object Sequence-Based Action Recognition System

As expected, recognition accuracy increased as more of the novel gaze object sequence was compared with each characteristic gaze object sequence (**Figure 9**). However, the simple recognition approach presented here is not perfect. Even when an entire novel gaze object sequence is compared with each characteristic gaze object sequence, the approach only achieves an accuracy of 96.4%, precision of 89.5%, and recall of 89.2% averaged across the six subtasks. The confusion matrix (**Figure 10**) shows which subtasks were confused with one another even after 100% elapsed time. Although the percentage of incorrect subtask label predictions is low, the subtasks that share the same gaze objects have been confused the most. For instance, the Subtask 1 ("remove pitcher lid") and Subtask 5 ("replace pitcher lid") were occasionally confused with one another. It is hypothesized that the training of a sophisticated machine learning classifier could improve the overall accuracy of the recognition results, especially if additional features were provided to the classifier. Potential additional features include quantities extracted from upper limb kinematics and other eye tracker data, such as 3D gaze saliency maps.

As with the processing of any sensor data, there are tradeoffs with speed and accuracy in both the spatial and temporal domains. In its current instantiation, the gaze object sequence contains rich temporal information, but at the loss of spatial resolution; entire objects are considered rather than particular regions of objects. By contrast, the 3D gaze saliency map and gaze object percentage contain rich spatial information, but at the loss of temporal resolution due to the convolution of eye tracker data over a lengthy period of time. For practical purposes, we are not suggesting that spatial and temporal resolution should be maximized. In practice, an action recognition system need not be computationally burdened with the processing of individual points in a 3D point cloud or unnecessarily high sampling frequencies. However, one could increase spatial resolution by segmenting objects into affordance-based regions (Montesano and Lopes, 2009), or increase temporal resolution by considering the temporal dynamics of action units rather than subtasks.

While object recognition from 2D egocentric cameras is an important problem, solving this problem was not the focus of the present study. As such, we bypassed challenges of 2D image analysis such as scene segmentation and object recognition, and used a marker-based motion capture system to track each known object in 3D. Data collection was performed in a laboratory setting with expensive eye tracker and motion capture equipment. Nonetheless, the core concepts presented in this work could be applied in non-laboratory settings using low-cost equipment such as consumer-grade eye trackers, Kinect RGB-D cameras, and fiducial markers (e.g., AprilTags and RFID tags).

### CONCLUSION

The long-term objective of the work is to advance human-robot collaboration by (i) facilitating the intuitive, gaze-based control of robots and (ii) enabling robots to recognize human actions, infer human intent, and plan actions that support human goals. To this end, the objective of this study was to identify useful features that can be extracted from 3D gaze behavior and used as inputs to machine learning algorithms for human action recognition. We investigated human gaze behavior and gazeobject interactions in 3D during the performance of a bimanual, iADL: the preparation of a powdered drink. Gaze fixation duration was statistically significantly larger for some action verbs, suggesting that some actions such as pouring and stirring may require increased visual attention for task completion. 3D gaze saliency maps, generated with high spatial resolution for six subtasks, appeared to encode action-relevant information at the subtask and action unit levels. Dynamic time warping barycentric averaging was used to create a population-based set of characteristic gaze object sequences that accounted for intra- and inter-subject variability. The gaze object sequence was then used to demonstrate the feasibility of a simple action recognition algorithm that utilized a DTW Euclidean distance metric. Action recognition results (96.4% accuracy, 89.5% precision, and 89.2% recall averaged over the six subtasks), suggest that the gaze object sequence is a promising feature for action recognition whose impact could be enhanced through the use of sophisticated machine learning classifiers and algorithmic improvements for real-time implementation. Future work includes the development of a comprehensive action recognition algorithm that simultaneously leverages features from 3D gaze–object interactions, upper limb kinematics, and hand– object spatial relationships. Robots capable of robust, real-time recognition of human actions during manipulation tasks could be used to improve quality of life in the home as well as quality of work in industrial environments.

### ETHICS STATEMENT

This study was carried out in accordance with the recommendations of the UCLA Institutional Review Board with written informed consent from all subjects. All subjects gave written informed consent in accordance with the Declaration of Helsinki. The protocol was approved by the UCLA Institutional Review Board.

### AUTHOR CONTRIBUTIONS

All authors contributed to the conception and design of the study. AF and XW supervised data collection, performed the data analysis, and created the first draft of the figures. All authors wrote sections of the manuscript and contributed to

### REFERENCES


its revision. All authors have read and approved the submitted manuscript.

### ACKNOWLEDGMENTS

The authors thank Allison Walters and Manuel Cisneros for contributions to the initial experimental protocol and equipment setup. The authors thank Sahm Bazargan, Kevin Hsu, John-Pierre Sawma, and Sarah Anaya for assistance with data collection and analysis. Finally, the authors thank Eunsuk Chong, Cheng-Ju Wu, and Eric Peltola for discussions on early drafts of this manuscript.

### FUNDING

This work was supported in part by National Science Foundation Awards 1461547 and 1463960, and Office of Naval Research Award N00014-16-1-2468. Any opinions, findings, conclusions, or recommendations are those of the authors and do not necessarily represent the official views, opinions, or policies of the funding agencies.

### SUPPLEMENTARY MATERIAL

The Supplementary Material for this article can be found online at https://www.frontiersin.org/articles/10.3389/frobt.2018.00025/ full#supplementary-material.

Video S1 | The Supplemental Video shows the experimental setup and reconstruction of 3D gaze saliency maps for the task of preparing a powdered drink.

*Research*. Available at: http://ycb-benchmarks.s3-website-us-east-1. amazonaws.com/


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Copyright © 2018 Haji Fathaliyan, Wang and Santos. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*