# BODY REPRESENTATIONS, PERIPERSONAL SPACE, AND THE SELF: HUMANS, ANIMALS, ROBOTS

EDITED BY : Matej Hoffmann, Alex Pitti, Lorenzo Jamone, Eszter Somogyi and Pablo Lanillos PUBLISHED IN : Frontiers in Psychology, Frontiers in Neurorobotics and Frontiers in Computational Neuroscience

#### Frontiers eBook Copyright Statement

The copyright in the text of individual articles in this eBook is the property of their respective authors or their respective institutions or funders. The copyright in graphics and images within each article may be subject to copyright of other parties. In both cases this is subject to a license granted to Frontiers. The compilation of articles constituting this eBook is the property of Frontiers.

Each article within this eBook, and the eBook itself, are published under the most recent version of the Creative Commons CC-BY licence. The version current at the date of publication of this eBook is CC-BY 4.0. If the CC-BY licence is updated, the licence granted by Frontiers is automatically updated to the new version.

When exercising any right under the CC-BY licence, Frontiers must be attributed as the original publisher of the article or eBook, as applicable.

Authors have the responsibility of ensuring that any graphics or other materials which are the property of others may be included in the CC-BY licence, but this should be checked before relying on the CC-BY licence to reproduce those materials. Any copyright notices relating to those materials must be complied with.

Copyright and source acknowledgement notices may not be removed and must be displayed in any copy, derivative work or partial copy which includes the elements in question.

All copyright, and all rights therein, are protected by national and international copyright laws. The above represents a summary only. For further information please read Frontiers' Conditions for Website Use and Copyright Statement, and the applicable CC-BY licence.

ISSN 1664-8714 ISBN 978-2-88963-877-2 DOI 10.3389/978-2-88963-877-2

#### About Frontiers

Frontiers is more than just an open-access publisher of scholarly articles: it is a pioneering approach to the world of academia, radically improving the way scholarly research is managed. The grand vision of Frontiers is a world where all people have an equal opportunity to seek, share and generate knowledge. Frontiers provides immediate and permanent online open access to all its publications, but this alone is not enough to realize our grand goals.

#### Frontiers Journal Series

The Frontiers Journal Series is a multi-tier and interdisciplinary set of open-access, online journals, promising a paradigm shift from the current review, selection and dissemination processes in academic publishing. All Frontiers journals are driven by researchers for researchers; therefore, they constitute a service to the scholarly community. At the same time, the Frontiers Journal Series operates on a revolutionary invention, the tiered publishing system, initially addressing specific communities of scholars, and gradually climbing up to broader public understanding, thus serving the interests of the lay society, too.

#### Dedication to Quality

Each Frontiers article is a landmark of the highest quality, thanks to genuinely collaborative interactions between authors and review editors, who include some of the world's best academicians. Research must be certified by peers before entering a stream of knowledge that may eventually reach the public - and shape society; therefore, Frontiers only applies the most rigorous and unbiased reviews.

Frontiers revolutionizes research publishing by freely delivering the most outstanding research, evaluated with no bias from both the academic and social point of view. By applying the most advanced information technologies, Frontiers is catapulting scholarly publishing into a new generation.

#### What are Frontiers Research Topics?

Frontiers Research Topics are very popular trademarks of the Frontiers Journals Series: they are collections of at least ten articles, all centered on a particular subject. With their unique mix of varied contributions from Original Research to Review Articles, Frontiers Research Topics unify the most influential researchers, the latest key findings and historical advances in a hot research area! Find out more on how to host your own Frontiers Research Topic or contribute to one as an author by contacting the Frontiers Editorial Office: researchtopics@frontiersin.org

## BODY REPRESENTATIONS, PERIPERSONAL SPACE, AND THE SELF: HUMANS, ANIMALS, ROBOTS

Topic Editors:

Matej Hoffmann, Czech Technical University in Prague, Czechia Alex Pitti, Université de Cergy-Pontoise, France Lorenzo Jamone, Queen Mary University of London, United Kingdom Eszter Somogyi, University of Portsmouth, United Kingdom Pablo Lanillos, Radboud University Nijmegen, Netherlands

Citation: Hoffmann, M., Pitti, A., Jamone, L., Somogyi, E., Lanillos, P., eds. (2020). Body Representations, Peripersonal Space, and the Self: Humans, Animals, Robots. Lausanne: Frontiers Media SA. doi: 10.3389/978-2-88963-877-2

# Table of Contents

	- *200 How Cognitive Models of Human Body Experience Might Push Robotics* Tim Schürmann, Betty Jo Mohler, Jan Peters and Philipp Beckerle

## *206 Social Antecedents to the Development of Interoception: Attachment Related Processes are Associated With Interoception*

Kristina Oldroyd, Monisha Pasupathi and Cecilia Wainryb

### *219 Exploring Self-Consciousness From Self- and Other-Image Recognition in the Mirror: Concepts and Evaluation*

Gaëlle Keromnes, Sylvie Chokron, Macarena-Paz Celume, Alain Berthoz, Michel Botbol, Roberto Canitano, Foucaud Du Boisgueheneuc, Nemat Jaafari, Nathalie Lavenne-Collot, Brice Martin, Tom Motillon, Bérangère Thirioux, Valeria Scandurra, Moritz Wehrmann, Ahmad Ghanizadeh and Sylvie Tordjman

### *231 Prerequisites for an Artificial Self*

Verena V. Hafner, Pontus Loviken, Antonio Pico Villalpando and Guido Schillaci

# Editorial: Body Representations, Peripersonal Space, and the Self: Humans, Animals, Robots

Matej Hoffmann<sup>1</sup> \*, Pablo Lanillos <sup>2</sup> , Lorenzo Jamone<sup>3</sup> , Alex Pitti <sup>4</sup> and Eszter Somogyi <sup>5</sup>

<sup>1</sup> Department of Cybernetics, Faculty of Electrical Engineering, Czech Technical University, Prague, Czechia, <sup>2</sup> Donders Institute for Brain, Cognition and Behaviour, Radboud University, Nijmegen, Netherlands, <sup>3</sup> ARQ (Advanced Robotics at Queen Mary), School of Electronic Engineering and Computer Science, Queen Mary University of London, London, United Kingdom, <sup>4</sup> Laboratoire ETIS, CY Cergy Paris University, ENSEA, CNRS, UMR8051, Cergy-Pontoise, France, <sup>5</sup> Department of Psychology, Centre for Situated Action & Communication, University of Portsmouth, Portsmouth, United Kingdom

Keywords: body representations, peripersonal space, self, neurorobotics, cognitive developmental robotics, body schema, body image, development of body representations

**Editorial on the Research Topic**

#### **Body Representations, Peripersonal Space, and the Self: Humans, Animals, Robots**

The presence of various "body maps" in the brain has fascinated scientists and the general public alike, spurred by the account of Head and Holmes (1911) and the discovery of the somatotopic representations (the "homunculi") in the primary motor and somatosensory cortices of primates (Leyton and Sherrington, 1917; Penfield and Boldrey, 1937). Neurological conditions and accounts of a whole range of illusions regarding own body perception (e.g., rubber hand illusion, out-of-body experience, apparition) generated both seminal research articles (e.g., Botvinick and Cohen, 1998; Lenggenhager et al., 2007) and public interest. The attention devoted to the representations of the body in the brain has also led to numerous attempts at describing or defining them and proposals of a variety of concepts, such as superficial and postural schema (Head and Holmes, 1911), body schema, body image (Paillard, 1999), corporeal schema, etc. One characteristic common to all these representations is their multimodal nature: they dynamically integrate information from different sensory modalities (visual, tactile, proprioceptive, vestibular, auditory), not excluding motor information (Azañón et al., 2016). However, the concepts of body schema, body image, and many others are umbrella notions for a range of observed phenomena rather than a result of identification of specific mechanisms. The field is thus in a somewhat "chaotic state of affairs" (Berlucchi and Aglioti, 2009), with limited convergence to a common view (Graziano and Botvinick, 2002; Holmes and Spence, 2004). Next to "body space," the space immediately surrounding the body is called peripersonal space. There are two notions associated with this term: (i) a safety margin around the body, and (ii) space within our reach. They may be supported by distinct neuronal substrates see Cléry et al. (2015) for a survey. Furthermore, it is not clear to what extent the representation of the "body space" and the space around it are overlapping. They may be "two labels for the same concept" (Cardinali et al., 2009) or rely on a unified representation (Canzoneri et al., 2013). Alternatively, others amass evidence suggestive of their dissociation (Bassolino et al., 2015).

This state of affairs calls for collective action of the interdisciplinary research community and this Research Topic with articles from Frontiers in Psychology—Cognition, Frontiers in Neurorobotics, and Frontiers in Computational Neuroscience is an example of such efforts. Infant development constitutes a key viewpoint from which to study body representations. In our collection, this theme is introduced by Philippe Rochat in his review (Rochat) on self-unity constituting the basis of learning and development. Two original research articles target

#### Edited by:

Florian Röhrbein, Independent Researcher, Winnenden, Germany

Reviewed by: Manfred Hild, Beuth Hochschule für Technik Berlin, Germany

> \*Correspondence: Matej Hoffmann matej.hoffmann@fel.cvut.cz

Received: 08 April 2020 Accepted: 14 May 2020 Published: 16 June 2020

#### Citation:

Hoffmann M, Lanillos P, Jamone L, Pitti A and Somogyi E (2020) Editorial: Body Representations, Peripersonal Space, and the Self: Humans, Animals, Robots. Front. Neurorobot. 14:35. doi: 10.3389/fnbot.2020.00035 the somatosensory-motor aspects of early infant development: DiMercurio et al. contribute an observation study of spontaneous touches in the first 2 months of life; Chinn et al. study reaching movements to tactile targets. Tamé et al. also focus on somatoperception—this time in adults. The contributions of Banakou et al., Scarpina et al., Nuara et al., and Arnold et al. deal with plasticity and effects of disorders on body representations. Body representations do not develop in isolation but in a social context—these aspects are studied by Drew et al. in infants, in adults but in a developmental context in Oldroyd et al., Keromnes et al., and in adults, involving a robot to study the effects of anthropomorphism in Heijnen et al.. Peripersonal space as the frontier of self and modulations thereof are reviewed by Cléry and Ben Hamed. Dürr and Schilling study peripersonal space in stick insects.

The remaining contributions employ robots. The motivation is two-fold: First, following the synthetic methodology ("understanding by building") (Pfeifer and Bongard, 2007; Hoffmann and Pfeifer, 2018), robots can be deployed as embodied computational models of body representations and their development and clear up the notoriously muddy waters of the concepts invented to describe body and self representations. This is the general approach of cognitive developmental robotics and neurorobotics (e.g., Asada et al., 2009) and can be applied to body models specifically (Hoffmann et al., 2010; Schillaci et al., 2016; Lanillos et al., 2017 for surveys). Hafner et al. contribute a conceptual review on the prerequisites for an artificial self. Pugach et al. and Juett and Kuipers present robotic models of peripersonal space representations. Second, the way humans represent their bodies and the space around them provide a proxy for what they expect from a robot collaborator. Hence, a good understanding of these phenomena is the basis for safe and natural human-robot interaction, as studied by Schürmann et al. and also Heijnen et al..

### 1. INFANT DEVELOPMENT

From the first day of life, newborns manifest awareness of their own body as an invariant and organized spatial structure, coupled with an experiential awareness of the self. Reviewing infancy research of the past few decades, Rochat argues that learning and development rest on this primordial and necessary sense of self-unity. In fact, self-unity, as Rochat proposes, could represent important grounding information for artificial learning systems, allowing them to learn rapidly like human children do.

How does an early sense of the body and self manifest itself in infancy? Two research papers in this section studied the question by examining how infants spontaneously touch their own body and how they reach to tactile targets on the skin. In the first paper, DiMercurio et al. show that infants are active explorers of their own body from the first days of life. In a series of observation sessions, few-week-old infants engaged in a high rate of self-touch, contacting about twenty different areas with each hand, frequently moving from one area to the other. The authors propose that early self-generated and deeply embodied sensorimotor experiences form the critical foundation from which future goal-directed behaviors may develop. Chinn et al. investigated the developmental progression of reaching and grasping strategies to vibrotactile targets attached to various parts of the face. In their longitudinal study, they found that infants are more likely to reach to the target with the hand rather than using other effectors or strategies; they also refine their hand postures with age, using the palmar surface or fingers of the hand rather than the dorsum, and grasping the targets more as they become older.

Young infants not only experience their own bodies but also observe other people's bodies and recognize similarities and differences between them. Such interpersonal aspects of body representations may serve to undergird early social learning. In their EEG study, Drew et al. show that the infant brain registers correspondences between infants' own bodies and the bodies of others. Thus, responses to tactile stimulation to the hand or the foot were modulated by simultaneous vision of the corresponding or non-corresponding effector of another person being touched.

### 2. ADULT BODY REPRESENTATIONS, PLASTICITY, AND EFFECT OF NEUROLOGICAL AND MOVEMENT DISORDERS

In their review article, Tamé et al. introduce a new model to describe how tactile processing contributes to a coherent perception of the body as an integrated whole. In a previous model, it was proposed that three types of body representations the superficial schema, the postural schema, and a model of body size and shape—are required to localize touch in space (Longo et al., 2010). Reviewing evidence, they currently extend this model with two novel dimensions of tactile processing, namely the integration of touch across the two sides of the body and the use of stored proprioceptive information about the location of touch in space (postural priors).

Three clinical research papers examined how the plasticity of the body schema is altered in various neurological conditions. Introducing novel clinical research paradigms based on tool embodiment, graphesthesia tasks, or self-portraits, these articles contribute with valuable results to the relatively scarce existing literature concerning the body schema in patients with Parkinson's disease, somatosensory loss, or cerebral palsy. Scarpina et al. explored how the body schema accommodates significant objects or tools in patients with Parkinson's disease, where motor and sensory bodily functions are primarily affected. Following tool-use training, these patients did not show changes in movement parameters that are associated with effective tool embodiment in healthy individuals. The authors propose that altered plasticity of the body schema is one of the key sensorimotor symptoms in Parkinson's disease. Somatosensory information has a crucial role in self-orientation, as shown in the study by Arnold et al., who examined the effect of somatosensory loss in deafferented patients on the adoption of self-centered vs. decentered perspectives. They compared the responses of two deafferented patients with those of age-matched controls in a graphesthesia task, which consisted of identifying ambiguous tactile letters (such as d and b) drawn on various surfaces of the head. Deafferented patients relied on individual cognitive strategies and responded with greater variability across head and trunk orientation conditions. On the other hand, the control group, consistent with earlier studies, reliably adopted selfcentered perspectives for tactile letters drawn on the forehead or on side surfaces of the head which were aligned with the front surface of the trunk. How do representations of self and body develop in children with a neurological condition, such as unilateral cerebral palsy? Using self- and peer portraits, Nuara et al. report evidence that body self-representation more specifically the children's own experience with their body's functioning—is reflected in their drawings.

Finally, Banakou et al. explored whether our cognitive performance, attitudes, and perhaps behaviors also change when we switch bodies in virtual reality settings. The plasticity of our body schema allows us to easily perceive a virtual life-sized body as our own even when the virtual body is strikingly different from our own. They report exciting results showing that virtual embodiment—adopting the body of Albert Einstein in this case can cause changes in cognitive processing and also a reduction in age-based discrimination of young adults toward the elderly.

### 3. SOCIAL ASPECTS OF BODY REPRESENTATIONS

Alongside early sensorimotor experiences, early social experiences also have a substantial impact on the areas of the brain responsible for representation of the body. Oldroyd et al. explored how attachment between child and caregiver might be linked to interoception—an individual's ability to detect and track internal bodily cues. They found that an avoidant attachment style was associated with lower interoceptive functioning, whereas an anxious attachment style was associated with heightened interoception. Furthermore, reported parenting was associated with youths' awareness of their physiological and emotional responding.

Keromnes et al. present a historical review of the concept of self-consciousness and provide an overview of the role of body perception in the construction of a sense of self as well as the differentiation of self and other. They demonstrated that a multidisciplinary approach is mandatory to address such a complex concept. The paper highlights the importance of selfimage recognition in the mirror to assess self-consciousness but also the role of the other in self-image recognition. Self-image development might be a good indicator of the evolution of self-consciousness.

Heijnen et al. analyzed the impact of movement synchronization on the level of anthropomorphization of a robot. Two competing hypotheses were behind the study: (1) feature overlap, i.e., self-other overlap, will activate features related to humans; (2) autonomy, where unpredictability (unsynchronized condition) will increase anthropomorphization. Results did not show any significant influence of the synchronization manipulation regarding the attributed anthropomorphization of the robot.

### 4. PERIPERSONAL SPACE REPRESENTATIONS AND ROBOTIC MODELS THEREOF

Two articles from the collection deal with the space around the body. Cléry and Ben Hamed in their review summarize recent neuroscience research on peripersonal space (PPS) representations, focusing both on PPS models of individual body parts (e.g., hand, face, trunk) and models of their interaction, and suggesting possible avenues for future studies. The paper discusses how visual and tactile events in the PPS are predicted (both temporally and spatially), how the PPS is modulated (for example, by tool use, by other perceptual stimuli, by social factors), what is the relationship between PPS and Interpersonal Space, and how individual personality traits can affect the PPS. Ultimately, the links between PPS and bodily self-consciousness are discussed. Dürr and Schilling propose a formalization of PPS in insects, and in particular offer a description of how the PPS of a stick insect (Carausius morosus) would look like. Whole-body motion capture data of unrestrained walking, climbing and searching behaviors is used to delineate "action volumes" and "contact volumes" for both antennae and all six legs of the insect; the intersection of these volumes is equivalent to a representation of coinciding somatosensory and motor activity, and can therefore be representative of the PPS. Then, overlapping regions of the action spaces of each pair of limbs are deducted and referred to as affordance space, which defines regions of the space in which the motion of one limb influences the possible motion of another limb. Finally, an artificial neural network model is proposed to model the motion interaction between pair of limbs, based on the aforementioned affordance space.

Two articles employ robotic models to study PPS-related phenomena. Pugach et al. propose a neural model based on Gain-Field neurons for integrating tactile events with arm postures and visual locations for constructing handand target-centered receptive fields in the visual space. In robotic experiments using an artificial skin, they show how their neural architecture reproduces the behaviors of parietal neurons for: (1) dynamically encoding the body schema of a robotic arm without any visual tags on it, and (2) estimating the relative orientation and distance of targets to it. By doing so, they demonstrate how tactile information facilitates the integration of visual and proprioceptive signals in order to construct the body space. Juett and Kuipers present a computational model that enables a robot to automatically build a representation of its peripersonal space (PPS) by sensorimotor exploration. Following a developmental approach based on intrinsic motivation, the robots first performs motor babbling and begins to discover patterns of regularities and unusual events in the sensorimotor (visuomotor) space; gradually, this leads to the emergence of goal-directed reaching and grasping abilities. Preliminary results obtained with a Baxter bimanual

robot support the validity of this approach, and its applicability to real-world situations.

### 5. HUMAN-LIKE BODY MODELS IN ROBOTS

Finally, two articles from researchers in robotics provide bridges between the body representations in biology and machine body models. Hafner et al. discuss the minimal requirements for a robot to develop an artificial sense of self. For a minimal self, they focus on sense of body ownership and agency and analyze how an artificial agent could develop these capacities and how that could be measured. Self-exploration behaviors, artificial curiosity, sensorimotor simulations, and predictive processes are discussed in this context. Schürmann et al. in their perspective article take a more pragmatic, application-oriented approach and discuss how taking inspiration in biological body representations can be exploited in assistive devices and humanoid robots.

### REFERENCES


### AUTHOR CONTRIBUTIONS

All authors listed have made a substantial, direct and intellectual contribution to the work, and approved it for publication.

### FUNDING

MH was supported by the Czech Science Foundation (GA CR), project EXPRO (nr. 20-24186X). PL was supported by the SELFCEPTION project (www.selfception.eu) European Union Horizon 2020 Programme (MSCA-IF-2016) under grant agreement no. 74194. LJ was partially supported by the EPSRC UK (projects MAN<sup>3</sup> , EP/S00453X/1, and NCNR, EP/R02572X/1). AP was supported by EQUIPEX-ROBOTEX (CNRS), chaire dexcellence CNRS-UCP, and project Labex MME-DII (ANR11-LBX-0023-01). ES gratefully acknowledges the support of ERC Grant 323674 FEEL and FET Open Grant 713010 GOAL-Robots.


**Conflict of Interest:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2020 Hoffmann, Lanillos, Jamone, Pitti and Somogyi. This is an openaccess article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Virtually Being Einstein Results in an Improvement in Cognitive Task Performance and a Decrease in Age Bias

#### Domna Banakou1,2, Sameer Kishore<sup>1</sup> and Mel Slater 1,2 \*

<sup>1</sup> Event Lab, Department of Clinical Psychology and Psychobiology, University of Barcelona, Barcelona, Spain, <sup>2</sup> Institute of Neurosciences, University of Barcelona, Barcelona, Spain

The brain's body representation is amenable to rapid change, even though we tend to think of our bodies as relatively fixed and stable. For example, it has been shown that a life-sized body perceived in virtual reality as substituting the participant's real body, can be felt as if it were their own, and that the body type can induce perceptual, attitudinal and behavioral changes. Here we show that changes can also occur in cognitive processing and specifically, executive functioning. Fifteen male participants were embodied in a virtual body that signifies super-intelligence (Einstein) and 15 in a (Normal) virtual body of similar age to their own. The Einstein body participants performed better on a cognitive task than the Normal body, considering prior cognitive ability (IQ), with the improvement greatest for those with low self-esteem. Einstein embodiment also reduced implicit bias against older people. Hence virtual body ownership may additionally be used to enhance executive functioning.

#### Edited by:

Alex Pitti, Université de Cergy-Pontoise, France

#### Reviewed by:

Silvia Serino, Università Cattolica del Sacro Cuore, Italy H. Henrik Ehrsson, Karolinska Institutet (KI), Sweden

\*Correspondence:

Mel Slater melslater@ub.edu

#### Specialty section:

This article was submitted to Cognition, a section of the journal Frontiers in Psychology

Received: 27 February 2018 Accepted: 18 May 2018 Published: 11 June 2018

#### Citation:

Banakou D, Kishore S and Slater M (2018) Virtually Being Einstein Results in an Improvement in Cognitive Task Performance and a Decrease in Age Bias. Front. Psychol. 9:917. doi: 10.3389/fpsyg.2018.00917 Keywords: body ownership, embodiment, rubber hand illusion, virtual reality, executive functioning, age bias, implicit association test, Tower of London test

## INTRODUCTION

It has been demonstrated that it is quite straightforward to induce in healthy individuals the perceptual illusion that an object or fake body part is part of their own body—a body ownership illusion—illustrating the surprising plasticity of the brain's body representation. For example, the rubber hand illusion (RHI) (Botvinick and Cohen, 1998) has shown that tapping and stroking a rubber hand placed in an anatomically plausible position on a table in front of a person, and synchronously tapping and stroking the corresponding occluded real hand usually leads to the illusion that the rubber hand is their own. This illusion is both subjective and can be measured objectively through "proprioceptive drift"; when asked to blindly point toward their hand, participants will point more toward the rubber than the real hand. Similarly, if the rubber hand is threatened, then there are strong physiological and cortical responses in response to the perceived threat (Armel and Ramachandran, 2003; Zhang and Hommel, 2015). This illusion has been shown to work in immersive virtual reality (VR), where instead of a rubber arm, a virtual arm is seen in stereo 3D as coming out of the participant's real shoulder (Slater et al., 2008). Moreover, a threat to the virtual hand results in a motor response (Kilteni et al., 2012), including motor cortex activation (González-Franco et al., 2013).

Body ownership illusions have also been shown to occur at the whole body level (Petkova and Ehrsson, 2008). In VR a virtual body as seen from first-person perspective (1PP) through a headtracked head-mounted display (HMD) can be programmed to spatially substitute a person's real body, with motion capture of participants' body movements being mapped to the virtual body in real time. When the person looks down toward their own body, they see the virtual body instead, and when they look toward a virtual mirror, they see a reflection of their virtual body (Slater et al., 2010).

These results demonstrate the high degree of brain plasticity in our body representation, but it is also interesting that the type of virtual body has been found to induce perceptual, attitudinal and behavioral changes in experimental participants, a result first reported in Yee and Bailenson (2007). One characteristic example of this is that when adults are embodied in a small virtual body (van der Hoort et al., 2011) they overestimate the sizes of objects, and when the small virtual body depicts that of a child they also have implicit attitudes and behavioral changes toward becoming child-like (Banakou et al., 2013; Tajadura-Jiménez et al., 2017). However, when they are placed in an adult body that is scaled down to match the size of the child one, then they do not exhibit such changes. In other examples, when White participants experience the RHI over a black rubber hand, or a body ownership illusion over a Black virtual body in VR, this leads to a reduction of their implicit racial bias against Black people (Peck et al., 2013; Farmer et al., 2014; Maister et al., 2015), an effect that has been found to last at least 1 week (Banakou et al., 2016). It was recently shown that when White participants are embodied in a White or Black body, and interact with a Black or White virtual human, the skin color of their virtual body, rather than their real body, influenced which virtual partner they mimicked more (Hasler et al., 2017).

These changes that the body type seems to carry may also apply at higher levels of cognitive processing rather than only at a perceptual and behavioral level. In the study described in Osimo et al. (2015) people were embodied in a virtual body that represented a famous counselor—Dr Sigmund Freud—or alternatively a virtual look-alike representation of themselves. It was found that a strong body ownership illusion when the counsellor's body was Dr. Freud allowed them to find a more satisfactory solution to a personal problem, and positively influenced their mood compared to when the counselor was a double of themselves. Being embodied as Freud had an effect over and above being embodied as a copy of themselves, as if some of the cognitive attributes of a famous therapist mapped over to the participants.

In this paper we investigated whether embodiment of people in a virtual body that is strongly associated with high performing cognitive abilities would result in them exhibiting enhanced cognitive performance. Specifically, we tested whether embodiment in a body that signified super-intelligence, Albert Einstein, would lead to measurable short-term changes in cognitive abilities. In order to accomplish this we used the tower of London task (Shallice, 1982), which was designed to specifically assess executive functioning, and is linked to fluid intelligence and working memory (Unterrainer et al., 2004; Zook

FIGURE 1 | The experimental setup. The body of the participant was substituted by a gender-matched VB, viewed from 1PP, onto which body and head movements were mapped in real time. (A) The Einstein virtual body. (B) The Normal virtual body. (C) Participants were fitted with an HTC VIVE head-mounted display, and their body movements were tracked by 37 OptiTrack markers.

et al., 2004; D'Antuono et al., 2016). We were interested to see whether people virtually represented as Einstein would show greater performance on this test, compared to pre-exposure baseline performance. Furthermore, since the virtual body of Einstein was older (**Figure 1A**) than our experimental group that consisted of young males, we addressed a second issue, that of implicit bias against older people. Specifically, we were interested in examining whether embodiment in an older looking virtual body could lead to a reduction of implicit age-based discrimination in young adults as was found in Oh et al. (2016).

To test this, we ran an experiment with adult males who were embodied in the body of Einstein or, as a control, that of a young adult (Normal). The participants saw their assigned virtual body from 1PP where the eyes of the virtual body coincided with the person's real eyes, and the virtual and real body were spatially coincident. Body ownership over their virtual body was enhanced using the technique of visuomotor synchrony, so that through real-time motion capture, the movements of the participant were mapped to the movements of their virtual body, following earlier examples (Banakou et al., 2013; Banakou and Slater, 2014; Osimo et al., 2015; Tajadura-Jiménez et al., 2017).

### MATERIALS AND METHODS

### Ethics

The experiment was approved by Comissió Bioètica of Universitat de Barcelona. All participants gave their written informed consent prior to participating. The study was performed according to institutional ethics and national standards for the protection of human participants. Ethical considerations included informed consent, right to withdraw, and confidentiality. Exclusion criteria were epilepsy, use of medication, recent consumption of alcohol, intellectual disability and mental health difficulties (e.g., requiring medication). Following completion of the experiment, participants were debriefed with an explanation about the purpose of the study.

### Materials

The experiment was conducted in a Virtual Reality laboratory (width: 3.5 m, length: 4.0 m—back wall to curtain—height: 2.5 m). Participants were fitted with an HTC VIVE headmounted display (HMD) (**Figure 1C**). This is stereo and has a nominal field-of-view of 100◦ , with a resolution of 2,160 × 1,200 pixels per eye displayed at 90 Hz. Participants were also required to wear an OptiTrack full-body motion capture suit that uses 37 markers used with the Motive software to track their body movements in real time (**Figure 1C**). The infrared technology was implemented with a 12-camera truss by OptiTrack. The virtual environment was implemented on the Unity3D platform. The animation-enabled model of the Normal virtual body was purchased from Rocketbox Libraries and the Einstein model was created with Mixamo Fuse and customized appropriately for the purposes of the study using Mudbox 2016 and Maya 2016 academic versions.

### Participants

Thirty adult male healthy participants aged 18–30 years (28 students and 2 unemployed) (mean ± SD age 22.0 ± 2.81), with correct or corrected vision, were recruited by advertisement and email around the campus of the University of Barcelona. They had no prior knowledge of the experiment, and no or little prior experience of virtual reality. The experimental groups were comparable across a number of variables, including previous experience of VR, and time spent playing computer games (**Table 1**). Participants were compensated for their participation, by receiving e15 (e5 after the end of the first phase, and the remaining e10 after the end of the second phase).

For each case the total number of participants, mean of ages, median and IQR values for participants' experience in VR and hours per week of playing video games (1 = 0, 2 = "<1," 3 = "1– 3,"...,6 = "7–9," 7 = ">9." Codes refer to a 1–7 Likert scale. For previous VR experience hours spent playing video games 1 means the least and 7 the most. The Word Accentuation Test (WAT) scores are converted to full scale IQ estimates (fsiq), and Self-Esteem scores refer to Rosenberg's scale with higher scores indicating higher self-esteem. Details of these scores are given below.

### Experimental Design

The experiment was conducted as a between-groups design with a single factor referred to as "Body," with levels Einstein TABLE 1 | Experimental design and distribution of participants by condition.


(they had the Einstein body) (**Figure 1A**) or Normal (they had a young male adult body (**Figure 1B**). The size of the virtual environment and proportions of the content were equivalent to real-life sizes and proportions, and identical in both conditions (Einstein, Normal). Participants were randomly allocated to one of the two conditions. The experimental design can be seen in **Table 1**. Participants visited the laboratory twice, once to complete some baseline measurements (see Response Variables below), and second a week later for their virtual exposure and the collection of further post-exposure data.

### Procedures

Participants attended the experiment at pre-arranged times. Upon arriving, they were given an information sheet to read, and after they agreed to continue with the experiment, they were given a consent form to sign, and completed a demographics questionnaire. Participants were first assessed with the Word Accentuation Test (WAT) (Del Ser et al., 1997), which is used to estimate intelligence. This test is an adaptation of the North American Adults' Reading Test (NART) (Blair and Spreen, 1989) for Spanish speakers. The WAT utilizes lowfrequency Spanish words with all accents removed to make the pronunciation ambiguous, and it has been shown that it gives a reliable estimate correlated with IQ in healthy adults (Gomar et al., 2011). Participants were also assessed on Rosenberg's self-esteem scale (Rosenberg, 1965). The IQ estimates and selfesteem scores for each experimental group can be seen in **Table 1**.

Participants were then seated in front of a desktop computer and completed an age bias Implicit Association Test (IAT) (Greenwald et al., 1998, 2003), and a Tower of London Task (Shallice, 1982), and the results were recorded (variables: preIAT, scorepre). After a period of 1 week they returned for the main experiment.

The VR exposure took place in a laboratory where the position of all participants was controlled through Velcro strips on the floor that were used to mark where they should stand during the experiment. When ready to start, the participants were fitted with a head-mounted display (HMD), and the body-tracking suit (**Figure 1C**). Initially participants were instructed to turn and move their heads and bodies and walk a maximum two steps away from their starting point to prevent them from hitting the walls due to the restricted laboratory space.

#### TABLE 2 | Questionnaire items.


All questions were scored on a −3 to +3 scale, where −3 meant least and +3 meant most agreement with the statement.

Upon entering the virtual environment, participants found themselves in a virtual room where their body was visually substituted by the life-sized Einstein or a young adult virtual body (Normal), seen from 1PP (**Figure 1**). Their head and body movements were mapped in real-time to the virtual body. They could see this body both by looking directly toward their real body, and also in a virtual mirror. A series of instructions were then given to them from a pre-recorded audio. First, they were asked to perform a simple set of stretching exercises in order to explore the capabilities and real time motion of the virtual body, including movements of their arms, legs and feet. They were asked to continue performing these exercises by themselves and also look around the virtual room in all directions, where they were asked to state and describe what they saw.

After this 5-min orientation period, participants were instructed through audio that they had to complete a task. They were told that a series of numbers (either positive or negative numbers, fractions, or decimals) would appear around them on the walls or floor and that their task was to locate these numbers and order them in ascending order by selecting them with their hands (for details refer to Movie S1, in Supplementary Material). They were shown 11 number combinations in total (4 different numbers at a time), and the task lasted between 5 and 7 min, depending on how fast they were at selecting the numbers. The reason for choosing this task was to engage participants for the total time required for them to stay in the virtual environment, and to constantly reinforce visuomotor synchrony, since by turning around and pointing they would continually be aware of their virtual body and that its movements were their own.

Finally, the HMD was removed, and all participants completed the age IAT and TOL task again (postIAT, scorepost), along with a post-experience questionnaire (**Table 2**). The whole procedure lasted approximately 35 min. Two experimental operators (one female, one male) were present throughout the whole experiment. Further information is given in Movie S1 (Supplementary Material).

### Response Variables Implicit Association Test (IAT)

The IAT (Greenwald et al., 1998) was administered on a desktop screen a week before participants' virtual exposure (preIAT), and then immediately after their virtual exposure (postIAT). The IAT was completed on the same desktop computer screen both times. The age IAT followed the standard IAT procedure (Nosek et al., 2005) where participants are required to rapidly categorize faces (young or old) and words (positive or negative) into groups. Implicit bias is calculated from the differences in accuracy and speed between categorizations (e.g., young people's faces, positive words and old people's faces, negative words compared to the opposite groups). Higher IAT scores are interpreted as the greater implicit age bias, as this signifies longer reaction times and greater inaccuracies in categorizing old people's faces with positive words, and young faces with negative words. Here the response variable of interest was dIAT = postIAT–preIAT to examine whether the VR exposure led to any change in bias against old. Positive values indicate greater bias. It has been shown that mean IAT scores tend to show slightly stronger associations corresponding to the pairings of the combined block that is completed first (Nosek et al., 2005). To control for this effect, the order of the combined blocks was counterbalanced between participants as proposed by Nosek et al. (2007). The IAT used was downloaded from the Millisecond Test Library and modified with the Inquisit software by Millisecond.

#### Tower of London Task (TOL)

The TOL task is designed to assess executive functioning and specifically, planning and problem solving skills (Shallice, 1982), and its reliability has been shown for test-retest purposes (Köstering et al., 2015). In this test, participants are presented with a model where three beads (red, green, blue) are strategically positioned on three rods of descending heights. They are asked to manipulate the beads from a predetermined starting position on a different set of pegs to match the position of beads in the model. There are 12 different problems of graded difficulty, of 2, 3, 4, and 5-move examples, and only 3 moves are allowed per problem. A problem is classified as correct if the end position is achieved in the minimum number of prescribed moves. The algorithm, based on the procedural details adapted from Krikorian et al. (1994), gives 3 points for a successful solution on the first trial, 2 points on the second, 1 on the third, and 0 points if all trials are failed. The total score is the sum of points on all 12 problems, with a maximum possible score of 36. The TOL was administered on a desktop screen a week before participants' virtual exposure (scorepre), and then immediately after their virtual exposure (scorepost). It was completed on the same desktop computer screen both times. The response variable of interest was dscore = scorepost−scorepre which showed the degree of improvement (positive values) or decline (negative values) in score after the exposure compared with before. The TOL was downloaded from the Millisecond Test Library and modified with the Inquisit software by Millisecond.

#### Post-experience Questionnaire

After each exposure a 5-statement post-questionnaire was administered to assess participants' subjective experience (**Table 2**). A 7-point scale was used ranging from −3 to +3, with "0" indicating a neutral response on each question (with the scale varying from Strongly Disagree, −3, to Strongly Agree, +3). These questions were related to the strength of body ownership (vrbody, mirror) and agency (agency) over the virtual body—here we require that the levels of body ownership and agency are the same between the two conditions—while others served as control questions (features, twobodies).

### Statistical Methods

The major interest is to examine whether there are differences between the Normal and Einstein groups on the two response variables: the IAT for age bias (dIAT), and dscore (the change in score with respect to the problem solving). These comparisons are premised on there being a strong body ownership illusion.

We adopt a Bayesian approach where we can treat both response variables simultaneously in one model. As can be seen in **Table 1** the selfesteem variable, which had been elicited prior to the VR exposures, differs between the Experimental and Control groups by chance, and thus must be included as a covariate in the model. Similarly, for fsiq.

The overall model is as follows:

$$dscore\_i \sim t(df, \eta\_i, \sigma\_{score})$$

where

η<sup>i</sup> = βscore,0+βscore,1Xi+βscore,2Fi+βscore,3(X<sup>i</sup> .Fi) +βscore,4Si+ βscore,5(Xi. .Si)

$$\text{data}\_{i} \sim \text{N}\left(\beta\_{iat,0} + \beta\_{iat,1}X\_{i}, \sigma\_{iat}\right) \\ i = 1, \ldots, \Re 0 \tag{1}$$

t(df ,µ, σ) refers to a Student-t distribution with degrees of freedom df, mean (or median) µ and scale paremeter σ. Similarly, N(µ, σ) refers to a normal distribution with mean µ and standard deviation σ. Here X<sup>i</sup> = 1 if the ith individual is in the Einstein group and 0 if in the Normal group, F<sup>i</sup> is the fsiq score and S<sup>i</sup> denotes the selfesteem score.

These express a linear model akin to an ANOVA or regression model, which can also be written in statistical model notation as:

dscore = Condition + fsiq + Condition×fsiq + selfesteem +

Condition×selfesteem

diat = Condition where dscore has a t-distribution and diat is normally distributed with the stated standard deviation. A t-distribution for dscore was chosen because inspection of the data suggested a fat tailed distribution. However, this parameterisation also allows for the possibility that dscore has a normal distribution, which would occur were df about 30 or more.

In a first model fit that we carried out dIAT used the same model as dscore (i.e., including fsiq and selfesteem) but there was found to be no relationship between dIAT and these variables. Hence these were removed for simplicity.

The prior distributions of the parameters β∗,<sup>j</sup> are conservatively chosen to be Cauchy with median 0 and scale parameter s = 10, in order to allow for wide variation. The Cauchy distribution, Student-t with 1 degree of freedom, has infinite mean and variance, and 95% of this distribution is between ±127. The σ<sup>∗</sup> have the same Cauchy distributions but restricted to the range (0,∞). Ninety five percent of this distribution lies between 0.4 and 254. The prior distribution of df is also the same Cauchy, but restricted to the range 0–30. Ninety five percent of this distribution is between 0.3 and 27.

The results are not sensitive to changes in prior–for example, if the scale parameter s = 20 then the same results are obtained (see Results).

### RESULTS

### Questionnaire Responses

First we consider the responses to the post-experience questionnaire on body ownership and agency (**Table 2**). The variable vrbody refers to the degree to which participants felt as if the body they saw when looking toward themselves was their own body, and mirror refers to the body they saw in the mirror. Agency refers to the extent to which participants affirmed that the virtual body's movements were their own, whereas the control question twobodies refers to the extent to which they felt they had two bodies, and features refers to the extent to which participants affirmed that the virtual body had similar physical characteristics to themselves. In **Figure 2** it can be seen that the lower quartiles of vrbody, mirror and agency are all at least 1 in all conditions. The control question twobodies always have the upper quartiles at most 1. The score for features has the upper quartile at 1 in the case of the Einstein body, but 2 in the case of the Normal body with greater variance. This is not surprising since indeed the Einstein body, being older, had features that would have been most unlike those of the participants. Overall the body ownership and agency scores are very high. This is a pre-requisite for the validity of the study. No further statistical analysis is required here, since we only need to know for this particular sample of people whether or not the body ownership manipulation was successful.

### TOL Change

**Figure 3** shows that the mean change in dscore, which was calculated as the difference in scores before and after the exposure (scorepost–scorepre), was greater in the Einstein than in the Normal condition. The means and Standard Errors are 0.67 (0.33 SE) and 1.73 (0.38 SE) with Cohen's d = 0.38, which is a small to medium effect size. However, this does not take into account the prior baseline "intelligence" of the participants. **Figure 4** shows the scatter diagram of dscore on fsiq, by condition. Apart from one outlier dscore is negatively associated with fsiq in the Normal condition while positively associated in the Einstein condition. This outlier was removed for subsequent analysis.

### IAT Change

**Figure 5** shows the mean and standard error of the change in IAT by condition. The means and standard errors are −0.03 (SE 0.060) for the Normal body and −0.24 (SE .036) for the Einstein body (Cohen's d = 0.54) which is a medium effect size. **Figure 6** shows that the bias decreases for those in the Einstein condition, but hardly changes for those in the Normal condition. Hence bias does not change from positive to negative, but rather decreases (in the Einstein condition). This is in line with the

and the whiskers range from max (min value, lower quartile−1.5\*IQR) to min (max value, upper quartile + 1.5\*IQR).

findings on racial bias reported in Peck et al. (2013) and Banakou et al. (2016), where we found that embodiment of "White" people in a dark-skinned body reduces but does not flip implicit racial bias.

### Posterior Distributions

**Table 3** summarizes the posterior distributions. It can be seen from the posterior distribution for dIAT that the probability that the Einstein condition results in a smaller dIAT is

1−0.068 = 0.932. The posterior probability of interaction effect for dscore seen in **Figure 4** is 1.000, indicating that for those in the Einstein condition greater fsiq scores are associated with a greater problem solving result, whereas the opposite is the case in the Normal condition. However, selfesteem has an effect, where greater esteem in the Einstein condition is associated with lesser score (probability = 1– 0.008 = 0.992).

The first six columns show the means and standard errors, 2.5th, 50th, and 97.5th percentiles of the posterior distributions of the parameters of the model. The seventh column shows the posterior probability of the parameter being positive. The prior 95% credible intervals are −127 to 127 for each of the β<sup>∗</sup> parameters and 0.4–254 for the σ∗. For df the prior 95% credible interval is 0.3 to 27.

If instead of using s = 10 as the scale parameter for the Cauchy distributions we use s = 1, 5 or s = 20, i.e., using priors that are even more conservative, the results hardly change–the equivalents to **Table 3** are almost identical.

**Figure 7** shows the bar chart of dscore by Body and a median split on the selfesteem score (median = 32.5). It can be seen that generally those with lower self-esteem had greater improvement in the score compared with those with higher self-esteem. However, the difference between these two is most pronounced in the Einstein condition, and generally the mean change in dscore amongst those with high self-esteem is close to zero. This accounts for the apparent negative relationship between dscore and esteem for those in the Einstein condition.

### Goodness of Fit

Using the posterior distributions of the model we generated 8,000 pseudo random observations on each of the response variables, for each individual–in order to obtain fitted values of \ diat <sup>d</sup><sup>i</sup> and dscore<sup>i</sup> over each individual i. The result, referred to as the predicted posterior, is the posterior distribution of each of diat d<sup>i</sup> and dscore \<sup>i</sup> . We used the mean (over the 8000) as a point estimate of the individual values. These could then be compared with the corresponding originally observed values of the corresponding variables.

In the case of IAT the predicted values fall into two clusters, since they are dependent on one binary factor (condition). Therefore, for comparison we compared the means of the observed and fitted values, as shown in **Table 4**. In the case of dscore **Figure 8** shows the scatter plot comparing the fitted values dscore \<sup>i</sup> and the observed values dscore<sup>i</sup> . In each case the model suggests a good fit to these data.

### DISCUSSION

The first result is that although the participants were young men, they clearly had overall a strong illusion of body ownership over a much older body as well as over a body representing one of approximately their own age. Body ownership over bodies profoundly different to the real one has been repeatedly demonstrated. For example, in Slater et al. (2010) the participants were all men, but their virtual body was that of a young girl. In Normand et al. (2011), although participants were thin males, they had the illusion of owning a virtual body with a fat belly. Similarly, Preston and Ehrsson (2016) found that healthy individuals reported illusory ownership over virtual obese or slim bodies during functional magnetic resonance imaging. In Kilteni et al. (2013); Peck et al. (2013); Banakou et al. (2016), and Hasler et al. (2017) all participants were white, but the level of body ownership did not differ among white, black, and even purpleskinned virtual bodies. In the study reported by Banakou et al. (2013), and a recent replication study by Tajadura-Jiménez et al. (2017), it was found that young and older adults felt ownership over a virtual child body or a body of a scaled-down adult, equally and high for both conditions. Osimo et al. (2015) reported a study where young adults experienced ownership over a virtual body that was a 3D scan of their real body and looked very much like themselves, and also a virtual body that was much older and depicted Sigmund Freud, without overall differences in ownership between the conditions.

In line with earlier findings, in the current paper we show that it is possible to induce in young adults a subjective body ownership illusion with respect to a much older virtual body, representing Albert Einstein. Specifically, we show that embodiment induced through 1PP and synchronous visuomotor correlations between the participants' movements and those of



sample (32.5) and Higher Esteem to those with selfesteem > median.

their virtual bodies leads to equally high ownership and agency ratings for both those embodied as Albert Einstein and those embodied in a younger looking virtual body. Notably, there was a difference in the subjective report of physical resemblance between participants and their virtual body, which was lower for those in the Einstein condition. As reported in the results, this finding is not surprising since the Einstein body, being older, had features that would have been unlike those of the participants.

Our results also show that embodiment in Einstein leads to changes in implicit attitudes. Specifically, embodiment of young adults in the older Einstein body led to a reduction of implicit bias against elderly, resulting in overall lower IAT TABLE 4 | Means of the Observed diat and Estimated Values diat d from the posterior distribution.


scores compared to the control condition (Normal body). Recent evidence suggests that the type of body can indeed have an impact on how the world is perceived and on attitudes and behaviors

of the participant (Banakou et al., 2013, 2016; Kilteni et al., 2013; Peck et al., 2013; Maister et al., 2015; Bailey et al., 2016; Tajadura-Jiménez et al., 2017). Regarding stereotyping against the elderly, Yee and Bailenson (2006) used virtual reality to embody participants in a virtual body of a much older person or a body of a young adult. The results showed that negative stereotyping of the elderly was reduced when participants were embodied in the virtual body of old people compared to those embodied in younger.

Our finding expands on these previous findings, demonstrating that the body type carries meaning, and that this meaning has implications for the perceptual processing, attitudes and behaviors of the person experiencing it. This was argued in detail in Banakou et al. (2013, 2016) and Llobera et al. (2013) in the frame of the "cortical body matrix" (Moseley et al., 2012), that not only maintains a multi-sensory representation of the space around the body, but also aspects of the self and corresponding psychological correlates. Moreover, in Banakou et al. (2016) we explained how the IAT is used as a measure of association between categories for any individual, based on statistical associations from the social environment. Similar to there being negative associations with the concept of "Black" people (Greenwald and Krieger, 2006), the elderly also face both implicit and explicit forms of age-based discrimination (Hummert et al., 2002; North and Fiske, 2012; Harwood et al., 2015). Nonetheless, as argued in Maister et al. (2015) and Banakou et al. (2016), during body ownership illusions, the similarity in appearance between the transformed self and the out-group (here Einstein depicting an older person) results in the disruption of associations between the out-group and negative valence items, and substituted by positive associations with the self. Nonetheless, we cannot rule out the possibility that the changes in age-bias scores in our experiment were not caused by the fact that the virtual body depicted only an older person, but by the fact that it depicted a highly eminent universally known person (Einstein). This remains an interesting question to be addressed in future work.

Furthermore, there has been recent evidence that the type of the owned body can result in changes beyond perceptual, attitudinal and behavioral, including also cognitive processing. As introduced earlier, Osimo et al. (2015) used virtual reality to embody people in a virtual body depicting Sigmund Freud. A strong body ownership illusion over that body improved participants' mood and happiness after the experience, and allowed them to find a more satisfactory solution to a personal problem, compared to those who experienced a control body (virtual representation of themselves). The authors explained their findings in terms of activation of perspective-taking mechanisms and the "self " concept. Since the self is associated with attributes of the new transformed body, this allows the participant to access mental resources that are normally inaccessible due to their familiar modes of thinking about themselves. In our case, this generalization of body ownership to higher level capabilities is linked to enhanced performance in cognitive tasks. We show that embodiment in the Einstein virtual body led participants to better performance in a TOL task, which has been linked to fluid intelligence (Unterrainer et al., 2004; D'Antuono et al., 2016). Interestingly, we found that participants' problem solving performance was associated with a measure of IQ and self-esteem scores that differed depending on the embodiment condition (Einstein vs. Normal). We discuss these below and offer possible explanations.

Past studies have shown that most cognitive tasks tend to show improvement with higher IQ (Duncan et al., 1996; Conway et al., 2003; Zook et al., 2004), but also repetition (Strauss et al., 2006; Calamia et al., 2013). However, it has been suggested that performance specifically related to the TOL test is uncorrelated with IQ (Welsh et al., 1991; Bishop et al., 2001; Bechara and Martin, 2004; Huizinga, 2006). Even in samples that were characterized by above-average full-scale IQs, it was found that associations between TOL performance and IQ were not significant, and that even IQs ranging from 80 to 150 were weakly associated with perfect solutions (Luciana et al., 2009). Moreover, Köstering et al. (2015) showed that the TOL can be reliably used for test and re-test in group-based studies and with individual participants.

In our experiment, taking into account the baseline "intelligence" scores of participants, we find that higher IQ is associated with a greater problem-solving result, but only for those embodied as Einstein. However, for those in the Normal condition, IQ and performance are negatively associated, with participants with higher IQ showing weaker results. Therefore, the difference in baseline IQ between experimental and control groups that varied by chance could not have itself accounted for differences between the conditions. But how can it be explained that people with higher IQ in the Normal condition performed worse compared to those in the Einstein condition who performed better?

The authors in Köstering et al. (2015) suggested that the relationship between IQ and cognitive task performance may be strengthened when the task is made more challenging or unpredictable. Similarly, Pekrun et al. (2010) proposed that "boredom" experienced during a task can be expected to reduce both motivation to perform and the effort invested. According to Stankov's hypothesis (Stankov, 1983) individuals with higher scores of intelligence (or higher general ability) might perform worse in simple tasks due to low arousal (boredom), concluding that in such cases intelligence correlates negatively with task performance. We suggest that a similar explanation could apply to our results. Although the task was identical between the experimental and control conditions, the main difference was the type of body participants experienced. Participants in the control condition saw themselves embodied in a young-looking body, with age, and possibly physical characteristics, similar to their own, thus resulting in no additional levels of excitement during the experimental session. This in conjunction with the relative simplicity of the task might have caused a lack of interest, thus driving them to perform poorly. On the contrary, for those participants in the experimental condition there is some new important evidence about the self – "I am Einstein." This piece of evidence is not a trivial one, it is linked to "super-intelligence." Seeing oneself as Einstein could have caused participants to reach a higher level of their cognitive abilities (in a way "living up to their name"), thus resulting in better task performance.

The second question that remains is how self-esteem could have affected task performance in the Einstein condition. Concretely, we found that self-esteem scores were negatively associated with task performance for those participants embodied as Einstein. This negative correlation is caused by the change in performance (dscore) being high for participants with low self-esteem but with little change in performance by participants with high self-esteem. In other words, there is an increase in TOL score for those with low self-esteem, whereas for those with high self-esteem there is not much change.

Previous research has shown how higher self-esteem is generally associated with higher mental and physical health (Taylor and Brown, 1988; Baumeister et al., 2003; Taylor et al., 2003). Various clinical techniques and standard self-esteem enhancement programs are extensively used to improve selfesteem (Bednar et al., 1989; Frey and Carlock, 1989; Burns, 1993; Mruk, 2006), amongst which are learning techniques of social approval and acceptance (Kernis, 2006), and perspective-taking (Peterson et al., 2015). For example, regarding intimate-partner relationships, it has been shown that low self-esteem participants report increased esteem and closeness toward their partner after going through a traditional perspective-taking technique, whereas participants with more favorable self-views are not affected by the perspective-taking instructions (Peterson et al., 2015). Perspective-taking methods are similar to the technique of embodiment used in our study. The critical difference is that the former is imaginal, whereas virtual embodiment leads to a perceptual illusion of body ownership, without requiring participants to imagine what it would be like to have a different body: they simply experience it. Therefore, as in the above example, it could be argued that giving participants the experience of being Einstein might lead to greatest benefits on cognitive performance for those who have room for improvement–those with low self-esteem. Moreover, since Einstein can generally be considered a socially approved and highly accepted personality, one could argue that this leads to an improvement of self-esteem in low self-esteem people, which is it turn reflected in better cognitive performance.

In line with the above, it has also been suggested selfesteem can be affected by mood, with lower self-esteem people more likely to evaluate themselves positively when they are in a good mood (Brown and Mankowski, 1993). Regarding the underlying process of how moods affect cognition, it has been suggested that self-relevant positive thoughts become more accessible when people are happy, and negative when people are sad (Forgas et al., 1984; Bower, 1987; Brown and Mankowski, 1993). Studies on body representation have shown the impact that one's body can have on emotional state and selfesteem, with participants who feel more positive also showing enhanced self-esteem (Tajadura-Jiménez et al., 2015). In the studies of Osimo et al. (2015) and Tajadura-Jiménez et al. (2017) participants also reported feeling happier after experiencing embodiment in Dr Sigmund Freud or in a child body respectively compared to control groups, however, no data on self-esteem were recorded. Similarly, a potential increase in self-esteem could have affected participants' stress levels as previously demonstrated (Juth et al., 2008). Additionally, stress has been shown to impair cognitive abilities, including selective attention, working memory, and other verbal or visual solving problems (Keinan et al., 1987; Braunstein-Bercovitz et al., 2001; Luethi, 2008; Tiferet-Dweck et al., 2016). There could therefore be the possibility that embodying the Einstein body led low self-esteem participants to increase their self-confidence - thus decreasing any experienced task-related stress - which in turn led to better performance.

Although we cannot draw any conclusions on improvement of self-esteem, motivation, mood, or stress levels based on our data, our speculation is that our findings can be associated with experiencing enhanced self-reassurance, provided that the critical role of the body is taken into account. This is to an extent supported by our findings, as there were changes in cognitive performance only for those people in the experimental condition, and only for those with low self-esteem. Since, as argued earlier, in that condition the "self " is now associated with Einstein, this gives participants access to their own internal mental resources that they might associate with that body that would otherwise be inaccessible. In our case this is specifically linked to enhanced cognitive performance as measured by the TOL task. On these lines, it is currently unclear whether such changes are the product of virtual embodiment in a personality known for intelligence, or the effect is due to an increase in self-esteem, arousal, motivation or mood from virtually embodying any famous universally respected character. Also, we cannot conclude that similar changes would take place were different cognitive abilities to be tested. It remains unclear whether embodiment as Einstein has a specific effect on cognitive processing related only to problem-solving, or the effect can carry over to different cognitive or other functions. Hence additional tasks to control and account for these effects should be further tested. Certainly we do not claim that embodiment in a different body, no matter how prestigious and important personality this body represents, could give people access to entirely new knowledge (e.g., quantum mechanics, physics). However, it could make them more open to acquire such new knowledge.

Additionally, further research is required to understand the contributions of body ownership and agency to these effects. For example, previous research has shown that experiencing agency over the virtual body's movements is an essential factor for the illusion to result in behavioral, perceptual, and implicit attitudes (Banakou et al., 2013; Osimo et al., 2015; Banakou and Slater, 2017). The significance of agency was explicitly addressed in Banakou and Slater (2017), where we found that body ownership in itself cannot account for behavioral after-effects (illusory agency over speaking), and that it is necessary that body ownership be primarily induced by visuomotor synchrony between movements of the participants and movements of the virtual body (always in the context of 1PP over the virtual body). In this work we did not study how asynchronous visuomotor correlations, leading to a reduction of body ownership, might have influenced the results, however, in future studies we aim to replicate and extend these types of findings, and specifically address the agency factor. Although additional research is needed in this direction and to understand the extent to which body ownership can generalize to higher level capabilities, this method

### REFERENCES

Armel, K. C., and Ramachandran, V. S. (2003). Projecting sensations to external objects: evidence from skin conductance response. Proc. R. Soc. Lond. Series B Biol. Sci. 270, 1499–1506. doi: 10.1098/rspb.200 3.2364

could prove useful in the improvement of cognitive performance, especially in people who have low self-esteem.

## CONCLUSIONS

The results of this experiment, in conjunction with earlier work discussed above, shows that virtual embodiment can be used to generate an illusion of body ownership of a virtual body that substitutes their own body, through first-person perspective and visuomotor correlations over real and virtual body movements. The main focus here is that embodiment does not only lead to perceptual, attitudinal and behavioral correlates as previously shown, but can also cause changes in cognitive processing. Specifically, our findings suggest that embodiment in a virtual body that is associated with high cognitive abilities, such as Albert Einstein, results in better performance in a TOL task, and also a reduction in age-based discrimination of young adults toward the elderly. There is evidence that participants' baseline "intelligence" and self-esteem correlate with the above findings, taking however into account the critical role of the body in which embodiment occurs. Nonetheless, the present study comes with a number of limitations and alternative hypotheses in the interpretation of the results that we discuss above, which point out the necessity for further research to be able to understand the exact mechanisms resulting in such effects.

## AUTHOR CONTRIBUTIONS

MS designed the original concept. All authors designed the experiment. DB and SK implemented the scenario and carried out the experiment. MS carried out the analysis. DB and MS wrote the paper.

## ACKNOWLEDGMENTS

This work was supported by PSI2014-56301-R Ser Einstein: La Influencia de Internalizar un Cuerpo Virtual en la Inteligencia, Ministerio de Economía, Industria y Competitividad of Spain. We would like to thank Rodrigo Pizarro for the implementation of the virtual environment, and José Valenzuela for the design of the virtual characters.

### SUPPLEMENTARY MATERIAL

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fpsyg. 2018.00917/full#supplementary-material


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2018 Banakou, Kishore and Slater. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Frontier of Self and Impact Prediction

#### Justine Cléry\* and Suliann Ben Hamed\*

UMR5229, Institut des Sciences Cognitives Marc Jeannerod, CNRS-Université Claude Bernard Lyon I, Bron, France

The construction of a coherent representation of our body and the mapping of the space immediately surrounding it are of the highest ecological importance. This space has at least three specificities: it is a space where actions are planned in order to interact with our environment; it is a space that contributes to the experience of self and selfboundaries, through tactile processing and multisensory interactions; last, it is a space that contributes to the experience of body integrity against external events. In the last decades, numerous studies have been interested in peripersonal space (PPS), defined as the space directly surrounding us and which we can interact with (for reviews, see Cléry et al., 2015b; de Vignemont and Iannetti, 2015; di Pellegrino and Làdavas, 2015). These studies have contributed to the understanding of how this space is constructed, encoded and modulated. The majority of these studies focused on subparts of PPS (the hand, the face or the trunk) and very few of them investigated the interaction between PPS subparts. In the present review, we summarize the latest advances in this research and we discuss the new perspectives that are set forth for futures investigations on this topic. We describe the most recent methods used to estimate PPS boundaries by the means of dynamic stimuli. We then highlight how impact prediction and approaching stimuli modulate this space by social, emotional and action-related components involving principally a parieto-frontal network. In a next step, we review evidence that there is not a unique representation of PPS but at least three subsections (hand, face and trunk PPS). Last, we discuss how these subspaces interact, and we question whether and how bodily self-consciousness (BSC) is functionally and behaviorally linked to PPS.

Keywords: visual, tactile, looming stimuli, prediction, multisensory integration, peripersonal space

### PERIPERSONAL SPACE

In everyday life, we are solicited by multiple stimuli in our environment. The space around us is filled with conspecifics, animals and objects, often animated by their own goals. Most of the time, this implies interacting with these elements of the environment along a very rich and complex repertoire that depends on the context and the very nature of this environment. This requires the construction of a coherent representation of our body and the selective encoding of the space immediately surrounding it, the so-called peripersonal space (PPS), both in order to estimate the consequences of the environment and the consequences of our own actions onto our body. Interestingly, the PPS is subserved in the brain by specific neuronal mechanisms embedded in a well identified cortical network that specifically processes visual or auditory information occurring in the space that directly surrounds us as well as the tactile information occurring on the body.

#### Edited by:

Matej Hoffmann, Czech Technical University in Prague, Czechia

#### Reviewed by:

Jean-Paul Noel, Vanderbilt University, United States Michela Bassolino, École Polytechnique Fédérale de Lausanne, Switzerland

#### \*Correspondence:

Justine Cléry jclery@uwo.ca; justine.clery@isc.cnrs.fr Suliann Ben Hamed benhamed@isc.cnrs.fr

#### Specialty section:

This article was submitted to Cognition, a section of the journal Frontiers in Psychology

Received: 18 April 2018 Accepted: 06 June 2018 Published: 27 June 2018

#### Citation:

Cléry J and Ben Hamed S (2018) Frontier of Self and Impact Prediction. Front. Psychol. 9:1073. doi: 10.3389/fpsyg.2018.01073

**23**

### Visuo-Tactile Neurons as a Substrate for PPS Encoding in the Cortex

Numerous studies in non-human primates have shown that multisensory cues, and specifically those recruiting the body through touch, are integrated by a specialized neural system representing PPS (**Figure 1A**). While much of the work has focused on visuo-tactile interactions, audio-tactile properties of PPS have also been explored. Specific populations of multisensory neurons respond both to tactile information on the body (arm, face or trunk) and visual or auditory stimuli occurring in PPS, i.e., close to the body. These multisensory neurons have first been described in the macaque brain, in a network composed by specialized parietal and frontal areas: the ventral premotor cortex (vPM; F4, Rizzolatti et al., 1981a,b; or polysensory zone PZ, Graziano et al., 1994, 1997, 1999; Fogassi et al., 1996; Graziano and Cooke, 2006; Guipponi et al., 2015), the ventral intraparietal area on the fundus of the intraparietal sulcus (VIP, Hyvärinen and Poranen, 1974; Duhamel et al., 1997, 1998; Avillac et al., 2005; Schlack et al., 2005; Graziano and Cooke, 2006; Guipponi et al., 2013, 2015), in the parietal areas 7b as well as in subcortical regions such as the putamen (Graziano and Gross, 1993). Though the response properties of these neurons are modulated by eye position their visual receptive fields (RFs) are anchored to specific body parts. This suggests that the multisensory representation of PPS they hold, is body-part centered, for example on the head for area VIP neurons (Duhamel et al., 1997; Avillac et al., 2005) or on the arm for premotor PZ neurons (Graziano et al., 2000; Graziano and Cooke, 2006). While these studies point toward a functional convergence between PPS processing and multisensory convergence processes, very few of them have explicitly probed that these multisensory neurons actively integrate sensory information from different modalities (Avillac et al., 2007), and even fewer have explicitly probed a direct link between multisensory visuo-tactile or audio-tactile integration and PPS processing. In a recent study performed in epileptic patients, Bernasconi et al. (2018) recorded for the first time surface intracranial electroencephalography signals (ECoG) while tactile and/or approaching auditory stimulations are presented to the subjects. The authors show that PPS processing most often coincides with multisensory integration processes.

### Clinical Evidence for Visuo-Tactile Interactions in PPS

Extinction is a neurological condition in which patients fail to detect contralesional stimuli only when challenged in their sensory processing by the presentation of a double simultaneous stimulation, both on the ipsilesional and contralesional sides (Bender, 1952; Mattingley et al., 1997; Làdavas and Serino, 2008). This condition is observed both when the concurrent stimuli are from the same sensory modality (e.g., both visual, this condition is referred to as unimodal extinction) and when the concurrent stimuli are from two different modalities (e.g., one is visual and the other is tactile, this condition is referred to as cross-modal extinction). In such right brain-damaged patients with tactile extinction, visual or auditory stimulations on the ipsilesional side exacerbate contralesional tactile extinction. In contrast, if the visual and tactile stimuli are both presented on the same contralesional side, then, the clinical deficit is reduced, the processing of one sensory stimulus benefiting from the processing of the other one (Làdavas et al., 1998a). Therefore, cross-modal extinction depends on the spatial arrangement of the stimuli relative to the patient's body (Farnè et al., 2005a,b; for review, see Làdavas, 2002). Importantly, this modulation is most systematic when visuo-tactile interactions occur in the space near to the patients' body, as compared to the space far away (di Pellegrino et al., 1997; Làdavas et al., 1998a, 2000). This finding is taken as evidence for the existence of a PPS in the human brain, relying on the integration of visual and tactile information in the space close to the body, in a way very similar to that described in monkeys (Làdavas, 2002). Most of these studies place the bimodal stimuli close to the hand. Subsequent studies confirmed that this visuo-tactile integration was not specific of PPS around the hand but could also be reported around other body parts, such as the face (Làdavas et al., 1998b; Farnè and Làdavas, 2002; Farnè et al., 2005a). From a neuroanatomical point of view, studies have shown that brain lesions in frontal, temporal and parietal cortex in the right hemisphere are the most common regions leading to extinction (Mattingley et al., 1997; Driver and Vuilleumier, 2001; Farnè et al., 2005b; Vossel et al., 2011; Kamtchum-Tatuene et al., 2017), at locations considered as the human homologues of the monkey cortical regions involved in PPS processing and described above. In particular, this neurological disorder appears most often in patients with focal inferior parietal lesions. Lesions of the temporo-parietal junction (TPJ), a region crucially involved in self-processing, also induce a disruption of PPS processing (Blanke et al., 2002; Blanke, 2012). The monkey homologue of TPJ is uncertain. A recent fMRI study suggests that the monkey homologue of human TPJ could actually lie midway along the ventral temporal sulcus (Mars et al., 2013) at a location where face and body patches are identified (Perrett et al., 1992; Tsao et al., 2003, 2008; Tsao and Livingstone, 2008; Rushworth et al., 2013; Popivanov et al., 2014; Premereur et al., 2016) and where impact prediction to the body produces strong neuronal activations (Cléry et al., 2017).

### Behavioral Evidence for the Existence of PPS

The above clinical evidence in favor of the existence of a PPS system in the human brain is corroborated by behavioral studies in healthy participants (Spence et al., 2004; Macaluso and Maravita, 2010; Occelli et al., 2011). These studies showed that the modulation of tactile perception by visual or auditory stimuli is more pronounced when these are presented close, as compared to far, from the body. Neuroimaging studies using EEG (Sambo and Forster, 2008), TMS (Serino et al., 2011) and fMRI (Bremmer et al., 2001; Makin et al., 2007; Brozzoli et al., 2011, 2013; Gentile et al., 2011) demonstrated that multisensory representation of PPS occurs in both parietal and prefrontal areas (**Figure 1B**) where PPS neurons have been identified in the homologous macaque regions (for reviews, see Cléry et al., 2015b; di Pellegrino and Làdavas, 2015).

IFS, inferior frontal sulcus; IPS, intraparietal sulcus; LS, lateral sulcus; LuS, luneate sulcus; MTS, middle temporal sulcus; PoCS, postcentral sulcus; PrCS, precentral sulcus; PS, principal sulcus; SFS, superior frontal sulcus; STS, superior temporal sulcus; OTS, occipito-temporal sulcus.

There is no physical separation between PPS (near space) and the extrapersonal space (far space) in the real world, however, the brain does represent, at least as assessed behaviorally, a boundary between these two spaces. That is to say between what is close to our bodies, which can potentially impact, interact with or attack us, and what is further away, at a distance that we cannot act upon except by a full displacement of the body. Importantly, this boundary is not fixed and can vary within and across individuals (Maravita and Iriki, 2004; Farnè et al., 2005a,b; Cléry et al., 2015b; de Vignemont and Iannetti, 2015). Indeed, the limits between PPS and far space can be very different from one subject to the other, as well as the sharpness of the representational gradient between these two spaces (**Figure 2**). Likewise, within a given subject, these limits can vary as a function of the sensory, cognitive or social context, and appears to be reliably skewed under certain psychiatric conditions (see for review Cléry et al., 2015b). Nevertheless, even if PPS can be modified in certain conditions, under specific controlled conditions and in a homogeneous sample (e.g., no phobia), it is possible to estimate PPS boundaries at least at group level.

### Possible PPS Functions

Objects approaching us or a predator may generate a threat or harm us, and induce the need to initiate defensive behavior. As a result, looming stimuli often indicate an intrusion or a risk of intrusion in our PPS. This correlates with an enhanced

boundary. The limits between peripersonal space, closest to us, and far space, can vary within individuals as a function of sensory, cognitive or social context. These limits can also vary across individuals as a function of their own experiences and state (phobia, type of social interaction, etc.).

tactile processing as assessed both by d'-sensitivity measures and reaction time (RT) measures (Canzoneri et al., 2012; Cléry et al., 2015a; Kandula et al., 2015; De Paepe et al., 2016). As a result,

PPS has been proposed to define a safety boundary around the body (Graziano and Cooke, 2006; Sambo and Iannetti, 2013; Cléry et al., 2015a,b, 2017, 2018; de Vignemont and Iannetti, 2015). However, PPS is also, by definition, the space that is close to our body, or self. Accordingly, recent studies and reviews highlight the link between PPS and body self-consciousness. For example, Grivaz et al. (2017) propose a meta-analysis of human studies, comparing the cortical bases of PPS and body self-consciousness, with a specific focus on their overlap and their respective specificities.

In the following, we will first review the different methods developed to measure PPS (see Measuring Peripersonal Space), the role of impact prediction in the definition of PPS (see Looming Stimuli and Touch or Impact Prediction to the Body), evidence for modulations of PPS (see Modulations of Peripersonal Space), a discussion on the modular nature of PPS (see Different Representations of Body-Related PPS) and last, the functional link between PPS and body self-consciousness (see Peripersonal Space and Bodily Self-Consciousness).

### MEASURING PERIPERSONAL SPACE

Both in the human brain and in the monkey brain, the neurons that represent PPS are more strongly driven by dynamic stimuli approaching the body than by static stimuli. This is for example the case for the bimodal and trimodal neurons that can be recorded both from the ventral intraparietal area (Colby et al., 1993; Duhamel et al., 1997) and the premotor cortex (Graziano et al., 1994, 1997, 1999; Fogassi et al., 1996). The firing rate of some of these neurons increases as function of the velocity of the looming stimulus, suggesting that these neurons might be computing the time to impact on the body (Fogassi et al., 1996). This is also observed behaviorally, as the velocity of looming audio stimuli has been recently shown to dynamically resize PPS (Noel et al., 2018a). This observation is suggested to be an emergent property of visuo-tactile recurrent neuronal networks proposed to mimic PPS parietal and prefrontal functions (Noel et al., 2018a). Looming stimuli have also been used to probe PPS in more complex designs. For example, Finisguerra et al. (2015) use TMS (transcranial magnetic stimulation) in order to quantify changes in hand cortico-motor excitability as a function of the position of a looming stimulus with respect to the subject's hand.

Based on these findings, a method has been developed to estimate the boundary of PPS using dynamic stimuli. Indeed, these stimuli have a higher ecological relevance than static stimuli when it comes to studying PPS. Besides, this approach is more similar (though not identical) to the conditions used in monkey neurophysiology experiments, and thus makes it possible to directly compare the results across species (Canzoneri et al., 2012).

The idea behind this paradigm is to measure the behavioral responses in humans that are expected to reflect the properties and putative function of the RFs of PPS primate neurons. The paradigm relies on using a dynamic multisensory (audiotactile or visuo-tactile) integration task in order to assess the limits of PPS (defined as the inflection point where a notable increase in multisensory integration can be observed) and is considered as a functionally and ecologically more relevant paradigm than previous designs. Specifically, participants have to respond as fast as possible to tactile stimuli presented somewhere on their body, while task-irrelevant heteromodal cues (auditory or visual stimuli) looming toward or receding from the body part stimulated by the tactile stimulus are presented (Canzoneri et al., 2012, 2013a,b, 2016; Teneggi et al., 2013; Galli et al., 2015; Noel et al., 2015a,b). On each trial, tactile stimuli are presented at different timing with respect to the trajectory of the sound/visual dynamic stimuli. In other words, the tactile stimulus is delivered when the sound or visual dynamic stimulus is perceived at a variable distance from the body of the subject. PPS limits is inferred from the function associating the measured RTs to the tactile stimulus at the body part of interest (the hand, the face or the trunk), to the distance at which the visual or auditory dynamic stimulus was presented.

Reaction times to tactile stimuli progressively slow down as a function of the distance at which the sound/visual looming stimulus is presented; and inversely, RTs progressively speed up as a function of the distance at which the sound/visual receding stimulus is presented. The authors propose that this function describes the link between tactile processing and the location of auditory or visual stimuli in space and allows to estimate the critical distance at which an external stimulus starts to affect tactile processing. This distance, along a spatial continuum between far space and the external surface of the body, allows to approximate the boundary of PPS representation in humans (**Figure 2**). In a recent study, we use a visuo-tactile version of this paradigm to demonstrate that PPS is not only characterized by a speeding up of RTs but also by an anticipated enhancement of tactile processing as assessed by changes tactile sensory d' measures, in prediction of an impact to the body (Cléry et al., 2015a). We show that this enhanced tactile processing in anticipation of an impact to the body happens according to spatial and temporal coincidence laws very similar to those proposed to subserve multisensory integration processes (Stein and Meredith, 1993; Rowland and Stein, 2014).

This new paradigm was first developed and used in the context of a dynamic audio-tactile interaction task to investigate hand-related PPS thanks to tactile stimulations presented on the hand (Canzoneri et al., 2012, 2013a,b). This paradigm was also used to investigate the effect of social variables onto faceanchored PPS, using a dynamic audio-tactile interaction task with tactile stimulations delivered onto the face (Teneggi et al., 2013). Recently this paradigm was also adapted to studies investigating the full body illusion (Noel et al., 2015a,b; Serino et al., 2015b). More recently, this protocol was used to study and measures the spatial extend of human PPS in real virtual as well as in mixed realities environment. More complex version of this task are also under investigation, whereby three sensory modalities are used (visual, auditory and tactile) thus experimental approaching richer and more ecological sensory environments (Serino et al., 2018).

Overall, this paradigm opens new perspectives in the study PPS and how it is modulated by the context (top–down

information, bottom–up evidence, social cues etc.), experience (learning, priors etc.) and action.

### LOOMING STIMULI AND TOUCH OR IMPACT PREDICTION TO THE BODY

The ecological significance between static stimuli close to our body (e.g., a wall, a desk) and dynamic stimuli looming toward us (e.g., a mosquito, a ball) are different. Approaching stimuli are potentially more hazardous than other visual stimuli, even when they do not predict a direct impact to the body. A predator, a dominant conspecific, or a mere branch coming up at high speed are dangerous if one does not detect them fast enough to produce the appropriate escape motor repertoire. Such looming stimuli are known to trigger stereotyped defense responses (in monkeys: Schiff et al., 1962; in human infant: Ball and Tronick, 1971). Interestingly, looming stimuli which are explicitly threatening are perceived as having a shorter time-to-impact latency in comparison to objects moving at the same objective speed and which are not threatening (Vagnoni et al., 2012). This underestimation of approaching stimuli is also influenced by ones motor abilities, and is for example increased if subjects have their heads constrained by a chin rest compared to when standing freely (Vagnoni et al., 2017), the former condition possibly indicating, due to the constraint, an increased threat relative to the unconstrained condition. The neuronal underlying of this observation is to our knowledge, completely unexplored.

### Temporal Prediction

In a visuo-auditory context, looming visual stimuli have been shown to generate evident orienting behavior toward simultaneous and congruent auditory cues compared with receding stimuli, both in 5-month-old human infants (Walker-Andrews and Lennon, 1985) and in non-human primates (Maier et al., 2004). Looming structured sounds can specifically benefit visual orientation sensitivity (Romei et al., 2009; Leo et al., 2011). In a recent study (Cléry et al., 2015a), we show that subjects have an enhanced tactile sensitivity in the presence of looming visual stimuli as compared to receding visual stimuli, confirming the idea that looming stimuli are more relevant than receding stimuli to the body, and trigger enhanced and anticipated tactile processes. Indeed, while both size and depth cues most likely participate to the tactile sensitivity modulation on the face, this study indicates that the movement vector cue (away from or toward the subject) is the main cue affecting tactile detection. Indeed, slower looming stimuli lead to a delayed predicted time of impact on the face, and consequently to a delayed time at which tactile sensitivity is maximally improved (Cléry et al., 2015a). In other words, the trajectory and speed of the looming visual stimuli fully account for the temporal and dynamic predictive cues that are exploited by the brain to anticipate touch or impact to the body (Cléry et al., 2015a; Huang et al., 2018). Likewise, other auditory or visuo-tactile integration studies (Canzoneri et al., 2012; Kandula et al., 2015) have shown that RTs are shorter when a tactile stimulus is delivered at the impact time of the looming stimulus and suggest that looming stimuli predictively speed up tactile processing. Specifically, the speed of the looming stimulus seems to guide the nervous system in defining a high touch/impact probability window not unlike the multisensory temporal binding window described during the physiological and perceptual binding of two stimuli into the representation of a same and single external source and defining the degree of temporal tolerance of the brain in this binding process (De Paepe et al., 2016; Noel et al., 2016, 2018b; for review, see Wallace and Stevenson, 2014).

In this context, it is suggested that a visual stimulus looming onto the body and predicting an impact with a tactile stimulation onto the skin can be used to recalibrate PPS representation in an anticipated manner. A recent modeling study captures this idea whereby the training of a recurrent neural network results in a prediction of the anticipated tactile stimulation, the prediction error increasing with the distance of the visual stimulus from the skin, and the confidence of the prediction decreasing with distance (Straka and Hoffmann, 2017).

Overall, an enhanced processing of time to collision to the body can thus be observed and modeled within PPS. However, this might actually reflect a general enhancement in the processing of time to collision in general. Indeed, the prediction of collision between two objects placed within PPS appears to be extremely dependent onto temporal variations (e.g, differences in object velocities, Iachini et al., 2017). This possibly suggests an adaptive function of PPS to anticipate and prepare the appropriate overt behavior in response to external events happening within PPS, whether interacting with the body or not (Iachini et al., 2017).

### Spatial Prediction

Besides, we found that tactile d', a direct measure of sensitivity, are improved not only at the predicted time but also at the predicted location of impact of a approaching visual stimulus to the face (Cléry et al., 2015a), fully mirroring the expected subjective consequences of the visual stimulus onto the tactile modality. This observation is suggested to be an emergent property of visuo-tactile recurrent neuronal networks proposed to mimic PPS parietal and prefrontal functions (Noel et al., 2018a). Importantly, this enhancement is also observed for stimuli trajectories that do not predict a direct impact to the face but rather brush past it, suggesting that the prediction of intrusion of a visual stimulus into PPS triggers the same tactile enhancement mechanisms whether a direct touch/impact on the body is actually expected or "just" an intrusion in PPS.

### Possible Neural Mechanisms

In addition to a baseline multisensory enhancement, tactile sensitivity thus appears to be further improved by the predictive components of the heteromodal auditory or visual stimuli. By definition, this process involves cross-modal influences, and it was suggested that the cortical regions processing this multisensory touch/impact prediction mostly overlap with the corresponding multisensory integration convergence and integration functional network. While this has never been explicitly investigated in these terms, early observations are in full agreement with this hypothesis. The visual response observed in

parietal tactile neurons was first interpreted as an "anticipatory activation," predicting touch in the matching skin (Hyvärinen and Poranen, 1974). Second, some neurons in the ventral intraparietal area (VIP) integrate vestibular proprioceptive self-motions and visual motion cues to encode relative self-motion relative to the environment (Bremmer et al., 1997, 2000, 2002a,b; Duhamel et al., 1997). In the same lines, vestibular inputs are shown to dynamically influence the multisensory PPS boundary and spatial self-representations in humans (Pfeiffer et al., 2018). These neurons have been shown to be activated by both visual and tactile stimuli (Duhamel et al., 1997; Guipponi et al., 2013, 2015) and show non-linear sub-, super-, or additive multisensory integration operations (Avillac et al., 2004, 2007). Recently, an fMRI study in the non-human primate confirms that this area VIP is involved in impact prediction to the face in a visuo-tactile context (Cléry et al., 2015b, 2017). As a result, this area appears to process both the consequences of ones' own whole-body movements onto the environment as well as the consequences of movement of objects within the environment, relative to the body. Last, premotor area F4, an area highly connected with parietal area VIP, is also robustly activated, bilaterally by impact prediction (Cléry et al., 2015b, 2017). Most importantly, in both parietal area VIP and premotor area F4, these activations are systematically significantly larger when the approaching stimulus is spatially and temporally predictive of the tactile stimulus than when these two stimuli are presented at the same time, strongly suggesting that these two areas are indeed, at the neuronal level predictively processing temporal and spatial cues, possibly via non-linear integrative neuronal mechanisms (Cléry et al., 2015b, 2017).

As seen in Section "Peripersonal Space," areas VIP and F4 are proposed to play a key role in the definition of PPS. In a recent monkey fMRI study we assess the neural bases of near and far space coding during naturalistic 3D moving objects (Cléry et al., 2018). This study clearly confirms the involvement of both VIP and F4 for PPS encoding (Cléry et al., 2015b:Figures 1B,C, 3; Cléry et al., 2018: Figures 4, 8). This confirms the prior observations from single neuron studies in monkeys (Rizzolatti et al., 1981b; Colby et al., 1993; Graziano et al., 1997; Bremmer et al., 2002a,b, 2013). However, two important observations need to be highlighted at this point. First, our fMRI data show that within an area VIP anatomically defined as the fundal intraparietal sulcus region (IPS), and functionally identified as the cortical region activated by large field visual stimulation (Colby et al., 1993; Bremmer et al., 2002a,b; Guipponi et al., 2013), only a small portion is activated by visuo-tactile convergence (Guipponi et al., 2013: Figure 5; Cléry et al., 2015b: Figures 2B, 3), impact prediction to the face (Cléry et al., 2015b: Figure 3B; Cléry et al., 2017: Figure 3) and near space processing (Cléry et al., 2015b: Figures 1B, 3; Cléry et al., 2018: Figures 4, 8). Importantly, the very same voxels are activated by visuo-tactile convergence, prediction of touch/impact to the body and selective near space encoding, suggesting that these different functions are possibly implemented by unique neuronal computations (see Cléry et al., 2015b, for discussion).

This set of monkey fMRI studies also allows to identify the larger cortical network involved in touch/impact prediction to the body and near space processing, encompassing, in addition to subsectors of the classically defined VIP, a subsector of premotor area F4, corresponding to the polysensory zone Pz, as well as the fundus of superior temporal sulcus FST and early striate and extra-striate areas. This extremely strong overlap between the touch/impact prediction to the body network and the near space processing network provides strong support to the idea that functionally, PPS includes the skin as a frontier of self, or alternatively, that the frontier of self is defined not only by the skin but also by PPS (these two views being functionally speaking, equivalent).

In **Figure 1**, a very good agreement can be seen between the premotor and intraparietal human and monkey PPS regions of interest (ROIs), as identified from a meta-analysis of the literature. In contrast, the monkey homologue of the human specific TPJ PPS ROIs, are not described. In a recent study based on the analysis of functional connectivity patterns, Mars et al. (2013) propose that the monkey homologue of human TPJ actually lies within the superior temporal cortex, at a location often associated with the processing of faces and other social stimuli (Perrett et al., 1992; Tsao et al., 2003, 2008; Tsao and Livingstone, 2008; Rushworth et al., 2013; Popivanov et al., 2014; Premereur et al., 2016). Importantly, this same region is found to be activated in our impact prediction to the face study (Cléry et al., 2017), as well as by objects looming toward PPS (Cléry et al., 2018), or placed within PPS (Cléry et al., 2018). **Figure 3** captures this functional overlap. As a result, we propose to expand the functions of this monkey STS region beyond the perception of faces and bodies to the processing of PPS in relation with one's own body, homologous to one of the multiple functions of human TPJ.

### A Putative Defense PPS

A visual stimulus entering the PPS close to one's cheek enhances tactile processing on that cheek, more than a visual stimulus which predicts an impact to the other cheek (Cléry et al., 2015a). This suggests that intrusion into PPS predicts touch or impact to the close by body surface. Canzoneri et al. (2012) demonstrate that the presence of a looming sound predicting an impact on the hand or within a well-defined distance from the hand, i.e., within a hand-referenced PPS, accelerates tactile processing on this hand. In monkeys, the electrical microstimulation of the neurons of these two regions induces a behavioral defense and avoidance repertoire of the entire body movements, indicating that they are involved in the coding of a defense PPS (Graziano et al., 2002; Cooke and Graziano, 2004; Graziano and Cooke, 2006). The size of this defensive space increases as the velocity of a potentially dangerous stimulus approaching the face increases (Bisio et al., 2017). Likewise, the size of PPS also increases as the probability that the looming threat stimulus impacts and harms the face increases (Bufacchi, 2017). All this taken together suggests the existence of a dynamic security margin around the face and the body.

One aspect of somato-sensation is nociception. In two studies, De Paepe et al. (2014, 2015) used temporal order judgment tasks, to assess whether the perception of nociceptive stimuli and their localization was influenced by proximal

visual stimuli thus contributing to the construction of an integrated representation of PPS as has been described for touch. Participants were requested to judge which of two nociceptive stimuli was presented first, each stimulus being presented on one hand –the two hands being thus stimulated. Each dual nociceptive stimulation was preceded by visual cues presented either unilaterally or bilaterally, and either close to the subject's body, or far from it. The authors further requested the participants to either cross their hands over their body's midline or not. They found that the unilateral visual cue prioritized the processing of nociceptive stimuli delivered on the hand adjacent to the unilateral visual cue. This effect increased when the cue was displayed near to the participant's hand (De Paepe et al., 2014), irrespective of posture. This demonstrates that the visuo-nociceptive interactions occur in a predominantly hand-anchored frame of reference and not in a body-anchored frame of reference and predominantly in a hand-anchored PPS (De Paepe et al., 2015; Filbrich et al., 2017). In a third study (De Paepe et al., 2016), participants were required to answer as fast as possible to indicate on which side they felt the nociceptive stimulus on their hand while a visual stimulus with different temporal onset synchronies was either looming or withdrawing with respect to the left or right hand of the participants. RTs were fastest when the visual stimulus was close to the stimulated hand and was more pronounced for visual looming stimuli. Taken together, these three studies confirm an interaction between the coding of nociceptive information and a peripersonal frame of reference bringing additional support to the proposal that PPS may contribute to the definition of a safety margin representation around us and having as a goal to keep us safe from any potential physical danger.

A recent review (Van der Stoep et al., 2015) suggests that, depending on their distance to the body, different combinations of sensory information might be more or less relevant. For example, touch and vision interactions are expected to dominate in PPS, as they correlate with an interaction between the body and the environment (e.g., for grasping or defense). In contrast, auditory and visual information may be more relevant in extrapersonal space away from the subject's body as they provide information about far away objects, and contribute to spatial orienting, navigation and interaction with others (e.g., during conversation). As tactile stimuli can only be processed when applied to the body, audiotactile and visuotactile interactions (e.g., in the case of touch or impact to the body) by definition take place close to the body and PPS margin can thus be rationalized as the spatial alignment of different stimulus modalities with respect to the body. A more recent review from the same group (Van der Stoep et al., 2016) focuses on whether multisensory integration follows the same rules throughout the whole of 3-D space. Their meta-analysis highlights the fact that the region of space in which stimuli are displayed in, e.g., the distance to the body, modulates multisensory interactions, and that the space around us is separated into specific functional regions, defined by the body part they are mostly related to (e.g., the hand, the face or the trunk). Futures studies on PPS and notably on impact prediction onto the body need to take into account the several

spatial constraints that are expected to influence multisensory integration processing: the spatial and temporal dynamics of the stimuli, the distance from the different body parts, the incidence of looming trajectories with respect to the body, the effects of body posture, the ongoing or planned movement of the subject as well as the social, valence and sensory nature of the environment and its organization with respect to the subject.

### MODULATIONS OF PERIPERSONAL SPACE

Peripersonal space appears to have a singular function in our representation of space, associated, as described above, with an enhanced processing of sensory information as assessed behaviourally (RTs, sensory sensitivity) or functionally (single cell recordings, fMRI). In the last years, there has been a growing interest in the flexibility and plasticity of PPS (for review, see Cléry et al., 2015b; de Vignemont and Iannetti, 2015; Chris Dijkerman, 2017).

### Early Evidence for a Tool-Induced Reorganization of PPS

Several studies show that the use of a tool to reach objects in far space can extend the limits of PPS representation. In nonhuman primates, Iriki et al. (1996) demonstrated that, after training on the manipulation of a rake to access reward located at a distance beyond arm reach, hand-centered visual RFs of intraparietal neurons enlarged so as to encompass the rake. In humans, neuropsychological (Farnè and Làdavas, 2000; Maravita et al., 2001) and psychophysical (Holmes et al., 2004; Maravita and Iriki, 2004; Serino et al., 2007; Galli et al., 2015) studies showed that, after manipulating a tool, cross-modal interactions between visual or auditory stimuli presented in the far space and tactile stimuli at the hand increase. This is all the more pronounced at the location where the tool has been used. Taken together, these results bring support to the idea that the extent of PPS representation is dynamically reshaped by repeated experience and learning, allowing for an extension of the domain of action of the body beyond its structural limits (Maravita and Iriki, 2004; Gallese and Sinigaglia, 2010; Costantini et al., 2011). Early studies on this topic suggest that an active use of the tool is necessary for extending PPS representation. Persistence use, like in professional athletes (e.g., tennis players) or persons with disabilities (e.g., blind cane users), leads to a long-lasting incorporation of the tool into PPS even in the absence of the manipulation of the tool (Serino et al., 2007; Biggio et al., 2017). Last, tool-induced PPS plasticity is observed whether the tool is in physical interaction with the body (hammer, rack etc.) or not (mouse cursor, remote control of a sensory stimulus in far space etc., Goldenberg and Iriki, 2007; Bassolino et al., 2010; Serino et al., 2015a) indicating complex interactions between body schema and PPS for action. The immobilization of the right arm during 10h reduced PPS representation around this arm but without affecting the metric representation whereas the overuse of the left arm affected the metric representation but not PPS representation of this overused arm (Bassolino et al., 2015). This confirms the complex interactions between the body schema and PPS which are behaviourally dissociated.

### Sensory Synchrony as a Possible Trigger of Tool-Induced Reorganization of PPS

Serino et al. (2015a) propose the alternative hypothesis, that dynamic re-organization of PPS might from the integration of the experienced sensory feedback. Specifically, using a recurrent neural network model mimicking parietal multisensory neuronal organization, they show that the plasticity of PPS representation following tool-use arises neither from the function of the tool nor from the actions performed when using it, but is rather triggered by the experienced sensory feedback, i.e., the synchronous tactile stimulation of the hand when holding the tool and the heteromodal (auditory or visual) stimulation in the far space where the tool is being manipulated (for a review on tool-use, see Martel et al., 2016). In other words, temporal synchrony between (auditory or visual) sensory inputs in far space and tactile input arising from object manipulation by the hand in near space is suggested to have a major role in the functional definition of PPS from an action driven perspective. In a recent study (Cléry et al., 2018), we show that large cortical sectors are activates both by near and far space stimulations. We propose that these depth "non-specific" functional regions might support these dynamic associative mechanisms between far space and near space sensory stimulations.

### Non-motor Driven-Reorganization of PPS

Several studies show that tool use induce a remapping of PPS. This defines PPS from the point of view of a "goal-directed action" perspective in which we want to reach for something and grasp it (for review, see de Vignemont and Iannetti, 2015). However, recent evidence show that other cognitive factors than actions can remap this space such as fear, anxiety, social engagement and contribute to a "protective and defensive" view of PPS. These are reviewed below.

### Bottom–Up Driven Reorganization of PPS

It is now well established that certain categories of bottom-up signals drive an instantaneous resizing of PPS. This is the case of threatening stimuli. For example, tactile processing is facilitated when physically threatening pictures (for instance a snake or a knife) are presented in PPS, generating to quicker responses than when such pictures are displayed in far space (Poliakoff et al., 2007; Van Damme et al., 2009). Likewise, sounds that elicit a negative emotion (e.g., screaming woman) or sounds that have a negative ecological connotation (e.g., barking dog), induce faster reactions times when they appear close to the subject as compared to neutral or positive valence sounds (Taffou and Viaud-Delmon, 2014; Ferri et al., 2015). In addition, the distance from a visual stimulus to the body has a stronger influence on RTs to a tactile stimulus on the skin if it is perceived as threatening. This indicates that not only PPS is resized by a threatening object, but the information relative to its distance from the body is enhanced relative to that of a non-threatening one (de Haan et al., 2016).

Importantly, whatever the estimated level of threat represented by a visual object, the observed expansion of PPS is reduced when the threatening part of dangerous objects is oriented toward participants, as compared to when oriented away (Coello et al., 2012). This suggests that the interpretation of the higher order context in reference with the body is crucial in affecting the boundary of PPS. In other words, the resizing of PPS is due both to bottom–up and top–down factors. All taken together, these different studies show that the emotional aspects and characteristics of the threating relation to the body influence the defensive PPS and the safety body margin. Quite surprisingly, the neural bases of these observations and the functional networks they involve are unknown to date.

### Top–Down Driven Reorganization of PPS: Social Factors

Top–down factors are also shown to resize PPS. For example, the presence of an observer and the nature of the interaction with her/him reshape PPS representation (Teneggi et al., 2013). Indeed, PPS boundaries shrink when a neutral observer is standing in far space. This is not observed when the observer is replaced by a mannequin. This thus suggests that one's PPS resizes in the presence of conspecifics. Importantly, this resizing depends on the nature of the social interaction with these observers. For example, PPS boundaries between self and an observer merge (i.e., expand) after an economic game with this person, but only if this person has behaved cooperatively (Teneggi et al., 2013). PPS is thus shaped by our valuation of other people's behavior and is modulated by social interactions. A recent study (Pellencin et al., 2017) shows that not only the nature of social interactions (as constructed on the basis of past experience and information) but also the first impression of the person facing us, i.e., our social perception about this person (on the bases of immediate "bottom–up" perceptual cues: appearance, size, facial features, age, body posture etc.) affects our own multisensory PPS representation. This thus reflects a modulation of low-level 3D visual information processing by high-level cognitive variables and both automatic and constructed social cues.

The extension and shrinkage of our PPS representation may not be the only change triggered by the presence of others. Indeed, several studies suggest that the observation of sensory and motor experiences by others, whether humans or animals are remapped onto our own bodily representations, thanks to a so-called "mirror system" that has been described both in the monkey and human brain (Rizzolatti et al., 2001; Rizzolatti and Craighero, 2004; Sinigaglia and Rizzolatti, 2011; Rizzolatti and Fogassi, 2014; Rizzolatti and Sinigaglia, 2016; Rizzolatti and Rozzi, 2018). This system is activated both when we are touched onto our own body, when we view another person being touched, as well as when events occur in the space near the other's body (Blakemore et al., 2005; Serino et al., 2008; Caggiano et al., 2009; Keysers and Gazzola, 2009; Cardini et al., 2010). Ishida et al. (2009), using single cell recordings in monkeys, show that bimodal parietal neurons which are activated by sensory events taking place in the space close to the monkey's own hand also respond to events taking place in the space close to another monkey's hand. Similar functional activations are observed in premotor cortex in humans (Brozzoli et al., 2013; Holt et al., 2014).

A review by Ishida et al. (2015) based on monkey neurophysiology as well as human fMRI studies, reports shared self-other body representation coding in multiple brain areas including visuo-tactile neurons in parietal cortex (Ishida et al., 2009), secondary somatosensory cortex (Keysers et al., 2004, 2010; Blakemore et al., 2005; Ebisch et al., 2008; Keysers and Gazzola, 2009) and in insular cortex (Fitzgibbon et al., 2010, 2012; Lamm and Singer, 2010; Krahé et al., 2013) associated with affective touch and interoception. Importantly, Maister et al. (2015) show that synchronous tactile stimulation on one's own face and visual stimulation close to another person's face results in a functional interaction between both PPSs, such that events taking place near to the other person's face acquired improved the salience of stimuli occurring in one's own PPS. Nicely complementing these observations, Teramoto (2018) shows that, detection of tactile stimulation onto one's own hand is faster when a visual stimulus is approaching the hand of another person rather than when placed far away from this same person. All this brings support to the idea of shared inter-personal PPS representations. The underlying neuronal and network computations of this behavioral observation remain to be explored.

The discussion mostly addresses the effect of the presence of a conspecific onto PPS. However, more complex social factors might be at play, such as the location of others with respect to ourselves, as well as their orientation or inferred displacement coding trajectory. This would predict that the neural networks involved in the coding of self with respect to the environment, also code the spatial contingencies between oneself and others, possibly along a coding schema resembling what has been described in bat and rodent hippocampal neurons (Danjo et al., 2018; Omer et al., 2018).

### Interactions Between an Action-Based Peripersonal Space and Interpersonal Space

Recent studies were interested in investigating the link between PPS for action, defined as the space around us and onto which we can act, and interpersonal space (InterPS), defined as the space in which we maintain a distance around our bodies and in which any intrusion by others may cause discomfort. As seen above, this space can be modified by emotional and socially relevant interactions, including complex social information such as perceived morality or cooperativeness of another person, age and gender (Iachini et al., 2015, 2016). PPS for acting and interpersonal space share a common motor nature and are sensitive, at different degrees, to social modulation. Hence the proposal that social processing might be embodied and grounded in the "body acting in space" (Iachini et al., 2014). The evidence in this respect is mitigated. Indeed, in the hands of Patané et al. (2016) tool-use remaps the action-related PPS, estimated by a reaching-distance toward another person, but does not alter the social-related interpersonal space estimated by a comfort-distance task. Besides, after a positive social interaction with another individual, the estimated intrapersonal space is reduced whereas, in the same time, the estimated PPS is extended,

suggesting that these two space representations have no full functional overlap between them (Patané et al., 2017). In the same lines, the introduction of invisible body illusions results in dissociable changes in InterPS and PPS sizes (D'Angelo et al., 2017). In contrast, in the hands of Quesque et al. (2016), using a different paradigm in which participants observed a point-light walker approaching them from different directions and passing near them at different distances from their right or left shoulder, comfortable interpersonal distance, is found to be linked to the representation of PPS. This indicates that enlarging PPS through tool manipulation effect that comfortable interpersonal distance with respect to another person also enlarges, corroborating the hypothesis that interpersonal-comfort space and peripersonalreaching space share a common motor nature (Iachini et al., 2014, 2016; Coello and Fischer, 2015). Further investigations will need to be performed in order to reconcile these two views.

#### Interaction Between PPS and Personality Traits

Peripersonal space size can be related to some key personality traits. The study of defensive reflex responses is instrumental to address this question. Indeed, these defensive reflex responses can be precisely adjusted by the location of the stimulus within PPS. An important aspect of this modulation in that it is specific to the body part for which the reflex response gives protection (Sambo et al., 2012a,b). For example, subcortical defensive responses like hand-blink reflex (HBR) are improved when a threat approaching the face by one's own stimulated hand, by another person's hand and when the hand of the participant enters in PPS of another person. Importantly, the interaction between these defensive reflexes vary from one individual to another, as a function of several personality traits. For example, the enhancement of the HBR is more important in participants with a strong empathic tendency when observing another person from a third person perspective, suggesting that interpersonal interactions modulate perception of threat and defensive responses and more so in empathic participants (Fossataro et al., 2016). Along the same lines, the size of an individual's PPS is associated with trait anxiety, with an enlarged PPS in more anxious individuals (Sambo and Iannetti, 2013; for review, see de Vignemont and Iannetti, 2015). The passive listening to a conversation also affects the size of PPS/InterPS of a third person not involved in the conversation. Indeed, his/her PPS expanded if the conversation had an aggressive content compare to a neutral content, thus resulting in an increase in the peripersonal safety boundary in the face of a potentially aggressive confrontation (Vagnoni et al., 2018). Likewise, PPS size in claustrophobic subjects is different from that of non-claustrophobic subjects. Claustrophobia is a situational phobia characterized by intense anxiety in relation to enclosed spaces and physically restrictive situations (American Psychiatric Association, 2000). Lourenco et al. (2011) investigated whether the size of near space relates to individual differences in claustrophobic fear, as estimated from the reported anxiety in enclosed spaces and physically restrictive situations and show that claustrophobic fear is associated with an enlarged size of the close space directly around us. Vagnoni et al. (2012) show the same results and expand them by demonstrating that emotions, in addition to altering the perception of space as a static entity, also affects the perception of dynamically moving objects, such as those on a collision course with the observer. Importantly, claustrophobia is not only associated with an increased PPS relative to non–claustrophobic subjects, but it is also characterized by a less flexible PPS. Indeed, when using a stick during a line bisection task, whereas individuals low in claustrophobic fear demonstrate the expected expansion of PPS, individuals high in claustrophobic fear show less expansion following tool-use (Hunley et al., 2017).

In summary, PPS is not a fixed space but a dynamic space which is continuously modulated by our environment (social, emotional, functional). The dynamic adjustment of this "boundary" of self may be related to an optimization of the behavioral outcome and repertoire (protective, pro-active) to the outside environment, based on online estimation of bottom-up information (visual, tactile, auditory, proprioceptive. . .) as well as of top-down cognitive information (context, emotion, social interactions. . .) (Cléry et al., 2015b; de Vignemont and Iannetti, 2015). PPS can thus be viewed as the output computation of the integration of multiple sources of information dynamically linking the body with its environment. This predicts that the properties and specificities of PPS will depend on the body part it is referring to, including in the non-motor domains.

### DIFFERENT REPRESENTATIONS OF BODY-RELATED PPS

Most of studies on PPS targeted the hand and to a lesser extent on the face. We have seen that this "boundary" of PPS representation is modulated both by action (for example after tool-use) and emotional/social context (fear, anxiety, cooperation). Besides, these modulations can vary within individuals as a function of the context. A strong inter-individuals variation is also observed. The question we are addressing here is whether the representation of PPS follows the same constraints and rules for all body parts or not?

Measuring the influence of looming stimuli presented at different distances from a given body part on the RTs to a tactile stimulus (Canzoneri et al., 2012, 2013a,b; Teneggi et al., 2013; Galli et al., 2015; Noel et al., 2015a,b), Serino et al. (2015b) characterize PPS from a body-referenced perspective. In a first experiment, they test the effect of looming and receding auditory stimuli in relation to the trunk on tactile detection on this body part. As previously described for the hand and the face, they show that looming sounds modulate tactile processing depending on the distance of the sound from the body and that this effect is specific for looming sounds and is not observed for receding sounds. The majority of experiments on PPS are done only in the front space of the subject. Therefore, in a second experiment, the authors also introduce looming and receding auditory stimuli from the front or back of the peritrunk PPS. They confirm that only sounds looming toward the trunk are mapped into the representation of the trunk-PPS. No notable difference can be observed between a frontal trunk-PPS

and a hind trunk-PPS. In a third experiment, the authors test the effect of looming and receding auditory stimuli from the hand-PPS. They show that sounds modulate tactile processing according to the distance of the sound from the hand. This effect is observed not only for the looming sounds but also for the receding sounds, though the speeding of tactile detection on the hand is more pronounced for looming stimuli than for receding stimuli. Importantly, the distance at which the sounds started to have a significant effect onto tactile processing is shorter for the hand-PPS than for trunk-PPS, indicating that trunk-PPS is larger than the hand-PPS. The authors then confront the representations of the hand-PPS and trunk-PPS and how they interact. For this, while using looming and receding sounds from the stimulated body part, they apply tactile stimulations either to the trunk or to the hand placed close to the trunk (Experiment 4) or to the hand placed far from the trunk (experiment 5). The authors show that when the hand is close to the trunk, the trunk-PPS and its properties dominate onto the hand-PPS, while this is not the case when the hand is far away from the trunk. In summary, two different PPS representations can be distinguished, one anchored to the hand and that is sensitive to both looming and receding stimuli at close distance from the hand and another one, anchored to the trunk and sensitive only for looming stimuli and encompassing more PPS (in terms of distance to the body) than hand-PPS. Importantly, these two representations are not independent. To further investigate the nature of the interaction between sub-PPSs, the authors further test the effect of looming and receding stimuli (auditory or visual) from the trunk or the face PPS while tactile stimuli are presented either to the face or the trunk. Tactile processing on the trunk gets enhanced by looming stimuli both toward the face or the trunk, indicating that the trunk-PPS encompasses the face-PPS. The reverse is, however, not true, as tactile processing on the face is not enhanced by stimuli looming toward the trunk. Recently, the authors show that the velocity of looming auditory stimuli not only shape the peri-hand space, but also modulate the peri-face and the peri-trunk spaces (Noel et al., 2018a). They propose a neural network involving reciprocal connections between unisensory areas and higher-order multisensory neurons, with a neural adaptation to persistent stimulation, to account for these several behavioral observations characterizing PPS and its sub-PPS components (for details, see Serino et al., 2015a; Noel et al., 2018a).

To summarize this exhaustive study, Serino et al. (2015a) show that the size of PPS representation varies as a function of the stimulated body part, being gradually larger for the hand, the face and maximal for the trunk (**Figure 4A**). Tactile processing onto these specific body segments is modulated by looming stimuli, in a space-dependant manner. Most importantly, while the size of PPS representation around the trunk is relatively constant, PPS representation around the hand or the face vary according to their position relative to the rest of the body and relative to the trajectory of the stimulus relative to the body (**Figure 4B**). These observations are confirmed by more recent studies (Aggius-Vella et al., 2017) and also generalize to lower body segments (Stone et al., 2017). Indeed, Stone et al. observed that participants have speeded RTs to a tactile stimulus applied to the feet when a visual stimulus approaching the legs. In addition, they showed that, similar to what is observed for the hand, the leg is, in this condition, highly distorted (i.e., perceived to be wider or shorter than its actual physical dimension, Stone et al., 2018). These results are in agreement with the function of a PPS as a multisensorymotor interface for body-object interaction (Brozzoli et al., 2012b).

This first extensive mapping of humans PPS representation opens new perspectives in PPS research. For example, how are these body-part specific PPS representations incorporated in a "goal-directed action" or a "protective/defensive" view of PPS?

### PERIPERSONAL SPACE AND BODILY SELF-CONSCIOUSNESS

The trunk-PPS representation integrates both body-related signals (proprioceptive, tactile) and information related to stimuli from the outside world (visual and auditory) that can potentially interact with the body, in a global, egocentric frame of reference. This representation may thus form a basic neural representation that is relevant for the definition of self, self-consciousness and self-consciousness in relation to the outside world (Tsakiris et al., 2007; Blanke and Metzinger, 2009; Tsakiris, 2010; Blanke, 2012; Blanke et al., 2015; Serino et al., 2015b). In the following, we will shortly review the growing evidence providing a possible link between PPS and self-consciousness.

Bodily self-consciousness (BSC), that is, the feeling that the physical body and its parts belongs to us (i.e., our own body), is proposed to be one of the main characteristics of subjective experience, i.e., binding whatever external or internal experience to self (Gallagher, 2000; Blanke and Metzinger, 2009). In the last years, multisensory bodily illusion paradigms have been used to investigate BSC in the laboratory, demonstrating, for example, the behavioral mechanisms underlying the perception of ownership of the hand using the rubber hand illusion (Botvinick and Cohen, 1998), or of the face using the enfacement illusion (Tsakiris, 2008; Sforza et al., 2010), or of the entire body using the full-body illusion, the out-of-body illusion or the body-swap illusion (Ehrsson et al., 2007; Lenggenhager et al., 2007; Petkova and Ehrsson, 2008). These illusions are based on the application of synchronous stimulations binding the body (or body part) of the participants, stimulated by touch, to a virtual body (or fake body part), stimulated visually. This type of experimental paradigms results in an illusory feeling of ownership toward the virtual body or body parts. These studies, have resulted in a general agreement that ownership over hands, face, and body in general, depends on the integration of multiple bodily signals in the brain, including tactile, proprioceptive, visual and auditory signals (Ehrsson et al., 2004; Makin et al., 2008; Tsakiris, 2010; Blanke, 2012; Ehrsson, 2012; Serino et al., 2013; Blanke et al., 2015). As a result, there seems to be a direct relationship between the neural mechanism underlying multisensory PPS processing and BSC. However, to date, these two processes and their underlying neuronal mechanisms were

investigated separately. In a recent study, Grivaz et al. (2017), conduct an extensive meta-analysis of functional neuroimaging studies to find the key neural structures for PPS, for BSC and identify their possible functional overlaps in humans. The authors thus performed a systematic quantitative coordinatebased meta-analysis on human functional neuroimaging studies (Turkeltaub et al., 2002; Eickhoff et al., 2009, 2012). They selected 35 PET or fMRI studies: 18 studies assessing brain regions activated by the encoding of unisensory and multisensory stimuli within PPS (whether the hand, the face or the trunk PPS); 17 studies assessing brain regions activated by the BSC of the body or a part of the body. They identified a bilateral PPS network composed by superior parietal, temporo-parietal and ventral premotor regions. As discussed above, these regions play a key role in sensory-motor processes, mediating interactions between the subject and his/her direct environment, integrating sensory information and driving potential motor responses (Graziano and Cooke, 2006; Làdavas and Serino, 2008; Cléry et al., 2015b; Grivaz et al., 2017). On the other hand, the BSC network includes the posterior parietal cortex (IPS bilaterally), the superior parietal lobule (SPL), the right ventral premotor cortex, and the left anterior insula. These regions are involved in multisensory integration, attention and awareness. In particular, the insula plays a key role in the integration of exteroceptive body-related cues and interoceptive signals that are proposed to be crucial for subjective experience (Craig, 2009; Damasio and Meyer, 2009; Tsakiris, 2010; Seth, 2013; Park and Tallon-Baudry, 2014; Seth and Friston, 2016). Although BSC and PPS representations are not associated to the exact same functions, they do activate common fronto-parietal regions. Indeed, the conjunction analysis performed by Grivaz et al. (2017) shows that PPS and BSC tasks anatomically overlap in only two clusters located in the left parietal cortex (dorsally at the intersection between the SPL, the IPS and area 2, ventrally between area 2 and IPS). The activations of this dorsal SPL/IPS supports the hypothesis that multisensory integration of bodily cues contribute to the construction of both PPS and BSC (Brozzoli et al., 2012a; Gentile et al., 2013; Grivaz et al., 2017). A recent study by Salomon et al. (2017) shows that the integration of multisensory bodily inputs for PPS construction do not necessarily require conscious awareness while BSC, is by definition, a conscious process. This might correspond to a major hallmark differentiating these two processes.

Thus, overall, PPS and BSC are subserved by only partially overlapping functional networks supporting the idea that they correspond to two distinct functions, whereby PPS possibly implements a multisensory-motor interface for body-objects interaction and BSC is related with bodily awareness and selfconsciousness. Importantly, in spite of the fact that they are not activated in PPS studies, the premotor and insular clusters implicated in BSC are systematically co-activated with the parietal clusters activated by PPS processing during numerous cognitive tasks suggesting that these regions are functionally interconnected.

### CONCLUSION

PPS representation is a complex psychological and functional construct that can be subdivided in multiple entities

referenced to different body parts and whose exact configuration depend on multiple factors. This complex PPS representation continuously changes depending on the incoming bottom–up sensory information, motor experience e.g., during tool use, or top–down factors, including context, social interactions, personality or psychiatric traits (**Figure 4**). PPS representation is subserved by a well-identified parieto-temporo-frontal network that has some degree of overlap with the body self-consciousness network and one may predict that impairments in PPS representation or self-consciousness might have consequences on the other process. This opens new research directions for the future years.

### REFERENCES


Bender, M. (1952). Disorders of Perception. Springfield, IL: Charles C. Thomas.


### AUTHOR CONTRIBUTIONS

JC and SBH outlined the review, wrote the manuscript, and designed the figures.

### FUNDING

JC was funded by the Fondation pour la Recherche Médicale and by the Fondation Berthe Fouassier. SBH was funded by the French Agence Nationale de la Recherche (Grant #ANR-05-JCJC-0230-01).




effects of gender and age. J. Environ. Psychol. 45, 154–164. doi: 10.1016/j.jenvp. 2016.01.004


space. Neuropsychologia 70, 455–461. doi: 10.1016/j.neuropsychologia.2014. 10.027


self-consciousness. Consci. Cogn. 22, 1239–1252. doi: 10.1016/j.concog.2013. 08.013


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2018 Cléry and Ben Hamed. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Transfer of Spatial Contact Information Among Limbs and the Notion of Peripersonal Space in Insects

#### Volker Dürr 1,2 \* and Malte Schilling<sup>2</sup>

<sup>1</sup> Biological Cybernetics, Faculty of Biology, Bielefeld University, Bielefeld, Germany, <sup>2</sup> Cluster of Excellence Cognitive Interactive Technology (CITEC), Bielefeld University, Bielefeld, Germany

Internal representation of far-range space in insects is well established, as it is necessary for navigation behavior. Although it is likely that insects also have an internal representation of near-range space, the behavioral evidence for the latter is much less evident. Here, we estimate the size and shape of the spatial equivalent of a near-range representation that is constituted by somatosensory sampling events. To do so, we use a large set of experimental whole-body motion capture data on unrestrained walking, climbing and searching behavior in stick insects of the species Carausius morosus to delineate 'action volumes' and 'contact volumes' for both antennae and all six legs. As these volumes are derived from recorded sampling events, they comprise a volume equivalent to a representation of coinciding somatosensory and motor activity. Accordingly, we define this volume as the peripersonal space of an insect. It is of immediate behavioral relevance, because it comprises all potential external object locations within the action range of the body. In a next step, we introduce the notion of an affordance space as that part of peripersonal space within which contact-induced spatial estimates lie within the action ranges of more than one limb. Because the action volumes of limbs overlap in this affordance space, spatial information from one limb can be used to control the movement of another limb. Thus, it gives rise to an affordance as known for contact-induced reaching movements and spatial coordination of footfall patterns in stick insects. Finally, we probe the computational properties of the experimentally derived affordance space for pairs of neighboring legs. This is done by use of artificial neural networks that map the posture of one leg into a target posture of another leg with identical foot position.

Keywords: affordance, spatial coordination, limb movement, touch, peripersonal space, stick insect, whole-body kinematics, artificial neural network

### INTRODUCTION

Like humans, animals have internal representations of space (Jeffery, 2003). In humans, internal representations of space have been categorized in conjunction with distinct spatial volumes, which correspond to different sensory cues about the ambient space, often with correspondingly distinct neuronal substrates (for review see Previc, 1998; Holmes and Spence, 2004). Such representations

Edited by: Alex Pitti, Université de Cergy-Pontoise, France

#### Reviewed by:

Andrey Olypher, Georgia Gwinnett College, United States Guido Schillaci, Humboldt-Universität zu Berlin, Germany

#### \*Correspondence:

Volker Dürr volker.duerr@uni-bielefeld.de

Received: 07 September 2018 Accepted: 03 December 2018 Published: 18 December 2018

#### Citation:

Dürr V and Schilling M (2018) Transfer of Spatial Contact Information Among Limbs and the Notion of Peripersonal Space in Insects. Front. Comput. Neurosci. 12:101. doi: 10.3389/fncom.2018.00101 directly sub-serve behavior and play a functional role as internal models in control of goal-directed movements in humans (Kawato, 1999) and in robot motor control (Schillaci et al., 2016). In particular, peripersonal space is defined as a near-range area on which humans can directly act, i.e., which is "within reach." While there is considerable debate about how sharp the boundary of human peripersonal space is (Bufacchi and Iannetti, 2018), there is agreement on that it differs functionally from the space further away and is connected to specific neuronal substrates in parietal and premotor areas (e.g., Cléry and Hamed, 2018).

Whereas, in non-primate mammals and, potentially, other vertebrate groups such as birds, the existence of homologous neuronal substrates suggest the existence of similar, multiple internal representations of space as in humans, the situation is much less clear in invertebrates. One reason for this may be the conceptual problem that the distinction of internal representations of space must be linked to behavioral performance, for example as distinct skills or differential use of spatial cues related to different spatial volumes. In insects, at least two kinds of spatially coordinated behavior can be discerned that, most likely, are linked to distinct internal models: The first of these concerns the spatially coordinated movement of limbs and body parts, for example during locomotion on or manipulation of the near-range environment. A corresponding internal representation of near-range space is required whenever spatial information has to be shared by multiple body parts. Potential neural substrates of internal near-range representations are topological afferent projections such as those described for the cricket cercal system (Jacobs et al., 2008) or for mechanoreceptor afferents of locust legs (e.g., Mücke and Lakes-Harlan, 1995; Newland et al., 2000). A recent systematic inventory of somatosensory projections in fruit flies suggests parallels to the somatosensory system of mammals (Tsubouchi et al., 2017). The second type of spatially coordinated behavior concerns course control and navigation in far-range space, i.e., space beyond the immediate action range of the limbs and body parts. In insects, the spatial representation of farrange cues has been studied intensely in the context of visually guided locomotion. An example is the self-motion dependent modulation of visual interneurons (Chiappe et al., 2010) that gives rise to a representation of walking direction in the optic lobes of walking fruit flies (Fujiwara et al., 2017). Also, the central complex is well known to be involved in behaviors relying on estimates of distance and direction. Prominent examples include the encoding of celestial direction cues in locusts (Heinze and Homberg, 2007) and of heading direction in walking fruit flies (Green et al., 2017; Turner-Evans et al., 2017).

Thus, with regard to behavioral relevance of spatial sensory cues, an obvious boundary is defined by the volume that is "within reach" of any body part, the limbs in particular. This is plausible because sensory modalities such as touch or taste depend on contact cues on the body surface and therefore cannot be experienced beyond the spatial range spanned by all possible movements of the body trunk, and limbs. In contrast, vision, audition and smell transduce the energy from photons, sound pressure waves or volatile chemicals, most of which typically originate from locations beyond the own body. They are "beyond reach." The present study combines behavioral and computational considerations about the spatial volume "within reach" in walking and climbing insects. We will argue that this is in many ways equivalent to what is called peripersonal space in humans. The spatial volume "within reach" of the human body is perceived in a way that relates our ability to act and interact within that spatial volume. In order to capture this, internal models must be grounded in sensorimotor representations that relate body posture and movement to the corresponding part of space. At their core, internal models reflect functional, modular organization of the body (Davidson and Wolpert, 2004; Cothros et al., 2006) with redundancy. As an example, Patané et al. (2017) showed a dissociation between peripersonal and interpersonal space which they found to be largely overlapping, though clearly dissociable: the peripersonal space being delimited as the space reachable with a tool. Other hallmarks of human internal models are their flexibility, e.g., in case of tool use (Cardinali et al., 2009) and their multimodal organization, e.g., when estimating hand position from somatosensory, proprioceptive, visual and even auditory information (Makin et al., 2008). Despite its multimodal nature, most experimental work on human peripersonal space has focused on vision, often in relation to eye-hand coordination. However, since peripersonal space occurs in congenitally blind humans (Ricciardi et al., 2017), it must develop independently of vision. Ricciardi et al. suggested that, therefore, internal models in humans directly relate to the configurations of limbs relative to each other, thus forming an internal body model.

Whether or not insects may have an internal body model with similar properties to those in humans is unknown. It is clear, however, that insects readily climb about in spatially cluttered environment, thus demonstrating their ability of flexible and reliable spatial coordination of a multi-limbed body with many degrees of freedom. An important component of this ability is the transfer of spatial information from one limb to another. Essentially, this transfer turns the spatial knowledge acquired by one limb into an affordance for another limb. For example, the physical contact of one limb with an obstacle may be used to guide the movement of another limb, in order to exploit prior knowledge about foothold/grip locations and to achieve contact at a nearby location. Our use of the term affordance follows the definition by J. J. Gibson, as a behavioral option of an animal that is signaled by a combination of sensory features (Gibson, 1977, p. 79: "an affordance [. . . ] is a combination of physical properties of the environment that is uniquely suited to a given animal – [. . . e.g. its] locomotor system."). Behavioral evidence suggests that spatial coordination of limbs in insects ranges from pre-programmed, open-loop behaviors, to closed-loop control of limb posture, and to complex coordinate transfer among neighboring limbs. For example, grooming movements are often considered pre-programmed rhythmical limb movements, as in eye-cleaning behavior of the cricket (Honegger et al., 1979), or in grooming of various body locations in locusts (Berkowitz and Laurent, 1996) and fruit flies (Seeds et al., 2014). At least in the case of locusts, so-called grooming movements of the forewing have been shown to form a continuum of movements (Dürr and Matheson, 2003), consistent with the idea of a continuous encoding of the wing surface location by an array of mechanoreceptors (Page and Matheson, 2004). Although the neuronal substrate underlying these aimed limb movements are largely unknown until today, within-trial adjustment of limb posture suggests that they are subject to feedback control (see **Figure 6** in Matheson, 1998) and plasticity of proprioceptive encoding of limb posture proves that the corresponding neural representation is adaptive (Page and Matheson, 2009).

Regarding coordinate transformation among limbs, several studies have demonstrated this to occur in stick insects, including standing (Cruse, 1979), walking (Dean and Wendler, 1983) and climbing animals (Theunissen et al., 2014). Targeting behavior of legs has been transferred into models of motor control. These demonstrate qualitatively how such mappings can be realized using a local transformation (Dean et al., 1999) or, in the case of more complex walking behavior, by applying an internal body model (Schilling and Cruse, 2012). In stick insects, the ipsilateral transfer of postural cues not only works between pairs of walking legs, but also between the antennae and front legs (Schütz and Dürr, 2011). In the latter case, antennal contact cues can elicit fast re-targeting of on-going swing movements, effectively turning a swing movement into an aimed reach-to-grasp movement of a front leg (for review, see Dürr et al., 2018). Visual estimates of distance "within reach" have been shown to occur in gap crossing behavior in fruit flies (Pick and Strauss, 2005), suggesting that these insects also have a reliable estimate of their own body size and/or action range (Strauss et al., 2011; Krause, 2015). Visually mediated coordinate transformations allow for targeted front leg movements in locusts (Niven et al., 2010) and horsehead grasshoppers (Niven et al., 2012). In this kind of behavior, locusts combine monocular visual inputs with mechanosensory inputs from their antennae before the onset of a step, i.e., during motion planning. Similar to spatially targeted grooming movements as mentioned above, visually induced reaching in locusts requires proprioceptive sensory information from the femoral chordotonal organ. Finally, a very fast, ballistic, visually induced type of leg movement is the front leg strike of praying mantises (Maldonado et al., 1967; Corrette, 1990) and mantispids (Kral et al., 2000) that strike to catch prey.

Given this body of evidence on spatially targeted limb movements, their plasticity and multimodal control, we claim that the insect body is surrounded by an ambient volume that is functionally equivalent to peripersonal space in humans. With particular reference to the coordinate transfer among limbs in stick insects, we suggest that the peripersonal space in insects may be defined by the shared use of spatial information among two or more body parts. Accordingly, the objectives of this study are (i) to determine the size, shape and locations of action volumes from whole-body motion capture data on unrestrained climbing stick insects; (ii) to investigate the relative size of contact volumes, i.e., the regions where contacts are particularly likely to occur during natural locomotion; and (iii) to determine the size and shape of affordance volumes, i.e., the overlap of contact volumes of pairs of limbs. In our case, a contact event at one limb, together with the corresponding proprioceptive information about the posture of this limb, generates the behavioral option for another limb to reach for the contact location. The underlying coordinate transformation is a basic functional property of motor control systems in limbed animals in general. Therefore, our final objective is to (iv) understand the computational complexity of such transformations in an insect. Using artificial neural network models of different complexities we assess the performance of the reciprocal spatial mappings among pairs of legs that share an affordance volume. By. doing so, we provide a basic notion of an internal model for near-range space in insects. This may serve as a computational ground plan for spatial coordination in other limbed animals.

## MATERIALS AND METHODS

### Experimental Data Set

All experimental data used in this study were acquired in behavioral experiments on unrestrained walking and climbing, adult, female stick insects of the species Carausius morosus (de Sinéty, 1901). Animals were bred at the animal facility of the Biological Cybernetics Department of Bielefeld University, where they were kept in a 12:12 h light:dark cycle and room temperature around 24◦C. All data used for the calculation of spatial volumes were acquired with a marker-based motion capture system (Vicon MX10 equipped with eight T10 cameras, **Figure 1**) as described by Theunissen and Dürr (2013). Temporal resolution was 200 frames per second and spatial precision of the 3D marker position measurements was approximately 0.1 mm. Three different types of setups were used to record a variety of walking, climbing and searching movements of the legs and the antennae. In all cases, the animals walked along a flat horizontal walkway that was 40 mm wide.

In the stair-climbing setup, a set of two stairs was placed on the distal third of the walkway (**Figure 2**, left). The stairs were of different height (8, 24, or 48 mm), so that animals had to adapt their climbing behavior to different obstacles, resulting in heightdependent changes in body inclination (Theunissen et al., 2015) or the relative frequency of short correction steps (Theunissen and Dürr, 2013). A flat walkway was used as reference condition. A total of 365 stair-climbing trials from ten animals were included in the present analysis. In each trial, motion capture analysis yielded the joint position and joint angle time courses of all six legs, along with the position time courses of all segment boundaries of the thorax and the head. Thirty-four trials of one animal also comprised the joint angle and tip position time courses of both antennae. This stair-climbing data have been used before in original research publications on distinct step types (Theunissen and Dürr, 2013), spatial coordination of foot contacts (Theunissen et al., 2014) and an inter-species comparison of whole-body kinematics of walking and climbing insects (Theunissen et al., 2015).

In the rod-climbing setup, a horizontal rod was mounted above and perpendicular to a flat walkway (**Figure 2**, middle). The height of the rod varied between 5 and 50 mm above the walking surface, with heights of 18 mm or 36 mm used in the motion-capture experiments using the Vicon system. Animals were either video-recorded by a set of synchronized, orthogonally arranged, digital cameras (Basler 601af; this concerns **Figure 6** only), by a single, top view, analog video camera and a slanted mirror next to the setup (Cohu; this concerns **Figure 7** only), or

motion capture system (bottom row). Animals (A) were labeled with small retro-reflective markers and their whole-body kinematics recorded by means of a marker-based motion capture system with eight Vicon cameras (VC, numbered 1–8 in right bottom panel) and an additional digital video camera (DV). The motion capture data yielded sets of labeled marker trajectories (top right panel: markers) that allowed geometrical reconstruction and kinematic analysis of the animal posture (top right panel: video) in 200 frames per second. Note that the setup (S) shown here was only one of three variants used in this study.

FIGURE 2 | Three types of setups were used to acquire experimental data. In all paradigms, stick insects were motion-captured as they walked along a 40 mm wide walkway. Two recorded postures are shown, one at the beginning of the trial and another near the end of the trial. Gray spheres show marker locations. Only the tracked body segments are shown. Colored lines show the trajectories of the tibia-tarsus joint of the right front leg (red) and of the head (blue). Left: In the stair-climbing paradigm the animals encountered two stairs of different height (here 24 mm) which they climbed readily. In trials of this paradigm all legs and thorax segments were recorded. In some trials, also the head and antennae were recorded. Middle: in the rod-climbing paradigm the animals encountered a horizontal rod held across the walkway at different height. In trials of this paradigm, only the antennae and front legs were recorded, along with the head and thorax segments. Right: In the searching paradigm, animals stepped across the far edge of the walkway and engaged in rhythmic searching movement of the antennae and front legs. In trials of this paradigm, only the antennae and front legs were recorded, along with the head and prothorax.

motion-captured by the Vicon system mentioned earlier, as the animals touched the rod with their antennae and subsequently climbed it. A total of 262 motion capture trials from eight animals were included in the present analysis. As in searching trials, rod-climbing trials focused on the coordination of antennae and front legs. Accordingly, only joint position and joint angle time courses of both front legs and both antennae, along with the position time courses of the prothorax and head were recorded.

In the searching setup, only the flat walkway was used and animals were motion-captured as they approached the end of the walkway, stepped across the distal edge and engaged in bilateral searching movements of both front legs and both antennae (**Figure 2**, right), similar to the experiments described by Durr (2001). A total of 69 trials from three animals were included in the present analysis. In each trial, the motion capture analysis yielded the joint position and joint angle time courses of both front legs and both antennae, along with the position time courses of the prothorax and head. The same computational procedures were used as described by Theunissen and Dürr (2013).

In summary, the volume density estimates calculated for the limbs in the present study are based on 365 trials from 10 animals in case of hind and middle legs, 696 trials from 21 animals in case of the front legs, and 385 trials from 12 animals in case of the antennae.

### Body-Centered, Standardized Limb Coordinates

All volume density estimates were calculated using a standardized body shape on a 3D grid. In a first step, limb position coordinates were calculated separately for each limb and relative to the thorax- or head-fixed coordinate systems of the corresponding segment of the main body axis. From the original kinematic analysis (as described in detail by Theunissen and Dürr, 2013), each trial comprised absolute position coordinates of the limb segment boundaries (coxa, femur, and tibia of the legs, scape and pedicel/flagellum of the antennae), along with the six degrees of freedom of position and orientation of their carrying body segments, i.e., of the pro-, meso-, and metathorax for the front, middle, and hind legs, respectively, and of the head for the antennae. Whereas, the position of the body segment was used to calculate the relative position of the limb coordinates, the segment orientation gave the body-fixed, segment-specific coordinate system into which the corresponding relative limb coordinates were projected. The resulting, body-centered positions were scaled to the limb size of a standardized body shape, rounded to the nearest full millimeter, and counted on a 3D grid with 90<sup>3</sup> nodes, centered on the base of the limb (i.e., the thorax-coxa joints in case of the legs, and the head-scape joints in case of the antennae).

The standardized body shape was determined from the mean segment length and width measurements of the adult female specimens that contributed to the motion-capture data set. For each pair of limbs, a scaling factor Bref/Bcurr was determined, where Bref was the sum of standardized segment lengths of both femora, both tibiae and the carrying body segment in case of legs, and the sum of both standardized antenna lengths and the head length in case of the antennae. Bcurr was the corresponding sum of segment lengths of the specimen that contributed the current trial. Thus, the scaling factor was adjusted for each pair of limbs, in order to account for variation of relative limb length among animals. The body-centered, standardized volume data grids of the eight limbs were then aligned in order to match the body segment lengths and location of the limb bases of the standard body shape. For this, the main body was assumed to be stiff and straight, neglecting movement of the thoracic joints and neck. The corresponding standardized body shape was used in all volume plots presented in this study in order to provide a 3D reference structure. Limb postures of this reference structure were set according to an arbitrary single instant of an experimental trial. The reference structure also includes the six tarsi. Since the motion-capture data did not comprise measurements of the tibia-tarsus angle, only the standardized tarsus length is drawn for reference. For the calculation of "action volumes" and "contact volumes" of the legs, the tarsi were assumed to be straight extensions of the tibia (see below).

### Tip, Contact, Action, and Affordance Volumes

One goal is to define an "affordance volume" that delimits a volume in which multiple limbs can act. In this volume, positions of one limb potentially provide an affordance for other limbs through an internal model. In order to find such an intersection volume, first the working ranges of the individual legs had to be charted. The physiological movement ranges of the eight limbs were calculated as density distributions across an orthogonal 3D grid of 1 mm spacing. Depending on the part of the limb considered, three types of volumes were calculated per limb: (1) the "action volume" comprised the movement range covered by the entire limb, i.e., the entire flagellum of an antenna, or the entire set of femur, tibia and tarsus of a leg. (2) The "contact volume" comprised the distal fraction of the flagellum in case of the antennae, or of the distal part of the tibia and entire tarsus in case of the legs. The default proximal limit of contact volume was 2/3 of the flagellum or tibia. The distal limits of contact volumes were determined separately for each leg, and ranged between 1.33 and 1.34 tibia lengths in front and hind legs, and between 1.38 and 1.39 tibia lengths in middle legs. These numbers correspond to the factor by which the tibia needed to be scaled in order to reach the tip of the tarsus. (3) Finally, the "tip volumes" were calculated from the movement ranges of the most distal points of the tracked limb segments, i.e., the antennal tips and tibia-tarsus joints of the legs.

In all cases, the volumes were calculated for a discrete set of points along the limbs. **Figure 3** shows the distribution of these points for the three types of volumes calculated. In case of the contact volume, ten equidistant points were calculated along the tibia and tarsus as determined by a scaling factor. For antennae this scaling factor ranged between the proximal limit, i.e. 0.67, and the distal limit of 1.0. For legs, the scaling factor ranged between 0.67 and a distal limit between 1.33 and 1.39 (see above). Whereas, the flagellum can be considered reasonably straight (at least when it does not contact anything), the angle of the tibia-tarsus joint varies throughout a step with an approximate range between 90◦ (abducted) and 0◦ (aligned with the tibia). Since we had no information about the tibia-tarsus joint angle, we always assumed an angle of 0◦ , thus maximally extending the radial working range of the tibia. Given the difference in distance of the 10 points that were considered for each frame, increasingly distant points traveled increasingly longer arcs for a given excursion of the limb. To compensate for this effect, i.e., to avoid an overestimation of volume densities in proximal parts of the working ranges, each point was weighted with a factor. In case of n points (n = 10 for contact volumes), the weights were 2k/n/(n+1), with k = 1 ... n. As a consequence, the sum of weights per frame was always 1. These volume densities provide

a likelihood estimate for a limb to pass through that specific part of body-centered space, i.e. the grid.

Action volumes were calculated differently for antennae and legs. In antennae, the calculation followed the same principle as for the contact volume, except that the proximal limit was set to 0.1 and n = 20. In legs, eight equidistant points were distributed along the femur (also starting at a proximal limit of 0.1), a further eight along the tibia, and another four along the tarsus (**Figure 3**). Thus, 20 points per frame were used for the calculation of an action volume. In case of the antennae, these points were distributed equidistantly along the flagellum. The same weight distribution applied as explained above (n = 20) when updating the counts on the grid.

In order to estimate volume densities from absolute frequency distributions across the 3D grid, the count numbers per grid node were smoothed with a cubic kernel of spanning 5<sup>3</sup> grid nodes. This kernel had a Gaussian weight distribution with standard deviation of 1 and a sum of weights equal to 1. To obtain reasonably smooth volume boundaries, we chose a volume density threshold that was equivalent to 1% of the maximum density per limb and volume type. This threshold limited the volume to a range of 95.3 to 98.6% of the summed density values, depending on the type of volume and limb. The detailed values are listed in **Supplementary Table 1**.

Finally, affordance volumes were calculated as the intersecting volume of two neighboring limbs, e.g., the right middle and hind legs, or the left antenna and front leg. All calculations and volume visualization were done in Matlab R2018a (The Mathworks, Natick/MA), including the Geom3D toolbox of David Legland. Transparent volume surfaces were calculated by use of the Matlab function boundary(), using a convexity scaling factor of 0.8, with 1.0 being no convexity between the supporting polygon nodes.

### Artificial Neural Network Simulations

Pairs of non-spiking Artificial Neural Networks (ANN) were used to learn mappings between joint angle spaces of neighboring legs. A foot position in space that can be reached by two neighboring legs corresponds to a set of joint angles for each one of these legs. We used neural networks of passive summation elements to transform the joint angles of one leg to the corresponding set of joint angles of the neighboring leg for identical foot positions in space. The training data were obtained from the grid points contained by the affordance volumes spanned by any one of the four ipsilateral pairs of legs. For each point within an affordance volume, the corresponding sets of joint angles were calculated for both legs, using the inverse kinematics calculation as deduced by Cruse and Bartling (1995). Accordingly, we assumed fixed and slanted rotation axes for the thorax-coxa joints, such that protraction /retraction about the thorax-coxa joint correlated with pronation/supination of the leg plane. This simplification is justified also in freely walking and climbing stick insects, as protraction/retraction and pronation/supination angles are strongly correlated in these conditions (see, Figure 11 of Theunissen et al., 2015). The corresponding Euler angles of the ThCx joint axis are given as yaw and pitch angles of the resting coxa in **Table 1**.

As a result, each point within an affordance volume yielded 2x3 joint angles, i.e., protraction, levation and extension angles of two neighboring legs (e.g., the right front leg and the right middle leg). The ANNs were trained to map three of these angles, i.e., the posture of a "sender leg," to the other three angles, i.e., the posture of a neighboring "receiver leg." The input of such a feedforward ANN can be considered the posture of the sender leg, the output can be considered the corresponding target posture of the "receiver leg." An affordance is thus generated in the following way: if the receiver leg was moved so as to assume this target posture, the position of its tibia-tarsus joint would coincide with that of the "sender leg." Two reciprocal mappings were learned for each affordance volume. Each one of the two legs was once used as the sender leg (joint angles were used as an input to the ANN) and once as the receiver leg (joint angles were used as training values for supervised learning of the appropriate output of the ANN).


TABLE 1 | Standardized body shape: segment lengths, insertion coordinates, roll and pitch angles of the coxae as used for inverse kinematics.

Based on the experimental data, the affordance volumes of the left front and middle legs comprised 5,500 matching pairs of leg postures (8,382 on the right side). In case of the left middle and hind legs, the affordance volume comprised 2,918 matching pairs of leg postures (3,741 on the right side). For training and evaluation of each ANN, the corresponding data set was split into a training part (80%, 4,400 samples for the left side and 6,705 for the right side) and a testing part (the remaining 20%). The testing part of the data set was used to evaluate the generalization capabilities of a trained ANN, assessing how well it could interpolate for data points it had never encountered during training.

Feed-forward ANNs were used with systematic changes of the network complexity. As a baseline, a feed-forward NN without a hidden layer was used. Since this network structure is equivalent to a regression problem, an optimal solution was found analytically using the normal equation and through calculation of the pseudo-inverse. In all other cases, the ANNs contained a single hidden layer. The size of this single hidden layer was changed systematically in order to assess mapping performance for different network complexities. The Keras framework (https://keras.io/) was used for ANN training, with sigmoid activation functions in hidden layer neurons and linear activation functions in the output layer neurons. Networks were trained in batches of ten, using the optimizer ADAM (Kingma and Ba, 2015). ADAM implements an adaptive gradient descent method that includes a momentum term and has the advantage that it does not require any additional hyperparameters. Weight matrices were initialized at random, using the Glorot uniform initialization (Glorot and Bengio, 2010). Training was repeated in five individual runs for the data of the left legs. ANNs for the right leg pairs were trained only once for comparison. Training runs lasted for 5000 epochs, which proved to be sufficient for convergence. Sample data and ANN training code are publicly available under (https://pub.uni-bielefeld.de/record/ 2932236) (Schilling and Dürr, 2018).

## RESULTS

Based on our considerations about peripersonal space as the volume within which the body and its limbs may physically interact with the environment, we first calculated the action volumes of all legs and antennae. The combination of these action volumes then delineated the boundary of what we propose to call the peripersonal space of an insect. In contrast, the intersection of each pair of action volumes was equivalent to the joint working range of two neighboring limbs. This was termed the affordance volume of a pair of limbs.

### The Combined Action Volumes of all Limbs Delineates Peripersonal Space

The action volume of a limb was defined as that part of space, where this particular limb could contact an external object, irrespective of which part of the limb was making contact. Action volumes were calculated from a large motion capture data set, comprising a total of 6061.5 s (1 h 41 min) of movement sequences from 365 to 696 experimental trials (depending on the kind of limb, see Material and Methods) of the Indian stick insect Carausius morosus. The experimental data had been acquired in three different locomotion experiments, including climbing and searching episodes (**Figure 2**). Two hundred single limb postures per second were sampled, so that even fast limb movements were broken down into a reasonable set of discrete postures. For example, a typical swing movement of a leg was represented by some 40 limb postures. For simplification, the movements of the neck and of the two thorax joints were neglected, so that the insertion points of the limbs were fixed before calculating the body-centered coordinates of each limb segment. Furthermore, the volumes of the limbs themselves were neglected and each limb posture was treated as a set of 20 points on a 1 mm grid. As a consequence, the antennal posture was treated as a set of points along a single line, and each leg posture was treated as a set of points on a pair of lines: one line for the femur and another line for the tibia and tarsus (**Figure 3**). To estimate the shape of an action volume, we first approximated the likelihood of the limb to pass through a particular point in body-centered space, and then set a density threshold to determine the volume boundary (for details on the likelihood approximation, in particular the spatial smoothing procedure and the compensation of decreasing likelihood with increasing distance from the insertion point, see section Tip, Contact, Action, and Affordance Volumes). As a consequence, the actual shape of the action volume strongly depended on the particular choice of density threshold. In all figures shown in this study, we applied limb-specific thresholds

equivalent to 1% of the maximum density recorded for a particular limb. **Supplementary Table 1** lists the limb-specific threshold values and the corresponding fraction of the total volume density comprised by the action volume (which was always > 95%). The combined action volumes of all eight limbs are shown in **Figure 4**. The orthogonal projections of the grid points reveal that the action volumes of the left and right limbs of the same segment have similar shapes, though not the same. Throughout this study, we did not pool data for limbs of the same segment.

The action volumes of the front legs were the largest of all limbs, amounting to more than 60 ccm. This was approximately twice the action volume of the antennae and approximately three times that of the middle and hind legs (**Table 2**). The action volumes of the middle legs were the smallest of all limbs, amounting to 88 and 93% of the hind leg action volumes in left and right legs, respectively. The order of action volume size was the same as the order of limb length, with the front legs being the longest and the middle legs being the shortest (**Table 1**). However, the ratio of front leg length over antenna length was only 112%, which is substantially smaller than the corresponding volume ratio of about 200%. Similarly, the ratio of front leg length over middle leg length was 130%, compared to about 300% for the volume ratio. We conclude that the front legs were the most agile limbs and covered much larger ranges than any other limb. Since the leg length ratios of middle and hind legs (83 and 84% for left and right legs, respectively) were smaller than the corresponding volume ratios, middle legs proved to be more agile than hind legs.

TABLE 2 | Comparison of the derived limb volumes (using the 99% threshold for the likelihood as explained in section Tip, Contact, Action, and Affordance Volumes) of the four limb pairs.


Rows indicate volumes in ccm and fraction of the action volume for tip volume (top), contact volume (middle), and action volume (bottom).

FIGURE 4 | Action volumes of the eight limbs. Orthogonal projections of the action volumes of all six legs and two antennae, depicted as colored points on the 1 mm grid that was used to calculate volume densities. Red and dark blue dots show the volumes of left limbs, magenta and light blue dots those of right limbs. Red/magenta show volumes of antennae and middle legs, dark/light blue dots show volumes of front and hind legs. Top, side and frontal views (as indicated by the standardized insect in the background) are aligned and scaled to match. Top right: The combined action volume of all limbs, delimited by a transparent envelope surrounding the non-zero grid points shown in the orthogonal projections. Note that volumes for left and right limbs were calculated separately. As a result, they are similar but not the same.

The shape of the combined action volumes of all limbs reveals a nearly hemispheric region of about 35 mm radius around the head (spanned by antennae and front legs), and a dorsoventrally compressed volume ranging from mid-mesothorax rearward along the first three quarters of the abdomen. Note that **Figure 4** conceals the overlap of neighboring action volumes. These overlap volumes proved to cover substantial fractions of the action volumes (**Figure 5**). For example, the overlap between antennal and front leg action volumes amounted to 14 and 10% of the action volumes of left and right front leg, respectively (for volume sizes in ccm, see **Table 3**). This means that 10–14% of possible contact locations of a front leg may be contacted also by the ipsilateral antenna. In other words, bidirectional transfer of spatial information from one limb to another is possible in these overlap volumes, thus potentially giving rise to affordances. Accordingly, we chose to call these overlap volumes affordance volumes. The affordance volumes of front and middle legs shown in **Figure 5** covered 32 and 45% of the left and right middle leg action volumes, respectively. The affordance volumes of middle and hind legs corresponded to 20 and 24% of the left and right hind leg action volumes. The lower left side view in **Figure 5** reveals that the affordance volumes of ipsilateral leg pairs are located mostly below the body axis. This is not the case for the affordance volumes of antennae and front legs which appear almost centered on the horizontal plane through the body axis. Note that the top and frontal views in **Figure 5** reveal a zone of bilateral overlap between the left (red) and right (blue) affordance volumes of antennae and front legs. This narrow, elongate region in front of the insect head indicates that both antennae and both front legs could transfer contact information among each other. This region comprises the volume that is covered by the outstretched front legs aligned with both antennae, as it occurs in the posture that Carausius morosus assumes for its camouflaging twig mimesis.


Volumes were calculated as overlap of the corresponding tip (top row), contact (middle row), or action volumes (bottom row) in ccm. All volumes are indicated as a fraction of the corresponding affordance volume based on action volumes.

### Behavioral Relevance of Contact Location

Affordance volumes, as defined here, comprise positions suitable for coordinate transfer among ipsilateral limbs. This leads to the question whether these volumes were not just computationally plausible but also behaviourally relevant. After all, the affordance volumes shown in **Figure 5** had been calculated based on the action volumes of entire limbs, including parts of the limb which would be at least awkward, if not unlikely contact locations in natural behavior. For example, whereas it is trivial to observe that an insect regularly contacts obstacles with one if its feet, this is not clear at all for more proximal parts of the limb, such as the femur. To address this question, we observed stick insects as they climbed a horizontal rod that was held across the walkway, and recorded the contact locations along the antennae and front legs. In order to have independent position records from contact to contact, only the location of the initial limb contact was recorded per trial. **Figure 6** shows the result for the antennae, including 500 single trials from 10 animals and 10 different rod heights. The results clearly show that initial antennal contacts with a horizontal rod occur almost exclusively in the distal half, and approximately 90% occur in the distal third of the antennal flagellum (**Figure 6**, top left). This is largely independent of the height of the rod (**Figure 6**, lower left), as the median relative contact location along the flagellum ranged between 0.8 and 0.9 in almost all cases, and shifted distally only for very high rods (43 mm and above). Accordingly, most initial contact locations were at least 20 mm away from the head, irrespective of whether the rod was located above or below the body axis (**Figure 6**, right).

The situation was more variable in case of leg contacts. Stick insects are known to respond to antennal contact with altered swing movements of the front legs (Schütz and Dürr, 2011). Two kinds of responses can be distinguished, depending on the state of the front leg at the time of contact by the ipsilateral antenna. If the front leg is in stance phase in the instant of antennal contact, the front leg completes the stance movement and then lifts off to execute a reaching movement that often is considerably higher than normal. If the front leg is in swing phase in the instant of antennal contact, the front leg often executes a retargeting movement with a distinct upward kink in the trajectory. In the latter cases, the leg can be very close to the object as the antenna makes contact, leaving little reaction time before hitting the object with a part of the leg. Accordingly, our results showed that the distribution of initial contacts along a front leg depended in large parts on whether antennal contact had been made during swing or stance (**Figure 7**, compare black with blue lines). In comparison, the effect of rod height was small.

As can be seen in **Figure 7**, the probability of initial contact on the femur was very low in case of stance-initiated movements, and zero for swing-initiated movements. Contact probability was highest in the distal third of the tibia and on the tarsus. Initial contacts were recorded in this region in 58% of trials with swinginitiated movements, and in 84% of trials with stance-initiated

FIGURE 6 | Initial antennal contacts occur in the distal third of the antennal action volume. Top left: Location of the first antennal contact along the flagellum, as a stick insect walks toward a horizontal rod that is reaching across the walkway at height h (see insert). Histogram of five trials per 10 rod heights per 10 animals. Blue line shows the cumulative sum. Most initial contacts with an obstacle of this kind occur along the distal third of the flagellum. Bottom left: Box-whisker plots show medians, IQR and min/max ranges of distributions of contact locations, separately for each rod height (n = 50, each). Open circles show outliers. Except for the highest obstacle heights, medians and IQR are very similar. Right: Contact locations in head-centered coordinates (side view), with different colors corresponding to different rod heights. Most initial contacts were located in the distal third of the action volume of the antenna, here approximately between 26 and 39 mm away from the antennal base.

FIGURE 7 | In reaching movements, the front leg contacts a horizontal rod most often with the distal tibia or tarsus. Cumulative probability plots of initial leg contact location along the length of the leg (location shown on x-axis is standardized to 50% femur + 50% tibia; contacts by the tarsus are counted as 100% leg length). Trials were separated according to rod height (squares: 12 mm, circles: 24 mm) and depending on whether the initial antennal contact occurred during swing (black) or stance (blue) of the subsequently reaching front leg. The targeting quality of reaching movements initiated during stance is superior to those initiated during swing. 84% of initial contacts occurred in the distal third of the tibia or on the tarsus when reaching followed a first contact during stance movement (blue). When reaching required re-targeting of an ongoing swing movement (black), 58% of first contacts occurred in the distal third of the tibia or at the tarsus.

movements. Following leg contacts with the tarsus, the animal typically grasped hold of the rod. When the rod was contacted with the distal tibia, the leg was typically retracted until the tarsus achieved firm grip. For other contact locations, the leg was lifted and retracted until another contact was achieved.

### Limb Contacts and Affordance Space

Given the results shown in **Figures 6**, **7**, we wanted to know how the shapes of the action volumes would change if only those parts of a limb were considered that were likely to contact an obstacle. To test this, we calculated the "contact volumes" for all limbs. The computational procedure was the same as for the calculation of action volumes, except that only 10 points per limb posture and frame were considered for volume density estimates. These 10 points were placed along the distal third of the antenna or along the distal third of the tibia and the entire tarsus. For immediate comparison of action and contact volumes, **Figure 8** shows the volume envelopes of the right antenna, right middle leg and left hind leg within the peripersonal space. In case of the antenna, the neglect of the proximal two thirds resulted in a fairly wide gap between the head and the contact volume. As a consequence, the antennal contact volumes comprised only 82 or 86 % of the corresponding action volumes in left and right limbs, respectively (**Table 2**). For comparison, we also calculated the volumes for the most extreme reduction of contact sites on a limb, i.e. a single point. Such tip volumes (**Figure 8**, right column) were calculated from the volume densities of the most distal point of the motioncaptured limb segment (the tip of an antenna or the tibia-tarsus joint of a leg, see **Figure 3**). As expected, the tip volumes of the antennae were very narrow curved, convex regions (see **Figure 8**, lower right). Despite their small width, antennal tip volumes still comprised 38 and 35% of the left and right antennal action volumes, respectively.

Compared to the relatively strong size reduction of antennal contact volumes, the contact volumes of the legs were of nearly the same size and shape as their corresponding action volumes (**Figure 8** and **Table 2**). In fact, three of the six contact volumes turned out to be even slightly larger. We attribute this apparent increase in volume to slight weighting differences of the discretised limb postures for the calculation of the volume densities for action and contact volumes. These differences lead to different threshold values and, as a consequence, in variation of volume size and shape. Comparing the action and contact volumes of the hind and middle legs in **Figure 8** reveals that the gap between the contact volumes and the body is relatively small. This can be explained by strongly flexed leg postures that let the distal tibia and tarsus come very close to the base of the leg. As a consequence, much of the volume that is traversed by the femur may also be traversed by the foot and distal tibia. The most pronounced difference between action and contact volumes of the legs appears to be the region traversed by the "knees" (femurtibia joint) and the nearby distal femur and proximal tibia. A foot could only reach knee positions of postures with moderate levation of the femur. This is because the foot can move to the previous knee position only by a combination of strong levation of the coxa-trochanter joint and strong flexion of the femur-tibia joint.

The strong effect of flexed leg postures becomes evident when comparing the tip volumes of the legs (**Figure 8**, right column) with their corresponding action volumes. Other than the antennal tip, that cannot be moved close to the head, the tibia-tarsus joint can be moved very close to the base of the leg, allowing this joint to traverse a substantial fraction of the action volume of the entire leg. Accordingly, **Table 2** lists the ratios of tip volume over action volume of the legs as ranging between 60 and 76%, which is approximately twice the ratio for an antenna (35–38%).

Having established similar properties for action and contact volumes, we reasoned that the overlap of contact volumes for ipsilateral pairs of legs should not differ much from the overlap of action volumes. In other words, the affordance volume for a given pair of legs should remain the same even if the underlying volume density estimates were calculated from a subset of points per limb posture. Indeed, this was the case. **Figure 9** juxtaposes affordance volumes based on action, contact and tip volumes, revealing strong similarity between all leg affordance volumes, particularly of those based on action and contact volumes. The absolute sizes of affordance volumes and their relative size compared to the corresponding action volumes are listed in **Table 3**. The data show that the affordance volumes of antennae and front legs were affected much more strongly by the restriction to contact regions than the affordance volumes of leg pairs. This is because the antenna maintains a fairly straight posture during movement,

such that the distal part of the antenna can only be reached by relatively strong extension of the front leg. As a consequence, that part of the front leg contact volume that required a flexed leg posture was excluded from the affordance volume, despite the fact that the contact volume of a front leg changed only little compared to its action volume. For the same reason there is no overlap of antennal and front leg tip volumes at all. The tibiatarsus joint of a front leg cannot reach the tip of the ipsilateral antenna.

Much like it was observed for the comparison of contact and action volumes in **Table 2**, affordance volumes based on contact volumes proved to be even larger than those based on action volumes (between 114 and 122%). However, as outlined in conjunction with **Table 2**, differences in weighting entail relatively small differences of the volume density threshold used to delimit the boundary, causing a variation of volume size. Since affordance volumes are considerably smaller than contact volumes, the relative variation in size was larger for the affordance volumes (**Table 3**) than for the contact volumes (**Table 2**).

In summary of the experimental results, we propose to distinguish two kinds of spatial regions surrounding the insect body that differ in their behavioral relevance. The first of these is what we called peripersonal space. In analogy of the use of that term in human psychology and neuroscience, it comprises that part of the ambient space that is "within reach" of any body parts, the limbs in particular. In the present study we defined it as the combination of all action volumes of the limbs, as shown in **Figure 4**, top right. The second region is what we propose to called affordance space and defined as the intersection of action volumes of all limb pairs. The functional significance of this distinction is that the affordance space is "within reach" of at least two limbs and therefore allows a coordinate transfer that is suitable for the control of aimed limb movements based on a physical contact of another limb. Based on our considerations about behavioral relevance, we suggest that affordance volumes should be related to those regions, where spatial contacts are likely to occur in natural behavior.

### Modeling Coordinate Transfer Within the Affordance Space

Given the definition of affordance space above, we wanted to know how complex a computational mapping would have to be that mediates coordinate transfer within the experimentally derived affordance volumes as shown in **Figure 5**. To this end, we studied the computational properties of the transformation of postures between neighboring legs (in both directions: backwards, from an anterior leg to a posterior leg, and forwards, i.e., in the opposite direction). We used two different methods, both related to feed-forward Artificial Neural Network (ANN) simulations, but of different complexity. For an immediate mapping of a set of three joint angles (the posture of the sender

front leg × middle leg; rear: middle leg × hind leg). All views are aligned and scaled to match. Note that, if the tip volumes were considered, there were no affordance volumes for antennae and front legs because the tibia-tarsus joint never reached as far as the tip of the ipsilateral antennal tip.

leg) to another set of three joint angles (the target posture of the receiver leg), we calculated an optimal linear regression. This then served as a benchmark for comparison with more complex ANN structures that included a hidden layer of variable size. Our goal was to determine how the accuracy of the posture mapping depends on the complexity of the underlying neuronal network structure.

For the two affordance volumes of left leg pairs (front-tomiddle-leg, middle-to-hind-leg) a simple regression provided only a coarse approximation of the target values (**Table 4**): for the front-to-middle-leg transformation the mean squared error (MSE) was 61.0, equivalent to a mean error of around 7.8◦ per leg joint. The middle-to-hind-leg transformation achieved a smaller MSE of 10.0, equivalent to around 3.2◦ per leg joint. This difference in mapping accuracy can be explained by the larger size of the affordance volume of front and middle legs, making the approximation of joint angle transformations by a simple hyperplane more error-prone. **Table 4** lists the joint angle working ranges and the MSE for each degree of freedom. For the transformation in the opposite direction, i.e., from a posterior sender leg to an anterior receiver leg, the MSE dropped for the middle-to-front leg pairing to 34.3 (average of 5.9◦ per leg joint) and rose for the hind-to-middle leg pairing to 18.2 (average of 4.3◦ per leg joint).

Overall, these results show that a regression yields a poor approximation of a joint angle mapping. The variability of the different mappings further stresses the high non-linearity of the space. Therefore, we employed more complex models, including a hidden layer of varying size.

### Comparison of Different Model Complexities

For a systematic investigation of the required model complexity, we trained ANNs with two kinds of architectures (**Figure 10**). The first of these architectures was a three-layered feed-forward ANN with varying number of hidden neurons (**Figure 10A**). The second architecture additionally included skip connections that shortcut the hidden layer (**Figure 10B**). As before, all simulations were done for the two left affordance volumes of leg pairs.

When evaluated on a set of previously unseen test data, the mean performance of the three-layered ANNs as a function of hidden layer size is shown in **Figure 11**, along with the benchmark accuracy achieved by regression. The blue shaded area shows the standard deviation over five repetitions per

#### TABLE 4 | Joint angle range inside the affordance volume.


hidden layer size. Variation was quite small for repeated learning experiments, suggesting that training time was sufficient for a good comparison.

In case of the front-to-middle-leg mapping (**Figure 11A**), small hidden layers introduced a bottleneck into the network, such that the performance of these networks was worse than linear regression. Only when four or more hidden neurons were used, the network performance improved continuously with increasing hidden layer size. Beyond a hidden layer size of 32 neurons, the MSE decreased only little, suggesting that additional complexity of larger ANNs would not pay off in terms of accuracy. Finally, the similar MSE curves for front-to-back and back-to-front projections suggested that a pair of reciprocal ANNs would work equally well in both directions. The top right subfigure adds the training time as the second independent axis, revealing that the networks converged nicely and that the training time of 5,000 epochs is sufficient for convergence, even for the more complex models. In general, the mapping problem appears sufficiently simple for continuous improvement with increasing model complexity

Results on a test data-set for the middle-to-hind-leg mapping were similar to those for the front-to-middle-leg mapping in that a minimum of four hidden neurons were necessary to achieve better performance than a linear regression. Also, the mapping accuracy improved continuously (the MSE decreased) with hidden layer size and the learning curve (top right insert) was equally smooth and monotonously decreasing as before. However, two results differed for the two mappings. First, near-optimal accuracy for the ANN with 32 hidden neurons was approximately tenfold higher for the middle-to-hind-leg mapping (**Figure 11B**) than for the front-to-middle-leg mapping, reaching a root mean squared error below half a degree. The second difference concerned the difference in mapping accuracy for the two directions, the back-to-front mapping reaching the level of mapping accuracy of the front-to-middle leg mappings only. To analyse this further, we turned toward the data for the

accuracy for front-to-back (blue) and for the reciprocal back-to-front projection (orange). Top right inserts show accuracy as a function of both hidden layer size and training duration, illustrating how the test error improved over time. Lower right inserts compare accuracy-complexity functions of the right pair of legs (green, front-to-back projection) with that of the left pair of legs (blue).

right affordance volumes (lower right inserts in **Figures 11A,B**). For those examples, the difference between front-to-middle-leg and middle-to-hind-leg mappings were less pronounced than for the corresponding left leg pairs.

Since already small to medium hidden layers proved to be sufficient for a good approximation of the mapping, especially in the case of the affordance volumes of middle and hind legs, we wondered whether another kind of ANN architecture could work equally well with even less neurons. This is because the number of 32 neurons in the hidden layer was still high compared to possible candidate neural structures in an insect. Therefore, we further extended the model by direct skip connections from joint angle inputs to target outputs. Our results showed that skip connections may introduce a significant improvement in the case of very small hidden layers which had previously introduced a bottleneck effect (see **Supplementary Figure 1**, where the MSE for an ANN with 2 hidden neurons was as lows as 7.4 compared to 10.0 for the regression approach). However, the positive effect of skip connections vanished for more complex models. Probably, this was because skip connections only introduced a small number of additional connections compared to the growing number of connections toward and from the hidden layer. We conclude that for each affordance volume of ipsilateral leg pairs, very small feed-forward neural networks can achieve a better mapping performance than a linear regression, and that very high accuracy may be achieved with hidden layer sizes around 32 neurons.

### DISCUSSION

Using the stick insect as an example, our study proposes a method to delineate distinct, behaviourally relevant spatial volumes in the near-rage environment of the insect body, based on experimental data. The first of these volumes is equivalent to what is typically referred to as peripersonal space in humans and comprises the action volumes of all eight limbs of the insect (six legs and two antennae; **Figure 4**). Essentially, our method assumes that this volume is defined by motor activity, as it is the volume traversed by any kind of limb movement that is likely to be observed during the behavioral paradigms considered. Nevertheless, it is important to note that this volume is also a volume of distinct sensory activity in that any contact-induced sensory activity can only occur within reach of a limb. Therefore, the boundary of peripersonal space can be viewed as the boundary beyond which motor activity cannot coincide with mechanosensory cues of physical contact. As a corollary, peripersonal space must be represented by distinct patterns of neural activity within the somatosensory and motor system of an insect. The behavioral relevance of the second volume—the affordance space—is given by the spatial correspondence of contact points that may be reached by two or more limb postures, either sequentially or simultaneously. The affordance space was therefore defined as that part of peripersonal space that fulfills the following two criteria: (i) it must be traversable by at least two different limbs (as judged by overlap of two action volumes in **Figure 5**) and (ii) the part of the limb that traverses must be likely to experience physical contact in natural behavior (e.g., the distal third of the limb, as justified by **Figures 6**, **7**).

In our study, the first of these criteria (overlap) was applied only for ipsilateral pairs of limbs. Contralateral overlap was not considered because there are no dedicated experimental studies on bilateral spatial coordination of limbs in insects that could possibly contribute sufficient experimental data. Owing to the data-driven calculation of affordance space, the applicability of our method critically depends on the suitability of available motion capture data. In our case, the choice of climbing and searching paradigms would have been appropriate to estimate contralateral overlap for front legs and antennae, but much less for middle and hind legs. Since all experimental setups (**Figure 2**) were based on a horizontal walkway, the likelihood of middle and hind legs to cross the sagittal plane was limited to very rare and brief episodes of cyclic searching movements if a swing movement missed the obstacle. Future experimental studies will be needed to address contralateral coordinate transfer. Likely suitable behavioral paradigms would be gap-crossing with increased likelihood of searching movements of middle and hind legs (e.g., see Durr, 2001), or climbing along narrow substrates (e.g., see Cruse et al., 2009).

### The Role of Contact

Contact events are of particular relevance to both peripersonal space and affordance space. This is justified by the certainty of the sensory event of physical contact, and by the immediate behavioral relevance. An important factor contributing to the certainty of contact cues is "resisted movement" that is known to cause shear forces that stimulate strain-sensing campaniform sensilla in the cuticle (Zill et al., 2012). A second factor is the experience of coincident motor and sensory activity through proprioceptive postural feedback, strain-induced feedback, and, potentially, further sensory activity caused by exteroception of contact cues (touch). The immediate behavioral relevance of contact cues is related to the presence of an external object within the action range of the body.

So far, our considerations on likely contact locations (criterion 2 for affordance space) are restricted to antennae (**Figure 6**) and front legs (**Figure 7**). Future studies will need to record contact locations at the other limb pairs, in order to test whether the results on contact locations on front legs can be transferred to middle and hind legs. Moreover, it could be intriguing to distinguish distinct movement types subsequent to limb contacts, for example re-positioning in case of inappropriate foothold, or retraction in case of proximal contact sites. Although there is some evidence that physically interrupted swing movements of walking stick insects always follow a default retraction-levationflexion response (Ebeling and Dürr, 2006), existing studies did not control for the contact site along the limb. To date, contactinduced limb-movements have been described qualitatively in stick insects (e.g., Bläsing and Cruse, 2004; Theunissen and Dürr, 2013) and cockroaches (e.g., Ritzmann et al., 2000), but were not related to preceding contact locations.

Another limitation of the existing experimental data concerns the restriction to initial contacts. Potentially, this leads to underestimating the likelihood of proximal contacts. In case of the antennae, this can be expected from results of Krause and Dürr (2012), who studied antennal tactile sampling behavior of stick insects that climbed a stair of varying height. That study categorized antennal contact locations on the obstacle as "along the frontal wall" and "on the upper edge" of the stair and found that the prior category occurred predominantly near the tip (a region corresponding very well to that shown in **Figure 6**), whereas the latter category occurred predominantly along more proximal parts of the flagellum. In case of the legs, the relatively small difference of the affordance volume sizes for action volumes vs. contact volumes (**Figure 9**) suggests that the inclusion of more proximal contact locations would have little effect on the affordance volumes of leg pairs. In case of the antenna, however, the effect would be much stronger, as the gap between the head and the antenna-front-leg affordance volume would shrink.

### Modeling Affordance Space

Since our definition of affordance space is based on the transfer of spatial information among limbs, we probed the computational properties of the mappings within pair-wise affordance volumes with Artificial Neural Networks (ANN) of differing complexity (**Figure 10**). ANNs were trained to map the posture of a sender leg to the corresponding posture of a neighboring receiver leg with equal foot position. The output of the ANN can be viewed as a target posture that can be used to control the movement of the receiver leg. A first model for such leg targeting behavior was introduced by Dean (1990) to simulate the spatial coordination of lift-off and touch-down locations of ipsilateral leg pairs in stick insects. This model was later included as the so-called target net in Walknet. Walknet is a behavior-based, distributed ANN control model of multi-legged locomotion in animals and walking robots (Cruse et al., 1995; Schmitz et al., 2008; for the most recent version, see Schilling et al., 2013). In Walknet, spatial control of foot position, i.e., targeting, was originally realized by a simple feed-forward neural network that only consisted of one hidden layer with three hidden neurons. In later versions of Walknet, the target net also included skip connections (Cruse et al., 1998; Dean et al., 1999), as tested by the present study (**Supplementary Figure 1**). Already this small network could simulate spatial targeting behavior of an insect walking on a plane. The original target net was analyzed only qualitatively and postures were restricted to much smaller working ranges (i.e., action volumes). Moreover, the resulting walking behavior of Walknet was quite regular. In contrast to the mentioned studies on Walknet, we provide a quantitative analysis of the complexity of that part of this control network that deals with spatial inter-limb coordination (the target net). Major differences of our present model and target net are (i) the considerably larger action volumes of the limbs, owing to the much larger behavioral variability, (ii) the consideration of all three spatial dimensions, and (iii) the systematic and quantitative evaluation of mapping accuracy as a function of network complexity. Our results show that the ANN structure used by Dean (1990) was insufficient to achieve an accurate mapping for our experimentally derived affordance volumes (being in the bottleneck range of **Figure 11**, with inferior accuracy than a linear regression). However, highly accurate mappings can be learnt with more hidden neurons. For the middle-tohind-leg mappings, appropriate network structures were still small. As for the front-to-middle-leg mappings, the affordance volume was much larger than for middle-to-hind-leg mappings, equal accuracy required more neurons. However, in both cases accuracy was very high for moderately sized network topologies (between 8 and 32 hidden neurons). These numbers are in the range of what would be plausible for a physiological neural network realized in the insect. For example, von Uckermann and Büschges (2009) described twelve non-spiking premotor interneurons for the mesothoracic ganglion of the stick insect, all of which are candidates for being involved in the local control of leg movements. As mentioned, Dean et al. (1999) employed skip connections for their target network approach to improve accuracy. Our results confirm that skip connections can lead to an improvement of targeting accuracy for small networks (less than 4 hidden neurons; **Supplementary Figure 1**), but not for larger networks (which would be required for high accuracy). A next step could be to extend the ANN toward more hidden layers and deeper architectures. However, this should be related to known properties of the neural organization of the insect sensorimotor system. In insects, both the terminal arborisations of leg proprioceptor afferents, and the dendrites of motoneurons of a leg are always confined to the ganglion of the same thorax segment. As a consequence, the transfer of limb posture information from one leg to another requires at least one layer of intersegmental neurons that mediates the afferent input from one segment to the efferent output neurons in the next segment. Intersegmental neurons that mediate postural information have been described for stick insects (Brunn and Dean, 1994). However, whether these intersegmental neurons connect to motor neurons directly (corresponding to one hidden layer), or to local premotor interneurons (corresponding to two hidden layers) or both (two hidden layers with skip connections that shortcut hidden layer 2) is unknown. In case of the antenna-to-front-leg mapping, skip connections would be plausible because proprioceptive afferents from antennal joints have collateral projections to the brain and to the suboesophageal (gnathal) ganglion (e.g., Goldammer and Dürr, 2018). However, these skip connections would not connect to the output layer (motoneurons of the front leg), but to at least one further hidden layer.

With regard to the asymmetry of backward and forward projections described in **Figure 11**, it appears that the sampled data for the left middle-to-hind-leg mapping (and to limited extent for the right middle-to-hind-leg mapping as well) contains some underlying regularity which makes it easier to learn the mapping in one direction than in the other, opposite direction. A possible explanation for this could be related to nested trigonometric functions involved in the mapping of limb postures, where small changes in the input or output ranges could both favor or prevent successful inversion. For example, consider approximating a sine function: when considering the range around zero only, this function can be linearly approximated and inverted. As yet, the function values around π/2 are all close to one and inversion is impossible.

We conclude that ANNs provide a good model for the affordance space defined here. The model could account for information transfer about footholds among limbs. In humans, Magosso et al. (2010) realized a model for—what they call more generally—peripersonal space through artificial neural networks. Their work is comparable, as it is based on trained mappings between different spatial representations that relate locations of limbs among each other. As a key difference, their work focuses on visuo-tactile representations and takes inspiration from human cortical representations, whereas our work aims at simpler models. But while the function of the sub-components is comparable, they further show how these can be interconnected, thus giving rise to a body model. In another example, Braud et al. (2018) introduced an anticipatory model for grasping that aims to learn the combination of actions and their associated perceptual effects. This is then exploited for motor planning by a form of mental simulation. Like our study, Braud et al. focus on behavioral relevance by directly relating sensory information to the action capabilities of the system. In general, body model representations are used widely in robotics (e.g., Lallee and Dominey, 2013, for review see Schillaci et al., 2016). They are assumed to be quite flexible in both humans and animals and allow for cognitive abilities such as movement planning. So far, most existing models in robotics deal with visuo-tactile coordination and the control of reaching or grasping movements (Hoffmann et al., 2010). These approaches could benefit from including further modalities.

The basic mappings as used to model our affordance space may be arranged to constitute a body model too, e.g., by application of the "Mean of Multiple Computations" (MMC) principle as done by Schilling (2011). The MMC principle breaks down the complexity of a sensorimotor system into multiple local relationships, each one of which expressing a relatively simple transformation. The mappings analyzed here are examples of local relationships for pairs of parallel kinematic chains. As such, they could be integrated into an MMC model of an entire insect body or of any other body scheme including multiple limbs.

### CONCLUSION

In summary, we argue that invertebrates have at least two internal representations of space: one far-range representation of the space "beyond reach" that is required for orientation and navigation behavior (e.g., see Heinze and Pfeiffer, 2018, and the corresponding special issue), and one near-range representation of the space "within reach" that is required for spatial coordination of limbs. With regard to the latter, we demonstrate that the joint action ranges of two neighboring legs are almost equivalent to the overlap regions in which physical contact with the environment is likely to occur. We call these joint action ranges affordance volumes. Finally, we propose basic computational elements that relate the posture

### REFERENCES


of one limb to that of another and, thus, serve as models for spatial inter-limb coordination in general. Since each one of these elements is experimentally grounded in a database of natural movement sequences, they model behaviourally relevant coordinate transformations within the natural action range of an insect. Owing to the directedness of the transformations, i.e., the property that one (sender) leg informs another (receiver) leg how to reach the same foot position, they implement affordances for spatially coordinated limb movements. We argue that these affordances for spatial inter-limb coordination define a subspace of peripersonal space that is essential for any behavior that requires spatial control of footfall patterns (in climbing this may be vital) or bimanual coordination, Given the ubiquity of spatial inter-limb coordination behavior in animals, this affordance space must be a fundamental property of motor systems with multiple limbs.

### AUTHOR CONTRIBUTIONS

VD conceived the study and did experimental analysis, MS did neural network simulations, VD and MS discussed the results, prepared the figures, and wrote the paper.

### FUNDING

This work was supported by the cluster of excellence EXC 277 Cognitive Interaction Technology, funded by the German Research Foundation, DFG.

### ACKNOWLEDGMENTS

Experimental data were collected by Leslie Theunissen (stair-climbing experiments), Ago Mesanovic (searching experiments), Christina Neumann, Anna Vavakou and Christine Brenninkmeyer (rod-climbing experiments). The authors thank Yannick Günzel for technical assistance with data management and some figures, and Holk Cruse and Anke Fleischer for helpful comments on the manuscript.

### SUPPLEMENTARY MATERIAL

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fncom. 2018.00101/full#supplementary-material


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2018 Dürr and Schilling. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# A Naturalistic Observation of Spontaneous Touches to the Body and Environment in the First 2 Months of Life

Abigail DiMercurio, John P. Connell, Matthew Clark and Daniela Corbetta\*

Department of Psychology, The University of Tennessee, Knoxville, Knoxville, TN, United States

Self-generated touches to the body or supporting surface are considered important contributors to the emergence of an early sense of the body and self in infancy. Both are critical for the formation of later goal-directed actions. Very few studies have examined in detail the development of these early spontaneous touches during the first months of life. In this study, we followed weekly four infants in two naturalistic 5-min sessions (baseline and toys-in-view) as they laid alert in supine from the age of 3 weeks until they acquired head control. We found that throughout the 2 months of observation, infants engaged in a high rate of touch and spent about 50% of the time moving their hands from one touch location to the next. On most sessions, they produced up to 200 body/surface contacts and touched as many as 18 different areas (mainly upper body and floor) both hands combined. When we did not consider the specific areas touched, the rates of touches were higher to the body than to the floor, but the duration of contacts and the most touched areas were higher for the supporting surface than for the body. Until the age of 9 weeks, we found no consistent differences in the rate of touch between head and trunk. Infants also did not display significant differences in their rate of touch between right and left hand or between conditions. However, we discovered that in the earlier weeks, infants engaged more often in what we called "complex touches." Complex touches were touches performed across several body/floor areas in one continuous bout while the hand maintained contact with the body or floor. Single touches, in contrast, corresponded to one touch to one single body or floor area at a time. We suggest that infants are active explorers of their own body and peripersonal space from day 1 and that these early self-generated and deeply embodied sensorimotor experiences form the critical foundation from which future behaviors develop.

Keywords: touch, self-touch, infancy, embodiment, sensorimotor experience, emerging self

## INTRODUCTION

Developing a sense of our body is an essential prerequisite for our interactions with the world. Sensing our body entails knowing where our limbs are in space and time, being aware of how fast or how far our limbs can move, or even knowing how much space our body occupies in our proximal environment. Indeed, knowing the limits and extent of our peripersonal space is fundamental for

#### Edited by:

Eszter Somogyi, University of Portsmouth, United Kingdom

#### Reviewed by:

Lisa K. Chinn, Tulane University, United States Jenni M. Karl, Thompson Rivers University, Canada

> \*Correspondence: Daniela Corbetta dcorbett@utk.edu

#### Specialty section:

This article was submitted to Cognition, a section of the journal Frontiers in Psychology

Received: 08 September 2018 Accepted: 05 December 2018 Published: 18 December 2018

#### Citation:

DiMercurio A, Connell JP, Clark M and Corbetta D (2018) A Naturalistic Observation of Spontaneous Touches to the Body and Environment in the First 2 Months of Life. Front. Psychol. 9:2613. doi: 10.3389/fpsyg.2018.02613

**60**

navigating our social and physical world, and for situating and orienting ourselves in our everyday activities. Little is known about how such sense of the body develops in early infancy. Many pioneers in developmental psychology like Piaget (1936/1952) and Wallon (1941) initially assumed that newborns lived in complete adualism during the first months of life, meaning that newborns were assumed unable to differentiate their own body from their surrounding world. According to these pioneers, the development of a sense of the body and awareness of a person's self would take months and even years to build. Nowadays, however, many researchers are acknowledging that an emerging sense of the body – a precursor of the sense of self – already begins to form in the womb (Reissland and Austen, 2018), can be observed at birth (Meltzoff and Moore, 1977; Filippetti et al., 2015) and is certainly present and detectable at 2 months of age (Bahrick, 1995; Rochat, 1995, 1998). Yet, systematic assessments of how such emerging sense of the body develops from birth to 2 months of life are generally lacking.

The goal of this study is to begin to examine infants' spontaneous contacts with their own body and supporting surface from 3 weeks to 2 months of life. This early period of development is critical because this is a time during which infants perform self-generated activities almost entirely within their peripersonal space and because these initial free form behaviors and contacts may be important contributors to the formation of body map representations and later goal-directed behaviors (Corbetta et al., 2014; Marshall and Meltzoff, 2015; Thomas et al., 2015). Indeed, prior to the onset of reaching, which occurs around 3–5 months of age, infants are greatly restricted in what they can do. As a result, they are often perceived as passive and dependent organisms that are limited to the sensory and physical experiences that happen in their immediate vicinity. Despite these limitations, development and learning is at work from day 1, and discovering the body via self-generated movements of the arms and legs is probably one of the earliest behaviors to which infants attend around the clock. Infants and even fetuses experience touch through spontaneous limb activity resulting in contact with their own bodies or their immediate environments (Thomas et al., 2015; Fagard et al., 2018). In fact, many early experiences center around touch, as touch is often considered one of the first senses to develop (Field, 2014; Reissland and Austen, 2018).

In this paper, we document the spontaneous touch activities of 4 infants that we followed longitudinally, every week, until they developed head control (between 9 and 13 weeks of age). Our goal was to provide a detailed description of the early development of infants' touches to their own body and supporting surface in order to gain a better understanding of how these early body/environment-oriented sensorimotor experiences might contribute to the development of an initial sense of the body, a necessary precursor to the emergence of reaching and subsequent goal-directed behaviors.

### The Emerging Sense of the Body

The ability to sense our body is intimately linked to selfperception. Self-perception can be proprioceptive as we move our limbs and head in space and time, and it can be haptic as we touch a surface or our own body. These deeply embodied sensorimotor experiences can occur in conjunction with other perceived information, such as turning the head toward a sound or looking at and/or tracking a moving object. Newborn infants spontaneously perform these activities from birth, on a day-today, second-by-second basis. At each moment, they are recipients of proprioceptive and haptic feedback that informs them about their posture, any changes in limb position, about contact with themselves or other surfaces, thereby allowing them to discover not only their limbs and their range of motion, but also the limits of their peripersonal space. These self-generated movements, as newborns move their limbs freely, clearly provide a foundation for exploratory behavior from which body representation and a basic, implicit form of self-knowledge build (Hoffmann, 2017). Particularly, touches to the body may provide redundant information about the limb posture in space, the part of the limb making contact with the body, and the body area being touched (Rochat and Hespos, 1997). These particular touches may differ from those where the limbs only touch the supporting surface on which the body lays. Indeed, these later touches may provide more specific information about limb extensions and their range of activity within the infant's peripersonal space.

Studies that have examined spontaneous and exploratory motor activities in early development have mainly focused on the prenatal, neonatal, and later months of the 1st year of life. Studies on prenatal development have shown that very early on, fetuses already direct their arms toward their body and face (Piontelli, 1987). Self-touches on the body and face increase in the last months of gestation and are accompanied by increasing gross body movement activity and increasingly complex and frequent limb movements (Andonotopo et al., 2004). Some other studies even provided evidence that an emerging sense of the body may already exist in the womb (Zoia et al., 2007; Reissland et al., 2014). These later studies reported that fetuses generate ample spontaneous limb movements, however, when limbs are approaching the mouth, it was observed that the speed of the limb movements decreased, compared to other limb movements directed toward other parts of the face, like the eyes (Zoia et al., 2007). Fetuses were also found to open their mouth in anticipation of their hand making contact with the mouth, suggesting that they may have had a basic body representation of where the hand was being directed (Myowa-Yamakoshi and Takeshita, 2006). These authors often argue for an early form of action planning based on spatial awareness and body knowledge (see also Reissland and Austen, 2018). However, the mouth may be a unique case and caution may be required in interpreting these prenatal movements as "goaldirected" or "prospective" (see Delafield-Butt et al., 2018). Indeed, other evidence has shown that infants at 3 months of postnatal age do not succeed in reaching toward their arms, hands, legs, or feet when prompted by visuo-tactile stimuli (Somogyi et al., 2018). In sum, observations of prenatal behavior reveal that selftouch is already very active in the womb and that these body oriented spontaneous behaviors, providing both proprioceptive and haptic information within the same time frame, may already begin to contribute to the emergence of an early sense of the body (Bahrick, 1995).

At birth, a slight regression in motor activity can occur as neonates adjust to the new ambient gravitational field, compared to when motor activity was performed in the amniotic fluid (Fagard et al., 2018). This transition typically translates into a slight decrease in nearly all hand to body self-touch activities, aside from hand-to-mouth movements which increase in postnatal life (Kurjak et al., 2004). Prematurely born infants also tend to move their hands to their head, the only part of their body not covered in clothing (Durier et al., 2015). The authors suggested that this postnatal increase in self-touch activity to the head could be related to self-soothing responses which again could be interpreted as evidence of an emerging sense of the self. Neonates have also been shown to imitate certain gestures (e.g., tongue protrusion, hand opening/closure) when modeled by an adult (Meltzoff and Moore, 1977; Vinter, 1986), and very recently, researchers have identified movements of the arms in few-hour-old full-term neonates that presented kinematic profiles consistent with those of movements performed prospectively, that is, similar to goal-directed patterns (Delafield-Butt et al., 2018). However, in this study, they also found that 25% of the responses were not meeting the authors' criteria for movement "prospectiveness," which caused the researchers to caution about the functionality of these early motor responses. Interestingly, in this same study, the researchers found that prospective arm activities were much disrupted in infants born preterm.

Clearly, more studies are needed to understand how prenatal motor activity relates to post-natal motor activity. Furthermore, to fully understand the functional context of self-touch activity and possible movement prospectiveness, these early behaviors should be studied from a dynamic systems perspective, that is within the realm of multiple developing systems such as hunger, comfort, motor ability, environmental stimulation, caregiver presence, and more, to assess variations in behavior and gain deeper insights into the meaning of these early movement activities.

When we turn to studies performed with older infants, reports evidencing a form of awareness of the body and limb movements become more frequent, especially in studies performed with infants aged 2 months and older. Investigations using contingent reinforcement in the mobile kicking paradigm have revealed that infants as young as 10 months old can modify their rate of kicking to increase the motion of a mobile that is tied to one of their legs (Rovee and Rovee, 1969). This change in kicking response indicates that they are capable of recognizing the contingency between their leg movements and the action of the mobile. Three-month-old infants were also shown to increasingly choose harder-to-produce simultaneous kicking of both legs to receive the contingent reinforcement (Thelen, 1994). In another variation of this leg kicking paradigm, Angulo-Kinzler and colleagues demonstrated that 3-month-old infants could even discover how to adopt specific leg postures or specific hip and knee angles to increase the mobile activity (Angulo-Kinzler, 2001; Angulo-Kinzler et al., 2002). Similar contingency discovery was additionally observed in 2-month-old infants when the mobile was attached to their arms, instead of their legs (Watanabe and Taga, 2006).

Other studies have shown that infants can detect incongruences between leg movements they produce and the filmed images of their own leg movements (Rochat and Morgan, 1995; Rochat, 1998). In those studies, when infants were shown inverted recordings of their actual leg movements (e.g., the right leg was moving on the TV monitor while infants were, in fact, moving their left leg), infants as young as 3 months old looked longer at the incongruent video than the congruent one. Along the same vein, studies on tactile stimulation (Bremner et al., 2008; Begum Ali et al., 2015) have revealed that between the ages of 4 and 6 months, infants are more likely to rely on haptic stimulation to select a limb when the limbs are crossed, compared to older infants who are slower and prone to more errors in limb selection. Presumably, the older infants are confused by the fact that haptic sense and the spatial representation of the source of stimulation do not match when the limbs are crossed. This is a puzzling finding, especially knowing that the body maps for the hands in the somatosensory area of the brain of 60-day and 7-month-old infants appear to be lateralized (Marshall and Meltzoff, 2015; Saby et al., 2015; Meltzoff et al., 2018a,b).

These studies as a whole clearly suggest that infants have developed a basic sense of their body by the age of 2 or 3 months. They can select and activate the limb that creates an interesting event or that corresponds to a lateralized haptic source of stimulation, they demonstrate a sense of agency, and attend more to the events that do not match the outcome of their actions. In a more recent study, however, using a different paradigm, body self-knowledge around that same age range appeared to be lacking. This study used tactile stimuli in the form of "pancake buzzers" that were placed on specific limbs or body areas of infants with widely varied reaching experiences (Somogyi et al., 2018). The buzzers produced small vibrations on the infants' skin and were also clearly visible to the infant depending on the body placement. In 3-month-old infants, the buzzer generated increased, generalized body activity that was non-specific to the location of the buzzer. Based on the studies reviewed above, one could assume that by 3 months of age, infants have acquired sufficient self-touch and limb movement experience to differentiate limb activity. Yet, from those findings, it remains unclear why at that age undifferentiated activity occurred.

From this brief review, it appears that self-touch activity takes place well before birth and intensifies as the fetus reaches the last gestational period. Observations of the limb movements of fetuses and neonates suggest that they may have begun to acquire an initial sense of their body. This sense of the body is becoming more evidenced from the age of 2 months and beyond, when infants demonstrate that they are capable of producing more targeted movement reactions in responses to specific stimulations or contexts involving specific parts of their body. However, studies examining in detail the limb movement activity of young babies in the first 2 months of their life are lacking. This is an important omission, as this period marks a time during which infants are adapting to their new airborne environment (Fagard et al., 2018). Newborn vision is also very poor, limiting their apprehension of the more distant extrapersonal space. Therefore, much of their sensory and motor

experience is centered on their body and the surface surrounding their body's limits. These deeply embodied first 2 months of life not only provide continuity between the early body sensations experienced in the womb and the more targeted responses of older infants, but also contribute greatly to the infants' journeys of discovering what they can do with their body and how they can situate themselves in the environment.

### This Study

The present study aims to examine the naturalistic progression of infants' spontaneous touches to their body and supporting surface from the time they are 3 weeks old until they have acquired head control (between 9 and 13 weeks of age). We observed infants weekly, while in supine, over two 5-min sessions varying only by the presence or absence of objects in their visual field. During that age span, most infants in western cultures spend a large amount of time in the supine position while in their cribs or play-pens. Therefore, studying self-touch in this context allowed us to examine the behaviors that infants would most likely exhibit and experience during that early age range.

This is a descriptive study that is part of a larger longitudinal study where we followed a few infants at close weekly intervals until they were able to reach for objects. In this report, we focused specifically on the first 2 months of life preceding the emergence of head control. The emergence of head control marks an important transition in the perceptual and motor development of the infants and provides a critical foundation to the formation of eye, head, and trunk control that is needed for object reaching (Bertenthal and von Hofsten, 1998). In our study design, when infants demonstrated head control, we no longer observed them while in supine; we moved them into a different paradigm, where they were supported on a seat, in order to capture reaching onset. In the present report, we concentrate on two supine conditions: a baseline condition, and a toys-in-view condition, where colorful toys were placed on the side of the infant preferred head turn. For each session, we asked how many touches infants performed during each 5-min observation, which part of their peripersonal space they touched most – their body or the supporting surface – or if they spent more time moving their arms in the air than touching their body. We documented which and how many different areas of their body they touched in one session, for how long, and if there were differences between right and left arms.

To our knowledge, only one study examined self-touch behavior in infants from birth to 24 weeks of age (Thomas et al., 2015). These researchers observed self-touch over 21 s (on average) of video recordings and limited their behavioral analyses to the first 10 self-touch observed. They also divided the body into 3 major areas: the head, torso, and legs, with detailed analyses of the hand posture during self-touch (i.e., palmar or dorsum contacts). The authors found that infants followed a cephalocaudal progression with more touches to the head and torso at first, followed by more touches to the legs by 12 weeks of age (Thomas et al., 2015).

The present study complements this prior work by providing detailed behavioral observations of fewer infants, but over segments of 5-min-weekly observations. We coded every touch performed in relation to a more detailed map of the body using a transition network to track where the hands moved from place to place on the body, including contacts with the supporting surface. We also controlled the position of the infants by using the posture that seemed the most ecologically valid for the age range studied and the one in which infants naturally explore and experience their body the most. Finally, we manipulated the environment of the infants by introducing colorful objects in the infants' view in one condition. The introduction of colorful objects in the infant view was aimed to assess whether perceiving objects would affect the patterns of touches to the body and surface. In particular, we thought that as infants would develop visual attention, especially in the later weeks when they approached 2 months of age, we could eventually observe a slowing down of movement of the arms since at that time infants could be expected to stare more at the objects (Colombo, 2001).

### MATERIALS AND METHODS

### Participants

Four infants (3 males, 1 female) were followed weekly from the age of 3 weeks up to the time they acquired 5 weeks of reaching experience. The method and data in this report focus only on the touch activity that occurred while infants were in supine during the pre-reaching period, that is the period spanning from 3-weeks-old until infants acquired head control (between 9 and 13 weeks of age). Potential participants were referred to us via an OB/GYN practice at the University of Tennessee Medical Center in Knoxville, TN, United States before the infants were born. The principal investigator (DC) met with the expecting parents to explain the goal of the study and methods used. If parents agreed to participate in the study with their infant, they signed a consent form and began to come to our Infant Perception-Action laboratory 3 weeks after their infant was born. One infant (♀) started the study at 4 weeks old and that same infant dropped from the study when she was nearing head control. Her parents were no longer able to bring her to the weekly sessions. Thus, this infant only provided touch data during the pre-reaching period. Also, all infants had one missing data collection session at some point in the study due to sickness. All four infants were born full-term, two via C-section. They weighed between 2693 and 3629 g at birth. Three of the four infants had APGAR scores of 8 and 9 at 1 and 5 min, respectively, after birth, one infant (♂) had APGAR scores of 3 and 5 at 1 and 5 min after birth but showed no neurological disorders or developmental complications during his follow up. Three infants were White, one was of Hispanic descent. Parents received a \$25 gift card at each visit and on their last visit, they also received a copy of all video records and a baby book of pictures of their child taken while in the laboratory. This study was approved by the Institutional Review Boards of the University of Tennessee and Medical Center.

### Materials

An all-white foam, uniformly flat and padded surface measuring 126 cm × 129 cm was placed on the laboratory floor to support the infants during the recordings. Two vertical white panels

(91.5 cm × 122.5 cm) standing on each side of the infants were used to block distractions from the surroundings (see **Figure 1**). Two digital videos (Panasonic PV-GS39), one recording from above and the other recording from the front were fed in a Digital Video Switcher SE-500 (Datavideo Corp., Whittier, CA, United States) providing a split-screen image of the two video images. These video recordings were captured on a Dell Optiplex 9020 via an Osprey 820e digital video capture card (ViewCast Corp., Plano, TX, United States) and recorded with the Debut Video Capture software (NCH Software Pty Ltd., Australia). Both split images provided a simultaneous full view of the infant body.

Objects used for the toys-in-view conditions were a fairy doll, a giraffe, and a ring stacker all made of soft cloth and colorful material (see **Figure 1B**). These objects measured between 19 and 23 cm in height and between 10 and 24 cm in width. The giraffe could play infant lullabies, but only when pressed on the tummy. A set of Dr. Seuss books and an infant mobile were also used for

FIGURE 1 | Recording setup. (A) Baseline condition, (B) toys-in-view condition. Written informed consent was obtained from the legal guardian of the infant for the publication of these images.

some of the testing conditions, but those conditions will not be reported in this manuscript.

In addition, the infants were wearing 8 mm markers attached to the dorsal side of their wrists with hypoallergenic Johnson and Johnson soft cloth tape. The markers were part of an electromagnetic motion analysis system (Flock of Bird, Ascension Technology Corp., Burlington, VT, United States) that was used to record the infants' arm movements. However, because the analyses reported in this manuscript focus mainly on touch activities, no movement kinematics analyses are included in this report.

### Procedure

The data collection sessions were scheduled at regular times during the weeks that were convenient for the parents and corresponded to wake times for the infants. Infants were brought to the laboratory following feeding times to ensure that they were alert during testing. Parents were not instructed to alter the clothing of the infant, and the observation proceeded with the clothing on that the infant wore into the lab. The clothing varied depending on the season ranging from onesies, dresses, and long sleeves with long pants. After birth, infants typically wear clothing throughout most of the day, thus leaving the clothing on the infants during our observations provided a naturalistic context closer to how infants normally experience their body on a daily basis. The current report focuses on the development of touch patterns in a baseline condition and a condition with objects in view. Recordings always began with the baseline conditions first during which the infants were placed in supine in the middle of the padded surface. No stimuli were presented during this condition (see **Figure 1A**). The toys-in-view condition immediately succeeded the baseline condition by placing the three objects (doll, giraffe, and ring stacker) parallel to the infants, at an out of reach distance of 43 cm (to not obstruct infants' hand paths), on the side infants displayed preferred head turn (see **Figure 1B**). The side of the infants preferred head turn was determined during the baseline condition. During recording, infants were free to move their arms and legs at their will. If they started to show signs of fussiness, parents were allowed to give them a pacifier, although in general, the use of pacifier was avoided as much as possible. Giving or adjusting the pacifier were the only instances where parents were allowed to intervene during the recordings. Each condition lasted 5 min, except for 1 week for one infant, and 1 week for two other infants where recordings were shortened to 3 and 4 min, respectively, in response to infant fussiness.

Three additional conditions were collected (a musical condition with the objects on the non-preferred head turn, a parent reading condition, and an overhead mobile condition). Touch in these conditions has not been analyzed yet. They were introduced mainly for the purpose of measuring changes in overall movement activity as a function to parental/musical sound, which is not the focus of this paper. On two out of the 32 weeks video recorded, infants received the mobile condition after the baseline, instead of the toys-in-view condition. This switch in condition was done in response to infant fussiness.

### Touch Coding and Analyses

fpsyg-09-02613 December 15, 2018 Time: 15:10 # 6

The coding of the videos was performed with the data video coding software Datavyu v1.2 (Datavyu Team, Databrary Project, New York University). The videos were scored continuously for the onsets and offsets of touches on the body and on the floor, respectively. Self-touches to the body were identified according to a body map of 20 areas (see **Figure 2**). The floor (or supporting surface on which the baby laid) was sectioned into three additional areas (see **Figures 2**, **3B**). Each hand was coded in separate passes. From the onset/offset of touches, we derived the duration of the touches (in milliseconds), as well as the duration when the hands were not touching the body or floor, we identified the area(s) of the body or floor where the touches occurred, and their frequency. For this coding, if a touch occurred in a continuous manner over more than one area of the body or the floor (for example, if the hand moved from head to trunk while maintaining contact with the body), it was counted as a single "complex" touch, but the different body/floor areas covered during such more complex touches were recorded. Likewise, depending on the analyses, the duration of those more complex touches was either considered as a single continuous touch with one duration, or the touch duration was split evenly across areas touched. Touches were not considered if they were shorter than 280 ms (7 video frames), or if they occasionally occurred in contact with the parents' hand (for example when the parent adjusted or gave the pacifier to their infant). Infants' hand contact with the parent hand occurred rarely. A code of unknown was also used for times when the infant hand could not be seen, and it was impossible to determine if a touch occurred. Unknown codes only represented 2.2% of the total video footage recorded across the 4 infants. Finally, touches to the mouth (as opposed to touches to other areas of the head) and touches on bare skin (as opposed to touches on clothing) were coded in separate passes.

FIGURE 2 | Map of body and floor areas used for the coding of the touch locations. The body was divided into 20 areas corresponding to specific body parts. The floor was divided into 3 broad areas (X, Y, Z) respective to the head, trunk, and legs of the infants. Body and floor were divided vertically into a right and a left side.

The touch coding was performed by 3 trained coders who worked independently. They each coded a 3rd of the entire video footage while ensuring that 20% of the videos were coded independently by all three pairs of coders to assess reliability coding among them. The weeks and infants were assigned randomly among coders. Interrater reliability scores for onset/offset of touches (with a 7-frame margin of error) were 80.3% for the left hand, 79.42% for the right hand (r = 0.980). Interrater reliabilities for the areas touched were 83.62% for the left hand and 85% for the right hand (r = 0.875). Touches to the mouth corresponded to 98.77% interrater agreements and touches to the skin yielded a 93.16% agreement.

We used the Social Network Analysis and Visualization software SocNetV v2.4 (Dimitris V. Kalamaras©, 2005–2018) to quantify the number of transitions (or connections) between body areas and supporting surface locations touched (nodes), to determine the centrality node (the body area from which most touches left), and to measure the network density, which captures the portion of potential connections in a network that correspond to actual connection (see **Figure 3**). As the number of connections across nodes increases, so does the network density. For this analysis, each area covered by complex touches was represented on the network map.

Our data met normality distribution assumptions. However, given the few missing weeks and the fact that not all infants were followed for the same duration, we used Linear Mixed Model (LMM) ANOVAs with a Bonferroni adjustment for multiple contrasts to analyze the trends in the data. All 4 infants provided data up to weeks 9 of age, one infant provided data up to week 12 and one infant was followed until week 13 of age. Weeks 10 and 13 ended up being excluded from our analyses because those weeks only had data for one infant. However, for the purpose of visualizing the data, these weeks are represented in our graphs. The symbols and lines correspond to those specific weeks excluded from the statistical analyses appear in gray on our graphs.

### RESULTS

### Durations of Hands on Body, Floor, or in the Air

Our first analysis was to assess where infants kept their hands the longest: in contact with their body, in contact with the supporting surface (floor), or in the air while transitioning from place to place. **Figure 4** shows the average percent of time all four infants spent in each of these broad locations by week. A Condition (2) × Hand (2) × Location (3) × Week (9) Linear Mixed-Model ANOVA revealed a main effect of location [F(2,228) = 45.249, p < 0.0001], but no main effects of condition, hand, or week. Infants spent significantly more time with their hands in the air moving it from one location to another, than either touching their bodies or the floor. This was true during both the baseline and toys-in-view conditions, and occurred similarly with either hand. An interaction between weeks and touched areas was also significant, [F(16,228) = 2.77, p = 0.001]. Pairwise comparisons indicated that in the earlier and later weeks,

these specific weeks were not entered in our statistical analyses.

infants spent relatively more time with their hands in the air compared to touching their body or the floor, however, in the middle period those differences were much smaller (p < 0.0001). Of the touches that infants made to their body, 45.11% were on bare skin locations, while the remaining 54.89% were on parts of the body covered with clothing.

### Network Density, Number of Nodes Touched, and Point of Centrality

In order to understand the complexity of how the infants distributed touches to their body and the floor, we created a network map to analyze the areas contacted by each hand, their densities and transitions. **Figure 3** provides an example of a network of touches that were exhibited independently by the right (**Figure 3A**) and left hand (**Figure 3C**) in one infant during the same week and condition (infant DJ, week 6, baseline condition). **Figure 3B** represents a map of frequency of touches as distributed across the 20 areas of the body and three areas of the floor. On this frequency map, each dot represents a touch to an area and the color indicates if the touch occurred with the left hand (red) of right hand (blue). Transitions between these touched areas were obtained from the temporal sequence of touches coded through Datavyu and subsequently entered in the

SocNetV program to create a network map. On **Figures 3A,C**, each dot (on the body) or triangle (on the floor) are "nodes" and represent areas where contacts occurred. The size and color of the nodes reflect how often those areas were touched: larger and "warmer" colored nodes reflect more touches to those areas. The arrows, and their directionality and thickness, represent the transitions from one node to the next. These are called "transitional arcs." Thicker lines correspond to more frequent transitions between nodes. Measures of network density and transitional arcs by week and by hand were obtained by dividing the number of observed connections in the network by the total number of possible connections between nodes. Network densities and transitional arcs express similar trends using different scales.

**Figure 5** shows the averaged network density and transitional arcs (per minute) across all four infants by week and by hand. The Linear Mixed Model [condition (2) × hand (2) × week (9)] revealed no significant main effects of density (and transitional arcs) across conditions [F(1,76) = 0.265, p = 0.608], and hands [F(1,76) = 0.957, p = 0.331], however, it revealed a main effect of week [F(8,76) = 2.627, p = 0.014]. **Figure 5** shows that density (or transitional arcs) declined as weeks passed indicating that infants' range of transitions across nodes lessened. On average, infants' transitions between nodes declined from 27.25 transitions per minute (SD = 9.016, range = 13–42) on week 3 to an average of 19 transition per minute (SD = 7.89, range = 10–35) by week 12. This decline in transition number, however, did not significantly affect the number of nodes visited in the network over time [F(8,38) = 1.521, p = 0.183]. **Figure 6** shows that infants transitioned on average across 9 nodes (or body/floor areas) by hand (M = 9.05, SD = 3.16, range = 6.75–11.17) in both conditions from week 3 up to they acquired head control. Statistical analyses on this measure reported no significant effects of condition or hand. Thus, the number of nodes visited over time did not change, but the routes that each hand took to transition to those nodes did.

Another measure that can capture variations in the network is the point of centrality. The point of centrality corresponds to the point on the network map where the greatest frequency of movements came to and departed from (that would be the "warmest" and largest node in the network). For example, for week 6 of infant DJ that is displayed in **Figure 3**, the point of centrality is the upper torso node for both hands. The points of centrality for each infant, by week, hand, and condition are reported in **Table 1**. This table shows that the floor was the most frequent point of centrality for all infants on most weeks, followed by the torso, the head next, and the arm on some weeks for some infants. Specifically, the floor happened to be the point of centrality 21 times (66.67%) for MA, 19 times (83.33%) for KP, 15 times (40.56%) for LN, and 14 times (56%) for DJ when we combine both hands and conditions. The torso was the point of centrality 15 times (40.54%) for LN, 9 times (36%) for DJ, 4 times for KP (16.67%) and 5 times for MA (14.7%). MA was the only infant with the head coming as the second highest point of centrality (N = 7, 19.44%), compared to LN, DJ and KP who had the head as point of centrality only 5 (13.31%) 1 (4%) and 1 (4.16%) times, respectively. The arms as the point of centrality occurred only twice for LN (5.41%), once for DJ and MA (4% and 2.78%, respectively) and never for LN. An exploratory Chi Square performed on these frequencies by week, condition, and hand, revealed no effects.

### Complex Touches

We mentioned earlier that sometimes touches were not limited to one single body area. Fairly frequently, infants moved their hand while remaining in contact with their body, thus crossing more than one of our defined body/floor areas. We called these touches "complex." We were curious to know how often these

and conditions.

fpsyg-09-02613 December 15, 2018 Time: 15:10 # 9

complex touches occurred in the 2-month period examined, as these touches may express a deeper and more extensive haptic exploration of the body and surrounding space.

TABLE 1 | Point of centrality by infant, hand, week, and condition.

The horizontal dashed line corresponds to the grand average across weeks

**Figure 7** reveals that overall, the complex touches that contacted two or more body/floor areas represented on average 32.47% (SD = 18.01, range = 19.83–42.81%) of all touches. Touches to three or more areas represented 14.8% of all touches. For the two or more touched areas, a Condition (2) × Hand (2) × Week (9) Linear Mixed Model revealed a main effect of week [F(8,76) = 2.246, p = 0.033]. For the three or more touched areas, the Linear Mixed Model main effect of week remained [F(8,76) = 2.549, p = 0.016]. No other significant effects were found. The developmental trend observed was a declining one. **Figure 7** shows that complex touches represented 36.6% of the touches on week 3, while they declined to 29.36% on week 12. The high percentage point observed on week 9 was due to one infant (DJ) who performed an unusually high number of complex touches on that particular week. Analyses comparing the duration of those complex touches with those of simple touches (those limited to only one specific area) revealed no differences. In other words, touches to one area were as long as touches to several areas.

### Frequency and Duration of Touches Between Body and Floor

How many touches altogether did infants perform in the 5 min observations? **Table 2** reports the average number of individual


U, upper; L, lower; Italic indicates contralateral point of centrality.

```
not entered in our statistical analyses.
```
contacts to the body and floor per week (collapsed across areas and hands) for all 4 infants and between conditions. This table shows that regardless of condition, on any week, infants maintained an overall high level of touches [baseline grand average = 113.17, SD = 48.90, range = 85–162.5, M rate (per minute) = 23.40, SD = 10.2, range = 17–35.71; toys-in-view grand average = 105.23, SD = 45.27, range = 66.75–158, M rate (per minute) = 21.72, SD = 9.13, range = 15.3–29.1].

When we distinguished touches between those performed on the body and those performed on the floor, we found that touches were more frequently directed to the body (rate per minute:



M = 13.35, SD = 8.11, range = 8.7–18.7) than to the floor (rate per minute: M = 7.61, SD = 4.31, range = 5.13–11.85; see **Figure 8**, left graph). The Linear Mixed Model [condition (2) × location (2) × week (9)] revealed that location yielded a significant main effect [F(1,76) = 18.704, p < 0.0001]. No other main effect or interaction was significant.

When we examined the average duration of each touch (not the frequency of touches), we found the opposite trend. Touches to the floor, although relatively less frequent than touches to the body, were on average of longer duration (milliseconds: M = 4777.95, SD = 4910.736, range = 1731.67– 8432.15) than touches to the body (milliseconds: M = 2880.27, SD = 2587.96, range = 1409.06–4079.65). Again, a Linear Mixed Model [condition (2) × location (2) × week (9)] revealed that the durations of touches between body and floor areas were significantly different [F(1,78) = 4.549, p = 0.036; see **Figure 8**, right graph]. No other main effect or interaction was significant. Thus, while touches to the body were more frequent, they were of lesser duration.

### Frequency and Duration of Touches Between Left and Right Hand

When we computed the number and duration of all touches (body and floor combined) by hand and across infants, the Linear Mixed Model analyses [Condition (2) × Hand (2) × Week (9)] revealed no lateral differences by week or condition. There were no significant differences in the rates of touches performed between the left hand (M = 12.03, SD = 5.90, range = 9.03–16.65) and the right hand [M = 10.56, SD = 4.52, range = 7.42–15.40; F(1,76) = 1.916, p = 0.17]. Likewise, the durations of touches

between the left (milliseconds: M = 3673.22, SD = 3852.13, range = 1483.97–6001.88) and the right hand (milliseconds: M = 3384.06, SD = 2923.75, range = 1167.02–5131.24) were not different [F(1,76) = 0.185, p = 0.669]. However, when we collapsed the rate of touch across hands, the Linear Mixed Model revealed a main effect of week [F(8,76) = 3.35, p = 0.002]. **Figure 9** shows that the overall rate of touch, whether to the body or the floor, increased over time.

### Frequency and Duration of Self-Touches to the Body: Head vs. Torso

Finally, our last analysis focused on the body alone and aimed at comparing differences in self-touch between head and body. The Linear Mixed Model [condition (2) × body area (2) × week (9)] performed on the rates of touch to those areas revealed that overall infants contacted their torso at a significantly higher rate (N per minute: M = 6.33, SD = 6.1, range = 2.8–9.48) than their head [N per minute: M = 4.10, SD = 4.21, range = 0.63– 8.00; F(1,76) = 6.071, p = 0.015]. This body area main effect was accompanied by a significant main effect of week [F(8,76) = 2.62, p = 0.014], but no effect of condition. A significant area by week interaction further indicated that the rate of touches directed to the torso increased significantly in the last weeks of the study compared to those directed to the head [F(8,76) = 4.11, p < 0.001; see **Figure 10**]. This effect was driven primarily by the two infants who were followed beyond the age of 9 weeks old. Until week 9, all infants' rate of touch to the head versus torso were not different.

In relation to the duration of touches directed at the torso versus the head, the Linear Mixed Model [condition (2) × body area (2) × week (9)] revealed no major main effects nor interactions. The durations of touches directed to the torso lasted on average 2165.94 milliseconds (SD = 3427.51, range = 1064.21– 8532.34) and those directed to the head lasted on average 2755.60 milliseconds (SD = 3795.51, range = 1028.89–5370.25).

Finally, we examined how many touches to the head resulted in contacts to the mouth. The Linear Mixed Model [condition (2) × hand (2) × week (8)] performed on the proportion of touches to the mouth out of total touches to the head revealed no major effects nor interactions. In this analysis, week 11 was excluded because one of the two infants did not contact the head at all. Of the touches that occurred to the head, only 17.16% of these touches resulted in a contact to the mouth (SD = 0.25, range = 0.10–0.42).

### DISCUSSION

The aim of this descriptive study was to examine the spontaneous touch activity of a few infants every week, over two 5-min time windows, from the age of 3 weeks up to the time they acquired head control – a developmental period relatively understudied. Our results revealed that from 3 weeks of age, infants actively contacted their body and the supporting surface, and they continued to do so until our observations ended. The numbers we report are in fact quite stunning. During our observations, on most weeks, infants produced nearly 200 contacts on their bodies and the supporting surface in a cumulated 10-min time period. They also spent about 50% of that time moving their arms in the air, going from one place of contact to another. This time with the hands away from any contact was significantly more than the time used to contact either the body or the supporting surface. If we multiply these numbers by the number of hours and days infants spend in a crib or playpen over a 2-month period, it becomes clear that from very early on infants receive a great deal of haptic and proprioceptive experience through their own self-generated activity.

As mentioned in the introduction, such activities are fundamental for developing an early sense of the body and for discovering the boundaries of the peripersonal space in which future developing goal-directed actions will take place. The active touches we observed were not only expressed by the high number of contacts performed, they were also indexed by the many areas that were being contacted on a weekly basis. Infants contacted roughly as many as 8–10 (out of 20) different areas on each side of their body with each hand on most weeks. The number of body areas contacted is double if we combine the number of

touched areas from both hands. The untouched areas were on the 2 nodes on the bottom of the legs and the bottom floor areas, the only places that infants of those ages could not reach to. In other words, combining both hands, infants contacted all the possible body areas that were within their hand reach. Each hand mostly contacted body and floor areas that were ipsilateral to the hand making the contact, although contralateral touches occurred occasionally.

Our skewed coding scheme, which divided the floor only into three broad areas, compared to the body that was divided into a more detailed map, may give the false impression that touches to the body were more numerous than those to the floor. But the analysis comparing the overall rate of touches between floor and body independently of the area divisions confirmed that the body was touched at an higher rate than the floor. Interestingly, however, the durations of the contacts on the floor were longer than those performed on the body. The floor was also a frequent point of centrality for all four infants, followed by the torso as the second point, indicating that contacts to the body were frequently interspersed with contacts to the floor. Thus, infants explored

their body with frequent touches, they explored their bodies widely by touching many body areas (mainly their head and torso which were areas within arm's length), and they explored their body in relation to the supporting surface.

The meaning of more frequent touches to the body compared to longer touches to the floor is hard to discern given the descriptive nature of our study. But one can speculate on the range of explanations that could account for such findings. More frequent but shorter touches to the body could be more selfstimulating or self-soothing. These body-oriented touches indeed provide redundant information between the hand and body part that are being simultaneously contacted. Self-stimulation could have a significant value for young babies who initially have poor visual acuity and whose sensorimotor experiences are mainly centered around their body. Touches to the floor, on the other hand, may be more novel. These are contacts with a foreign surface never experienced before birth, whereas selfcontacts with the body have clearly been experienced extensively before birth, particularly during the last months of gestation when space in the womb is tighter. Novelty with floor contacts may also entail novel arm and body postures, causing at times the stretching of muscles, compared to the more familiar limb flexions. Thus, stretching may cause new body sensations that were seldom experienced in the last months of gestation. Less frequent but longer touches to the floor may also express more relaxed states, periods of rest in between periods of active body exploration. Clearly more studies will be needed to better understand the nature of these differences in touch between surface and body.

We observed few developmental changes in the above measures, indicating that the frequent touches and active motion of the arms from place to place were an ongoing constant in those infants during their 2 first months of life. Thomas et al. (2015) who followed infants from birth to 24 weeks of age, also did not report many developmental changes during the early period. Most changes in self-touch that they observed seemed to start occurring between 12 and 14 weeks of age. In our study, however, where we documented every single touch over much longer 5 min recording periods than Thomas et al. (2015) did, we found an increase in total number of touches from 3 to 12 weeks of age. During that time period, the network density decreased. In fact, this increase in the number of touches and decrease in network density over time appeared to be related to a particular type of touches that we categorized as complex touches. We defined complex touches as those touches that transitioned over more than one body area while the hand remained in contact with the body/floor. We found that infants produced proportionately more of those complex touches in the early weeks than the later ones, which accounted for the higher network densities and lower touch count observed in the early weeks. Indeed, according to our coding scheme, complex touches counted for one touch when they occurred, but they translated into more than one area of contact when we tracked the spatial areas where contact occurred. The fact that infants produced proportionately more of those complex touches during the earlier weeks of life could possibly reflect a different kind of body exploration where the arm movements and haptic feedback being received simultaneously offer redundancy or an enhanced sensory experience that could well contribute to initially defining the body, its different parts, and their position in relation to the supporting surface. It is also possible that the greater flexor activity of young infants in their first weeks of life is at the origins of this greater number of complex touches observed early in life (Gesell, 1946). As infants progressively learn to extend their arms away from their body, complex touches, in turn, are expected to decline in number.

In line with Thomas et al. (2015), we saw a developmental change in the rate of touches to the torso and the head areas. Initially, the infants touched the torso and head as frequently, but the two infants who were followed beyond 9 weeks of age displayed a significant change in their distribution of touches between head and torso at weeks 11 and 12. Touches to the torso increased while touches to the head decreased. This transition coincides with the observations of Thomas et al. (2015). At around that same age range, these researchers noticed that touch became more caudal and was directed more to the lower body areas. Clearly, more observations with more babies during this age period are needed to further substantiate this transition. For now, we can only speculate as to what may have caused this transition. One possible explanation could be linked to the significant changes in the visual system and visual attention that occur during this age period [(Colombo, 2001); see also Corbetta et al. (2018) for a review]. As infants direct more visual attention to the surrounding world, they may direct their hands or relax their limbs more frequently along their torsos.

Infants touched body areas covered by clothing nearly as much as bare skin areas, thus wearing clothing did not appear to influence self-touch activity to bare skin areas. Further, the most represented point of centrality on the body was the torso, an area covered by clothing. We were also surprised by the low rate of touches to the mouth. Given the literature on hand-to-mouth behavior (Butterworth and Hopkins, 1988; Rochat et al., 1988; Rochat, 1993) and the recent demonstration that the mouth in 60 day-old infants is indeed a very sensitive area of the face (Meltzoff et al., 2018b), we expected touches to the mouth to be much higher. One possible explanation for our result is that studies on hand-to-mouth focused specifically on that particular behavior, while our observations documented all touches to all reachable body areas. When considering the rate of mouth touches within the realm of all touches preformed to the head, we found that touches to the mouth were not as frequent as one would expect. This finding, however, should be put to further scrutiny.

We found no discernable effects of condition or laterality. Infants moved their arms and touched their bodies and supporting surface as much when toys were in view as when not in view. As a group, they also displayed no evidence of lateral differences between hands in either selftouch, surface contact, or time spent with the arms moving from place to place. By 60 days old, infants already show hemispheric somatosensory responses from haptic stimulation to the contralateral hand indicating that body lateralization is already somehow represented in their brain (Meltzoff et al., 2018b). But spontaneous movements of the arms are different from receiving a local stimulation on the skin surface of the hand, and it is possible that despite contralateral

somatosensory representation of the hand, infant have not yet established a selective motor dominance for hand use by 2 months of age. Studies that have examined lateral differences in hand movements during the pre-reaching period have reported no preference in arm activity, whether activity differences were assessed by movement count or kinematic recordings (Lynch et al., 2008; Jacobsohn et al., 2014). Further, even though infants in our study displayed head turn preferences, especially during the first weeks, these head turns, as a group, did not seem to have affected touches differentially between hands. However, it is possible that individual head turns may have had an impact. This is something we are planning to examine in future analyses.

The fact that toys in view did not affect touch patterns, their rate, duration, or location during the 2-month period was more surprising. Our objects were brightly colored and stood out from the uniform white background. But it is possible that during this very early period, when vision is poor and arm activities are mainly centered around the body, colored objects in the visual field may not be so relevant. Furthermore, our objects were static and as a result may have failed to capture the attention of the infants at an age range where object motion is important to trigger a behavioral response (von Hofsten, 1982).

The present study provided detailed information on the touch activity of infants while in supine during the 2 first months of life – a period that has not previously been extensively studied. It is assumed throughout the manuscript that these early touch activities directed to the body and the supporting surface play an important role in providing a sense of the body and an emerging sense of the self that are essential for the development of future interactions with the physical and social world. We found that from the age of 3 weeks, infants engaged in extensive touch activities of their bodies and the supporting surface on which they lay and continued to do so until they attained head control. This study was limited to intensive observations of 4 typically developing infants followed weekly. Future studies could expand on these initial observations by documenting this activity in non-typically developing populations and over larger samples of infants to assess the impact of these early embodied experiences on the formation of future goal-directed behaviors. Future studies could also track infants over a more extended developmental period, more postures, and varied conditions to obtain a comprehensive depiction of how touch experiences may contribute to infant development. Also, in this study, we shifted infants in a different paradigm as soon as they acquired head control to capture the emergence of reaching, but it remains to be seen if the intensity of their touch activities would have changed more readily after this 2-month transitional period where vision, head control, and attention all show important changes. We encourage researchers to examine more

### REFERENCES

Andonotopo, W., Stanojevic, M., Kurjak, A., Azumendi, G., and Carrera, J. M. (2004). Assessment of fetal behavior and general movements by fourdimensional sonography. Ultrasound Rev. Obstet. Gynecol. 4, 103–114. doi: 10.1080/14722240400016895

in-depth the behaviors of infants in the first months of life as they are foundational to future sensory, motor, and cognitive development.

### ETHICS STATEMENT

This study was carried out in accordance with the recommendations of the Institutional Review Boards of the University of Tennessee and Medical Center in Knoxville, TN, United States. All parents of the infants gave written informed consent in accordance with the Declaration of Helsinki. The protocol was approved by the Institutional Review Boards of the University of Tennessee and Medical Center in Knoxville, TN, United States.

### AUTHOR CONTRIBUTIONS

DC lead principal investigator of this project, designed and oversaw every aspect of the study, including data collection and analysis, and was majorly involved in the writing of the manuscript. AD, JC, and MC collectively contributed to data collection, provided major contributions in refining the coding scheme for the touches, coded the touches from the videos in Datavyu, helped with many aspects of data processing, and provided first drafts on sections of the manuscript.

### FUNDING

This project was funded by a seed grant from the Neuroscience Network of Eastern Tennessee (NeuroNET) at the University of Tennessee, Knoxville.

### ACKNOWLEDGMENTS

We thank the parents and their infants for their amazing commitment to this study. We could not have performed this study without their strong support. Dr. Craig Towers, M.D., and Beth W. Weitz, R.N., at the University of Tennessee Medical Center helped in recruiting the participants. We thank Ashley Forster and Elizabeth Steward for their hard work and dedication in drafting an initial coding scheme for analyzing infants' touches. We also thank Drs. Rebecca Wiener, Sabrina Thurman, and many undergraduate students for their assistance with data collection. Dr. Marianne Jover from the University of Aix-Marseille, France, provided valuable feedback on an earlier draft of this manuscript.

Angulo-Kinzler, R. M. (2001). Exploration and selection of intralimb coordination patterns in 3-month-old infants. J. Mot. Behav. 33, 363–376. doi: 10.1080/ 00222890109601920

Angulo-Kinzler, R. M., Ulrich, B., and Thelen, E. (2002). Three-month-old infants can select specific leg motor solutions. Motor Control 6, 52–68. doi: 10.1123/ mcj.6.1.52


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2018 DiMercurio, Connell, Clark and Corbetta. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Interpersonal Influences on Body Representations in the Infant Brain

Ashley R. Drew<sup>1</sup> \*, Andrew N. Meltzoff<sup>1</sup> and Peter J. Marshall<sup>2</sup>

1 Institute for Learning and Brain Sciences, University of Washington, Seattle, WA, United States, <sup>2</sup> Department of Psychology, Temple University, Philadelphia, PA, United States

Within cognitive neuroscience, there is burgeoning interest in how the body is represented in the adult brain. However, there are large gaps in the understanding of neural body representations from a developmental perspective. Of particular interest are the interconnections between somatosensation and vision, specifically infants' abilities to register correspondences between their own bodies and the bodies of others. Such registration may play an important role in social learning and in engendering feelings of connectedness with others. In the current study, we further explored the interpersonal aspects of neural body representations by examining whether responses to tactile stimulation in 7-month-old infants are influenced by viewing another's body. During EEG recording, infants (N = 60) observed a live presentation of an experimenter's hand or foot being touched. During the presentation of touch to the adult's hand or foot, the infant received a brief tactile touch to their right hand or right foot. This resulted in four conditions: (i) receive hand stimulation/observe hand stimulation, (ii) receive hand stimulation/observe foot stimulation, (iii) receive foot stimulation/observe hand stimulation, and (iv) receive foot stimulation/observe foot stimulation. Analyses compared responses overlying hand and foot regions when the observed limb matched the stimulated limb (congruent) and did not match (incongruent). In line with prior work, tactile stimulation elicited a somatotopic pattern of results in the somatosensory evoked potential (SEP) and the sensorimotor mu rhythm (6–9 Hz). Cross-modal influences were observed in the beta rhythm (11–13 Hz) response and in the late potential of the SEP response (400–600 ms). Beta desynchronization was greater for congruent compared to incongruent conditions. Additionally, tactile stimulation to the foot elicited larger mean amplitudes for congruent compared to incongruent conditions. The opposite was true for stimulation to the hand. This set of novel findings suggests the importance of considering cross-modal effects in the study of neural body representations in the infant brain. Continued work in this new area of infant neuroscience research can inform how interpersonal aspects of body representations may serve to undergird early social learning.

#### Edited by:

Eszter Somogyi, University of Portsmouth, United Kingdom

#### Reviewed by:

Ric Dalla Volta, Università degli Studi Magna Græcia di Catanzaro, Italy Daniela Corbetta, University of Tennessee, Knoxville, United States

> \*Correspondence: Ashley R. Drew ashdrew@uw.edu

#### Specialty section:

This article was submitted to Cognition, a section of the journal Frontiers in Psychology

Received: 31 August 2018 Accepted: 04 December 2018 Published: 21 December 2018

#### Citation:

Drew AR, Meltzoff AN and Marshall PJ (2018) Interpersonal Influences on Body Representations in the Infant Brain. Front. Psychol. 9:2601. doi: 10.3389/fpsyg.2018.02601

Keywords: infants, EEG, somatosensory, touch, self, social perception, attention, interpersonal engagement

## INTRODUCTION

fpsyg-09-02601 December 20, 2018 Time: 18:24 # 2

The term "body representations" can refer to several different kinds of body-related constructs. One prominent approach to studying body representations has been to examine the neural mechanisms involved in the organization and maintenance of somatosensory processing in the brain. Most commonly, this pertains to the somatotopic representation of the body surface in primary somatosensory cortex, sometimes called a "body map." A burgeoning aspect of this neuroscientific literature concerns the question of whether representations of one's own body are connected with the representations of the body of others. In studies of human adults, it has been well documented that motor and sensory cortices allowing the control of movement and the registration of touch are also activated while observing others moving or being touched (Keysers et al., 2004; Rizzolatti and Craighero, 2004; Singer et al., 2004) and efforts have been made at modeling this (Pitti et al., 2013). This vicarious aspect of sensorimotor processing may draw on interconnections between vision and somatosensation, the study of which could provide insights into the origins and maintenance of interpersonal connectivity in early childhood (Marshall and Meltzoff, 2015; Meltzoff and Marshall, 2018).

Identifying self-other correspondences plays a role in social perception across the lifespan, and may be especially important for infants prior to language (Meltzoff, 2007). According to the "Like-Me" hypothesis (Meltzoff, 2007), the development of social cognition in infancy is grounded in the process of observing that others are similar to me at the level of bodily acts (Meltzoff, 2013). Such bodily connections between self and other may provide a foundation on which interpersonal relationships are built (Marshall and Meltzoff, 2014, 2015). One avenue for studying these aspects of body representations is the examination of how vision of others' bodies influences the processing of tactile stimulation to one's own body. The work presented here examines how brain responses to tactile stimulation of infants' hands and feet are influenced by the vision of another person's hand or foot being touched. At the highest level, the present work examines multimodal representation of the body in the infant brain.

In the current study we use the infant electroencephalogram (EEG) to investigate body representations. One advantage of employing EEG in the study of infant body representations is the temporally fine-grained way it allows for the examination of the processing of somatosensory stimulation. In turn, this temporal precision provides a window into different stages of somatosensory processing. In the current work, the influence of vision of other's hands and feet being touched was tested for two aspects of infant EEG responses to tactile stimulation: (i) sensorimotor EEG oscillations, specifically the infant mu (6–9 Hz) and low beta (11–13 Hz) rhythms, and (ii) somatosensory evoked potentials (SEPs) elicited to touch. Both kinds of responses were examined at electrode sites overlying cortical sensorimotor regions, specifically central electrode sites. The three central electrode sites of interest in the current study were electrodes Cz (medial central) and C3 and C4 (left and right lateral central).

Previous work with infants has established that, in line with the somatotopic organization of somatosensory cortex found in adults, tactile stimulation of the right hand elicits the largest response in the infant EEG signal over the contralateral (left) electrode C3, stimulation of the left hand elicits a response over the contralateral (right) electrode C4, and tactile stimulation of the foot elicits a response at the midline central electrode (Cz) (Saby et al., 2015; Meltzoff et al., 2019). Further insights about somatotopy come from an EEG study of 6-7-month-old infants showing that the amplitude of the somatosensory mismatch negativity in infants is sensitive to the somatotopic arrangement of the body in primary somatosensory cortex (Shen et al., 2018).

The mu rhythm has frequently been employed to examine neural linkages between the execution of actions and the observation of similar actions (Fox et al., 2016). The infant mu rhythm occurs at a lower frequency range (6–9 Hz) than in adults (8–13 Hz) (Stroganova et al., 1999; Marshall et al., 2002). The beta rhythm (13–30 Hz in adults, lower in infants as noted below) also demonstrates consistent responses related to sensorimotor activity (for a review see Kilavik et al., 2013). In adults, beta power decreases during movement, tactile stimulation, and observation of actions, followed by a characteristic increase in beta power 300–1000 ms after stimuli completion that is known as the beta "rebound" (Gaetz and Cheyne, 2006; Caetano et al., 2007; Kilavik et al., 2013).

In the current study, we examined both mu and beta rhythms in response to tactile stimulation of infant body parts. While the boundaries of the infant beta rhythm have not yet been clearly established during infancy, visual inspection of our timefrequency plots showed time-locked activity in a low beta band (11–13 Hz). This aligns with expectations of rhythms occurring at lower frequency ranges during infancy compared to adulthood, although there is variability in approaches to delineating infant beta. Early and late windows of oscillatory responses were analyzed to account for power rebounds that are regularly observed in adults.

There is an established body of literature on EEG and MEG evoked responses to tactile stimulation in infants (Gondo et al., 2001; Nevalainen et al., 2008; Saby et al., 2015; Meltzoff et al., 2018, 2019). EEG studies reporting on the SEP response to tactile stimulation have found a large positivity occurring between 100 and 300 ms post-stimulus. For example, in a study of 7-monthold infants, Saby et al. (2015) observed a peak in the SEP at around 175 ms post-stimulus onset. In line with a somatotopic response pattern, the largest mean amplitudes of the early positivity to foot stimulation were found at electrode Cz, which overlies the foot region of sensorimotor cortex. Following hand stimulation, the largest responses were found over more lateral hand regions, with the response strongest at the site contralateral to tactile stimulation (C3 for right hand stimulation and C4 for left hand stimulation). A similar somatotopic pattern has been found in an EEG study of infants as young as 60 days of age (Meltzoff et al., 2019).

A series of recent studies has gone beyond unimodal tactile perception alone and provided evidence of a mapping between infants' representations of their own body and the bodies of others by examining the effect of body-specific visual stimuli

on sensorimotor EEG responses. In a live observation protocol, 14-month-old infants observed actions of an adult reaching toward and touching a toy using her hand or her foot (Saby et al., 2013). The infant mu rhythm response displayed a somatotopic pattern during the observation of the hand and foot actions, with greater mu desynchronization occurring over sensorimotor areas corresponding to the observed body part (i.e., a lateralized event-related desynchronization (ERD) response for hands and a medial response for feet). In a converging study using older infants, 12-month-old infants viewed videos of a human hand being touched or not touched (i.e., no contact was made) by an object (Müller et al., 2017). The extent of desynchronization of the infant mu rhythm over central-parietal sites was significantly greater when the human hand was touched. In a detailed MEG study of 7-month-old infants using source analysis, regions of cortex that were activated when the infant received a touch to the hand or foot were also found to be activated when watching a video of another person's hand or foot being touched (Meltzoff et al., 2018). Taken together, these findings provide evidence for connections between the representation of the infant's own body and the bodies of others.

We believe that a promising path toward enriching our understanding of infant body representations is to develop new paradigms for examining the multisensory integration of bodily information in young infants (e.g., Meltzoff and Marshall, 2018; Somogyi et al., 2018). Of particular interest are the temporal interactions between vision and somatosensation.

Adult studies have shown cross-modal effects such that viewing a body part modulates SEP responses to tactile stimulation while viewing the same part of one's own body (Taylor-Clarke et al., 2002; Sambo et al., 2009; Cardini et al., 2012) and (to a lesser extent) while viewing the relevant part of another person's body (Deschrijver et al., 2015; Adler et al., 2016). A similar body-specific visual modulation of neural responses to touch was demonstrated in a study of 3–4-year-old children using MEG (Remijn et al., 2014). To date, only one study has examined neural responses to simultaneous visual and tactile stimuli during infancy (Rigato et al., 2017). In this EEG study, 4-month-old infants viewed videos of a paintbrush touching an experimenter's hand or the table surface next to the hand. The visual and tactile stimuli were synchronized such that infants received a vibrotactile pulse to the hand for 200 ms when the paintbrush made contact with the hand or the table. A positive peak in the SEP occurred within the first 200 ms after tactile stimulus onset, with significant differences in the amplitude of this peak occurring between the two conditions.

In the current study, we extended existing work by manipulating the correspondence of limbs in visual-tactile events in order to examine specificities of self-other body mappings in infancy. One novelty of the current study is that it used live visual presentations instead of video recordings, in order to attain greater ecological validity. Using a between-subjects design, 7-month-old infants received tactile stimulation to either their right hand or right foot. These touches occurred while infants observed an experimenter's hand or foot being touched. This resulted in four conditions: (i) receive hand stimulation/observe hand stimulation, (ii) receive hand stimulation/observe foot stimulation, (iii) receive foot stimulation/observe hand stimulation, and (iv) receive foot stimulation/observe foot stimulation. We tested whether there were differences in the sensorimotor EEG rhythms (mu, beta) and SEP responses when the site of tactile stimulation to the infant was congruent with the site of observed stimulation, compared to when these sites were incongruent. Furthermore, we examined infants' looking time to observing cross-modally congruent vs. incongruent displays.

### MATERIALS AND METHODS

### Participants

Eighty-six infants were recruited from a diverse urban environment using commercially available mailing lists. All participating infants were born within 3 weeks of their due date and had not experienced serious developmental delays or illness. Infants taking long-term medication or who had two left-handed parents were excluded from the study. Twenty-six infants were not included in analyses due to insufficient trials remaining after rejection for movement artifact and/or lack of attention to the visual stimulus. The final participant sample comprised 60 infants (mean age = 6 months, 20 days; SD = 17 days). Within the final sample, 29 infants received stimulation to the right hand (19 females) and 31 infants received stimulation to the right foot (15 females).

### Tactile Stimulation

Tactile stimulation was delivered to the right hand or right foot of infants using an inflatable membrane mounted in a plastic casing (10 mm diameter; MEG International Services, Coquitlam, BC, Canada). A similar device for producing tactile stimulation has been used in prior EEG (Saby et al., 2015) and MEG studies (Pihko and Lauronen, 2004; Pihko et al., 2009; Meltzoff et al., 2018). Via flexible polyurethane tubing (3 m length, 3.2 mm outer diameter), the membrane was inflated by a short burst of compressed air controlled by STIM stimulus presentation software and a pneumatic stimulator unit (both from James Long Company, Caroga Lake, NY, United States).

For the delivery of tactile stimulation, a keypress by an experimenter triggered a solenoid to be opened on the pneumatic stimulator for 10 ms. This elicited an expansion of the membrane beginning 15 ms after trigger onset and peaking around 35 ms after trigger onset. The total duration of the membrane movement was around 100 ms. The 15 ms delay between trigger and membrane movement was corrected for in the timing of the events so that the time of 0 ms was the onset of membrane movement. The experimenter and pneumatic stimulator were located in an adjacent room behind a closed door to minimize audible solenoid operation in the testing room.

### Procedure

While seated on their caregiver's lap, the infant's head was measured and the infant was then fitted with an appropriately sized EEG cap. Tactile stimulators were attached at the midpoint of the dorsal surface of the right hand and right foot of the infant. The stimulators were attached using double-sided

adhesive electrode collars in combination with medical tape, and then covered with a tubular bandage to hold them firmly in place, following the procedure used by Saby et al. (2015). A between-groups design was used to maximize the number of trials per condition. Infants were randomly assigned to one of two conditions: to receive stimulation to their hand or to receive stimulation to their foot. Infants sat on their caregiver's lap throughout the experimental procedure. The caregiver was given instructions to prevent infants from putting objects in their mouth and to try to minimize extra movements.

### Visual Stimuli

The protocol involved the coordinated work of three experimenters in order to achieve a well-controlled live 3-D display. Sitting behind a curtain, Experimenter 1 began by reaching beyond the curtain to display a spinning toy to attract the infant's attention (∼56 cm away from the infant). Once the infant's attention was obtained, Experimenter 1 retracted the toy and held out either her right hand or her right foot. Experimenter 2 (who was completely out of sight of the infant) accomplished a touch of the Experimenter 1's hand or foot with a feather duster (see **Figure 1**) for approximately 3–4 s. While the feather duster was touching the hand or foot, Experimenter 3 (who sitting in an adjacent room and was observing a live video feed) twice triggered the opening of the solenoid, allowing the infant to receive two successive tactile stimulations (∼2 s apart). This process was repeated for a total of five times for a total of 10 tactile stimulations in one block. The blocks alternated between the display of the hand and foot of Experimenter 1 to the infants. The protocol contained a maximum total of 160 tactile stimuli (16 blocks), although the procedure was terminated if the infant could no longer maintain attention to the visual stimuli or became overly fussy.

### Video-Recording of the Test Session

The experimental session was recorded on video for the purpose of coding infant attention and movement. A vertical interval time code (VITC) was placed on the video signal that was aligned with EEG collection at the level of one video frame. For each tactile stimulus, the epoch from −250 to 250 ms before and after the onset of the stimuli was coded offline for infant attention toward the experimenter's hand or foot and large movements of the infant. Attention was coded if the infant maintained looking toward the hand or foot for the entirety of the epoch. Epochs were coded as containing large movements if they included gross body movements or large, repetitive movement of a limb (e.g., kicking a leg or batting a hand). Only trials in which the infant was attending to the visual stimulus were included in the final EEG analyses. In addition, trials containing large movements were excluded from the analyses. The video recording was also used to score the amount of time each infant spent looking at either the hand or foot of the experimenter during the experimental session.

### EEG Collection and Preprocessing

The EEG signal was recorded using a lycra stretch cap (Electro-Cap International) or a mesh stretch cap (ANT Neuro) with 21 electrodes (Fp1, Fp2, F3, F4, Fz, F7, F8, C3, C4, Cz, T7, T8, P3, P4, Pz, P7, P8, O1, O2, M1, M2) placed according to the 10–20 system. Scalp electrode impedances were accepted if they were at or below 35 k. The signal from each electrode was amplified using optically isolated, custom bioamplifiers with high input impedance (>1 G: SA Instrumentation) and digitized using a 16-bit A/D converter (±2.5 V input range). Bioamplifier gain was set at 4000 with hardware filter (12 dB/octave rolloff) settings at 0.1 Hz (high pass) and 100 Hz (low pass). During collection, the signal was referenced to the vertex (Cz) with an AFz ground.

The EEG Analysis System (James Long Company) and the EEGLab toolbox for MATLAB (Delorme and Makeig, 2004) were used for data processing. EEG signals were re-referenced to an average of the left and right mastoids. The signal was then low pass filtered at 30 Hz and segmented into 750 ms epochs for SEP computation and 2000 ms epochs for computation of eventrelated spectral perturbation (ERSP) in the mu (6–9 Hz) and low beta (11–13 Hz) bands. Epochs were visually inspected and excluded if they contained ocular or muscle artifact. Epochs were also excluded if amplitudes at central sites (C3, Cz, C4) exceeded ± 250 µV. Participants with less than nine trials within a condition after trial rejection were excluded from further analyses. After trial rejection, a 2(congruency) × 2(betweensubjects limb stimulated) repeated-measures analysis of variance (ANOVA) was carried out on the number of trials included for the SEP and ERSP analyses. For each, there were significant main effects of congruency. There was a greater number of congruent trials compared to incongruent trials included in the SEP analyses [F(1,58) = 13.50, p = 0.001; Congruent: M = 18.52; SE = 0.91; Incongruent: M = 15.92; SE = 0.89] and also for the ERSP analyses [F(1, <sup>52</sup>) = 6.19, p = 0.016; Congruent: M = 18.15; SE = 0.91; Incongruent: M = 16.36; SE = 0.96].

by a live person and thus involved dynamic stimuli. A feather duster came into view and touched the experimenter's hand or foot for the duration of two tactile stimulations to the infant's right hand or foot (see text for stimulus-parameter details).

SEPs were computed for each participant relative to a prestimulus baseline of −100 to 0 ms, with time zero corresponding to the onset of membrane expansion at the skin surface. Participants with extreme SEP values (±40 µV) were not included in analyses. ERSP was calculated for the frequency range of 5–30 Hz using 100 overlapping windows starting with a 4-cycle wavelet at the lowest frequency relative to a prestimulus baseline of -500 to 0 ms. ERSP values for the mu (6–9 Hz) and beta (11–13 Hz) bands were then extracted. Extreme values (1.5× interquartile range) of the mu rhythm and beta rhythm ERSP responses for each condition, window, and electrode were not included in analyses.

### Statistical Analysis Plan

fpsyg-09-02601 December 20, 2018 Time: 18:24 # 5

Analyses were time-locked to the onset of the tactile stimulation. During the window of analysis, the participants received the tactile stimulation while viewing the hand or foot of the experimenter being touched by a feather duster. The EEG analyses focused on a central region of interest (ROI) overlying sensorimotor regions, specifically electrodes Cz, C3, C4 (Saby et al., 2013, 2015). ERSP analyses examined an early (0– 500 ms) and late (500–1000 ms) window of the mu and beta responses. SEP analyses examined the early positivity peaking between 100 and 300 ms and a late potential peaking within the window of 400–600 ms after the onset of tactile stimulation. Repeated-measures ANOVAs were carried out for each time window that included factors of limb-visual congruency (congruent/incongruent) x electrode (C3, Cz, C4) with a between-subjects factor of limb stimulated (hand/foot). The Greenhouse-Geisser correction factor was applied as appropriate. A repeated-measures ANOVA including the factors of limb-visual congruency and the between-subjects factor of limb stimulated (hand/foot) was also computed for infant looking time.

### RESULTS

### Behavioral (Looking Time)

A repeated-measures ANOVA of infant looking time was conducted by calculating the percentage of time the infants were looking at the limb when it was visible (both congruent and incongruent limbs) as opposed to looking elsewhere about the room when a limb was visible. The ANOVA revealed no significant main effect of limb-visual congruency [F(1,51) = 1.54, p = 0.22] or infant limb stimulated [F(1,51) = 0.07, p = 0.79]. There was also no significant interaction [F(1,51) = 0.07, p = 0.79].

### Mu Rhythm (6–9 Hz)

Tactile stimulation to the infant's right hand and right foot elicited responses in the mu frequency band over the electrode sites of interest (C3, Cz, and C4).

### Early Window (0–500 ms)

The repeated-measures ANOVA revealed a main effect of the limb stimulated on the infant [F(1,52) = 8.78, p < 0.01]. Mu desynchronization was significantly greater for hand stimulation [M = −0.31; SE = 0.12] than foot stimulation [M = 0.18; SE = 0.12]. There were no other significant effects or interactions for the early time window.

#### Late Window (500–1000 ms)

The repeated-measures ANOVA for the late window revealed a main effect of electrode [F(1.72,85.87) = 5.35, p = 0.01]. Pairwise comparisons revealed significantly greater mu desynchronization (p = 0.01) at C3 (M = -0.35; SE = 0.15) compared to C4 (M = 0.11; SE = 0.16). There was a main effect of the infant limb stimulated [F(1,50) = 6.24, p = 0.02], with mu desynchronization being greater for hand stimulation (M = -0.46; SE = 0.18) than foot stimulation (M = 0.20; SE = 0.19). Finally, there was a significant interaction between electrode and the stimulated limb of the infant [F(1.72,85.87) = 3.31, p = 0.05]. In the infant group receiving hand stimulation, greater desynchronization occurred at C3 (M = -0.87; SE = 0.21) than Cz (M = -0.46; SE = 0.21; p = 0.01) and C4 (M = -0.04; SE = 0.22; p = 0.001) and at Cz compared to C4 (p = 0.04). No other effects were significant.

### Beta Rhythm (11–13 Hz)

Tactile stimulation of infants' right hands and right feet elicited responses in the beta frequency band at C3, Cz, and C4 (**Figure 2**).

#### Early Window (0–500 ms)

fpsyg-09-02601 December 20, 2018 Time: 18:24 # 6

A repeated-measures ANOVA revealed a main effect of limbvisual congruency for the early window [F(1,50) = 5.20, p = 0.03]. There was significantly greater beta desynchronization for the touch of the visual limb of the experimenter that was congruent with tactile stimulation on the infant's own body (M = −0.23; SE = 0.11) compared to the touch of the visual limb that was incongruent (M = 0.15; SE = 0.14). **Figure 2** shows the mean beta ERSP responses for each of the four conditions. No other effects were significant.

### Late Window (500–1000 ms)

No effects or interactions were significant in a repeated-measures ANOVA examining the late window.

### Somatosensory Evoked Potentials

Tactile stimulation of the right hand and right foot of the infant elicited SEP responses that were examined at the central electrode sites C3, Cz, and C4. The SEP responses consisted of an early response at 100–300 ms and a later response at 400–600 ms. **Figures 3**, **4** show the SEP responses at the three electrodes sites of interest for infant hand (**Figure 3**) and foot (**Figure 4**) stimulation.

#### Early SEP Positivity (100–300 ms)

The repeated-measures ANOVA for the early positivity revealed a significant main effect of electrode [F(1.88,95.91) = 13.88, p < 0.001]. Pairwise comparisons revealed that mean amplitude at Cz (M = 6.78; SE = 0.68) was significantly greater than the mean amplitude at C3 (M = 3.97; SE = 0.85; p < 0.001) and C4 (M = 4.50; SE = 0.74; p < 0.001). There was a significant main effect of the stimulated limb of the infant [F(1,51) = 6.54, p = 0.01]. The mean amplitude was greater for infant foot stimulation (M = 6.83; SE = 0.99) compared to infant hand stimulation (M = 3.34; SE = 0.94). There was also a significant interaction between electrode and stimulated limb of the infant [F(1.88,95.91) = 16.98, p < 0.001]. Specifically in the infant group receiving foot stimulation, the mean amplitude at Cz (M = 10.25; SE = 0.99) was significantly greater than the mean amplitude at C3 (M = 4.16; SE = 1.23; p < 0.001) and C4 (M = 6.09; SE = 1.07; p < 0.001). In addition, the mean amplitude at C4 was significantly greater than the mean amplitude at C3 (p = 0.04). No other effects or interactions were significant.

#### Late SEP Potential (400–600 ms)

The repeated-measures ANOVA for the late potential revealed a main effect of electrode [F(1.90,97.09) = 7.16, p < 0.01]. Pairwise comparisons revealed that the mean amplitude at C3 (M = −0.05; SE = 1.18) was significantly lower than at Cz (M = 2.80; SE = 1.09; p = 0.001) and C4 (M = 2.28; SE = 1.11; p = 0.01). There was a significant main effect of the stimulated limb of the infant [F(1,51) = 4.31, p = 0.04], with the mean amplitude being greater for infant foot stimulation (M = 3.80; SE = 1.49) compared to infant hand stimulation (M = −0.45; SE = 1.41). There was a significant interaction between the limb-visual congruency and the stimulated limb of the infant [F(1,51) = 11.31, p = 0.001]. For the infant group receiving foot stimulation, pairwise comparisons revealed a significant difference (p = 0.01) between the mean amplitudes of the congruent (M = 7.13; SE = 1.99) and incongruent (M = 0.48; SE = 1.90) conditions. This means that there was a greater mean amplitude for the touch of the visual limb of the experimenter that was congruent with the tactile stimulation on the infant's own body (received foot/observed foot). For the infant group receiving hand stimulation, there was a significant difference (p = 0.04) between congruent (M = −2.93; SE = 1.88) and incongruent conditions (M = 2.03; SE = 1.79). In this case, the

mean amplitude was greater for the touch of the visual limb of the experimenter that was incongruent with the tactile stimulation on the infant's own body (received hand/observed foot). There was also a significant interaction between electrode and stimulated limb of the infant [F(1.90,97.09) = 4.316, p = 0.02]. Specifically, in the infant group receiving hand stimulation, the mean amplitude at C3 (M = −2.81; SE = 1.62) was significantly lower than at Cz (M = −0.06; SE = 1.49; p = 0.02) and C4 (M = 1.52; SE = 1.52; p = 0.001). In the infant group receiving foot stimulation, mean amplitude at Cz (M = 5.65; SE = 1.58) was significantly greater than at C3 (M = 2.71; SE = 1.71; p = 0.01) and C4 (M = 3.05; SE = 1.61; p = 0.02). No other effects were significant. **Figure 5** shows bar graphs of the average mean amplitudes of the late potential for each condition.

### DISCUSSION

The current study examined whether infant neural responses to tactile stimulation of a specific body part were modulated by vision of the corresponding effector of another person. The primary aim of this work was to shed light on the suggestion that the infant brain registers correspondences between infants' own bodies and the bodies of others (e.g., Marshall and Meltzoff, 2015; Meltzoff et al., 2018). This neuroscience work is relevant to theories of infant behavioral development and social perception. For example, an early-developing neural ability to detect interpersonal bodily similarities (e.g., between your own hand and the hand of another) may undergird imitative learning from others and promote social engagement between infants and caregivers by engendering feelings of connectedness.

The theorizing of Meltzoff (2007, 2013) posits that infants' realization that others are "Like-Me" is a building block of early social cognition. It is hypothesized that this preverbal "Like-Me" recognition is supported by the fact body parts can be mapped as similar between self and other (Meltzoff and Moore, 1997, p. 186, Figure 2). The current study was aimed at harnessing methods from developmental cognitive neuroscience to further examine the interpersonal nature of infant body representations, primarily at the level of mapping the similarity between the observed body parts of others and felt body parts of self. More specifically, we tested infant neural responses to simultaneous visual and tactile stimuli, and examined whether the patterning of these responses was indicative of an interpersonal aspect of early body representations.

Independent groups of infants received tactile stimulation to either their right hand or right foot. Both groups observed a live-action presentation of an adult's hand and foot being touched with a feather duster. This resulted in four conditions varying in visual-tactile congruency: (i) receive hand stimulation/observe hand stimulation, (ii) receive hand stimulation/observe foot stimulation, (iii) receive foot stimulation/observe hand stimulation, and (iv) receive foot stimulation/observe foot stimulation. A novel aspect of this study is that the tactile events were modeled by real people in a well-controlled live presentation. A comprehensive set of neuroscientific measures was used to investigate the temporal interactions between vision

depict the live visual event observed by the infants.

and somatosensation. Three types of neural responses were recorded: (i) the mu rhythm, (ii) the beta rhythm, and (iii) SEP responses at central electrode sites (C3, Cz, and C4). Analyses of the mu (6–9 Hz) and low beta (11–13 Hz) rhythms were split between early (0–500 ms) and late (500–1000 ms) windows following the onset of the tactile stimulation. SEP analyses focused on an early positivity occurring between 100 and 300 ms post-stimulus and a late potential between 400 and 600 ms. The discussion below first reflects on the somatosensory rhythm findings (mu and beta), then the SEP responses.

The mu rhythm response did not show an effect of congruency between stimulated and observed body parts for either the early vs. late windows. The main significant finding concerning mu was that in the late window, the mu rhythm showed a somatotopic pattern in which there was greater desynchronization at the contralateral electrode C3 for stimulation to the right hand and at the medial electrode Cz (compared to C4) for stimulation to the foot. Few studies have reported on the post-stimulus response of the mu rhythm following delivery or observation of a tactile stimulus. In adults, there is generally an initial decrease (ERD) in mu power that is characterized by a somatotopic scalp distribution. Mu rhythm desynchronization contralateral to the stimulated hand has been reported in MEG studies following punctate tactile stimulation in adults (Cheyne et al., 2003; Gaetz and Cheyne, 2006), sustained tactile stimulation (van Ede et al., 2014), and median nerve stimulation (Della Penna et al., 2004). The mu rhythm also shows contralateral desynchronization in anticipation of tactile stimulation to the hand, a finding that has been documented both in adults (Haegens et al., 2012) and children (Weiss et al., 2018). The current findings add a developmental perspective from infancy to work on mu rhythm responses elicited to tactile stimulation, and are also consistent with previously reported somatotopic mu rhythm patterns in older infants (12- and 14-month-olds). In these prior studies, the infant mu rhythm showed a somatotopic response during observation of another's hand being touched (Müller et al., 2017), or another person reaching toward and touching a toy with their hand or foot (Saby et al., 2013).

We did not observe a somatotopic response pattern of the beta rhythm to tactile stimulation. However, there was an overall effect of limb-visual congruency in the early window of the beta rhythm response. This finding of differential modulation of the mu and beta bands may be related to reports in adults that both felt and observed touch activates a network of beta rhythm activity, but mu rhythm activity is more specific to felt touch (Pisoni et al., 2018). In the current study, there was greater beta desynchronization across the central region when infants were seeing a body part congruent with their body part being touched compared to seeing a different body part. This effect did not continue into the late window. The beta desynchronization elicited by the congruent condition resembles the desynchronization observed in adult studies following motor movement, action observation, and tactile stimulation (Cheyne et al., 2003; Gaetz and Cheyne, 2006; Kilavik et al., 2013). In the current study, the modulation of the early beta response by the congruency of the visual and tactile stimuli is notable. However, this modulation did not specifically vary by electrode.

The early positive peak of the SEP elicited in the current study is similar to the peak observed in a previous study of 7-month-old infants, which also showed a somatotopic response to hand and foot stimulation (Saby et al., 2015). In the current study, there was some evidence of a somatotopic response pattern for this peak when infants were being stimulated on the foot. However, a somatotopic pattern was not observed during hand stimulation, inasmuch as differences in mean amplitude of the early positivity were not significant between central sites. The lack of an observable somatotopic pattern in response to hand stimulation may be due to aspects of the experimental protocol that reduced somatotopic SEP responses. For instance, one important difference between the current study and prior infant EEG work (e.g., Saby et al., 2015) was the occurrence of tactile stimulation on different limbs in the prior study. In the current case, only one hand (the right hand) was stimulated throughout the entire experiment for the infants receiving hand stimulation. In the prior study, the right and left hand were stimulated as well as the right and left foot for each infant. It is possible that somatotopic responses in the prior work were more readily elicited by the variation (contrast) in the location of tactile stimulation and that in the current procedures, neural adaptation to hand stimulation may have occurred over the course of the experiment.

Examining a later window of the SEP response in infants showed findings for a late potential occurring between 400 and 600 ms post-stimulus. Effects of congruency on mean amplitude were observed in the late potential, with the specific pattern of effects being dependent on the infant limb stimulated. For stimulation of the foot, more positive mean amplitudes were elicited for congruent trials (i.e., during observation of the experimenter's foot) than for incongruent trials. When the infant was receiving stimulation to the hand, more positive mean amplitudes of the late SEP potential were elicited for incongruent

trials (i.e., during observation of the experimenter's foot). These results are discussed further below.

To date, only two prior studies investigating SEP responses in infancy have reported a response at 400 ms or later after the onset of a tactile stimulus. The response observed in the studies by Rigato et al. (2014, 2017) showed positive peaks clearly between 400 and 600 ms, matching the timing of the current study but having a different appearance (as more of a positive peak). The late peak observed in the prior studies may be due to the use of long-duration, intense vibrating tactile stimulation lasting for 200 ms on the palms of the infants. Therefore, the late peak in the work of Rigato and colleagues may be a response to the termination of the tactile stimulation about 200 ms later.

The study of Rigato et al. (2017) also reported a difference between conditions when an infant observed a video of a hand being touched by a paintbrush vs. the paintbrush making contact with the table to the side of the hand. Unlike the current findings, this effect was observed in a much earlier window of the SEP, around 100 ms following tactile stimulus onset. Differences in the SEP between the current study and that of Rigato et al. (2017) could be due to a body-specific contrast (i.e., observing hand vs. foot, or observing hand vs. table) or, as previously mentioned, could be due to differences in tactile stimulation characteristics (vibrotactile stimulation). Similar to our current limb-visual congruency effect for infant hand stimulation, Rigato et al. (2017) found larger SEP responses at contralateral electrode sites when infants viewed the table being touched rather than the hand. In both studies, seeing another's hand being touched while receiving tactile stimulation to the hand resulted in a suppression of the SEP response. Both studies therefore show that infant SEP responses can be affected by the observation of another's body.

Differences found in the infant late potential related to the congruency between observed and stimulated body parts may be related to findings reported in adult work (Sambo et al., 2009; Longo et al., 2012; Deschrijver et al., 2015). In these adult studies, congruency effects were present in the SEP response after 200 ms post-stimulus bilaterally over central sites. It is conceivable that the late potential in infants (emerging at 400 ms) could be related to or even develop into the late positivity in the adult SEP response (emerging between 200 and 300 ms), reflecting a late stage of somatosensory processing.

The results of the current study suggest a discrepancy between infant foot stimulation and infant hand stimulation in the direction of the late potential modulation by limb congruency. While the late potential showed a larger positivity for congruent trials during foot stimulation, it was smaller for congruent trials during hand stimulation. One relevant factor could be the different SEP morphology observed in response to hand and foot stimulation (**Figures 3**, **4**). The SEPs in response to foot stimulation show a very strong positivity particularly while infants were also viewing a foot, a pattern that persisted throughout the entirety of the SEP response. The SEPs in response to hand stimulation were slightly weaker and were less prominent across the overall time period analyzed. The reasons for these differences are uncertain, since they were not observed in Saby et al. (2015). One possible contributing factor is that the mean numbers of trials per condition were lower for the current study compared to the prior work. The protocol in Saby et al. (2015) had greater numbers of trials per limb because the tactile stimulation in that study was not systematically accompanied by congruent or incongruent visual input. Other possible explanations may be a novelty effect for tactile stimulation occurring on the dorsal area of the foot, or differential distortion effects occurring as the elicited electrical activity moves through the skull from the underlying sources. At a more psychological level, there are experiential differences between hands and feet for young infants. During the first year of life, infants are far more familiar with their own hands and viewing the hands of other people than they are with feet – infants regularly engage in own-hand regard, and the feet of others are more rarely viewed than their hands. The extent to which these and other developmental and experiential factors may contribute to the observed differences are topics for future research.

We also wish to draw attention to another aspect of the infant neuroscientific literature which is possibly relevant to the current work on infant neural body representations. Interestingly, studies examining ERP responses to visual stimuli in infancy have reported a component often referred to as the Nc (negative central) which occurs between 400 and 600 ms post-stimulus (Nelson and Salapatek, 1986; Richards, 2003; Reynolds and Richards, 2005; Wiebe et al., 2006; Ackles and Cook, 2007; Ackles, 2008). Although the morphology of the late potential observed in the current study does not necessarily resemble the large negative peak of the Nc, the onsets of the two potentials bear a resemblance to each other. Studies on the Nc have shown that it is modulated by factors such as frequency, and familiarity or novelty of the visual stimuli, such that the Nc is more negative for infrequent or novel stimuli. Attention toward the visual stimuli has also been shown to facilitate the Nc response in 4.5- and 7.5-month-old infants (Richards, 2003). Similarly, differences observed in the late potential of the current study may be due to body-specific attentional differences between infant's viewing congruent and incongruent limbs.

Despite the importance of looking time measures in infant research more generally, neuroscience studies of infants have rarely included such measures. The current study included the scoring of infant looking time to the visual displays, as a complement to the electrophysiological measures. The results showed no significant difference between infant's tendency to look at the experimenter's limb when it was congruent or incongruent with the infant's limb receiving tactile stimulation. Previous studies investigating body perception in younger infants found longer looking times toward congruent visual-tactile stimuli (Filippetti et al., 2013, 2015). In these studies, infants were touched on the face with a paintbrush while observing another face being touched. Infants looked longer when the observed face was being touched synchronously with the infant and in the same location (cheek or forehead). Therefore, very young infants may demonstrate increased visual attention to body-part correspondence between visual-tactile events under specific eliciting conditions (and perhaps using particular body parts, the face), which were not used in our current study.

The current neuroscience work can be connected, at least at a theoretical level, to three prominent lines of infant behavioral research, which also provide information about the role of the body in self-perception and interpersonal engagement. First, previous research has demonstrated infants' ability to detect correspondences between their own seen and felt leg movements (Bahrick and Watson, 1985; Rochat and Morgan, 1995), which is compatible with the current findings of multimodal aspects of body perception. Second, research on infant facial and manual imitation suggests that infants can recognize correspondences between specific body parts of self and others (Meltzoff and Moore, 1997). In order to imitate with high fidelity, infants first need to identify which body part to use (tongue, fingers, lips) to generate the matching response, thus successful imitation provides a nonverbal indicator of interpersonal connectivity (Meltzoff and Marshall, 2018). Third, infant research also shows interpersonal coordination and adjustments to the body movements of others, for example, the findings that young infants make bodily adjustments in anticipation of a person approaching them in order to pick them up (Reddy et al., 2013). Taken together, these studies strongly suggest that infants' coordination between their own body and those of others – which integrates tactile, proprioceptive, and visual domains in a multimodal fashion – is a fundamental and pervasive aspect of early development.

### CONCLUSION

The present findings contribute insights into how correspondences between vision and somatosensation may be processed by preverbal infants. This is a complex area that will benefit from detailed investigations of how different stimulus parameters influence infants' neural responses. Based on the research reported here, some key factors that should be systematically manipulated in future neuroscience studies include: whether live or videotaped displays are shown, whether vibrotactile or punctate tactile stimulation of infants is used to provide tactile stimulation of the infant's own body, whether effects differ by age and functional experience (e.g., differential experience between hands and feet), and whether one type of

### REFERENCES


stimulus is repeatedly presented or infants have an opportunity to experience variation and contrasts between the stimuli.

We favor the idea that the body, even in infancy, is a multimodal rather than unimodal, construct (Meltzoff and Marshall, 2018). Young infants not only experience their own bodies but observe other people's bodies and recognize similarities and differences between them. The neural representation of the body in the infant's brain is a topic that addresses important issues in human development and promises to illuminate key aspects of social perception prior to language. Future work in this area will contribute to grounding the field of developmental social neuroscience, an area of research whose time has come.

### ETHICS STATEMENT

All subjects gave written informed consent in accordance with the Declaration of Helsinki. The protocol was approved by the Temple University IRB.

### AUTHOR CONTRIBUTIONS

AD, AM, and PM were all involved in the conceptualization, design, analysis, and writing of the original research in the submitted manuscript. AD was involved in the collection of the data.

### FUNDING

This work was supported by National Institutes of Health grant R21HD083756 (PM) and National Science Foundation grants BCS-1460889 (PM) and SMA-1540619 (AM).

### ACKNOWLEDGMENTS

We thank V. Cordero, Z. Kearns, S. Khantsis, R. Laconi, M. Puzio, M. Salvatore, J. Schoenhard, and M. Shambaugh for their helpful assistance with recruitment and data collection.




**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2018 Drew, Meltzoff and Marshall. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Defective Tool Embodiment in Body Representation of Individuals Affected by Parkinson's Disease: A Preliminary Study

*Federica Scarpina1 \*, Nicola Cau2 , Veronica Cimolin2 , Manuela Galli2,3 , Lorenzo Priano1,4 and Alessandro Mauro1,4*

*<sup>1</sup> Istituto Auxologico Italiano, IRCCS, Divisione di Neurologia e Neuroriabilitazione, Ospedale San Giuseppe, Piancavallo (VCO), Italy, 2Department of Electronics, Information and Bioengineering, Politecnico di Milano, Milan, Italy, 3 IRCCS San Raffaele Pisana, Tosinvest Sanità Roma, Rome, Italy, 4Department of Neuroscience "Rita Levi Montalcini", University of Turin, Turin, Italy*

#### *Edited by:*

*Lorenzo Jamone, Queen Mary University of London, United Kingdom*

#### *Reviewed by:*

*Michela Bassolino, École Polytechnique Fédérale de Lausanne, Switzerland Michela Candini, Università degli Studi di Bologna, Italy*

> *\*Correspondence: Federica Scarpina f.scarpina@auxologico.it; federica.scarpina@gmail.com*

#### *Specialty section:*

*This article was submitted to Cognition, a section of the journal Frontiers in Psychology*

*Received: 30 August 2018 Accepted: 22 November 2018 Published: 07 January 2019*

#### *Citation:*

*Scarpina F, Cau N, Cimolin V, Galli M, Priano L and Mauro A (2019) Defective Tool Embodiment in Body Representation of Individuals Affected by Parkinson's Disease: A Preliminary Study. Front. Psychol. 9:2489. doi: 10.3389/fpsyg.2018.02489*

When efficiently used for action, tools become part of the body, with effect on the spatial-temporal movement parameters and body size perception. Until now, no previous investigation has been reported about tool embodiment in Parkinson's disease (PD), which is a neurological disease characterized by several sensory and motor symptoms affecting body and action. We enrolled 14 individuals affected by PD and 18 healthy individuals as controls. We studied the spatial-temporal parameters on selfpaced free pointing movement task, *via* an optoelectronic system, before and after a short training in which a 27-cm long rod was used to point toward a far target. Moreover, we investigated changes in estimation of arm length through the Tactile Estimation Task. After the tool-use training, controls showed changes in spatial-temporal parameters: they were slower to perform movements and reported a higher value of deceleration than the baseline. However, such a difference did not emerge in the PD individuals. In the Tactile Discrimination Task, no difference emerged before and after the tool-use training in both groups. Our results were suggestive of possible difficulties of the tool embodiment process in PD. We discussed our results in relation to aberrant multisensory integration as well as in terms of the effect of PD sensory and motor symptoms on body schema plasticity. The present study points at a novel way to conceive PD sensory motor signs and symptoms in terms of their effect on individuals' body representation.

Keywords: Parkinson's disease, tool embodiment, body representation, action, multisensory integration, body schema

## INTRODUCTION

One of the most peculiar characteristics of a human being is the capability to use tools for acting in the environment. For example, we can use a rod to indicate something that is out of our reaching space: the tool makes near what would otherwise be unreachable. When efficiently used, tools become part of our body; in other words, it is *embodied* (Maravita and Iriki, 2004; Longo and

**87**

Serino, 2012; Miller et al., 2018) with effects on action, perceptions, and cognitive capacities (Cardinali et al., 2009). For instance, after a tool-use training, healthy individuals perceive their arm as longer than before; moreover, changes in spatialtemporal parameters of motor behavior are observed (Cardinali et al., 2009). These "changes are compatible with the notion of the inclusion of tools in the 'Body Schema,' as if our own effector (e.g. the hand) were elongated to the tip of the tool" (Maravita and Iriki, 2004), where the term *body schema* (Gallagher, 2005) refers to the dynamic sensory-motor body representation derived from the integration of multiple sensory bodily inputs and is used to plan and execute actions (Gallagher, 2005; Dijkerman and De Haan, 2007). Body schema is known to be a plastic representation (Gallagher, 2005; Giummarra et al., 2008); not only it is constantly updated in relation to the online incoming sensory input, but also it changes in order to embody significant objects (Dijkerman and De Haan, 2007; Giummarra et al., 2008). Then, the adoption of an experimental paradigm grounded on tool embodiment has allowed to investigate the plasticity of body schema (Martel et al., 2016) in healthy individuals (Cardinali et al., 2009; Canzoneri et al., 2013) and in pathological conditions (Giummarra et al., 2008), such as amputees who use a prosthesis (Mayer et al., 2008), individuals with spinal cord injury who use the wheelchair (Pazzaglia et al., 2013), and brain-damaged patients (Garbarini et al., 2015) (for details, see Giummarra et al., 2008).

In the present work, we aimed to provide a first and preliminary investigation about tool embodiment in Parkinson's disease (PD). It is a neurological syndrome characterized by several motor and sensory symptoms, such as akinesia and bradykinesia, tremor and rigidity, and postural instability (Bereczki, 2010). These symptoms are due to the dysfunction of neural structures responsible for movement selection, coordination, and execution (see Moustafa et al., 2016 for a review); then, in PD, body and action are primarily affected. Interestingly, in the literature, preliminary but not conclusive evidence has been reported about changes in sensory bodily function ranging from primary sensory perception to the complex integration of multiple sensory and motor inputs in PD (Abbruzzese and Berardelli, 2003;Avanzino et al., 2018); however, the effect of PD symptoms on the bodily self (i.e. body awareness, sense of agency, and proprioception) (Blanke et al., 2015) and body representation is still in infancy.

In order to verify if a tool can be efficiently embodied in body representation in PD, we studied motor parameters of self-paced free pointing movements, before and after a short training in which a rod was used as a tool to point toward a far target. Moreover, we verified if the tool embodiment changed the cognitive representation of the arm used to handle the tool through the Tactile Estimation Task (Scarpina et al., 2014): in this task, participants estimated the distance between two tactile stimuli presented simultaneously on the arm. This judgment allows to infer the internal body representation of physical size of the arm (Serino and Haggard, 2010), namely how long do the participants estimate their arm. If the tool is correctly embodied, the arm should be represented as longer of its physical dimension, and consequently, the distance between the two tactile inputs might be perceived larger than the real gap.

Considering the previous studies (Maravita and Iriki, 2004; Cardinali et al., 2009, 2011; Longo and Serino, 2012), if PD affects the tool embodiment process, we might expect no changes in affected individuals' motor parameters of pointing movements, while in the healthy individuals, such a change should be emerged. Similarly, in the Tactile Estimation Task, no difference might be found before and immediately after the tool-use training in affected individuals' performance, whereas the healthy individuals might judge their arm longer after the tool-use training than the baseline condition, as an effect of a correct embodiment of the tool in their body representation. Nevertheless, possible dissociations might be emerged between the two tasks, since they rely on different components (one devoted to action and the other to perceptual description) of body representation (Gallagher, 2005; Dijkerman and De Haan, 2007).

### MATERIALS AND METHODS

The study was approved by the ethical committee of the IRCCS Istituto Auxologico Italiano, and it was performed in compliance with Declaration of Helsinki's (World Medical Association, 1991) ethical principles. All participants were volunteers who gave informed written consent, were free to withdraw at will, and were naïve to the rationale of the experiment.

### Participants

Fourteen individuals affected by PD (seven patients showing to have the right body side most predominantly affected by PD; seven patients, the left body side; *age* in years *M* = 66; *standard deviation* = 8; *education* in years *M* = 9; *SD* = 3) were recruited at the Division of Neurology and Neurorehabilitation, IRCCS Istituto Auxologico Italiano, San Giuseppe Hospital in Piancavallo (VCO, Italy).

All participants were right handers. They had been diagnosed as having PD (mean years from diagnosis *M* = 7, *SD* = 3) according to the Hoehn and Yahr's (1967) classification. The PD group reported a mean score of 30 (*SD* = 13) on the unified Parkinson's disease rating scale (UPDRS) (Fahn and Elton, 1987). Exclusion criteria were the evidence of other neurological (e.g., ictus, traumatic brain injury; dementia) or pathological conditions (e.g., psychiatric syndromes; POTS). Moreover, a threshold of 24 (Lezak et al., 2004) for Mini Mental State Examination (MMSE) (Folstein et al., 1975) was adopted as an inclusion criterion. Details are reported in **Table 1**.

All individuals with PD were tested when they were in a selfreported '*on*' state of medication, meaning when symptoms were efficiently managed by drugs, even though with negative effects on movement control (Cenci, 2007) and proprioception (O'Suilleabhain et al., 2001). In fact, when individuals are in an *'off'* state, symptoms such as tremor, rigidity, and slowness, as well as difficulty in attention, feeling to be completely blocked, anxiety, and pain emerge or worsen (Ahlskog and Muenter, 2001; Fahn et al., 2004), limiting not only the interpretation of the results, but also the patient's compliance to perform the task.

Eighteen healthy right-handed participants (*age* in years *M* = 48; *SD* = 14; *education* in years *M* = 15; *SD* = 3) without sensory, neurological, or psychiatric impairments were recruited through personal contact with the researchers or word-of-mouth.

#### TABLE 1 | Demographical and clinical details of individuals affected by PD.


*M = male and F = female. Age, education and duration of disease expressed in years. L = left body side. R = right body side. Means and standard deviations (SD) are reported in the lower part of the table.*

Individuals with PD were significantly older than healthy controls [*age U*(32) = 212; *p* < 0.001], and they had significantly fewer years of *education* [*U*(32) = 37.5; *p* < 0.001].

### Experimental Task

In **Figure 1**, a timeline of the experiment is shown.

### Pointing Movement Tasks

Participants were comfortably seated at the table, with their body midline aligned with the central midline of the table. The experiment had three phases: a *pre*- (i.e., the baseline) and *post-tool-use session* spaced out by the tool-use session (**Figure 1**). In the *pre-* and *post*tool-use sessions, participants performed six reach-to-point movements. The target was a black dot placed at a distance equal to 80% of the arm length from the body. Thus, for each participant, the arm length was recorded. Participants were required to extend their arms in the straight-ahead direction, at shoulder height; the horizontal distance between the acromion and the middle finger was measured. During the pointing movement task, the other hand was placed in the rest position in line with the corresponding starting point. The time of the movements was self-paced. The experimenter visually checked that participants completed the six movements.

In the *tool-use* session, participants were asked to perform six movements using a stick, 27 cm long and weighing 4 g, in order to reach the visual target with the same arm used in the *pre-* and *post*-tool-use session. In this condition, the dot was placed at a distance equal to 27 cm (i.e. equal length of the stick) from the target used in the *pre-* and *post*-tool-use sessions, far away with respect to the body midline; thus, the target was placed outside the arm-reaching distance. Participants were instructed to reach the target and to touch it, before going back to the starting point (i.e., the rest position in which individual's forearm made a 90° angle with the arm and the shoulder). During the *tool-use* session, the other hand was placed in the rest position in line with the corresponding starting point. The time of the movements was self-paced. The experimenter visually checked that participants completed the six movements.

The 3D-movement acquisition was conducted using an optoelectronic system with passive markers (VICON, Oxford Metrics Ltd., Oxford, UK) for kinematic movement evaluation. The optoelectronic system performed a real-time processing of images from six fixed infrared cameras (a sampling rate of 100 Hz) to extract the reflectance of a passive marker (with a diameter of 15 mm) that was positioned on the participants' index fingers (Cimolin et al., 2007).

#### Tactile Estimation Task

After all the three sessions (*pre*-tool use, *tool-use* training, and *post-*tool use) of the Pointing Movement Task, a modified version of the Tactile Estimation Task (Scarpina et al., 2014) was performed. Participants were with eyes closed for the duration of the task. The experimenter lightly pressed the two pointers of a caliper on the participants' ventral side of forearm, following the longitudinal axis. Participants were asked to estimate the distance between the two tactile stimuli by varying the separation between the thumb and the index hand fingers of the

non-stimulated arm. The distance between the two pointers was set at 7 cm in all repetitions. The tactile stimulation was repeated seven times for the session: overall 21 trials, about which 7 immediately after the *pre*-tool-use session (**Figure 1** – I), 7 immediately after the *tool-use* training (**Figure 1** – II), and finally 7 after the *post*-tool-use session (**Figure 1** – III), were performed. In line with previous studies (Cardinali et al., 2009, 2011), the embodiment of tools might emerge as a larger error in the second measurement, i.e., immediately after the *tool-use* training, with respect to the baseline, meaning the first measurement done after the *pre*-tool-use session. In other words, after the *tool-use* training, participants might evaluate their arms as longer as the baseline. The third measurement, i.e., after the second pointing movement session (the *post*-tool-use session), gave us the opportunity to verify if possible changes in tactile estimation observed in Session II (after *tool-use* training) can still be observed also when the movements were performed without any tool or, on the other hand, if the last action restores the original bodily estimation.

The entire experimental task was performed twice, with both right and left hands. The order of hands was counterbalanced between participants.

### Analyses

A *post hoc* power analysis was conducted using the software package GPower 3.0.1. A sample size of 28 was used (14 participants for two groups); moreover, the alpha level used for this analysis was *p* < 0.05. The *post hoc* analyses revealed that the statistical power for this study was 0.99 for detecting a medium effect size (*d* = 0.5), whereas it was 0.1 for a large effect size (*d* = 0.8).

#### Pointing Movement Task

For each trial, spatio-temporal parameters relative to the pointing movements were measured in the *pre*- (i.e. the baseline) and *post*tool-use conditions. Each parameter was referred to the going phase, and it was calculated using the 3D coordinate of the index finger marker. During the going phase, the distance between the marker of the finger and the target decreases (**Figure 2A**), and its value is close to zero once the participant reached the target. When the velocity profile is taken into account, it increases its value until a peak of velocity—maximum value. Then, the velocity value reduces quickly to guarantee the proper accuracy during the adjustment phase (**Figure 2B**). Velocity and acceleration profiles are strictly related: the latter is the derivative of velocity with time, and velocity itself is the derivative of displacement with time. Acceleration achieves its maximum during the increase phase of the velocity and gets zero in correspondence with the peak of velocity. Then, the velocity profile decreases, and the acceleration changes its sign negative value—and we observe a deceleration phase (**Figure 2C**).

Thus, the following parameters are defined as follows: *movement time* from the starting point to the target, expressed in s; *mean velocity,* defined as the average velocity of the finger marker during the going phase; and *peak of velocity*, defined as the maximum velocity of the finger marker during the going phase, in m/s; *mean acceleration and peak of acceleration* in m/s2 ; *mean deceleration and peak of deceleration* in m/s2 . The data relative to the six trials for each condition and hands were collapsed together, since preliminary analyses revealed no difference between right and left arms for healthy controls as well as no difference between affected or non-affected arm for PD patients when the lateralization of symptoms was taken into account. A repeated measure analysis of variance with the within-subject factor of *Time* (pre-tool use vs post-tool use) and the between-subject factor of *Group* (PD group vs control group) was performed for each motor parameter. If the interaction was significant, Bonferroni-corrected estimated marginal mean comparisons were applied.

### Tactile Estimation Task

The difference between the estimated distance and the physical distance between the two pointers of the caliper (7 cm) was computed for each trial, representing the *error*. A negative error indicated an underestimation of the perceived distance; a positive error indicated an overestimation of the perceived distance. A repeated measure ANOVA with the within-subject factor of *Time* (pre-tool use, tool-use training, and post-tool use) and the between-subject factor of *Group* (PD group vs control group) was performed, applying Bonferroni-corrected estimated marginal mean comparisons in the case of significant interaction.

### The Role of Age

Considering that the two groups were significantly different in terms of *age* with possible effects on embodiment (Costello et al., 2015; Costello and Bloesch, 2017), for both tasks (Movement Pointing Task and Tactile Estimation Task), the analysis was run again introducing the factor *Age* as a covariate for those parameters about which a significant main effect of *Group* of interaction with *Time* was found in the previous analyses.

### The Role of Clinical Characteristics

Only for the group of individuals affected by PD, the possible relationship between the clinical characteristics of *Duration of Disease* and *UPDRS motor score* and the spatio-temporal parameters relative to the pointing movements measured in the pre- and post-tool-use conditions was explored through Spearman's rank correlation coefficient. Moreover, the possible difference in all spatio-temporal parameters between PD individuals with a left lateralization of symptoms and those with a right lateralization was explored through the Mann–Whitney *U* test. The same analyses were conducted about the three experimental sessions (*pre*-tool use, *tool-use* training, and *post*-tool use) of the Tactile Estimation Task.

### RESULTS

All participants completed the task as well as the tool-use training.

### Pointing Movement Task

*Movement time*: A significant main effect of *Group* (PD group *M* = 0.859; *SD* = 0.03; control group *M* = 0.69; *SD* = 0.02) emerged [*F*(1, 30) = 20.82; *p* < 0.001; partial *η*<sup>2</sup> = 0.99]: PD patients required significantly more time to perform movements than the controls. Moreover, a main effect of *Time* (pre-tool use *M* = 0.739; *SD* = 0.142; post-tool use *M* = 0.794; *SD* = 0.093) emerged [*F*(1, 30) = 7.3;

*p* = 0.011; partial *η*<sup>2</sup> = 0.196]: in the post-tool-use condition, individuals required more time to perform movements than the baseline. Interestingly, the *Group × Time* interaction was significant [*F*(1, 30) = 4.72; *p* = 0.038; partial *η*<sup>2</sup> = 0.55]; while the healthy individuals required significantly more time in the post-tool-use condition than the baseline (*p* = 0.001), this difference did not emerge in PD patients' performance (*p* = 0.72). Moreover a significant difference emerged between the two groups in the pre-tool use (*p* < 0.001) and post-tool use (*p* = 0.011) (**Figure 3**); in both conditions, PD patients required more time to perform the movements.

*Mean velocity*: A main effect of *Group* [*F*(1, 30) = 9.55; *p* = 0.004; partial *η*<sup>2</sup> = 0.84] emerged: PD patients (*M* = 0.399; *SD* = 0.09) were significantly slower than the healthy participants (*M* = 0.475; *SD* = 0.07). No main effect of *Time* (pre-tool use *M* = 0.442; *SD* = 0.09; post-tool use *M* = 0.46; *SD* = 0.1) [*F*(1, 30) = 2.85; *p* = 0.1; partial *η*2 = 0.087] or a significant *Group × Time* interaction [*F*(1, 30) = 3.52; *p* = 0.07; partial *η*<sup>2</sup> = 0.1] emerged.

*Peak of velocity*: A main effect of *Group* [*F*(1, 30) = 12.64; *p* = 0.001; *η*2 = 0.93] emerged: PD patients (*M* = 0.68; *SD* = 0.03) reported a significant lower peak of velocity than the control group (*M* = 0.89; *SD* = 0.02). Moreover, a main effect of *Time* [*F*(1, 30) = 16.71; *p* < 0.001; partial *η*<sup>2</sup> = 0.35] emerged: in post-tool-use condition (*M* = 0.838; *SD* = 0.165), a significantly higher peak of velocity was observed than the baseline (*M* = 0.79; *SD* = 0.16). The *Group × Time* interaction was not significant [*F*(1, 30) = 0.36; *p* = 0.55; partial *η*<sup>2</sup> = 0.12].

*Mean acceleration*: A main effect of *Group* [*F*(1, 30) = 22.89; *p* < 0.001; partial *η*<sup>2</sup> = 0.43] was found: PD patients (*M* = 1.997; *SD* = 0.09) reported a significant lower acceleration than the control group (*M* = 2.719; *SD* = 0.12). Moreover, a main effect of *Time* emerged [*F*(1, 30) = 6.72; *p* = 0.015; partial *η*<sup>2</sup> = 0.18], since in the post-tool-use condition (*M* = 2.48; *SD* = 0.62), the acceleration was higher than the baseline (*M* = 2.32: *SD* = 0.52). No significant *Group × Time* interaction [*F*(1, 30) = 0.32; *p* = 0.85; partial *η*<sup>2</sup> = 0.01] emerged.

*Peak of acceleration*: A main effect of *Group* emerged [*F*(1, 30) = 44.87; *p* < 0.001; *η*<sup>2</sup> = 0.59]: PD patients (*M* = 5.45; *SD* = 0.22) reported a significant lower peak of acceleration than the control group (*M* = 8.271; *SD* = 0.35). No significant main effect of *Time* (pre-tool use *M* = 6.99; *SD* = 1.99; post-tool use *M* = 7.08; *SD* = 1.81) [*F*(1, 30) = 0.47; *p* = 0.49; *η*<sup>2</sup> = 0.59], (*p* = 0.015) or significant *Group × Time* interaction [*F*(1, 30) = 3.123; *p* = 0.087; partial *η*<sup>2</sup> = 0.094] emerged.

*Mean deceleration*: A main effect of *Group* emerged [*F*(1, 30) = 13.97; *p* = 0.01; partial *η*<sup>2</sup> = 0.31] (PD group *M* = −1.11; *SD* = 0.08; control group *M* = −1.675; *SD* = 0.11). A main effect of *Time* [*F*(1, 30) = 14.21; *p* = 0.001; partial *η*<sup>2</sup> = 0.32] was found: indeed in the post-tool-use condition (*M* = −1.5; *SD* = 0.56), the deceleration was

reach-to-pointing task.

FIGURE 3 | *Movement Time* expressed in s, mean values, and standard error (vertical line) in pre-tool-use and post-tool-use conditions by group (dark grey = healthy individuals; light grey = individuals with PD) are shown. Asterisk denotes *p* < 0.05 in the *post hoc* comparisons.

higher than the baseline. Interestingly, the *Group × Time* interaction was significant [*F*(1, 30) = 10.2; *p* = 0.003; partial *η*<sup>2</sup> = 0.25]; while the healthy individuals reported a significantly higher value of deceleration in the post-tool-use condition than the pre-tool-use condition (*p* = 0.001), this difference did not emerge in individuals with PD patients' performance (*p* = 0.703). Moreover, a significant difference emerged between the two groups in the pre-tool use (*p* < 0.005) and post-tool use (*p* < 0.001) conditions; in both experimental sessions, PD patients showed lower deceleration than controls (**Figure 4**).

*Peak of deceleration*: A main effect of *Group* [*F*(1, 30) = 24.73; *p* < 0.001; partial *η*<sup>2</sup> = 0.04] emerged: PD patients (*M* = −3.24; *SD* = 0.1) reported a significant lower peak of deceleration than the control group (*M* = −4.456; *SD* = 0.21). Also, a main effect of *Time* [*F*(1, 30) = 9.87; *p* = 0.004; partial *η*<sup>2</sup> = 0.24] emerged: in post-tool-use condition (*M* = −4.05; *SD* = 0.96), a higher peak of deceleration was found than the baseline (*M* = −3.79; *SD* = 0.91). The *Group × Time* interaction [*F*(1, 30) = 0.49; *p* = 0.48; partial *η*2 = 0.016] was not significant.

### The Role of Age on the Performance in the Pointing Movement Task

We run again the analyses, controlling the effect of *Age.*

*Movement time*: The main effect of *Group* still remained significant [*F*(1, 29) = 13.73; *p* = 0.001; partial *η*<sup>2</sup> = 0.32): PD patients (adjusted *M* = 0.868; *SD* = 0.03) were significantly slower than the healthy participants (adjusted *M* = 0.688; *SD* = 0.02). Instead, neither main effect of *Time F*(1, 29) = 0.78; *p* = 0.38; partial *η*2 = 0.026) nor the *Group × Time* interaction [*F*(1, 29) = 2.23; *p* = 0.1; partial *η*<sup>2</sup> = 0.072] was significant.

*Mean velocity*: The main effect of *Group* still remained significant [*F*(1, 29) = 4.37; *p* = 0.045; partial *η*<sup>2</sup> = 0.13], since PD patients (adjusted *M* = 0.406; *SD* = 0.026) were significantly slower than the healthy participants (adjusted *M* = 0.487; *SD* = 0.023). Interestingly, the main effect of *Time* was no longer significant [*F*(1, 29) = 0.98; *p* = 0.32; partial *η*<sup>2</sup> = 0.033]; however, the *Group × Time* interaction was significant [*F*(1, 29) = 5.51; *p* = 0.026; partial *η*<sup>2</sup> = 0.16]; indeed, while the healthy individuals reported higher mean velocity in the post-tool-use (adjusted *M* = 0.508; *SD* = 0.025) condition than the baseline (adjusted *M* = 0.465; *p* = 0.023) [*p* = 0.005], this difference did not emerge in PD patients' performance [pre-tool use: adjusted *M* = 0.412; *SD* = 0.027; post-tool use: adjusted *M* = 0.399; *SD* = 0.029) [*p* = 0.434]. Moreover, a significant difference emerged between the two groups in the post-tool use (*p* = 0.014) but not in the pre-tool use (*p* = 0.18).

*Peak of velocity*: The main effect of *Group* still remained significant [*F*(1, 29) = 4.92; *p* = 0.034; partial *η*<sup>2</sup> = 0.14], since PD patients (adjusted *M* = 0.737; *SD* = 0.043) reported a significant lower peak of velocity than the healthy participants (adjusted *M* = 0.876; *SD* = 0.037). The main effect of *Time* was no longer significant [*F*(1, 29) = 0.046; *p* = 0.83; partial *η*<sup>2</sup> = 0.002]; the *Time × Group* interaction was confirmed as not significant [*F*(1, 29) = 0.009; *p* = 0.92; partial *η*<sup>2</sup> < 0.001].

*Mean acceleration*: The main effect of *Group* still remained significant [*F*(1, 29) = 13.15; *p* = 0.001; partial *η*<sup>2</sup> = 0.31], since PD patients (adjusted *M* = 2.01; *SD* = 0.133) reported a significant lower acceleration than the healthy participants (adjusted *M* = 2.71; *SD* = 0.114). The main effect of *Time* was no longer significant [*F*(1, 29) = 0.008; *p* = 0.93; partial *η*<sup>2</sup> < 0.001], and the *Time × Group* interaction was confirmed as not significant [*F*(1, 29) = 0.16; *p* = 0.68; partial *η*<sup>2</sup> = 0.006].

*Peak of acceleration*: The main effect of *Group* still remained significant [*F*(1, 29) = 27.95; *p* < 0.001; partial *η*<sup>2</sup> = 0.59], since PD patients (adjusted *M* = 5.44; *SD* = 0.36) reported a significant lower peak of acceleration than the healthy participants (adjusted *M* = 8.28; *SD* = 0.31). The main effect of *Time* [*F*(1, 29) = 0.76; *p* = 0.39; partial *η*<sup>2</sup> = 0.026] and the *Time × Group* interaction were confirmed again as not significant [*F*(1, 29) = 0.6; *p* = 0.44; partial *η*2 = 0.02].

*Mean deceleration*: The main effect of *Group* still remained significant [*F*(1, 29) = 8.64; *p* = 0.006; partial *η*<sup>2</sup> = 0.23], since PD patients (adjusted *M* = −1.11; *SD* = 0.13) reported a significant lower mean of than the healthy participants (adjusted *M* = −1.67; *SD* = 0.11). The main effect of *Time* [*F*(1, 29) = 0.15; *p* = 0.69; partial *η*<sup>2</sup> = 0.005] was no longer significant. Interestingly, the *Time × Group* interaction was confirmed as significant [*F*(1, 29) = 7.47; *p* = 0.011; partial *η*<sup>2</sup> = 0.2]; indeed, the healthy group reported a significant higher mean deceleration in the post-tool use (adjusted *M* = −1.8; *SD* = 0.12) than the pre-tool use (adjusted *M* = −1.54; *SD* = 0.1) [*p* < 0.001], while no difference emerged in the performance of PD patients (pre-tool use adjusted *M* = −1.113; *SD* = 0.12; post-tool use adjusted *M* = −1.113; *SD* = 0.14) [*p* = 0.88]; moreover, both in pre-tool use (*p* = 0.026) and in the post-tool use (*p* = 0.002), the two groups were significantly different.

*Peak of deceleration*: The main effect of *Group* still remained significant [*F*(1, 29) = 15.83; *p* < 0.001; partial *η*<sup>2</sup> = 0.35], since PD patients (adjusted *M* = −3.22; *SD* = 0.21) reported a significant lower peak of deceleration than the healthy participants (adjusted *M* = −4.47; *SD* = 0.18). The main effect of *Time* [*F*(1, 29) = 0.049; *p* = 0.82; partial *η*<sup>2</sup> = 0.002] was no longer significant, and the *Group × Time* interaction was confirmed as not significant [*F*(1, 29) = 1.19; *p* = 0.28; partial *η*<sup>2</sup> = 0.04].

### The Role of Clinical Characteristics on the Performance in the Pointing Movement Task

Only for the PD group, we studied the relationship between the spatio-temporal parameters and the clinical characteristics of Duration of Disease and UPDRS motor score. The results, reported in **Table 2**, indicated the absence of any significant relationship, suggesting that the motor performance was no related to the considered clinical characteristics. Moreover, no difference emerged between the PD patients with the left lateralization of symptoms and affected individuals with the right lateralization of symptoms [*p* ≥ 0.081].

### Tactile Estimation Task

Neither main effect of *Group* (PD group *M* = 0.58; *SD* = 0.44; controls *M* = 1.38; *SD* = 0.38) [*F*(1, 29) = 1.86; *p* = 0.18; partial *η*2 = 0.06], nor an effect of *Time* (*pre-*tool-use session *M* = 1.12; *SD* = 0.26; *tool-use* session *M* = 0.97; *SD* = 0.31; *post-*tool-use session *M* = 0.87; *SD* = 0.33) [*F*(2, 58) = 1.19; *p* = 0.31; partial *η*2 = 0.039] was found. Moreover, no a significant *Time × Group* interaction [*F*(2,58) = 1.92; *p* = 0.15; partial *η*<sup>2</sup> = 0.062] emerged from the analyses. Due to this pattern of result, no further analyses were conducted for controlling the effect of *Age*.

### The Role of Clinical Characteristics on the Performance in the Tactile Estimation Task

Only for the PD group, we studied the relationship between the error reported in the experimental conditions of the Tactile Estimation Task and the clinical characteristics of *Duration of Disease* and *UPDRS motor score*. The results indicated the absence of any significant relationship, suggesting that the tactile estimation judgment was not related to the considered clinical characteristics. Specifically, considering the *Duration of Disease* in years, the relationship was not significant with the error reported in the *pre*tool-use session [*ρ*(14) = −0.98; *p* = 0.73], in the *tool-use* session [*ρ*(14) = −0.98; *p* = 0.73], and *post*-tool-use session [*ρ*(14) = 0.056; *p* = 0.82]. About *UPDRS motor score*, no significant relationship emerged with the error reported after the *pre*-tool-use session [*ρ*(14) = 0.86; *p* = 0.77], after the *tool-use* session [*ρ*(14) = 0.22; *p* = 0.44], and after the *post-*tool-use session [*ρ*(14) = 0.2; *p* = 0.47]. No difference emerged in the error after the *pre*-tool-use session [*U* = 23; *p* = 0.89], after the *tool-use* session [*U* = 24; *p* = 1], and after the *post*-tool-use session [*U* = 23; *p* = 1] between the PD patients with a left lateralization of symptoms (*pre*-tool-use session *M* = 0.6, *SD* = 0.71; *tool-use*-session *M* = 0.48, *SD* = 0.75; *post*tool-use session *M* = 0.58, *SD* = 0.77) and those with a right lateralization (*pre*-tool-use session *M* = 0.67, *SD* = 0.49; *tool-use*session *M* = 0.45, *SD* = 0.52; *post-*tool-use session *M* = 0.73, *SD* = 0.49).

In summary, in all considered spatial-temporal parameters, PD patients were significantly slower than the healthy individuals, as expected (Abbruzzese and Berardelli, 2003); this pattern emerged also when *Age* was taken into account in the analyses. In almost all spatial-temporal parameters (*Movement Time*, *Peak of velocity*, *Mean acceleration*, *Mean deceleration*, and *Peak of deceleration*), a significant difference between the *pre*-tool-use condition (i.e. the baseline) and the *post*-tool-use condition emerged, suggesting an effect of tool-use training on the motor behavior. Interestingly,

TABLE 2 | Correlational analyses between the clinical characteristics of Duration of Disease and UPDRS motor score and the spatial-temporal parameters about the performance of PD patients.


*n = 14.*

while the healthy individuals reported higher values of *mean velocity* and higher values of *deceleration* after the *tool-use* training, suggesting then the tool was embodied (Cardinali et al., 2009), such a difference did not emerge in the individuals with PD; this pattern of behavior emerged also when *Age* was taken into account in the analyses, suggesting how the difference in tool embodiment was not explained by age-related effects (Costello et al., 2015; Costello and Bloesch, 2017). On the other hand, tool use did not affect the tactile perceived length of the forearm, as suggested by the results in the Tactile Estimation Task.

### DISCUSSION

In this experimental study, we sought to investigate if a tool can be efficiently embodied in body representation, affecting action, of individuals with diagnosis of PD. According to our results, no changes in spatial-temporal parameters were observed in individuals affected by PD after a tool-use training, mirroring the absence of an effective tool embodiment into body representation. On the contrary, healthy controls had showed changes in velocity components, and specifically in the parameter of deceleration, meaning when individuals are nearest to approach the target after to have achieved the peak of velocity of their movement. This modification might be an effect of a modification in the movement trajectory, as suggested by the changes observed in temporal parameters relative to the amount of time to perform the going movement as well as in the mean velocity parameter. Critically, such a difference did not emerge in the PD patients.

As we have reported in the Introduction, tool embodiment allows investigating the peculiar characteristic of plasticity in body schema (Giummarra et al., 2008; Cardinali et al., 2009; Martel et al., 2016): a tool can be efficiently embodied in body schema, since it is an adaptable and plastic body representation. Multiple pieces of evidence indicated that body schema is altered in different pathological conditions (Berlucchi and Aglioti, 1997; Gallagher, 2001; Haggard and Wolpert, 2005) because of the influence by the aberrant peripheral input, such as in the case of pain (Schwoebel et al., 2001) or hemiplegia (Garbarini et al., 2015). Focusing on PD, it is a disease characterized by a multitude of sensory and motor symptoms, which mostly affect the body and action (Bereczki, 2010; Moustafa et al., 2016). We hypothesize that experiencing motor symptoms (such as tremor, bradykinesia, or rigidity) as well as sensory symptoms (such as pain or numbness of body parts) might alter body schema representation and specifically its plasticity. Indeed, brain processes somatosensory and motor information, which could be altered in PD, to build the complex body representation. As in our knowledge, no previous study had investigated body schema in PD; thus, our hypothesis need to be further explored and supported by future research. For example, it would be very interesting to observe which motor or sensory symptom might have a large impact on body representation. In our sample, we did not find any relationship between motor performance and clinical characteristics measured by UDPRS (Fahn and Elton, 1987), which is the most widely used clinical rating scale for PD in clinical and research setting; moreover, no difference emerged in terms of which body side was most affected by PD symptoms and signs.

Another possible explanation of this result can be traced in the description of body representation: it grounds on the integration of multiple sensory inputs (Gallagher, 2005;Dijkerman and De Haan, 2007). Through the central mechanism of multisensory integration, the different sensory inputs are coordinated together to create a unified and coherent internal representation of the external world (Stein and Meredith, 1993) and of our body (Ehrsson et al., 2012); importantly, the process of multisensory integration (and specifically of visual, tactile, and proprioceptive input) allows also tool embodiment, and specifically that it is part of the body-part-centered representation of space (i.e. peripersonal space), and extended the reachable area (Maravita and Iriki, 2004). Thus, following this hypothesis, the cognitive process of multisensory integration might be intact so that a tool can be efficiently embodied. From an anatomical point of view, basal ganglia play a pivotal role in the multisensory integration process, and specifically of proprioceptive and visual information (Adamovich et al., 2001; Nagy et al., 2006). However, basal ganglia are part of a network primary affected by the degeneration of the dopaminergic neurons of the substantia nigra in PD (Blandini et al., 2000). Indeed, it is not surprising to observe difficulties in the integration of multiple and different sensory inputs in PD (Adamovich et al., 2001;Almeida et al., 2005; Fearon et al., 2015; Ding et al., 2017; Avanzino et al., 2018). For example, Ding et al., 2017 recently hypothesized that the defective integration of proprioceptive-tactile and visual input in PD might impede the emergence of the traditional body illusion of the Rubber Hand in affected individuals. Thus, in the possible difficulties in tool embodiment in PD described in the present work might be due to an alteration of the multisensory integration process, since the anatomical dysfunction at the basal ganglia in PD. Future research needs to explore this topic, adopting more traditional methodological approaches (Calvert and Thesen, 2004; Scarpina et al., 2016) to study multisensory integration in PD. Moreover, the role of PD ideomotor slowness in this capability should be defined (Talsma et al., 2010): indeed, in our study, patients were systematically slower than controls, both in the *pre*- and *post*-tool-use session, which may have masked any change induced by tool use.

Considering the result about the Tactile Estimation Task adopted in the present study to investigate modification in the cognitive representation of arm's length, no difference emerged between the different experimental sessions. However, this result was observed not only in PD patients, but—against our hypothesis—also in the healthy controls. According to the traditional dualistic model of body representation (Gallagher, 2005; Dijkerman and De Haan, 2007), the Tactile Estimation Task refers to the component of *body image*, that is the perceptual body representation relative to cognition and beliefs, and not specifically involved in action and motor control (Dijkerman and De Haan, 2007). Following this hypothesis, tool use might affect specifically that body representation involved in action (i.e. *body schema*), investigated through the spatial-temporal analyses of *online* movement characteristics, but not the more stable representation of body image (Kammers et al., 2009; Cardinali et al., 2011). However, Cardinali et al., 2009 clearly reported that tool use can modify the perceived length of the arm. In their experiment, participants were asked to point toward different landmarks on the arm to study changes in perceptual body representation after tool-use training. Considering that PD individuals generally show poor accuracy in pointing movements (such as Flash et al., 1992; Adamovich et al., 2001; Pfann et al., 2001), this task might not be completely suitable in this clinical condition. Then, we adopted the Tactile Estimation Task, which allows investigating the body representation through the tactile size perception (Longo, 2015), in the absence of any movements. Nevertheless, both tasks refer to the same mechanism: participants use the representation of their own arm when they estimate the distance between two targets (Tactile Estimation Task) or point towards a target (Cardinali et al., 2009) perceived on the skin surface; thus, in the light of the previous consideration, we would have expected to find a significant difference between the experimental conditions in the Tactile Estimation Task, at least immediately after the tool training condition. However, it could be observed that in our experiment, the participants performed a significantly lower number of movements in all experimental conditions, than the study of Cardinali et al., (2009), perhaps too few to induce a change in the very stable body representation of the body image (Longo and Haggard, 2012). Moreover, it would be noticed that we adopted a very short tool, compared to what was done in previous studies (Serino et al., 2007; Cardinali et al., 2009, 2011; Sposito et al., 2012) in healthy participants; thus, even though our tool was long enough to allow pointing toward a target otherwise unreachable, affecting body schema representation in healthy individuals, it might be too short to change a stable body representation such as the body image (Farnè and Làdavas, 2000; Sposito et al., 2012). Focusing on the nature of the Tactile Estimation Task, it grounds on tactile perception, and specifically on the secondary tactile perception, meaning the process according to which extracting metric information from the skin surface requires additional computational processes over the primary tactile perception (i.e., when the external object presses on the skin) (Dijkerman and De Haan, 2007; Spitoni et al., 2010). However, we would underline that no previous study had measured the tactile threshold (Moseley, 2008) or the secondary tactile discrimination (Spitoni et al., 2010; Scarpina et al., 2014) in PD. Nevertheless, difficulties in sensory discrimination (Sathian et al., 1997; Nolano et al., 2008; Zambito Marsala et al., 2011) have been reported in PD population, requiring future investigation on this topic. Finally, even though it is out of the scope of the present manuscript, we would underline that there are multiple theories about how many body representations are in the brain (De Vignemont, 2007), with consequences on the interpretation of the behavioral data. In the present work, we refer to the traditional dyadic model of *body schema/body image* (Gallagher, 2005; Dijkerman and De Haan, 2007). However, considering the other theoretical frames, we underline that the Tactile Estimation Task might be read as referring to a body structural description (*triadic taxonomy*, e.g., Longo and Haggard, 2010, 2012), and it is a task grounded on that implicit metric body representation that underlies position sense and external tactile localization (Longo, 2015, 2018).

From the preliminary nature of this investigation, some limitations can be recognized. First of all, as previously stated, the number of movements and measurements should be enlarged, even though we need to deal with the negative effect of the well-known non-motor PD symptom of fatigue (Lou et al., 2001; Shulman et al., 2001). Moreover, the task was self-paced; it would be interesting to perform the tasks in different (self-paced vs external-paced) modalities, but it should be taken into account that the overall accuracy and stability of movements can be negatively influenced by attentional processing enhanced by the presence of external cueing in PD (Almeida et al., 2005). Finally, the possible effect related to lateralization of symptoms, meaning which body part side was the most affected by disease, in relation to the dominance handedness, as well as the role of cognitive difficulties in PD (Litvan et al., 2011) and specifically in cognitive estimation (D'Aniello et al., 2015a,b; Scarpina et al., 2017) should be considered. Future research needs to overcome these limitations, where possible.

This study suggests at a novel way to conceive PD sensory motor signs and symptoms: the disease might affect the tool embodiment in cognitive body representation, as a possible secondary effect of altered plasticity of body schema, since the sensory and motor symptoms, or altered multisensory integration process due to the degeneration of dopaminergic neurons in the basal ganglia. Tool embodiment in body representation can extend the potentiality of individual's action; however, if deficient, it might have remarkable consequences and implications (Giummarra et al., 2008;Mayer et al., 2008; Pazzaglia et al., 2013) on motor behavior, specifically in those clinical conditions like PD, in which the body and action are primarily affected by symptoms.

### AUTHOR CONTRIBUTIONS

FS conceived the study, collected data, performed the analyses and wrote the main manuscript. NC collected data and performed the kinematic analyses. VC and MG supervised the kinematic analyses. LP recruited patients and performed the neurological examination. AM supervised the recruitment and the neurological examination. All authors reviewed the manuscript.

### FUNDING

The authors declare no competing financial interests. This study was supported by Ministero dell'Istruzione, dell'Università e della Ricerca (MIUR) project "Dipartimenti di Eccellenza 2018–2022" of the Department of Neuroscience "Rita Levi Montalcini" University of Turin, Italy.

## ACKNOWLEDGMENTS

The authors would like to thank Erica Scerbo and Ilenia Quartini for their support in recruiting healthy individuals and Giuditta Andreoletti for her support in analyzing the kinematic data.

### REFERENCES


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Copyright © 2019 Scarpina, Cau, Cimolin, Galli, Priano and Mauro. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

# The Impact of Human–Robot Synchronization on Anthropomorphization

Saskia Heijnen1,2 \*, Roy de Kleijn1,2 and Bernhard Hommel1,2

<sup>1</sup> Cognitive Psychology Unit, Leiden University, Leiden, Netherlands, <sup>2</sup> Leiden Institute for Brain and Cognition, Leiden, Netherlands

To elucidate the working mechanism behind anthropomorphism, this study investigated whether human participants would anthropomorphize a robot more if they move synchronously versus non-synchronously with it, and whether this is affected by which of the two initiates the movements. We tested two competing hypotheses. The feature-overlap hypothesis predicts that moving in synchrony would increase perceived self-other feature overlap, which in turn might spread activation to codes of features related to humans—which should increase anthropomorphization. In contrast, the autonomy hypothesis predicts that unpredictability increases anthropomorphization, and thus that whenever the robot initiates movements, or when the human initiates movements to which the robot moves non-synchronously, there is an increased perception of the robot as a more human-like, intentionally acting creature, which in turn should increase anthropomorphization. We performed a study with synchrony as within-subjects factor, and initiator (robot or human) as between-subjects factor. To study the impact of synchrony on self-other overlap and perception of human likeness, participants completed two tasks that served as implicit measures of state anthropomorphization, and two questionnaires that served as explicit measures of state anthropomorphization toward the robot. The two implicit measures were the joint Simon task and one-shot Dictator Game. Additionally, participants filled in a trait anthropomorphization questionnaire, to enable correction for baseline tendencies to anthropomorphize. The synchrony manipulation did not affect the joint Simon effect, although there was an effect on average reaction time (RT), where in the group in which the robot initiated the movement, RTs were slower when the human and robot moved non-synchronously. The Dictator Game offer and the state anthropomorphization questionnaires were not affected by the synchrony manipulation. There was, however, a positive correlation between current anthropomorphization of the robot and amount of money offered to it. Given that most measures were not systematically affected by our manipulation, it appears that either our design was suboptimal, or that synchronization does not affect the anthropomorphization of a robot.

Keywords: anthropomorphization, robot, synchrony, Simon task, Dictator Game, imitation, agency, self-other overlap

Edited by:

Eszter Somogyi, University of Portsmouth, United Kingdom

#### Reviewed by:

Barbara C. N. Müller, Radboud University Nijmegen, Netherlands Bruno Lara, Universidad Autónoma del Estado de Morelos, Mexico

> \*Correspondence: Saskia Heijnen s.heijnen@fsw.leidenuniv.nl

#### Specialty section:

This article was submitted to Cognition, a section of the journal Frontiers in Psychology

Received: 19 June 2018 Accepted: 04 December 2018 Published: 08 January 2019

#### Citation:

Heijnen S, de Kleijn R and Hommel B (2019) The Impact of Human–Robot Synchronization on Anthropomorphization. Front. Psychol. 9:2607. doi: 10.3389/fpsyg.2018.02607

### INTRODUCTION

fpsyg-09-02607 December 26, 2018 Time: 19:0 # 2

Anthropomorphization is commonly defined as the attribution of human mental states and characteristics to non-human animals and objects. There are two components to anthropomorphization: attributing human physical features to non-human animals and objects (e.g., seeing a face in the clouds), and attributing a human mind to non-human animals and objects (Waytz et al., 2010). This attribution encompasses not only emotions (e.g., my cat is grumpy), but also higherorder mental states such as intentions, desires, self-reflection, consciousness, and agency (e.g., my cat is proud of the jump he just made). Anthropomorphization has a strong impact on how humans perceive, appreciate, and interact with artificial systems and robots in particular. Ample evidence suggests that a greater tendency to anthropomorphize—be it due to individual traits or situational factors—increases the acceptability of robots and the degree to which people enjoy interacting with and trust robots (for reviews, see Hancock et al., 2011; Fink, 2012; Zlotowski, 2015).

The present study investigated the effect of synchronous movement on the anthropomorphization of robots. In humans, moving in synchrony has been shown to have a strong impact on the interpersonal relationship and self-other representation. For instance, individuals who synchronized their behavior felt more connected and thought the other was more similar to themselves (Valdesolo et al., 2010). Synchronized behavior also led to increased similarity ratings, compassion, and a higher tendency to display altruistic behavior by helping the person that had been synchronized with (Valdesolo and DeSteno, 2011). We consider the possibility that synchronized movement might not only affect the relationship and perceived similarity between two humans but also between a human and a robot. As we will explain more elaborately below, this might depend on the overlap of the representations of the self and the other. That such representations can be extended to non-biological objects is consistent with findings that non-biological objects can become part of one's own body representation: Ma and Hommel (2015) have demonstrated ownership illusions for a balloon and a rectangle in the condition in which the object moved in synchrony with the participant's hand.

We hypothesized that synchronous movement could affect anthropomorphization of a robot in two, opposing, ways. On the one hand, synchrony may increase the perceived similarity between human and robot, because moving in the same way would be an event feature that human and robot would share, and this might increase anthropomorphization—the feature overlap hypothesis. On the other hand, however, one may also argue that non-synchronous behavior of a robot increases the perception of its autonomy which, as perceived autonomy (or agency) may contribute to anthropomorphization, may lead to stronger anthropomorphization—the autonomy hypothesis.

### The Feature Overlap Hypothesis

The first hypothesis is derived from the Theory of Event Coding (TEC: Hommel et al., 2001), which assumes that the same codes are used to represent perception and action features (Prinz, 1990). Thus, watching someone ride a bike involves the activation of codes that largely overlap with those activated by actually riding a bike oneself. Events are thus represented by networks of feature codes referring to the perceptual and action-related aspects of the event, weighted by the contextual relevance of the involved feature dimensions (Memelink and Hommel, 2013). Two implications of this approach are important for our hypothesis. First, the activation of features follows a pattern-completion logic: if one code of an event representation is activated, activation will spread to the remaining members of the representational code network, so that seeing a bike wheel will not only activate the feature bike wheel but will also spread to the codes representing bike, chain, saddle, and pedal, whether these are currently visible or not. Second, TEC does not distinguish between social and non-social events (Hommel et al., 2009), suggesting that it can be applied to humans and non-humans alike.

Combining these two implications allows us to derive a straightforward prediction with respect to the possible impact of synchronous movement. If a human participant and a robot are instructed to move synchronously, as compared to non-synchronously, they share a salient, task-relevant feature. This would render the self-representation of the human and his/her representation of the robot more similar, which should reduce self-robot discriminability. Reducing the discriminability between the representations of two events is likely to allow for feature migration from one representation to the other. For simple objects, this has been first demonstrated by Treisman and Gelade (1980), who found that distracting attention increases the probability of attributing the features of one object to another, simultaneously visible object. Extending the logic of this approach to social situations, Ma et al. (2016) have shown evidence of feature migration from a virtual face that moved in synchrony with the movements of a human participant: in contrast to a condition with non-synchronous movements, the synchrony condition led to more positive mood and better performance in a mood-sensitive creativity task when the avatar started smiling—suggesting that the avatar's mood migrated to the participant. If we assume that feature migration goes both ways—i.e., features of the other may affect features I associate with myself; features I associate with myself may affect features I associate with the other—it is possible that a synchronously moving robot leads human participants to attribute more human features to the robot, which in turn should lead to stronger anthropomorphization.

Empirical support for this consideration can be found in Morewedge et al. (2007), who demonstrated that when an animal, a robot, or an animated blob move at a speed that is closer to the average human speed of moving, it is anthropomorphized more. Along the same lines, a gender-neutral robot talking in a human-like voice (but not one talking in a robot-like voice) was anthropomorphized more when the gender of the participant matched the gender of the voice (Eyssel et al., 2012). Given that in-group members are seen to overlap with the self more than out-group members (Tropp and Wright, 2001), it is interesting to note another study which showed that a robot was anthropomorphized and liked more when it was presented as in-group, as compared to out-group (Kuchenbrandt et al., 2013). Participants were primed with either a picture of the robot or a picture of a computer, and then had to indicate whether a target word was a primary (e.g., happy) or secondary (e.g., hopeful) emotion, or no emotion at all. When the participants were told that they were in the same group as the robot, being primed with the robot coincided with quicker responses to secondary emotions, than being primed with the computer did. Given that secondary emotions are considered exclusively human (Leyens et al., 2000, 2001), the authors interpreted this as meaning that the in-group robot activated the concept "human."

### The Autonomy Hypothesis

fpsyg-09-02607 December 26, 2018 Time: 19:0 # 3

In their three-factor theory of anthropomorphization, Epley et al. (2007) suggest that unpredictability leads to increased anthropomorphization of a target. Humans generally like to interact effectively with the environment, and this is easier when the environment is predictable. The authors suggest that by attributing human characteristics, such as goals and intentions, to the unpredictably behaving non-human target, people become better able to predict its behavior, and thus resolve the tension between the desire to predict and the actual unpredictability. From that perspective, one might argue that the tendency to anthropomorphize should increase with the degree of unpredictability of another agent. Indeed, Epley et al. (2007) suggest that unpredictability of another agent induces the impression of this agent to be more autonomous a feature that characterizes humans—which in turn should facilitate the attribution of other human characteristics to the agent. This implies that a robot that moves nonsynchronously with a human participant, or that initiates unpredictable movements, should elicit a stronger tendency to anthropomorphize than a robot that moves synchronously with the human.

### The Current Study

To test the feature-overlap hypothesis against the autonomy hypothesis (for the first time, to the best of our knowledge), we exposed human participants to a robot with whom the participants were to interact. This interaction entailed making head movements that were the same as or different from the action partner's before a computer task that required head movements for responses—thus rendering head movements task-relevant. Three kinds of dependent measures were taken to assess various aspects and implications of anthropomorphization.

First, we used the joint Simon task as a measure of spatial self-other discrimination, and thereby as implicit measure of anthropomorphization. In the regular Simon task (Craft and Simon, 1970; Simon, 1990), one person responds to the identity of one of two different stimuli with a left or right button press on each trial. The stimuli are randomly presented either to the left or the right of a fixation cross, which consistently yields faster and more correct responses if the location of the stimulus and location of the response correspond (the congruent trials)—i.e., the stimulus is presented on the right and the correct response is the right-hand button—than if they do not correspond (the incongruent trials). This difference is called the Simon effect. Interestingly, the effect is also obtained if only one of the two keys is operated by the participant while the other is operated by another agent, whether this is another human being (Sebanz et al., 2003), a wooden hand or a Japanese waving cat (Dolk et al., 2013; Stenzel and Liepelt, 2016). In this version, called the 'joint Simon task,' the task is essentially a go/no-go task, requiring a response only when one of the two stimuli appears. The congruency effect in this paradigm is called the 'joint Simon effect.' Importantly for our purposes, the joint Simon effect was also obtained in a study where a human participant worked side-by-side with a robot (Stenzel et al., 2012), and the effect was larger when participants were either told that the robot was programmed in a "biologically inspired, autonomous way" than when they were told that it was programmed in a "purely deterministic way." Another recent study found a joint Simon effect in virtual reality both when the co-actor was a human hand and when it was a robotic hand (Bunlon et al., 2018). Stenzel et al. (2013) similarly found a joint Simon effect for a robotic co-actor, but failed to find a relationship between the size of the effect and explicit self-other inclusion, as measured by asking participants which of six images ranging from widely separated to highly overlapping circles best described the relation between the participant and the robot (the Inclusion of the Other in the Self scale, IOS). The latter is striking from the point of view of the feature overlap hypothesis, though it might be accounted for by the fact that there was no manipulation of self-other similarity. Likewise, Wen and Hsieh (2015) have found a joint Simon effect in participants undergoing fMRI who believed they were performing the task together with a robot, although they found reduced activation in areas associated with thinking about beliefs and intentions of others when compared to neuronal activation of participants who believed they were performing the task with another human. Here too, a manipulation of self-other similarity may have made a difference.

In line with Dolk et al. (2014), we interpret the joint Simon effect as the degree to which the presence of another agent is considered in (i.e., related to) one's own representation of the task. Since this makes it more difficult to determine whose turn it is on a given trial, the more the other is considered in one's own task-representation, the greater is the need to distinguish between oneself and the other. An obvious way to distinguish oneself from the other is via location, which makes location task-relevant. This increases attention to location—the feature that produces the Simon effect. Greater self-other similarity, and the subsequent greater reliance on location information that is required to deal with this similarity in order to perform on the task, leads to a more pronounced Simon effect. Hence, a larger joint Simon effect indicates larger self-other similarity, which, according to the feature-overlap hypothesis, is grounds for migration of self-related features to the other, resulting in increased anthropomorphization. However, in a task that requires two agents to take turns, greater self-other similarity might impair response selection even independently from the Simon effect proper. If so, one would not (or not only) expect synchrony between human robots to increase the size of the Simon effect but it may also affect reaction time (RT) in general.

Second, we used the Dictator Game to assess altruism, and thereby as implicit measure of anthropomorphization. Originally a method in experimental economics (Kahneman et al., 1986), the Dictator Game is often used to study fairness, rejection, and altruism, among other things (List, 2007), although it should be noted that other factors such as experimental demand characteristics and social norms play a role as well (Bardsley, 2008). In the Dictator Game, one person is the "dictator" who decides how a given amount of money will be distributed between him- or herself and another player. The other player has no choice but to accept the proposed distribution, hence the term "dictator" to characterize the former player. Since the human and robot were performing the Simon task together, and no competitive elements were present nor highlighted, we expected that the participant would consider the robot as a collaborator. Ben-Ner and Kramer (2011) have shown that collaborators are given higher stakes than neutral and competitive opponents, so we expected that more anthropomorphization of the robot would go along with more money given to it. One might object that giving money to a robot could seem counterintuitive (after all, what is it going to use it for?), but previous studies suggest that people are not entirely reluctant to give money to robots (Torta et al., 2013; de Kleijn et al., 2019). Given that synchronization promotes altruism (Valdesolo and DeSteno, 2011), we predicted that synchronized movement with a robot would lead to more money given to it in a one-shot Dictator Game.

Third, three questionnaires were used, two to assess state anthropomorphization (e.g., "Overall, do you believe QBo is capable of having intentions?"; Kozak et al., 2006; Epley et al., 2007; Torta et al., 2013) and one designed to assess trait anthropomorphization (e.g., "To what extent does the average reptile have consciousness?"; Waytz et al., 2010). The state anthropomorphization questionnaires served as explicit measures of anthropomorphization toward the robot, whereas the joint Simon task and Dictator game served as implicit measures of anthropomorphization toward the robot.

In sum, we tested how human participants would be affected by synchronously and non-synchronously moving with a robot in terms of explicit anthropomorphization and implicit measures that would be expected to relate to the degree of anthropomorphization. We distinguished between explicit and implicit measures due to evidence that these may diverge (Kim and Sundar, 2012). Based on the feature-overlap hypothesis, we expected that synchronous movement, compared to non-synchronous movement, would result in a larger joint Simon effect, higher stakes offered in the Dictator Game, and higher state anthropomorphization scores. In contrast, the autonomy hypothesis would predict that synchronous and/or predictable movement (i.e., the robot synchronizing with humaninitiated movement) should lead to lower anthropomorphization scores, a smaller joint Simon effect, and lower stakes in the Dictator Game, as compared to non-synchronous and/or unpredictable movement (i.e., the robot moving differently compared to the human-initiated movement, or the robot initiating movements).

### MATERIALS AND METHODS

### Participants

An a priori power analysis using G∗Power 3.1.9.2 (Faul et al., 2007) indicated a required sample size of 52 participants, based on an expected effect size of d = 0.4, informed by an informal review of the literature. Fifty-four participants were recruited (35 female), most of which (36) were Leiden University students. They were recruited through advertisements, word of mouth, and via e-mail invitations. One participant was excluded from analysis due to evident failure to understand the instructions. The mean age was 23.3 years (total range: 19–30). Inclusion criteria were: healthy adults between 18 and 30 years of age with normal or corrected-to-normal vision. Exclusion criteria were: autism spectrum disorder and the use of psychoactive medication. The study was approved by the Leiden University Psychology Research Ethics Committee. All participants gave written informed consent before participation, following the Declaration of Helsinki, and were given monetary compensation for their time and efforts.

### Manipulation

All participants completed two sessions, during one of which they moved in synchrony with the robot, i.e., mirroring movements, while in the other (order counterbalanced), participants and robot moved non-synchronously, i.e., avoiding mirroring or copying the other's movements. For half of the participants, the robot was the initiator of the movements in both sessions, with the participant as the follower. The other half of the participants were the initiator themselves in both sessions, with the robot as the follower. This distinction was made because it was thought that there may be differential effects depending on who initiates the movement. No specific direction was predicted. The design resulted in four scenarios: (I) human initiator, synchronous condition; (II) human initiator, non-synchronous condition; (III) robot initiator, synchronous condition; and (IV) robot initiator, non-synchronous condition.

In scenario (I), the participant was instructed to start making movements with his or her head, left and right at various speeds and to various degrees, which the robot would then copy. This copying was accomplished by use of a motion tracker sewn onto a cap that the participant wore throughout the session, which communicated with the computer that controlled the robot's movement. In scenario (II), the participant was instructed to make any of those movements with his/her head, and was told the robot would avoid copying the movements. In scenario (III), the participant was instructed to copy exactly the head movements that the robot made. In scenario (IV), the participant was instructed to avoid moving his/her head in the exact way the robot was at the time the robot was making the movement. The robot's head movements in these latter three scenarios were randomly generated. It was stressed that in the non-synchronous condition, participants should not make the exact opposite of the robot's movements, as that is really just like copying. Participants could thus freely move to the opposing or same direction, as long as they moved with a different speed and/or to a different angle compared to the robot at any specific point in time. See **Figure 1** for a sketch of the manipulation. Participants either went through scenarios (I) and (II) (those in the human initiator condition), or they went through (III) and (IV) (those in the robot initiator condition), order counterbalanced.

### Measurements

fpsyg-09-02607 December 26, 2018 Time: 19:0 # 5

#### Joint Simon Task

The task was presented on a 21-inch monitor. Each trial started with a fixation cross presented for 500 ms. After this, a blue or a red solid square was presented at either the left or the right of a fixation cross until a response was recorded. Depending on the session's instructions either the robot or the participant had to respond to the stimulus by turning their head to either the left or the right (see **Figure 2**). Color and side were counterbalanced between participants, so where one participant may have received the instructions to respond to red squares with a head turn to the right, another may have received instructions to respond to red squares with a head turn to the left, and yet another to respond to blue squares with a head turn to the left. Participants were always informed that the robot would respond to the other color, and with a head motion to the opposite side. Following the response, the next trial was initiated. Participants wore an InterSense InertiaCube4 motion tracker stitched to a cap on their head, which recorded the response onset (i.e., head turn to the left or right) in relation to the stimulus onset in miliseconds, which was used as RT measurement. Participants first completed a practice block of 8 trials, followed by 4 blocks of 64 trials, which made for 256 recorded trials in total.

### Dictator Game

After the joint Simon Task, participants performed a one-shot Dictator Game with the robot as the opponent. They were presented with a stake, which could be 2, 5, 8, 10, or 20 EUR (randomly drawn each session), and they were asked to decide how much of this stake, if any, they would want to give to the robot. The stakes were varied to control for size, and the outcome measure was the proportion of the stake participants were willing to give to the robot.

#### Questionnaires

At the end of both sessions, participants were asked to fill out three questionnaires: the Individual Differences in Anthropomorphism Questionnaire (IDAQ; Waytz et al., 2010) to assess trait anthropomorphization; the Mind Attribution Scale (MAS, Kozak et al., 2006) to assess state anthropomorphization; and another state anthropomorphization scale taken from Torta et al. (2013), which is based on Epley et al. (2007; henceforth: Torta state questionnaire). The IDAQ trait anthropomorphization questionnaire inquires into general tendency to anthropomorphize, with questions such as "To what extent does a car have free will?". We expected that people with a high, compared to low, tendency to anthropomorphize would show a larger joint Simon effect and offer more money in the Dicator game, hence we wanted to be able to control for its effects. The state anthropomorphization questionnaires asses anthropomorphization toward something recently interacted

#### FIGURE 1 | Continued

fpsyg-09-02607 December 26, 2018 Time: 19:0 # 6

initiator, non-synchronous condition. The participant initiates a large head movement to the left, the robot does not mirror this, but makes a small head movement to the right. (C) Robot initiator, synchronous condition. The robot initiates a large head movement to the right, with which the participant synchronizes with a minimal delay by making a large head movement to the left. (D) Robot initiator, non-synchronous condition. The robot initiates a large head movement to the right, the participant avoids mirroring this by making a small head movement to the right.

with, and were modified to inquire about the robot, TheCorpora's QBo rather than "your opponent" or "this person," i.e., "Overall, do you believe the opponents you have encountered have free will" became "Overall, do you believe QBo has free will" in the Torta state questionnaire; and "This person has complex feelings" became "QBo has complex feelings" in the MAS. All questionnaires were answered on a 7-point Likert scale. The IDAQ is originally rated on a 10-point Likert scale, but to increase consistency between the questionnaires, and because there is evidence that there is not much difference in answers to Likert scales of seven or more options (Cox III, 1980; Weng, 2004; Dawes, 2008), the response options were reduced to seven. Participants completed all questionnaires in both sessions. After the second session, participants answered five open questions that would give us insight into their experience. All questionnaires can be found in **Appendix A**.

### The Robot

TheCorpora's QBo was used, which is a small, semi-humanoid robot of 45.6 cm high, 31.4 cm wide and 29.25 cm deep, with a curved trunk and round head. It has two large wheels on its sides and one small wheel on the front (reminiscent of a vacuum cleaner), no limbs, but it does have a head that can move in all directions (see **Figure 3**). The head has two webcams for eyes, a led-light for a nose, and 20 led-lights for a mouth. QBo is mostly white, with elements of green. In scenarios (II), (III), and (IV), the robot received instructions from the computer controlling it for randomly determined head movements. The participant's motion tracker's data was disregarded in these scenarios. For scenario (I), the computer controlling the robot received input from the motion tracker, which was translated for the robot to mirror the motion the participant made in realtime. During the joint Simon task, at the start of every trial to which the robot was to respond, a message was sent from the experiment computer (E-Prime in Windows) to the robot computer (Linux) to initiate the appropriate response shortly after stimulus onset. There was no variation in the robot's response latency, and there were no pre-programmed erroneous responses, although there was a sporadic miscommunication between the computers leading some response omissions on the robot's end. The number of trials to which the robot failed to respond was not recorded.

### Design and Procedure

The current investigation was a two-session, 2 × 2 mixed design study, with synchrony as within-subjects factor (synchronous vs. non-synchronous) and initiator as between-subjects factor

FIGURE 2 | Joint Simon task setup. In this example of the joint Simon task, the participant has to respond to red stimuli, whereas the robot has to respond to blue stimuli. The participant has to respond with a large head movement to the left; the robot has to respond with a large head movement to the right. (A) Congruent trial for the participant (stimulus on the participant's (Continued)

#### FIGURE 2 | Continued

fpsyg-09-02607 December 26, 2018 Time: 19:0 # 7

side of the screen); no-go trial for the robot. (B) Incongruent trial for the participant (stimulus on the robot's side of the screen); no-go trial for the robot. (C) Congruent trial for the robot (stimulus on robot's side of the screen); no-go trial for the participant. (D) Incongruent trial for the robot (stimulus on the participant's side of the screen); no-go trial for the participant.

(human initiator vs. robot initiator). Participants completed two sessions of 60 min, 1 week apart. Assignment to conditions and group was performed using http://www.randomization.com/. The study was single-blind: the experiment leader was aware of the condition the participant was in. This was deemed unavoidable due to the novelty of the procedure and the necessity of the experimenter to observe the procedure to ensure it was executed correctly. This could only be accomplished by knowing which movements were required according to the current condition.

At the start of each session, the participant and robot practiced both the synchronous and non-synchronous movements for half a minute, so the participant could experience what the alternative was like. Subsequently they performed the synchrony manipulation that belonged to the current session for 4 min. After every block of 64 trials in the joint Simon task, the manipulation was repeated for 2 min to ensure that the effect did not wear off.

Upon arrival in the first session, participants were informed verbally and by means of an information letter about the study they were about to take part in. After giving written informed consent, they were taken into a room with a oneway mirror, where the experiment took place. The experiment leader was stationed behind the mirror and monitored whether the robot and the program were functioning appropriately, and that the synchronization procedure was executed correctly, not explicitly observing the participant's behavior for other purposes. Participants were informed of this fact, so as to minimize any effects of observation. The remaining procedure was the same for both sessions, the only difference being the synchronization type. Participants started with a practice session of the synchronization manipulation as explained above, followed by 4 min of the manipulation. After this, they received instructions for the joint Simon task and went through an 8 trial practice block. Thereafter they started on the joint Simon task, with repeats of the manipulation after every block. After the joint Simon task, they completed the one-shot Dictator Game. Finally, they filled out the trait and state anthropomorphization questionnaires. After the second session they filled out some additional questions about their experience of the experiment, which was followed by debriefing and payment.

### RESULTS

### Data Preparation

Before analysis, the data were prepared and filtered in the following way. The Dictator Game offer was coded as a proportion of the total stake, which was used for further analyses.

Principal component analyses with direct oblimin rotation were performed on the three questionnaires to ascertain the structure of the measures. For the Torta state anthropomorphization questionnaire, the Kaiser-Meyer-Olkin (KMO) measure of sampling adequacy was above the recommended 0.6 with a value of 0.864, and the determinant was satisfactory as well (0.065). The communalities were good, ranging from 0.547–0.781. All items were significantly correlated to one another, but none of the correlations were extremely high, indicating there was no reduncancy in the items (range of correlations: 0.479–0.766). The results indicated a one-factor solution, explaining 68.2% of the variance, for the Torta state anthropomorphization questionnaire. Hence, the ratings of each session were added to form one Torta state anthropomorphization score (one for synchronous, one for non-synchronous).

For the MAS state anthropomorphization questionnaire, the KMO was satisfactory as well with a value of 0.839, as was the determinant (0.015). The communalities were good, ranging from 0.508 to 0.758. Most items were significantly correlated (36 out of 45), with a range of 0.177 to 0.655 in correlation coefficients among the significant correlations. The results indicated a twofactor solution, explaining 60.1% of the variance, contrary to the three-factor solution suggested by the authors (Kozak et al., 2006). Factor loadings are presented in the **Appendix B**. The first factor seemed to be related to ascription of phenomenal consciousness, combining all but one of the items of the emotion and cognition scales that Kozak et al. (2006) identified, whereas

the second seemed to reflect ascription of agency, and included one of the cognition scale items ("QBo has a good memory") in addition to the items that Kozak et al. (2006) found to be in the intentionality scale. Based on these analyses, two MAS scores were calculated for each session: an overall MAS score (for both the synchronous and non-synchronous session), and an agency MAS score (for both the synchronous and non-synchronous session).

Waytz et al. (2010) reported a two-factor solution to best suit the IDAQ items, using both the anthropomorphization and control items in the factor analysis. In our sample, however, a two-factor model explained only 29.6% if the variation. Based on the criterion of Eigenvalue > 1, a 10-factor model emerged, with a low KMO value (0.664), low communalities (ranging from 0.007 to 0.640), and an unsatisfactory determinant (0.0000014), indicating that our sample was not large enough to support this model. It was therefore decided to use only the anthropomorphization items. Based on the criterion of Eigenvalue > 1, a four-factor model emerged explaining 63% of the total variance, one factor representing all items related to technology, one representing all items related to animals, and the items about nature distributed over two factors. A satisfactory KMO value of 0.728 and higher communalities (ranging from 0.295 to 0.829) indicated that this solution was better compared to the model using all items. Waytz et al. (2010) suggested a distinction between anthropomorphization toward animate versus inanimate targets, hence we also ran a two-factor analysis of the data. The animate versus inanimate distinction was, however, not reflected in the two-factor model, nor was any other pattern evident. This model explained only 46.3% of the variance. The KMO was satisfactory (0.728), but the communalities were lower (ranging from 0.179 to 0.744). All things considered, a three-factor model seemed to most sensibly capture the data, one factor representing items related to technology, one representing items related to animals, and one representing items related to nature, in total explaining 56.1% of the variance. KMO was satisfactory (0.728), as was the determinant (0.002), and the communalities were better than for the two-factor model (ranging from 0.293 to 0.744). Having established the structure of the questionnaire, and having assured that all items contributed to the scale, a total IDAQ score was computed for each session.

The session 1 (M = 41.7, SD = 9.6) and session 2 (M = 40.8, SD = 10.4) IDAQ trait scores were combined and averaged, assuming that the average of two moments in time of filling in a questionnaire gives a better indication of general anthropomorphization tendencies than does a single one, and the resulting score was used in further analyses. Indeed, the correlation between the two showed a good test-retest reliability [r(53) = 0.841, p < 0.001]. Please refer to **Table 1** for descriptive statistics for all measures.

The data of the Torta questionnaire, the joint Simon task, and the Dictator Game did not meet the assumption of normally distributed residuals. A log transformation did not sufficiently eliminate this problem. However, given that there is evidence that ANOVAs are robust against violations of this assumption (Blanca et al., 2017a,b), the planned mixed ANOVAs were performed. To determine the mean RTs, we used a recursive outlier detection method with a moving criterion (Selst and Jolicoeur, 1994), which has been shown to be relatively insensitive to sample size and skew compared to non-recursive methods (e.g., 2.5 SD from the mean) and recursive methods without moving criterion. This is appropriate due to the natural skewness of RT data.

To determine whether IDAQ trait anthropomorphization interacted with the independent variables and thus whether it could be used as a covariate for the mixed ANOVAs on RT, Dictator Game offer, and the state anthropomorphization questionnaires, we performed linear regression analyses. We ran separate analyses per outcome variable and per timepoint (synchrony condition), and used IDAQ and the interaction between initiator and IDAQ as predictors. For each of the analyses, the interaction was significant [RTsynchronous: F(3,6036) = 134.345, p < 0.001, R <sup>2</sup> = 0.063; RTnon−synchronous: F(3,6042) = 95.722, p < 0.001, R <sup>2</sup> = 0.045; DGsynchronous: F(3,49) = 144.786, p < 0.001, R <sup>2</sup> = 0.899; DGnon−synchronous: F(3,49) = 172.051, p < 0.001, R <sup>2</sup> = 0.913; MASsynchronous: F(3,49) = 155.929, p < 0.001, R <sup>2</sup> = 0.905; MASnon−synchronous: F(3,49) = 159.343, p < 0.001, R <sup>2</sup> = .907; Tortasynchronous: F(3, 49) = 229.920, p < 0.001, R <sup>2</sup> = 0.934; Tortanon−synchronous: F(3,49) = 191.557, p < 0.001, R <sup>2</sup> = 0.921], meaning the intended covariate was not independent, thus it could not be added without violating the assumption. We therefore ran the mixed ANOVAs without IDAQ trait anthropomorphization, and added Spearman correlation analyses to inquire into the relationship of IDAQ trait anthropomorphization and the dependent variable. The correlations reported below are all Spearman rho's (ρ), since they all involve questionnaire data. Results of all mixed ANOVAs described below are displayed in **Table 2**. All mixed ANOVAs were backed up by Bayesian mixed ANOVAs, the results of which are to be found in **Table 3**.

### Joint Simon Effect

We ran a mixed ANOVA on mean RT, with two withinsubjects factors (synchrony: synchronous vs. non-synchronous; and congruency: congruent vs. incongruent) and one betweensubjects factor (initiator: robot vs. human). There was a significant main effect of congruency [F(1,51) = 30.306, p < 0.001, η 2 <sup>p</sup> = 0.373], where responses on congruent trials (M = 431, SD = 8.2) were faster than on incongruent trials (M = 446, SD = 9.0), thus replicating the joint Simon effect with a robotic partner. Additionally, there was a significant synchrony ∗ initiator interaction [F(1,50) = 7.148, p = 0.01, η 2 <sup>p</sup> = 0.123], where those in the human initiator group had shorter RTs in the non-synchronous (M = 432, SD = 12.6) than in the synchronous condition (M = 439, SD = 11.9), whereas those in the robot initiator group had shorter RTs in the synchronous (M = 435, SD = 12.2) than in the non-synchronous condition (M = 451, SD = 12.8), see **Figure 4**. Follow-up paired t-tests showed that this effect was driven by a significant synchrony effect in the robot initiator group [t(25) = 2.644, p = 0.014]; the difference did not reach significance in the human initiator group [t(26) = 1.212, p = 0.236]. Notably, a number of participants in the robot initiator

#### TABLE 1 | Descriptive statistics.

fpsyg-09-02607 December 26, 2018 Time: 19:0 # 9


#### TABLE 2 | Results for all mixed ANOVAs.


TABLE 3 | Bayes Factors for inclusion of specified terms compared to models without those terms.


group reported difficulty during the non-synchronous session's manipulation. They found it taxing to simultaneously monitor the robot's movements, plan their own movements, and make sure they were not the same. However, as this was spontaneous self-report, and not systematically assessed, we cannot take this into account in analyses.

To provide stronger evidence for the current results, a Bayesian mixed ANOVA with the same factors was conducted. To determine which effects are likely predictors of RT, we looked at the Bayes Factors for addition of each of the terms to a model without that specific term (Jarosz and Wiley, 2014;

Wagenmakers et al., 2018), i.e., Inclusion Bayes Factor based on matched models. The results concur with the regular mixed ANOVA, and provided very strong evidence that congruency (BFinclusion = 2843.892) and the synchrony <sup>∗</sup> initiator interaction (BFinclusion = 111.516) were predictors in explaining the RT data. The correlation between the congruency effect (synchronous and non-synchronous averaged) and IDAQ trait anthropomorphization was not significant [ρ(53) = 0.187, p = 0.181], indicating that there was no relationship between baseline tendency to anthropomorphize and the congruency effect. Similarly, and following up on the synchrony <sup>∗</sup> initiator interaction, there was no significant correlation between the synchrony effect (average synchronous RT – average nonsynchronous RT) and the IDAQ trait anthropomorphization for either the human or robot initiator group [human initiator: ρ(27) = 0.237, p = 0.234; robot initiator: ρ(26) = −0.025, p = 0.904].

We also looked into the correlations between the intention subscale of the MAS state anthropomorphization questionnaire and the joint Simon effect for the synchronous and nonsynchronous sessions, since this subscale seems to give an indication of ascription of agency to the robot ("QBo is capable of doing things on purpose"; "QBo is capable of planned action"; "QBo has goals"; "QBo has a good memory"). There was no significant correlation [synchronous: ρ(53) = 0.121, p = 0.389; non-synchronous: ρ(53) = −0.099, p = 0.480], indicating that there was no relationship between ascription of agency and congruency effect. The same held for the correlations between the subscale of the MAS and the overall RTs, indicating that the interaction reported above was not driven by (explicit) ascription of agency.

Finally, we looked into order effects, since the nature of the initial interaction with the robot may affect the perception of the robot in the later session: it may be sticky, so to speak. To this end, we performed the mixed ANOVA on mean RT as above, with synchrony and congruency as within-subjects factors, and in addition to initiator, also order (synchronousfirst vs. non-synchronous-first) as between-subjects factor. Order was involved in a significant interaction with congruency and synchrony [F(1,49) = 6.240, p = 0.016, η 2 <sup>p</sup> = 0.113], where both order groups showed a numerical decrease in joint Simon effect in the second session, which was rather pronounced in the group that had the non-synchronous session first (session 1: M = 20.33, SD = 26.9; session 2: M = 8.50, SD = 19.3), and rather negligible in the group that had the synchronous session first (session 1: M = 17.54, SD = 23.4; session 2: M = 12.80, SD = 21.5). However, a follow-up paired t-test comparing the session 1 and session 2 joint Simon effects for both of the order groups showed that the difference was non-significant in both cases [synchronous first: t(26) = 1.463, p = 0.156; non-synchronous first: t(25) = 2.048, p = 0.051], so that we are reluctant to interpret the interaction.

### Dictator Game

A mixed ANOVA on Dictator Game offer was performed, with one within-subjects factor (synchrony: synchronous vs. nonsynchronous) and one between-subjects factor (initiator: robot vs. human). There were no significant effects, all ps > 0.067. The correlation between IDAQ and the difference between the DG offers (synchronous – non-synchronous) was not significant [ρ(53) = −0.103, p = 0.462], indicating that there was no relationship between tendency to anthropomorphize and altruism toward the robot. To confirm that the offer did not differ as a result of our manipulation, we conducted a Bayesian mixed ANOVA. The results indicate weak support for the synchrony <sup>∗</sup> initiator interaction (BFinclusion = 1.187).

As with the joint Simon effect, we looked at the correlations between the intention subscale of the MAS state anthropomorphization questionnaire and the Dictator Game offer for the synchronous and non-synchronous sessions. There was no significant correlation [synchronous: ρ(53) = 0.265, p = 0.055; non-synchronous: ρ(53) = 0.223, p = 0.109], indicating that there was no relationship between ascription of agency and proportion offered in the Dictator Game.

The Dictator Game offer thus did not vary as a function of our manipulation. It did, however, correlate positively with anthropomorphization of the robot: the offer on the synchronous session correlated positively with both state anthropomorphization questionnaires [MAS: ρ(53) = 0.361, p = 0.008; Torta: ρ(53) = 0.423, p = 0.002], and the offer on the non-synchronous session correlated positively with the Torta state anthropomorphization questionnaire [ρ(53) = 0.336, p = 0.014]. There were no correlations with the MAS intention subscale, suggesting that this relationship did not depend on perceived autonomy.

Finally, we looked into order effects by running the aforementioned mixed ANOVA with synchrony as withinsubjects factor, and initiator and order as between-subjects factors. Order was not a significant contributor, indicating that the Dictator Game offer was not affected by the order of the manipulation.

### Trait Anthropomorphization

fpsyg-09-02607 December 26, 2018 Time: 19:0 # 11

The IDAQ trait anthropomorphization questionnaire showed good internal consistency (αsynchronous = 0.724; αnon−synchronous = 0.760) in addition to the aforementioned good test-retest reliability.

An independent samples t-test was run to compare the two initiator groups on IDAQ trait anthropomorphization, to make sure there were no baseline differences between the groups. The result was non-significant [Mhuman = 41.0, SDhuman = 11.1; Mrobot = 41.5, SDrobot = 8.0; t(51) = 0.180, p = 0.858], meaning there were indeed no baseline differences in terms of tendency to anthropomorphize between the two initiator groups.

### State Anthropomorphization

The Torta state anthropomorphization questionnaire showed high internal consistency (αsynchronous = 0.787; αnon−synchronous = 0.892). A mixed ANOVA on the Torta state anthropomorphization questionnaire was performed, with one within-subjects factor (synchrony: synchronous vs. nonsynchronous) and one between-subjects factor (initiator: robot vs. human). There were no significant effects, meaning that state anthropomorphization as measured by the Torta questionnaire did not differ as a result of our manipulation.

To confirm that Torta state anthropomorphization did not vary as a function of our manipulation, we ran a Bayesian mixed ANOVA with the same factors as above. The inclusion Bayes Factors were very low (all below 0.508), indicating that the null model was the best explanation of the data, and confirming that this measure did not vary as a function of our manipulation.

Here too we looked into order effects by running the above mixed ANOVA with synchrony as within-subjects factor, and initiator and order as between-subjects factors. The absence of significant effects of order indicated that the order of the manipulation did not affect anthropomorphization of the robot as measured by the Torta state anthropomorphization questionnaire.

The MAS state anthropomorphization questionnaire showed high internal consistency (αsynchronous = 0.801; αnon−synchronous = 0.824). A mixed ANOVA on the MAS state anthropomorphization questionnaire was performed, with one within-subjects factor (synchrony: synchronous vs. nonsynchronous) and one between-subjects factor (initiator: robot vs. human). This too yielded no significant effects, indicating that state anthropomorphization as measured by the MAS did not differ as a result of our manipulation.

To confirm that the MAS state anthropomorphization questionnaire did not vary as a function of our manipulation, we ran a Bayesian mixed ANOVA with the same factors as above. The inclusion Bayes Factors were very low (all below 0.594) in this case as well, indicating that the null model was the best explanation of the data. Neither of the state anthropomorphization questionnaires thus showed an effect of the manipulation.

Order effects were examined by running a mixed ANOVA on the MAS state anthropomorphization questionnaire scores with synchrony as within-subjects factor, and initiator and order as between-subjects factors. There was a significant synchrony <sup>∗</sup> order interaction [F(1,49) = 16.967, p < 0.001, η 2 <sup>p</sup> = 0.257]. For both order groups, the robot was numerically anthropomorphized less in the second session. This seemed to be more pronounced in the non-synchronous-first group (session 1: M = 28.73, SD = 6.2; session 2: M = 24.50, SD = 7.0) compared to the synchronous-first group (session 1: M = 29.70, SD: 10.2; session 2: 28.11, SD = 11.6). A follow-up t-test indicated that this was significant in the non-synchronous-first group [t(25) = 3.784, p = 0.001], while it failed to reach significance in the synchronousfirst group [t(26) = 1.736, p = 0.094]. Hence, while the previous questionnaire was not affected by the order of the manipulation, the MAS state anthropomorphization questionnaire was. This may be explained by a difference in the two questionnaires; while they have largely overlapping items, the MAS state anthropomorphization questionnaire has some additional items that are not captured by the Torta state anthropomorphization questionnaire ("QBo is capable of planned actions"; "QBo has a good memory," "QBo can engage in a great deal of thought," and "QBo has goals"), which might be interpreted as related to rational cognitive function.

A correlation analysis was performed with IDAQ trait anthropomorphization and each of the state anthropomorphization scores (synchronous and nonsynchronous scores for each questionnaire separately). The IDAQ was positively correlated to each [Tortasynchronous: ρ(53) = 0.595, p < 0.001; Tortanon−synchronous: ρ(53) = 0.584, p < 0.001; MASsynchronous: ρ(53) = 0.373, p = 0.006; MASnon−synchronous: ρ(53) = 0.392, p = 0.004], meaning that the higher the tendency to anthropomorphize, the higher the actual anthropomorphization of the robot in the experiment.

### DISCUSSION

This study investigated whether anthropomorphization of a robot could be influenced by moving with it either synchronously or non-synchronously, and whether this would be affected by who initiated the movements. We pitted two hypotheses against each other: the feature-overlap hypothesis and the autonomy hypothesis. The former predicted that the robot would be anthropomorphized more following synchronous movement while the latter predicted the robot would be anthropomorphized more following unpredictable movement, i.e., non-synchronous when the human initiated the movements, or either synchrony condition when the robot initiated the movements.

In the joint Simon task, we replicated the joint Simon effect with a robotic co-actor, concurrent with previous studies (Stenzel et al., 2012; Stenzel et al., 2013; Wen and Hsieh, 2015; Bunlon et al., 2018). Contrary to expectations, the size of the joint Simon effect was not affected by our manipulation. The manipulation did, however, affect RTs overall: for the group in which the human initiated the movements, the RTs were larger when the robot synchronized with the human than when

the robot did not synchronize with the human. Conversely, for the group in which the robot initiated the movements, the RTs were larger when the human was instructed not to synchronize with the robot compared to when the human was told to synchronize with it. This pattern of results fits neither of the advanced hypotheses. The autonomy hypothesis would have predicted the opposite pattern in the group in which the human initiated the movements, and additionally there should not have been a difference in the group in which the robot initiated the movements—which there is. Additionally, there was no relationship between the questionnaire items assessing ascription of agency and the joint Simon effect, nor with overall RT, which leaves the autonomy hypothesis with even less support.

The feature overlap hypothesis would have predicted the increase in RT in the synchronous condition when the human initiated the movements, but would have also predicted to find this in the group in which the robot initiated the movements. It thus seems that neither hypothesis is sufficient to explain the results. Perhaps they can be explained by a difference in difficulty between the manipulations: In the human initiator condition, participants could safely ignore the behavior of the robot, which also did not overlap with their own action or action planning. In the robot initiator condition, however, they had to take the behavior of the robot into account, and it makes sense to assume that this required less cognitive effort in the synchrony as compared to the non-synchronous condition, where the behavior had to be mentally "inverted" to specify one's own action plan. This may have made the non-synchronous condition cognitively incompatible, which is known to impair action planning and response selection (Proctor and Vu, 2006). The potential asymmetry in the manipulation in terms of difficulty is therefore an unforeseen shortcoming of the current design.

The Dictator Game offer seemed entirely unaffected by our manipulation. Like the joint Simon task, there was no relationship with ascription of agency either. There was, however, a correlation between anthropomorphization of the robot and the size of the offer: the more the participant anthropomorphized the robot, the larger the proportion of the stake the participant offered. While this is consistent with previous findings suggesting a connection between trust and anthropomorphization (Hancock et al., 2011), it does not suggest a moderating role of synchrony. It may shed some light on inconsistent findings reported by Müller et al. (2014). After watching either a fragment of Pinocchio or a Dutch romantic comedy in one study, and after watching a fragment of Pinocchio or a documentary in which a wooden puppet is made in another study, participants were asked to choose a seat in a row of chairs with a wooden doll on the one end and a backpack (implying a human) on the other end, and were then asked to distribute seven lottery tickets worth €5 each between the human and a wooden puppet. In the second study, participants then also filled in a few questions about their perception of Pinocchio. In both studies, they found that participants sat closer to the wooden doll following the Pinocchio fragment compared to the other fragment. Additionally, they found an effect of movie fragment upon distribution of money in the former (with seating distance as covariate), but not in the latter study. Finally, they report negative correlations between seating distance and ascription of intentionality and will to the wooden doll, indicating that the more the participant perceived the doll as having an own will and intentionality, the closer they decided to sit to it.

To link these findings to the current study, two things are to be noted. (i) The studies differ in that in the former, only those in the Pinocchio fragment condition are exposed to a wooden doll prior to selecting the chairs, whereas in the latter study, both groups of participants are exposed to a wooden doll. (ii) The negative correlations reported pertain to the whole sample, thus not only to those in the condition in which the wooden doll might be expected to be perceived more human-like (i.e., the Pinocchio fragment). Linking our findings to (i), in our study, all participants were exposed to the robot in both sessions, rendering our design analogous to the second experiment. One explanation of our nullfindings based on synchronization condition thus is that exposure to the robot is all that determines altruism toward it. Since exposure is equal, no difference is to be expected. Linking our findings to (ii), the Müller et al. (2014) study leaves open the possibility that anthropomorphization of the wooden doll affects the amount of money allocated to it: they report that higher ascription of agency relates to closer seating next to the doll; and since closer seating next to the doll is taken as covariate in analyzing the allocation of money, possible variation due to anthropomorphization of the doll is taken out. Hence, their findings may be taken together with ours to suggest that mere exposure as well as baseline tendency to anthropomorphize affect altruism toward inanimate objects, so that differences in altruism can be found when comparing differential exposure to and/or differences in levels of anthropomorphization of the inanimate object.

Our manipulation had no effect on explicit anthropomorphization of the robot, as indicated by a lack of difference on the state anthropomorphization measures (Kozak et al., 2006; Torta et al., 2013). We did, however, find that a higher tendency to anthropomorphize (as measured by the IDAQ, Waytz et al., 2010) translated into more actual anthropomorphization of the robot, lending credibility to both measures. Additionally, we have found all questionnaires to have good internal consistency, and have found that the trait anthropomorphization questionnaire showed good test-retest reliability.

Interestingly, there was some indication that the order in which the synchronization manipulations were experienced affected anthropomorphization of the robot. Although this was not the case for all measures, the joint Simon effect and the MAS state anthropomorphization questionnaires showed an order effect that followed a similar pattern: anthropomorphization was reduced in the second as compared to the first session, and this was particularly so for the group that had the non-synchronous session first. We may draw two tentative conclusions from this: that more exposure to the robot does not lead to more anthropomorphization and that having had a non-synchronous interaction before a synchronous interaction leads to a stronger reduction of anthropomorphization in the latter.

Where do the results leave the two possible hypotheses? Unfortunately, it seems that either our manipulation was not ideal for testing the hypotheses, or that neither of the mechanisms has an effect: most measures showed similar results across conditions. There was an effect of our manipulation on RTs in the joint Simon task, but this may have been due to a difference in difficulty of the manipulation. The agency-related items of one of the questionnaires did not relate to this effect, leaving that hypothesis with less support—at least at the level of selfreport. However, since we have not used a similar measure of self-reported self-other overlap—a shortcoming of the current design—we cannot make any similar claims about the featureoverlap hypothesis. Other possible reasons for the lack of an effect include sample size, the distinct non-human appearance of the robot, and more interestingly: the motion patterns of the robot. The movement of the robot, though superficially mimicking human motion, has a monotonic speed, whereas human (and other biological) motion does not. On the one hand, previous findings do not suggest that monotonic speed as such stands in the way of social interactions with robots: for instance, van den Brule et al. (2014) found no impact of motion style on the trustworthiness of robotic agents. On the other hand, however, our synchrony manipulation might have increased the salience of the non-biological nature of the robotic movements, which in turn might have emphasized the perceived dissimilarity between the human and the robot. Future studies might overcome this possible obstacle by using humanoid robots programmed to move in a more biologically plausible way. For the time being, however, our findings do not

### REFERENCES


point to a strong role of behavioral synchrony in human-robot interaction.

### DATA AVAILABILITY STATEMENT

Datasets are available on request.

### AUTHOR CONTRIBUTIONS

All authors developed the study concept, contributed to the study design, and provided critical revisions to and approved the final version of the manuscript. SH programmed the tasks, collected and analyzed the data, and drafted the manuscript. RdK programmed the robot and motion tracker.

### FUNDING

This work was supported by an Advanced Grant of the European Research Council (ERC-2015-AdG-694722) to BH.

### SUPPLEMENTARY MATERIAL

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fpsyg. 2018.02607/full#supplementary-material



partners: co-representation of robotic actions. J. Exp. Psychol. Hum. Percept. Perform. 38, 1073–1077. doi: 10.1037/a0029493


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2019 Heijnen, de Kleijn and Hommel. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Development of Infant Reaching Strategies to Tactile Targets on the Face

Lisa K. Chinn<sup>1</sup> , Claire F. Noonan<sup>1</sup> , Matej Hoffmann<sup>2</sup> and Jeffrey J. Lockman<sup>1</sup> \*

<sup>1</sup> Department of Psychology, Tulane University, New Orleans, LA, United States, <sup>2</sup> Department of Cybernetics, Faculty of Electrical Engineering, Czech Technical University in Prague, Prague, Czechia

Infant development of reaching to tactile targets on the skin has been studied little, despite its daily use during adaptive behaviors such as removing foreign stimuli or scratching an itch. We longitudinally examined the development of infant reaching strategies (from just under 2 to 11 months) approximately every other week with a vibrotactile stimulus applied to eight different locations on the face (left/right/center temple, left/right ear, left/right mouth corners, and chin). Successful reaching for the stimulus uses tactile input and proprioception to localize the target and move the hand to it. We studied the developmental progression of reaching and grasping strategies. As infants became older the likelihood of using the hand to reach to the target – versus touching the target with another body part or surface such as the upper arm or chair – increased. For trials where infants reached to the target with the hand, infants also refined their hand postures with age. As infants became older, they made fewer contacts with a closed fist or the dorsal part of the hand and more touches/grasps with the fingers or palm. Results suggest that during the first year infants become able to act more precisely on tactile targets on the face.

Keywords: reaching, tactile localization, prehension, motor development, multisensory coordination, hand-tomouth coordination

### INTRODUCTION

The ability to act on one's own body by reaching to specific locations on the body is critical for many tasks of daily living. Although most individuals reach to body locations automatically and with apparent ease, this act involves a coordinated set of perceptual and motor skills. Reaching to a stimulus on the body uses perceptual inputs including touch, proprioception, and sometimes vision to localize a stimulus and to guide a motor action to that location (Longo et al., 2010; Heed et al., 2015). Even though reaching to the body is performed habitually, most reaching studies to date have focused on extending the hand to objects in external peripersonal space. In contrast, little research has addressed reaching to targets on the body or how this ability develops. Here, we longitudinally examine the motor strategies that infants use across the first year as they reach to and grasp a vibrating target placed at different locations on the face.

### Reaching to External Space Versus the Body

Previous work on reaching during infancy has mainly involved the presentation of objects in peripersonal space, external to the body (e.g., Morange and Bloch, 1996). For example, infants from

#### Edited by:

Claudia Gianelli, Universität Potsdam, Germany

#### Reviewed by:

Verónica C. Ramenzoni, Consejo Nacional de Investigaciones Científicas y Técnicas (CONICET), Argentina Elias Manjarrez, Benemérita Universidad Autónoma de Puebla, Mexico

#### \*Correspondence:

Jeffrey J. Lockman lockman@tulane.edu

#### Specialty section:

This article was submitted to Cognition, a section of the journal Frontiers in Psychology

Received: 31 August 2018 Accepted: 04 January 2019 Published: 21 January 2019

#### Citation:

Chinn LK, Noonan CF, Hoffmann M and Lockman JJ (2019) Development of Infant Reaching Strategies to Tactile Targets on the Face. Front. Psychol. 10:9. doi: 10.3389/fpsyg.2019.00009

9 to 20 weeks can first reach to objects placed in front of the ipsilateral shoulder (100% at 9 weeks), then to the midline (33% at 9 weeks and 93% at 17 weeks), and then to contralateral objects (0% at 9 weeks to 71% at 17 weeks; Provine and Westerman, 1979). By 18–20 weeks all infants studied by Provine and Westerman (1979) made contralateral reaches. Studies have also shown that reaching to an object in external space becomes faster, more efficient, and more direct during the first year (Thelen et al., 1993; Berthier and Keen, 2006; Rönnqvist and Domellöf, 2006; D'Souza et al., 2017; Corbetta et al., 2018). Furthermore, reaching is not limited to stimuli that are perceived visually. Infants are also capable of reaching to auditory targets in external space (Clifton et al., 1991).

In contrast to reaching to targets in external space, much less is known about the development of reaching to tactile targets on the body. How is reaching to tactile stimulation on the skin accomplished? Neural research has shown that somatosensory (tactile and proprioceptive) stimulation leads to activations in the somatosensory cortex of the brain, which has been referred to as the "sensory homunculus" (Penfield and Boldrey, 1937). Infants evidence at least a rudimentary somatotopy in these brain regions. By 2 months hand, foot, and lip stimulation leads to different locations of peak somatosensory-evoked potentials recorded with EEG (Saby et al., 2015; Meltzoff et al., 2018). However, such activation per se does not mean that the infant localizes the stimulus in the sense that she can reach to it. For that, the stimulation needs to be associated with other sensorimotor laws or contingencies (O'Regan, 2011; Hoffmann et al., 2017).

Reaching to tactile stimuli may initially also be reflexive and controlled in part by spinal or subcortical circuitry. A wiping/scratch reflex has been demonstrated in frogs (Fukson et al., 1980; Berkinblit et al., 1986) and cats (Tapia et al., 2013). Although the existence of such a reflex is debated in humans (MacKay-Lyons, 2002), we cannot exclude the possibility that early reaches to the face – the mouth region in particular – may be brought about by similar mechanisms. However, even if this were the case early in infancy, we would expect these behaviors to become progressively more complex and voluntary over time.

### Processes of Tactile Localization

To localize a tactile stimulus placed on the skin, somatosensory information is "remapped" to an external reference frame (such as body-centered or gaze-centered) in order for a person to reach to the target (e.g., Medina and Coslett, 2010; Heed et al., 2015). The distinctiveness of these skin-based and external representations of the stimulus location can be demonstrated in crossed-limb paradigms where, for instance, the anatomically left hand is located in the right side of external space. Such conflicts are often examined using the temporal order judgment task (Heed and Azañón, 2014) in which adults are slower at identifying the order of touches when the hands are crossed versus uncrossed. Furthermore, in the first half year, a developmental progression occurs in the response to tactile stimuli in crossed feet postures, suggesting that by 6 months infants are beginning to code the position of the crossed feet with respect to external space (Begum Ali et al., 2015). Neural responses associated with limb mapping in external space continue to develop between 6 and 10 months (Rigato et al., 2014).

The previous lines of research associated with tactile perception and body location have mainly focused on behavioral or neural responses that do not involve direct reaching to tactile targets on the body surface. Less is known, however, about the functional ability to reach to target locations on the body and how this sensorimotor skill becomes refined throughout infancy. One sensorimotor ability that may provide a foundation for reaching to some tactile targets, particularly on the face, is the hand-tomouth transport system (Lew and Butterworth, 1997). Research suggests that hand-mouth coordination is already evident to some degree in the prenatal period, but becomes more skilled and direct in the months immediately following birth (Rochat et al., 1988; Rochat, 1989; Lew and Butterworth, 1997).

### Reaching and Grasping

Being able to transport the hand to the mouth or more generally, contact a stimulus on the face, is only one element of the reaching act. Successful reaching to a stimulus, whether it is located on the body or in external space, typically involves the coordination of at least two different action systems: reaching and grasping (Jeannerod, 1996). Effective reaching requires individuals not only to extend their hands to the location of a stimulus, but open and orient the hand to prepare to grasp the stimulus. Developmentally, research indicates that the reaching system comes online before the grasping system (Piaget, 1952; Bruner, 1973), reflecting a proximodistal sequence in the development of prehension (Lockman and Ashmead, 1983). In particular, before 4 months, infants develop the ability to extend their hand to the location of an object (Piaget, 1952; Bruner and Koslowski, 1972), but during this period the hand is often fisted when it contacts the object. By 4 months, however, infants begin to open the hand in advance of contacting the object. Likewise, with regard to self-touch, closed hand contacts prevail in the first 2 or 3 months, and open hand contacts begin to increase in frequency between 3 and 5 months (Thomas et al., 2015). Nevertheless, it is important to note that with respect to the goal of the reaching act, research on self-touch where infants spontaneously contact a part of their body with their hand might not be directly comparable to research on reaching, where infants are presented a discrete stimulus to reach to in external space.

### The Current Study

In the present work, we consider the problem of reaching to discrete tactile stimuli on the face. We conducted a longitudinal study during the first year in which we placed vibrating targets, one at a time, at different locations on the infant's face. Because the targets were not accessible to vision, infants had to execute reaches on the basis of tactile and associated proprioceptive information.

In this work, we addressed two main issues. One centered on the different effector systems available to infants for reaching to stimuli on the body and whether infants privilege different effector systems to contact different areas of the face. Specifically, we asked when does the manual effector system become the dominant mode for contacting stimuli on the face. In principle,

other movable parts of the body can be used to contact face stimuli. The tongue has the potential to touch external stimuli located near the mouth. Both the head and shoulder can move to establish contact with stimuli on the lower side of the face or the ears. The manual effector system, however, might embody a more effective means for reaching to face stimuli because of the extent to which the arms can move and the precision afforded by fingers that can grasp. To explore these ideas, we asked to what extent infants recruit other parts of the body (e.g., tongue, head, and shoulders) to contact stimuli on the face. If infants, especially at younger ages, touch targets on the face with effectors other than the arms and hands, this would suggest an early awareness at some level of the affordances of the body for reaching to other parts of the body.

The second set of issues that we focused on centered on the manual effector system alone. Specifically, we asked how does grasping become adapted for reaching to targets on the face. We describe how infants' hand postures when contacting tactile targets and grips on the targets on the face vary with age. We expected that closed fist contacts would decrease with age, while open handed contacts and grips would increase with age. This prediction would be consistent with the idea that infants' reaching to the face is becoming more skilled and that infants were attempting to grasp these stimuli, which were not a permanent part of the body. Additionally, we were interested in the possibility that different locations on the face might call forth different hand postures depending upon ease or comfort. Modulation of grip strategies or hand postures based on the location of the target might suggest that infants adjust hand posture according to the demands associated with carrying out the reach.

### MATERIALS AND METHODS

### Participants

A total of 24 infants (10 female; starting age just under 2 to 6 months) were recruited from local daycares, the psychology department of the University, and family-oriented events in the greater New Orleans area. The racial/ethnic backgrounds of participants were Caucasian (N = 16), Black/African American (N = 3), more than one race (N = 3), American Indian (N = 1), and Asian (N = 1). Three infants did not complete every study visit (one family moved, one had schedule conflicts arise, one dropped out without providing a reason). These infants are included in the data analyses, which were able to accommodate missing data.

### Materials

During the task, gently vibrating targets were fixed to eight locations on infants' faces/heads one at a time using double-sided skin-safe tape. The target was a disk shape approximately 1.25 cm in diameter, 0.75 cm in height, and 3.5 grams in weight. Inside the target was a flat coin 3-volt DC 70 mA 12000 RPM micro motor that provided vibration similar to vibrating teething rings or mobile phones. The stimuli were coated with black liquid tape to provide a soft and smooth texture. Each testing session was recorded with two mounted video cameras. The experimenter also recorded target location and target contact success on paper, but coding of data analyzed here was done entirely from the videos.

### Procedure and Design

Parents of all subjects provided written informed consent, in which they consented to participating in the study and having the sessions videotaped. They could also choose whether or not to allow images/videos from testing to be used in presentations and written products. The research was approved by the Tulane University Institutional Review Board (reference number 153903) and was consistent with the Declaration of Helsinki. Families were invited to come in for the study every second week until the infant was able to reach to all eight target locations in one visit. Adherence to a schedule with two visits per month was not always possible due to parent schedules or illness (average time between visits = 21.7 days; see **Figure 1**). The target locations were the left/right corner of the mouth, below the left/right earlobe, on the center of the chin, on the center temple (i.e., forehead), and on the left/right temples (see **Figure 2**). The order of trials was randomized. When the experimenter applied the lateralized targets, the opposite side of the infant's face was touched simultaneously at the corresponding location so as not to draw attention to one side of the face over another. For the midline targets, there was no opposite side so the target was placed at midline with no other touch to the face. During each visit, each target was left on the infant's face until the infant removed it or for approximately 30 s, whichever came first.

Videos of each testing session were coded for factors of target location (chin, left/right mouth, below the left/right ear, and left/right/center temple), whether or not the infants successfully contacted each target (yes, no), how the target was first contacted (left/right hand, left/right arm, head-to-torso, tongue, and head-to-chair), and hand posture when they grasped or contacted the target (closed fist, dorsal hand, palm, finger

touch, pincer grasp, and four-finger opposing thumb grasp). When first contact was coded as "head-to-chair," infants turned the head toward the chair and rubbed the target against the chair. Contacts were not coded as "head-to-chair" if the infant was moving the head in a seemingly random fashion both before and after target placement and then appeared to accidentally graze the chair with the target. In the hand posture coding, the dorsal hand code included only the area on the back of the hand between the wrist and the knuckles; contact with the back of the fingers was coded as finger touch. The palm code included the area of the palm between the wrist and the base of the fingers. Grasps were coded as pincer grips when the index finger and thumb grasped the target.

Here we focus on the development of manual strategies used for successful reaches. In order to accommodate binary outcome variables (e.g., whether the target was first contacted with the hand or not), missing data, and data clustered within each subject, generalized estimating equations (GEEs) were used (See Hardin and Hilbe, 2012). A binomial distribution, a logit link function, and an exchangeable correlation matrix were used. GEEs allow significance testing, while also providing predicted average responses. For example, when a binomial GEE reveals a significant age effect on performance for a measure with a 0/1 scale, it also produces a curve showing predicted average probability of scoring a 1 on the 0/1 scale as age increases.

### RESULTS

### Preliminary Analyses

A primary coder coded 100 percent of the data, and a secondary coder coded an overlapping 20 percent. Inter-rater reliability was achieved for all categorical variables analyzed (mean Cohen's k = 0.87, range = 0.71–1.00). Preliminary analyses found no significant effect of sex or laterality (left versus right target placement and left versus right hand use) on reaching success, so these variables were excluded from further analyses. Further,

the age that an infant started the study was not significantly correlated with the age that the infant graduated from the study, indicating that enrolling at a younger age did not result in learning the task earlier.

### How Do Infants Contact Targets?

The first set of analyses looked at whether infants chose to use the hand (versus another body part or object) to make contact with the target. Specifically, the first GEE analysis examined the effects of age and target location and an Age x Target Location interaction on hand versus non-hand contact with the target. For 779 out of 1763 total trials (44.19%) infants successfully contacted the target, either using the hand or something else, such as the arm, head-to-torso, tongue, or head-to-chair. Out of 770 of trials where the target contact strategy was visible in our recordings, 599 (77.79%) of these initial contacts were made with the hand (Nine trials that were recorded as successful reaches by the experimenter on paper had a video recording error and are excluded here). As infants became older the likelihood of first contacting the targets directly with the hand versus some other body part or external surface increased significantly (Wald x<sup>2</sup> <sup>1</sup> = 41.46, p < 0.001; **Figure 3**). Further, the GEEpredicted likelihood of hand contact varied by location (Wald x 2 <sup>4</sup> = 10.17, p < 0.05; **Figure 4**). GEE predicted the highest percentage of hand contact for the center temple (92%), followed by the mouth (86%), chin (82%), lateral temples (66%), and ears (63%). The Age x Target Location interaction was not statistically significant.

Although the numbers of each type of non-hand contact were too low to analyze statistically (**Figure 4**), the non-hand strategies used seemed to vary based on the location of the target on the

body. Most trials where the head moved to rub the target on the chair involved trials in which the target was placed at the ears or lateral temples and were thus closest to the chair. The head and torso (shoulders or upper chest) came together most often for targets that were placed at the chin or ears – the locations most accessible to the torso. Finally, only mouth and chin targets could be contacted by the tongue, given anatomical constraints of the body.

### Hand Posture and Grips

Next we looked at how hand posture changed with age for trials where infants achieved target contact with the hand. Specifically, we considered the effects of age and target location on whether or not the hand was fisted when it contacted the target, whether the dorsal part of the hand contacted the target, whether the palm/fingers contacted the target, and whether infants used the finger(s) and opposing thumb to grasp the target.

### Fisted Target Contacts

The first analysis looked at whether infants became less likely to use a closed hand posture, specifically a closed fist, to contact targets as they became older. Out of 599 trials where initial target contact was made with the hand, 96 contacts (16%) were made with a closed fist. A GEE testing the effects of age, target location, and the Age x Target location interaction revealed that infants became significantly less likely to contact targets with a closed fist as they became older (Wald x<sup>2</sup> <sup>1</sup> = 28.69, p < 0.001; **Figure 5**). The Age x Target location interaction was statistically significant (Wald x<sup>2</sup> <sup>4</sup> = 10.26, p < 0.05). However, this interaction was difficult to interpret because it largely stemmed from the center temple location, where there were only six fisted target contacts.

### Dorsal Target Contacts

The next analysis looked at whether infants became less likely to use the dorsum of the hand to contact targets as they became older, suggesting that infants were attempting instead to touch or grasp the target with the fingers and/or palm. Out of 599 trials

FIGURE 5 | The effect of age on fisted versus non-fisted contact with the targets. The solid line represents the GEE-predicted probability of fisted hand contact with the targets, and the open circles represent the raw data (fisted

where the hand contacted the target, 138 contacts (23%) were with the dorsal part of the hand. A main effect of age showed significantly less dorsal contact as infants became older (Wald x 2 <sup>1</sup> = 15.03, p < 0.001). This main effect, however, was qualified by a significant Age x Target Location interaction on dorsal hand contact (Wald x<sup>2</sup> <sup>4</sup> = 10.08, p < 0.05; **Figure 6**). Follow up tests showed that dorsal hand contact became significantly less likely at the lateral temples, center temple, ears, and chin (ps < 0.01–0.001) as infants became older. For the mouth, age did not significantly affect whether infants used the dorsal part of the hand to contact the target, suggesting that infants may have been attempting another goal with targets located at the mouth.

### Palm and Fingers

hand contact or non-fisted hand contact).

In this section, we look at whether palm and finger (ventral or dorsal) contacts versus other forms of contact increase with age. Out of 599 trials where initial target contact was made with the hand 365 (60.93%) contacts used the palm or fingers, and 234 (39.07%) did not use the palm or fingers. As infants became older they were significantly more likely to make contact with the fingers or palm (Wald x<sup>2</sup> <sup>1</sup> = 40.44, p < 0.001; **Figure 7**). The main effect of target location and the Age x Target Location interaction were not statistically significant.

### Opposing Thumb Grasps

Next, we looked at whether infants became more likely with age to grasp the target with the opposing thumb and finger(s). Only 63/599 trials (10.52%) involved an opposing thumb grasp, indicating that this strategy was not particularly common in the age range under study. We divided opposing thumb grasps into two different types that we saw infants in this study use - pincer grips (37 trials) and grips with all four fingers opposing the thumb

non-palmar/finger contact with the targets. The solid line represents the GEE-predicted probability of palmar or finger hand contact with the targets, and the open circles represent the raw data (palmar/finger target contact or non-palmar/finger target contact).

(26 trials). Because there were a limited number of grasp trials, we could only analyze the effect of age in GEE and not the effects of target location and the Age x Target Location interaction. A GEE testing the effect of age on whether the infants used a pincer grip to contact the target showed that pincer grips became more common with age (Wald x<sup>2</sup> <sup>1</sup> = 24.85, p < 0.001). Four finger grips also increased with age (Wald x<sup>2</sup> <sup>1</sup> = 7.52, p < 0.01).

### DISCUSSION

The ability to reach to a source of stimulation on the face is highly adaptive but little studied. Infants reach to stimulation on the face to scratch an itch, but also to remove foreign, potentially dangerous stimuli. Although recent work has shown that the ability to contact vibrotactile targets on the body improves during the first and second years of life (Chinn et al., 2017), this previous work focused on whether infants were able to reach to targets, without examining specific motor strategies through which they do so. Little is known about the motor strategies that infants use to reach to the face. To address this question, we conducted a longitudinal study over the first year in which vibrotactile targets were placed one at a time at different locations on the face. Because the locations of these targets were not accessible to vision, infants had to rely on tactile and proprioceptive information to localize and reach to these targets.

In this study, we found that when a vibrotactile target is applied to the face, infants are more likely to reach to the target with the hand rather than using other effectors or strategies (e.g., rubbing the target on the chair) and that hand versus non-hand use increases with age. They also become more likely to use the palmar surface or fingers of the hand than the dorsum, and they grasp the targets more as they become older. We consider these findings in more detail below.

### Motor Strategies for Target Contact

A primary goal of the current study was to look at the motor strategies that infants used to contact tactile targets on the face. Most studies on reaching to objects in external space focus on the arm and hand. The predominance of the hand and arm in the reaching literature makes sense given that other parts of the body are not configured to grasp targets as well as the hand. However, it is possible to use other body parts or external surfaces to contact a target location on the body.

Here we addressed whether infants contact stimuli on the face with external objects or body parts other than the hand and if these strategies change with age. We found that the hand was used for most reaches throughout the age range under study, but nevertheless and as hypothesized, its use relative to nonhand contact options increased with age. This result suggests that across the first year infants are becoming more likely to reach with the effector best at grasping.

Although the percentages of each type of non-hand contact were low (**Figure 4**), the non-hand strategies used seemed to vary based on the location of the target and the anatomy of the body. Most trials where infants turned their heads to rub the target on the chair were trials in which the target was placed to the side of the face (ears or lateral temples) and was, therefore, closest to the chair. The head and torso (shoulders/upper chest) came together most often for targets that were placed at the chin or ears –

the locations most accessible to the torso. Only mouth and chin targets were contacted by the tongue, which makes sense given anatomical constraints. For tongue contacts of the mouth and chin targets, however, we cannot entirely rule out the possibility that the rooting reflex, although very weak during the age range under study, contributed to this response.

A possible future research direction would be to study localization strategies in infants or children with disorders affecting sensory processing and/or motor skills. For example, children with autism spectrum disorder are sometimes less responsive to tactile stimulation versus neurotypical controls (Tomchek and Dunn, 2007). It is possible that differences in performance on this task for children with known sensory or motor deficits could help our understanding of the processes involved in reaching to vibrotactile targets on the face.

### Reaching to the Body and Arm/Hand Movements

Another main goal of the current study was to look at hand postures and grasping strategies used for target contacts made with the hand. We found that – as predicted – across the first year, infants became less likely to contact targets with a closed fist and with the dorsal part of the hand. Conversely, they became more likely to use the palm or fingers to contact targets, versus the dorsal hand or a closed fist. They also became more likely to grasp the targets with the finger(s) and opposing thumb with age, although this strategy did not predominate by the end of the age range under study. It is known that the pincer grasp for reaching to external objects is beginning to emerge near the end of the first year, consistent with the results of this study on reaching to targets on the face.

More generally, the developmental patterns we saw for hand postures during body reaches are similar to developmental changes in hand posture during reaches to objects in external space (Piaget, 1952; Bruner, 1973). During the first half-year, reaching motions with the arm develop before the ability to open the hand and then grasp an object in external space. These patterns of development also mirror the order of developmental changes in previous findings on hand position during self-touch by Thomas et al. (2015). They found that fist contacts were common in early infancy, followed by an increase in palmar hand contacts, and then followed by a decrease in grasps on the clothing or body parts during self-touch in the first half-year.

Although the order of changes in reaching posture in our study was similar to Thomas et al. (2015) (decreasing dorsal and closed fist, whilst palmar and grasps increased), our results were not identical. Infants in their study appeared to make grasping motions during self-touch at a younger age than the age at which infants in our study grasped discrete vibrotactile targets on the face. For example, at just 20–24 weeks (∼5–6 months), their infants were on average using grasping motions slightly over 15% of the time. At this age infants in our study were still grasping targets less than 10% of the time. One explanation is that because it is uncertain whether spontaneous self-touch is directed toward a specific location, the demands of planning and executing a reach are less than when reaching to a discrete target on the body. Future work could directly test self-touch, reaching to body targets, and reaching to external objects in the same infants to study whether they use different motor strategies during these different types of reaching.

In some instances, hand posture varied based on target location. Specifically, dorsal contacts of targets decreased with age for all locations except the mouth. One explanation is that motions directed toward the mouth may have been more defensive in nature than reaches to other locations on the face. If infants reacted to mouth targets by wanting to rapidly contact them, they may have been more likely to reach to them with a strategy that they were already familiar with (dorsal hand contact) than one that involved orienting and using the fingers for grasping. Future work may use motion tracking to compare the development of reaching speeds for different tactile target locations, in part to determine whether infants are reacting to some tactile stimuli defensively and trying to remove or brush them aside quickly. Kinematic motion tracking is able to measure details of arm movements such as spatial location, acceleration of movement, and velocity in the age range under study (e.g., Ouss et al., 2018). In our current paradigm, motion tracking markers or cables, however, would have interfered with reaching to our targets. In the future, a markerless technology might be used to overcome this challenge and examine reaching trajectories to tactile face targets.

### CONCLUSION

This study provides new information about the motor strategies infants use to contact stimuli on the face. Our results suggest that early in the first year the hand is already the preferred effector for contacting the face, and it predominates even more as infants become older. At the same time, when infants use non-hand motor strategies to contact face targets, these strategies appear to be based on the location of the targets. For example, targets on the sides of the face, such as near the ears, can be rubbed on the chair or shoulder but cannot be accessed by the tongue. Furthermore, when infants reach with the hand, motor strategies become better adapted for grasping as infants become older. We found that closed fist and dorsal contact decrease with age; palm and finger contact increase with age, and grasping increases with age. These findings on reaching to the face thus support Jeannerod's (1996) distinction between reaching and grasping as constituting separate but integrated systems that underlie prehension. Finally, the results reported here raise questions regarding the mechanism(s) that underlie developmental changes in effector use and hand posture strategy use when infants contact targets on their faces. One possibility is that these changes are driven in part by experience associated with reaching to the face. For instance, selection of the hand to reach to the face might be reinforced because the hand can manipulate and explore objects better than other effectors. Likewise, opening the hand during reaching, although in part driven by central nervous system and maturational changes, may also be influenced by experience. Although we cannot easily vary experiential input to infants, often because of ethical issues, studies that use modeling with artificial agents in which input is systematically varied might help to provide answers to these questions (Hoffmann et al., 2017).

### AUTHOR CONTRIBUTIONS

fpsyg-10-00009 January 21, 2019 Time: 12:25 # 8

LC, MH, JL, and CN conceptualized the questions addressed in this study. JL and CN designed the study procedure. LC and CN performed the data collection. LC, MH, and JL conceptualized the statistical models. LC performed the data analysis and made figures under the guidance of JL. LC, MH, and JL drafted the manuscript. CN provided revisions.

### REFERENCES


### FUNDING

This research was supported by the National Institutes of Health Award 5R01HD067581. MH was supported by the Czech Science Foundation under Project GA17-15697Y.

### ACKNOWLEDGMENTS

We would like to thank the families who participated in our research and the undergraduate research assistants who helped with behavioral data coding.



**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2019 Chinn, Noonan, Hoffmann and Lockman. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Body Representation in Children With Unilateral Cerebral Palsy

Arturo Nuara1,2 \*, Pamela Papangelo<sup>1</sup> , Pietro Avanzini<sup>1</sup> and Maddalena Fabbri-Destro<sup>1</sup>

<sup>1</sup> Consiglio Nazionale delle Ricerche, Istituto di Neuroscienze, Parma, Italy, <sup>2</sup> Dipartimento di Scienze Biomediche, Metaboliche e Neuroscienze, Università di Modena e Reggio Emilia, Modena, Italy

Drawings produced by children provide insights about their physical and psychological status. In children suffering from unilateral cerebral palsy (UCP), self-portraits constitute a unique opportunity to study whether and how their disease affects self-body representation. The aim of the present study is to evaluate self-body representation in UCP children, comparing it to the way they portray both healthy and hemiparetic peers. Ten UCP children were asked to perform 3 drawings: a self-portrait, a portrait of their best classmate, and finally a portrait of a hemiparetic peer who had joint them in a childto-child rehabilitation protocol. As controls, 16 typically developing children were asked to perform a self-portrait, and their best-classmate portrait. The asymmetry index (AI), consisting of the difference between the upper limbs length expressed as percentage of their average, resulted greater in UCP than in controls' self-portrait. More interestingly, UCP children portrayed themselves more asymmetrically relative to their classmates and hemiparetic peers. No difference in terms of AI was found between self- vs. classmateportrait in the control group. This study provides evidence that UCP affects body self-representation, but not body-representation in general. In fact, the asymmetry in upper limb representation observed in children with UCP does not constitute a mere picturing of the hemiparesis, but rather reflects the experienced status of functioning, that is valid only for one's own. The inclusion of portraits in pediatric neurorehabilitation programs might enable clinicians to collect additional evidence about the children self-perceived functioning, i.e., an information not easily obtainable in pediatric patients.

Keywords: childhood stroke, perinatal stroke survivors, self-body representation, self-portrait, body image, perinatal stroke

### INTRODUCTION

Children have been using drawings to express themselves since ancient times (Wittmann and Barber, 2013). The idea that spontaneous drawing of young children may reflect their physical, cognitive and affective status, led psychologists to exploit drawings as a useful tool for assessing child development, personality and emotional adaptation (Cooke, 1885; Goodenough, 1975; Matthews, 2003).

One of the most used methods to measure the level of development through drawing is the DAM test (Draw-a-man) (Goodenough, 1975), which is a projective test using portraits: drawing a person, a child "projects himself in all of the body meaning and attitudes that have come to be represented" (Machover, 1949).

#### Edited by:

Eszter Somogyi, University of Portsmouth, United Kingdom

#### Reviewed by:

Herbert Heuer, Leibniz Research Centre for Working Environment and Human Factors (IfADo), Germany Daniela Bulgarelli, University of Turin, Italy

> \*Correspondence: Arturo Nuara arturo.nuara@gmail.com

#### Specialty section:

This article was submitted to Cognition, a section of the journal Frontiers in Psychology

Received: 31 July 2018 Accepted: 05 February 2019 Published: 19 February 2019

#### Citation:

Nuara A, Papangelo P, Avanzini P and Fabbri-Destro M (2019) Body Representation in Children With Unilateral Cerebral Palsy. Front. Psychol. 10:354. doi: 10.3389/fpsyg.2019.00354

**121**

The body image, regarded as the conscious representation of the body parts and their relative position, involves both the subject's perceptual body experience with the body limits and conceptual understanding of the body in general (Gallagher, 2005). Parallel to the body image is the so-called body schema, i.e., the subconscious ideas about the shape and size of the body and the relationship of the parts of the body to each other. While both these aspects affect the human figure drawing, deficits specific for body schema or body image are very difficult to separate (de Vignemont, 2010). For this reason, several studies refer to overall disorders of body representation to collectively describe these concepts (Lampe et al., 2016).

Among neurological conditions, cerebral palsy (CP) is the one in which brain injury effects on body representation have been more extensively investigated by means of human figure drawing (see for example Lampe et al., 2016). Abercrombie and Tyson (1966) used the DAM test in order to investigate body representation in CP, finding frequent anthropometric deviations and lacking of body parts in a subset of drawings performed by hemiplegic children, probably reflecting children's projection of their own specific physical impairment. However, these observations were not translated in quantitative terms, nor authors required systematically a self-portrait.

The view that the representation of the "self " in the generic DAM test is not firmly established (Harris, 1963) led some authors to prefer the self-portrait as an elective pictorial tool aimed to investigate children's self-body representation. Indeed, Morin and coworkers have shown that the self-portrait may give access to imaginary and symbolic aspects of subjectivity in normal subjects (Morin and Bensalah, 1998), and to the subjective effects of alterations in body representation in patients with brain lesions (Morin, 1998; Morin et al., 2001). In this regard, Morin et al. (2003) collected 161 portraits performed by hemiplegic stroke patients. Interestingly, these authors reported in a subset of right brain injured patients a dissociation between self- and other-portraits: while drawing a "neglected" selfportrait, they spontaneously drew a complete image of others. These discrepancies persuaded the authors to embrace the idea that unilateral defects of portraits may selectively reflect the subjective alteration of the own body representation.

Asymmetrical self-portraits were not a constant feature in adult hemiplegic patients (Morin, 1993; Morin et al., 2003). This finding induced authors to support a brain-damage onsetdependent hypothesis, postulating that body representation (in particular its sensorimotor side, i.e., body schema) mostly forms in the early development (Lacan, 1966; Morin et al., 2003). Thus, the relative timing between the stroke onset and the development of body schema/image could be a key determinant for the presence of asymmetrical features in selfportraits. In this regard, an ideal model is represented by perinatal stroke survivors, whose injury certainly precedes the body schema/image instantiation. Within such population, it is possible to evaluate whether the motor impairment selectively impacts on self-body representation, rather than on body representation in general.

By enrolling a population of children suffering from UCP due to perinatal stroke showing isolated and unilateral motor deficit with prevalent upper limb involvement, we accounted for: (1) the influence of symbolic disturbances or neglect on self-portraying abilities, (2) the impact of motor impairment on the ability to perform a drawing, and (3) the "unawareness" of the impairment due to the hemiparesis onset posterior to body schema/image establishment processes. Using the test of the human figure, we asked children to draw a self-portrait, a portrait of a hemiparetic peer whom they joined in a child-to-child rehabilitation protocol, and a portrait of a healthy classmate. As controls, 18 ageand sex-matched typically developing children were asked to perform a self-portrait and a portrait of the best-classmate. We finally compared the drawings evaluating the asymmetry of representation of upper limbs, thus providing for the first time to our knowledge a quantitative index of self-portraits asymmetry.

In this study, we hypothesized that children with UCP present a larger asymmetry in self-portraits relative to other portraits, and also relative to self-portraits of typically developing children. In addition, the direct comparison between self-portraits and the hemiplegic peer-portraits should reveal whether this asymmetry is specific for self-representation, or vice versa whether it is associated to the "hemiplegic condition" representation.

### MATERIALS AND METHODS

The study was approved by the Local Ethical Committee (Comitato Etico Area Vasta Emilia Nord) and was conducted according to the Helsinki Declaration. Subjects belonging to the clinical group were recruited in cooperation with "Fight The Stroke" association<sup>1</sup> , in the framework of a broader clinical rehabilitative protocol involving children with cerebral palsy. The families of the controls were enrolled in the realm of another study conducted in our Center on primary school children. Written informed consent was obtained from parents of each child involved. Nineteen UCP children undergoing a child-to-child rehabilitative protocol (clinical group) and 18 typically developing children (control group) were enrolled in the study. The rehabilitative protocol in which children with UCP were involved was composed by 30 daily sessions based on child-to-child interaction, with each participant interacting with another hemiparetic child, performing specific hand exercises. The interacting couples of children remained the same throughout the whole program, thus facilitating a social relationship between them.

Inclusion criteria of the clinical group were: age between 5 and 10; confirmed diagnosis of UCP; evidence of ischemic mono-hemispheric damage at brain MRI; Upper limb Modified Ashworth Scale (MAS) sum score < 2; Total IQ ≥ 70. Exclusion criteria were: attentive or sensory impairments; seizures not controlled by therapy; previous orthopedic surgery or botulinum toxin A injection in the upper limb within 6 months prior to study entry. Eighteen age- and sex-matched typically developing children were selected as controls. Evaluation of UCP and controls was conducted during a single session, in a clinical setting, according to the following procedures.

<sup>1</sup>www.fightthestroke.org

During the clinical evaluation, the following data were collected in children with UCP: neurological complete examination (verifying also the absence of body representation disorders in body-part pointing and naming, awareness of spatial notions and left-right orientation), Global hand motor skills using Besta Scale Global Score [Besta GS Rosa-Rizzotto et al. (2014)], upper limb's spasticity by means of Modified Ashworth Scale (MAS) (Bohannon and Smith, 1987), hand manipulative pattern classification (HC) according to Ferrari et al. (Bassi and Ferrari, 2016) and total Intelligence Quotient (IQ) by WISC-IV battery (Wechsler, 2012). Then, visuospatial constructional ability and visual memory were evaluated with Rey-Osterrieth Complex Figure Test (ROFC) (Shin et al., 2006) administered both in copy and early recall conditions (the latter performed 10<sup>0</sup> after figure visualization).

All children were asked to seat comfortably on a heightadjusted chair placed in front of a table and were provided with a set of pencils and white sheets. Children with UCP were asked to perform 3 drawings in the following order: a self-portrait (SP), a portrait of the best classmate-friend (FP), and a portrait of the hemiparetic child who joint them in the child-to-child rehabilitation program (HP). Controls were asked to perform a self-portrait and a portrait of the best classmate. To ensure a spontaneous body representation, no specific indication was given to children.

From the initial set of drawings, 9 triads performed by UCP children and 2 dyads performed by controls were excluded due to the presence of non-anthropomorphic representations or nonmeasurable body parts. Drawings by 10 UCP children and 16 controls were finally considered for analyses. The length of each represented limb, measured as the inter-joint distance between the shoulder and the wrist, was measured. An asymmetry index (AI), consisting in the difference between the upper limbs length expressed as percentage of their average, was computed according to the following formula: AI = Left−Right Left+Right <sup>×</sup> <sup>2</sup> <sup>×</sup> 100. Giving an example: if we consider a portrait with a left and right arm length, respectively of 5 and 4 cm, the AI = | (5–4)/(5+4)| × 2 × 100 = 22.22%.

After verifying that the normality assumption was not met by AI data, a Kruskal Wallis H test was conducted in order to investigate between-groups differences in AI in portrait types. Within-group AI difference across portrait types has been investigated through a non-parametric repeated measures analysis of variance by ranks (Friedman test). Post hoc comparisons were conducted through non-parametric test (Wilcoxon), and effect size was computed by means of Eta squared and Kendall's W parameters for between- and withingroup analyses. Subsequently, we tested whether asymmetry was correlated to age and/or to clinical variables indexing motor and cognitive functioning. By means of Spearman (ranked) test, the correlation between the AI and Age, IQ, Besta GS and HC were tested. This set of regressors was chosen to test whether age, intelligence level or motor functioning could impact on the AI. Significance threshold was set at 5%.

### RESULTS

The demographic data, clinical features and brain imaging findings of children with UCP are shown in **Table 1**. The mean


M, male; F, female; AH, affected hand; ROCF, Rey-Ostereith Complex Figure Test; Besta GS, Besta global score; ∆BestaGS to peer, difference in Besta Global Score relative to peer (positive values indicate better hand functioning relative to peer); SP-AI, self portrait asymmetry index; HP-AI, hemiparetic portrait asymmetry index; FP-AI, healthy classmate portrait asymmetry index; n.a., not available. Although not part of the analysis, MRI findings have been added in the table in order to demonstrate the presence of a predominantly unilateral brain lesion in the enrolled subjects, as well as to enrich the clinical data of our sample.

age of the 10 analyzed subjects with UCP (7 males, 3 females) was 7.06 ± 1.90 years. Overall, they presented mild hemiparesis with a mild level of spasticity (total MAS = 1.95 ± 1.34), a prevalent upper limb involvement associated to a significant hand motor deficit (Besta GS = 0.48 ± 0.38). According to the HC, 2 subjects belonged to type I ("integrated hand"), 2 to type II ("semifunctional hand"), 3 to type III ("synergic hand"), 3 to type IV ("imprisoned hand"). Visuo-spatial abilities evaluated with ROCF test showed values within ± 2 z-score for both copy and recall conditions (mean z-score = −0.09, range [−2, +1.65], mean z-score = −0.44, range [−1.92, + 0.91], respectively), according to the Italian pediatric normative (Rey, 1968) (see **Table 1** for individual ROCF z-scores collected in copy condition).

Neurological examinations show neither neglect nor hemiasomatognosia. All children were able to name their body parts correctly, no orientation abnormalities were detected, and spatial concepts were preserved. No children were excluded due their clinical profile. Overall, drawing were highly heterogeneous in terms of graphic style, with the precision and richness of details varying according to the age. However, an internal consistency was evident within-subject, with the three drawings presenting recurrent elements and a common graphical style (see **Figure 1A**).

The control group was composed by 16 typically developing children (10 M, mean age 7.37 ± 1.75). As expected, ROCF test performed in controls returned normal values for both copy and recall conditions (mean z-score = 1.51, range [−0.5, 2.5] and mean z-score = 0.92, range [−0.86, 1.85], respectively).

The Kruskal-Wallis H test showed a statistically significant difference in AI in SP between two groups [χ 2 (1) = 11.025, p = 0.001, effect size: η <sup>2</sup> = 0.418]. Post hoc contrasts indicated a significantly greater AI in self-portraits by UCP children relative to Controls (p < 0.001, see **Figure 1B**).

Within UCP group, the Friedman test applied to the AI rendered a chi-square value of 11.4, returning a significant effect of portrait type (p = 0.003, effect size: Kendall's W = 0.57). In particular, UCP children represented upper limbs more asymmetrically in self-portraits relative to other drawings (mean AI for SP: 39%, FP: 14%, HP: 22%). Post hoc contrasts indicated a significantly greater AI in self-portraits in comparison both to FP (p = 0.005) and HP (p = 0.013) (see **Figure 1B**). Moving to control group, no AI significant difference between SP and FP was found.

The study of clinical-demographical regressors on AI of self-portraits did not show any significant correlation. Besides, differential regressors related to the hemiparetic peer did not show significant correlations with the difference between SP and HP asymmetry indexes.

### DISCUSSION

The aim of the present study was to evaluate self-body representation in hemiparetic children affected by UCP with predominant upper limb involvement and to compare this pictorial representation to portraits of both hemiparetic and healthy peers. For this purpose, we evaluated the upper limb asymmetry in the three portrait types, which resulted significantly higher in self-portraits compared to both hemiparetic and healthy peers ones. Of note, self-portraits produced by typically developing children showed no significant difference in asymmetry, neither in comparison to portraits of others performed by the same group, nor relative to the portraits of others performed by children with UCP. This finding led us to regard the asymmetry of upper limbs in self-portraits as a specific signature of hemiparetic children.

The detection of asymmetries in own upper-limb representation in children with UCP is coherent with a previous work conducted by Abercrombie and Tyson (1966) on children suffering from cerebral palsy, in which the occurrence of unbalanced representations of upper limbs were reported in children with an unilateral brain damage. However, these authors used the Draw-a-Man test (Goodenough, 1975) as a projective test, implicitly making children represent their own body image. Differently from these authors, we explicitly asked children to produce both self- and classmate- portraits. The possibility to directly compare these drawings allowed us to verify whether upper limb asymmetry reflects an alteration of the own body image rather than a deviant representation of human body in general. Two are the major strengths of this

FIGURE 1 | (A) Example of portraits performed by a child: self-portrait, portrait of the hemiparetic peer with similar clinical conditions (5 years-old, unilateral cerebral palsy with prominent upper-limb motor impairment), portrait of best classmate. Note – only in self-portrait – the asymmetrical representation of upper-limb, with the paretic hand smaller than the contralateral one and without fingers. (B) Asymmetry index differences across different portrait types in children with UCP and controls. SP, self portrait; FP, portrait of the best classmate-friend; HP, portrait of the hemiparetic peer. Bars indicate SEM; <sup>∗</sup>p < 0.05, ∗∗p < 0.01.

approach. On one side, the within-subject comparison allowed us to rule out the contribution of subject peculiarities in drawing. On the other side, despite diagnosed for UCP, our clinical sample was free from visuospatial and symbolic disturbances, hemiasomatognosia and neglect, thus controlled for major disorders affecting pictorial representation.

The finding of a three-times higher level of asymmetry in self vs. classmate representation is in line with a previous work of Morin et al. (2003). These authors conducted a multivariate analysis evaluating 161 portraits performed by adult stroke patients (including both self-portraits and portraits of others). As expected, authors reported frequent "unilateral lacks" in right brain injured patients' drawings, attributing these difficulties to several aspects of hemineglect. However, some right-hemiparetic patients, despite drawing a "neglected" self-portrait, spontaneously drew a complete image of others, leading to postulate that unilateral defects of portraits may selectively reflect an alteration of body self-representation.

Although in line with our findings, whether this deviant representation constitutes a signature of the self-representation, or rather it is a more general representation of the hemiparetic condition, is still unclear. To address this issue, we required participants to portray also a hemiparetic peer with whom they had been experiencing a daily interaction in the previous month. This condition allow us to demonstrate that the asymmetrical picturing of upper limbs constituted a signature of the self-representation, favoring the view that selfportrait features are grounded in a first-person, sensorimotor bodily experience.

No correlation was found between the asymmetry in upper limb representation and indices of motor functioning. However, the small sample size and the heterogeneity of the investigated population in terms of brain lesions

### REFERENCES


require further studies to reveal a possible link between these two domains.

### CONCLUSION

In conclusion, our data indicate that UCP with predominant upper limb deficit affects body self-representation, but not body-representation in general. We suggest that the upper limb asymmetry does not constitute a picturing of pathological condition, but rather it may reflect the experienced status of motor functioning, that is valid only for one's own. We propose that evaluating self-portrait in hemiparetic children undergoing pediatric neurorehabilitation programs and quantifying the asymmetry of the self-representation could provide a valuable index of self-perceived functioning. Such procedure, well-suited for pediatric age, would enrich the clinical picture of the patient by adding a psychometric information to clinical outcomes, enabling clinicians to collect information not easily obtainable in pediatric patients.

### AUTHOR CONTRIBUTIONS

AN collected the data and drafted the manuscript. PP and PA revised the manuscript. MF-D interpreted the data and revised the manuscript.

### ACKNOWLEDGMENTS

We would like to thank Francesca Fedeli and Roberto D'Angelo (www.fightthestroke.org) for their help in recruiting participant and for their contribution in collecting drawings.


palsy: reliability and validity studies. Eur. J. Phys. Rehabil. Med. 50, 543–556.


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2019 Nuara, Papangelo, Avanzini and Fabbri-Destro. This is an openaccess article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Learning and Acting in Peripersonal Space: Moving, Reaching, and Grasping

Jonathan Juett\* and Benjamin Kuipers\*

*Computer Science and Engineering, University of Michigan, Ann Arbor, MI, United States*

The young infant explores its body, its sensorimotor system, and the immediately accessible parts of its environment, over the course of a few months creating a model of peripersonal space useful for reaching and grasping objects around it. Drawing on constraints from the empirical literature on infant behavior, we present a preliminary computational model of this learning process, implemented and evaluated on a physical robot. The learning agent explores the relationship between the configuration space of the arm, sensing joint angles through proprioception, and its visual perceptions of the hand and grippers. The resulting knowledge is represented as the peripersonal space (PPS) graph, where nodes represent states of the arm, edges represent safe movements, and paths represent safe trajectories from one pose to another. In our model, the learning process is driven by a form of intrinsic motivation. When repeatedly performing an action, the agent learns the typical result, but also detects unusual outcomes, and is motivated to learn how to make those unusual results reliable. Arm motions typically leave the static background unchanged, but occasionally bump an object, changing its static position. The reach action is learned as a reliable way to bump and move a specified object in the environment. Similarly, once a reliable reach action is learned, it typically makes a quasi-static change in the environment, bumping an object from one static position to another. The unusual outcome is that the object is accidentally grasped (thanks to the innate Palmar reflex), and thereafter moves dynamically with the hand. Learning to make grasping reliable is more complex than for reaching, but we demonstrate significant progress. Our current results are steps toward autonomous sensorimotor learning of motion, reaching, and grasping in peripersonal space, based on unguided exploration and intrinsic motivation.

Keywords: sensorimotor learning, autonomous robot learning, peripersonal space, reaching and grasping, intrinsic motivation, developmental robotics

### 1. INTRODUCTION

### 1.1. What Is the Problem?

We observe that human infants are born without the ability to reach, grasp, and manipulate nearby objects. Their motions are seemingly aimless, but careful research has established that infants are biased toward moving objects and toward keeping the hands in view (von Hofsten, 1982, 1984; van der Meer et al., 1995; van der Meer, 1997). After a few months of unguided experience, human infants can reach deliberately to contact nearby objects, and after a few more months, they can grasp nearby objects with a reasonable degree of reliability (Berthier, 2011).

#### Edited by:

*Matej Hoffmann, Faculty of Electrical Engineering, Czech Technical University in Prague, Czechia*

#### Reviewed by:

*Elio Tuci, University of Namur, Belgium Lorenzo Jamone, Queen Mary University of London, United Kingdom Alessandro Roncone, University of Colorado Boulder, United States*

#### \*Correspondence:

*Jonathan Juett jonjuett@umich.edu Benjamin Kuipers kuipers@umich.edu*

Received: *11 August 2018* Accepted: *04 February 2019* Published: *22 February 2019*

#### Citation:

*Juett J and Kuipers B (2019) Learning and Acting in Peripersonal Space: Moving, Reaching, and Grasping. Front. Neurorobot. 13:4. doi: 10.3389/fnbot.2019.00004*

During the early process of learning to reach, children's arm trajectories are quite jerky, suggesting the underdamped behavior of partially tuned control laws (Thelen et al., 1993). A tempting hypothesis about early reaching is that visual servoing brings the images of the hand and the target object close together. However, an elegant experiment (Clifton et al., 1993) refutes this hypothesis by showing that young children's reaching behavior is unaffected when they can see the target object, but not their own hands. During later reach learning, children and adults move the arm and hand more smoothly and directly to the target object, and they start depending on visual access to the moving hand (Berthier, 2011).

We abstract this developmental psychology problem to a problem in robot learning (**Figure 1**): How can a robot learn, from unguided exploratory experience, to reach and grasp nearby objects? We use the term peripersonal space (PPS) for the space immediately around the robot, accessible to its arms and hands for the manipulation of objects.

Peripersonal space includes multiple representations to accommodate different sensors and effectors. Proprioceptive sensors in the joints of the arm and hand provide information about the degrees of freedom of the manipulator, typically six or more. These degrees of freedom define the dimensions of the configuration space, in which a point determines the configuration of the arm, including the pose (position and orientation) of the hand. Vision provides sensory access to the 3D workspace, some but not all of which is within reach. To reach and grasp successfully, the robot needs to learn useful representations for the configuration space and the workspace, and for mappings between their different representations of peripersonal space.

Peripersonal space is also accessed by other sensory modalities such as touch and sound, and via other activities such as selftouch and tool use (Canzoneri et al., 2012, 2013; Roncone et al., 2016; Mannella et al., 2018). This paper focuses on learning from unguided exploration the functional relations linking proprioception and vision, two sensory modalities central to the representation of knowledge of peripersonal space. We hope to extend our approach to include touch and sound in the future.

### 1.2. Why Is the Problem Important?

Consider the computational problem faced by the newborn agent (human or robot), trying to make sense of the "blooming, buzzing confusion" of its sensory input, and learning to act with predictable and eventually useful results (Pierce and Kuipers, 1997). (Some of this learning could take place over evolutionary time, the learned knowledge being innate to the individual).

Reaching and grasping are among the earliest actions learned by a human infant, and they help it achieve control over its immediate environment, by being able to grasp an object, take control of it, and move it from one position to another. Reaching the desired object is a prerequisite for grasping. Moving the arm from one pose to another is a step toward learning to reach. All of this learning takes place through unguided exploration, without explicit instruction or reward.

From the early days of artificial intelligence, planners and problem-solvers (e.g., STRIPS, Fikes and Nilsson, 1971) assumed the existence of primitive actions for grasping and moving objects. This research contributes to showing how such primitives can be learned from very early experience.

### 1.3. Overview

A fundamental question about developmental learning is how an agent, without prior knowledge of its body, its sensors, its effectors, or its environment, can build a useful representation for the state of its world, and then can use this representation to learn reliable actions to change that state.

In our approach, the learning agent uses its unguided experience to define the peripersonal space (PPS) graph. Each node of the PPS graph represents a state of the arm, defined in terms of its joint angles, so it represents a point in configuration space. An edge linking two nodes is included when direct motion is safe between those two configurations. Each node is also annotated with the perceptual image(s) of the hand and arm in the otherwise empty workspace.

In this paper, we describe two applications of a general process for learning reliable actions. After creating the PPS graph, the process collects data about an initial action, learning its typical results, identifying unusual results, and then adding new preconditions or parameterizations to define a novel action that makes those unusual results reliable. We assume that a kind of intrinsic motivation (Baldassarre and Mirolli, 2013) drives this learning cycle. We use intrinsic motivation as a tool, but this paper is not intended as a contribution to the literature on intrinsic motivation.

The first application of the process observes the arm moving to configurations described by randomly-selected nodes in the PPS graph. The typical result is no change at all to the perceived images of blocks on the table; the main unusual result is a quasi-static change to the image due to the arm pushing or bumping (i.e., reaching) the block. Given a block to reach, the learning process finds preconditions that identify a target PPS node corresponding to that block, so that moving to that target node reliably reaches the intended block.

In the second application of the same process, the agent observes the result of reaching to randomly-selected blocks. Since the reach action is now reliable, the typical result is to cause a quasi-static change to the image of the selected block. The unusual result is for the block to move dynamically with the hand, rather than remaining static in a new position: the hand has grasped the block.

The conditions for making the grasp action reliable are more complex than for the reach action, but fortunately, they can still be expressed in terms of the PPS graph and the continuous spaces it approximates. Human infants, for several months after birth, exhibit the Palmar reflex, in which a touch on the palm causes the fingers to close tightly, automatically (and unintentionally) grasping an object (Futagi et al., 2012), the unusual event of an accidental grasp becomes frequent enough to provide sufficient data for a learning algorithm.

In this paper, we describe this process for learning increasingly reliable reach and grasp actions, without externally provided feedback or instruction. This paper improves, extends, and

unifies results presented in our previous papers (Juett and Kuipers, 2016, 2018).

### 2. RELATED WORK

### 2.1. The Human Model: Evidence From Child Development

There is a rich literature in developmental psychology on how infants learn to reach and grasp, in which the overall chronology of learning to reach is reasonably clear (e.g., Berthier, 2011; Corbetta et al., 2014). From birth to about 15 weeks, infants can respond to visual targets with "pre-reaching" movements that are generally not successful at making contact with the targets. From about 15 weeks to about 8 months, reaching movements become increasingly successful, but they are jerky with successive submovements, some of which may represent corrective submovements (von Hofsten, 1991), and some of which reflect underdamped oscillations on the way to an equilibrium point (Thelen et al., 1993). For decades, early reaching was generally believed to require visual perception of both the hand and the target object, with reaching taking place through a process of bringing the hand and object images together ("visual servoing"). However, a landmark experiment (Clifton et al., 1993) showed that the pattern and success rate of reaching by young infants is unaffected when the hand is not visible. Toward the end of the first year, vision of the hand becomes important for configuring and orienting the hand in anticipation of contact with target objects. The smoothness of reaching continues to improve over early years, toward adult reaches which typically consist of "a single motor command with inflight corrective movements as needed" (Berthier, 2011).

Theorists grapple with the problem that reaching and grasping require learning useful mappings between visual space (two- or three-dimensional) and the configuration space of the arm (with dimensionality equal to the number degrees of freedom).

Bremner et al. (2008) address this issue under the term, multisensory integration, focusing on sensory modalities including touch, proprioception, and vision. They propose two distinct neural mechanisms. The first assumes a fixed initial body posture and arm configuration, and represents the positions of objects within an egocentric frame of reference. The second is capable of re-mapping spatial relations in light of changes in body posture and arm configuration, and thus effectively encodes object position in a world-centered frame of reference.

Corbetta et al. (2014) focus directly on how the relation is learned between proprioception ("the feel of the arm") and vision ("the sight of the object") during reach learning. They describe three theories: vision first; proprioception first; and vision and proprioception together. Their experimental results weakly supported the proprioception-first theory, but all three had strengths and weaknesses.

Thomas et al. (2015) closely observed spontaneous selftouching behavior in infants during their first 6 months. Their analysis supports two separately-developing neural pathways, one for Reach, which moves the hand to contact the target object, and a second for Grasp, which shapes the hand to gain successful control of the object.

These and other investigators provide valuable insights into distinctions that contribute to answering this important question. But different distinctions from different investigators can leave us struggling to discern which differences are competing theories to be discriminated, and which are different but compatible aspects of a single more complex reality.

We believe that a theory of a behavior of interest (in this case, learning from unguided experience to reach and grasp) can be subjected to an additional demanding evaluation by working to define and implement a computational model capable of exhibiting the desired behavior. In addition to identifying important distinctions, this exercise ensures that the different parts of a complex theory can, in fact, work together to accomplish their goal.

The model we present at this point is preliminary. To implement it on a particular robot, certain aspects of the perceptual and motor system models will be specific to the robot, and not realistic for a human infant. To design, implement, debug, and improve a complex model, we focus on certain aspects of the model, while others remain over-simplified. For example, our model of the Peri-Personal Space (PPS) Graph uses vision during the creation of the PPS Graph, but then does not need vision of the hand while reaching to a visible object (Clifton et al., 1993). The early reaching trajectory will be quite jerky because of the granularity of the edges in the PPS Graph (von Hofsten, 1991), but another component of the jerkiness could well be due to underdamped dynamical control of the hand as it moves along each edge (Thelen et al., 1993), which is not yet incorporated into our model.

## 2.2. Robot Developmental Learning to Reach and Grasp

#### 2.2.1. Robotic Modeling

Some robotics researchers (e.g., Hersch et al., 2008; Sturm et al., 2008) focus on learning the kind of precise model of the robot that is used for traditional forward and inverse kinematics-based motion planning. Hersch et al. (2008) learn a body schema for a humanoid robot, modeled as a tree-structured hierarchy of frames of reference, assuming that the robot is given the topology of the network of joints and segments and that the robot can perceive and track the 3D position of each end-effector. Sturm et al. (2008) start with a pre-specified set of variables and a fully-connected Bayesian network model. The learning process uses visual images of the arm while motor babbling, exploiting visual markers that allow extraction of 6D pose for each joint. Bayesian inference eliminates unnecessary links and learns probability distributions over variable values. Our model makes weaker assumptions about the variables and constraints included in the model, and uses much weaker information from visual perception.

#### 2.2.2. Neural Modeling

Other researchers structure their models according to hypotheses about the neural control of reaching and grasping, with constraints represented by neural networks that are trained from experience. Oztop et al. (2004) draw on empirical data from the literature about human infants, to motivate their computational model (ILGM) of grasp learning. The model consists of neural networks representing the probability distributions of joint angle velocities. They evaluate the performance of their model with a simulated robot arm and hand, assuming that reaching is already programmed in. Their model includes a Palmar reflex, and they focus on learning an open-loop controller that is likely to terminate with a successful grasp.

Chinellato et al. (2011) propose an architecture consisting of two radial basis function networks linking retinotopic information with eye movements and arm movements through a shared head/body-centered representation. Network weights are trained through experience with a simulated 2D environment and 2 dof arm. Experiments demonstrate appropriate qualitative properties of the behavior.

Savastano and Nolfi (2013) describe an embodied computational model implemented as a recurrent neural network, and evaluated on a simulation of the iCub robot. They demonstrate pre-reaching, gross-reaching, and finereaching phases of learning and behavior, qualitatively matching observations of children such as diminished use of vision in the first two phases, and proximal-then-distal use of the arm's degrees of freedom. The transitions from one phase to the next are represented by manually adding certain links and changing certain parameters in the network, begging the question about how and why those changes take place during development.

Caligiore et al. (2014) present a computational model of reach learning based on reinforcement learning, equilibrium point control, and minimizing the speed of the hand at contact. The model is implemented on a simulated planar 2 dof arm. Model predictions are compared with longitudinal observations of infant reaching between ages of 100 and 600 days (Berthier and Keen, 2006), demonstrating qualitative similarities between their predictions and the experimental data in the evolution of performance variables over developmental time. Their focus is on the irregular, jerky trajectories of early reaching (Berthier, 2011), and they attribute this to sensor and process noise, corrective motions, and underdamped dynamics (Thelen et al., 1993). By contrast, we attribute part of the irregular motion to the irregularity of motion along paths in the PPS graph (rather than to real-time detection and correction of errors in the trajectory, which would be inconsistent with Clifton et al., 1993). We accept that other parts of this irregularity is likely due to process noise and underdamped dynamics during motion along individual edges in the PPS graph, but that aspect of our model is not yet implemented. At the same time, the graph representation we use to represent early knowledge of peripersonal space can handle a realistic number of degrees of freedom in a humanoid robot manipulator (**Figure 1**).

#### 2.2.3. Sensorimotor Learning

Several recent research results are closer to our approach, in the sense of focusing on sensorimotor learning without explicit skill programming, exploration guidance, or labeled training examples. Each of these (including ours) makes simplifying assumptions to support progress at the current state of the art, but each contributes a "piece of the puzzle" for learning to reach and grasp.

Our work is closely related to the developmental robotics results of Law et al. (2014a,b). As in their work, we learn graphstructured mappings between proprioceptive and visual sensors, and thus between the corresponding configuration space and work space. Like them, we apply a form of intrinsic motivation to focus the learning agent's attention on unusual events, attempting to make the outcomes reliable. A significant difference is that Law et al. (2014a,b) provide as input an explicit schedule of "constraint release" times, designed to follow the observed stages identified in the developmental psychology literature. Our goal is for the developmental sequence to emerge from the learning process as pre-requisite actions (e.g., reaching) must be learned before actions that use them (e.g., grasping).

Jamone et al. (2012, 2014) define a Reachable Space Map over gaze coordinates (head yaw and pitch, plus eye vergence to encode depth) during fixation. The control system moves the head and eyes to place the target object at the center of both camera images. Aspects of this relationship between retinal, gaze, and reach spaces were previously investigated by Hülse et al. (2010). In the Reachable Space Map, R = 0 describes unreachable targets; intermediate values describe how close manipulator joints are to the physical limits of their ranges; and R = 1 means that all joints are well away from their limits. The Reachable Space Map is learned from goal-directed reaching experience trying to find optimal reaches to targets in gaze coordinates. Intermediate values of R can then be used as error values to drive other body-pose degrees of freedom (e.g., waist, legs) to improve the reachability of target objects. Within our framework, the Reachable Space Map would be a valuable addition (in future work), but the PPS Graph (Juett and Kuipers, 2016) is learned at a developmentally earlier stage of knowledge, before goal-directed reaching has a meaningful chance of success. The PPS Graph is learned during non-goal-directed motor babbling, as a sampled exploration of configuration space, accumulating associations between the joint angles determining the arm configuration and the visual image of the arm.

Ugur et al. (2015) demonstrate autonomous learning of behavioral primitives and object affordances, leading up to imitation learning of complex actions. However, they start with the assumption that peripersonal space can be modeled as a 3D Euclidean space, and that hand motions can be specified via starting, midpoint, and endpoint coordinates in that 3D space. Our agent starts with only the raw proprioceptively sensed joint angles in the arm and the 2D images provided by vision sensors. The PPS graph represents a learned mapping between those spaces. The egocentric Reachable Space Map (Jamone et al., 2014) could be a step toward a 3D model of peripersonal space.

Hoffmann et al. (2017) integrate empirical data from infant experiments with computational modeling on the physical iCub robot. Their model includes haptic and proprioceptive sensing, but not vision. They model the processes by which infants learn to reach to different parts of their bodies, prompted by buzzers on the skin. They report results from experiments with infants, and derive constraints on their computational model. The model is implemented and evaluated on an iCub robot with artificial tactile-sensing skin. However, the authors themselves describe their success as partial, observing that the empirical data, conceptual framework, and robotic modeling are quite disparate, and not well integrated. They aspire to implement a version of the sensorimotor account, but they describe their actual model as much closer to traditional robot programming.

### 3. BUILDING THE PERIPERSONAL SPACE GRAPH

### 3.1. Methods

A baby begins to explore its environment and the range of motion of its arms with seemingly random movements and no clear external goal.

There is a physical relationship between the configuration **q** of the arm in configuration space, and the resulting pose **p** of the hand in the workspace. This relationship, forward kinematics, is not known to the baby.

$$f(\mathbf{q}) = \mathbf{p} \tag{1}$$

The physical structure of the robot and its perceptual system also define a mapping from the pose of the hand to a visual representation (e.g., a binary image) of the hand. (Note that I<sup>p</sup> is simply an identifier for an image, and does not allow the agent to obtain an explicit representation of the pose **p**).

$$I(\mathbf{p}) = I\_{\mathbf{p}} \tag{2}$$

Composing these defines a (partial) function g that the robot can learn about, by simultaneously using proprioception to sense the configuration **q**, and visual perception to sense the image Ip.

$$g(\mathbf{q}) = I(f(\mathbf{q})) = I\_{\mathbb{P}} \tag{3}$$

This observation (**q**, Ip) is one point on the function g.

The Peripersonal Space (PPS) graph P is a collection of nodes and edges, representing a state of knowledge about the mapping g. <sup>1</sup> A node n ∈ P represents an observation (**q**, Ip). An edge (ni , nj) = eij ∈ P represents an affordance (i.e., an opportunity) for safe motion between **q**(ni) and **q**(nj).

The robot learning agent creates a PPS graph P of N nodes by sampling the configuration space of its arm. From an initial pose **q**<sup>0</sup> in an empty environment, the robot samples a sequence of perturbations 1**q** from a distribution D to generate a sequence of poses:

$$\mathbf{q}\_{i+1} = \mathbf{q}\_i + \Delta \mathbf{q}\_i \text{ while } i \in [0, N-1] \tag{4}$$

While the motor babbling of human infants may appear random, it does exhibit biases toward moving objects and toward keeping the hand visible (von Hofsten, 1982, 1984; van der Meer et al., 1995; van der Meer, 1997). We use rejection sampling to enforce these biases, and constraints against collisions with the table or

<sup>1</sup> Strictly speaking, a graph P = hN, Ei consists of two sets, one for nodes and one for edges. For notational simplicity, we will use n<sup>i</sup> ∈ P and eij ∈ P as abbreviations for n<sup>i</sup> ∈ N(P) and eij ∈ E(P).

the robot's own body. If either condition is violated, the proposed configuration is rejected and a new qi+<sup>1</sup> is sampled.

At this point, the arm is physically moved from its current configuration **q**<sup>1</sup> to the new configuration **q**i+1. After each new pose has been safely reached by physical motion of the arm, a corresponding perceptual image Ip,i+<sup>1</sup> is collected, and the node ni+<sup>1</sup> = (**q**i+1,Ip,i+1) and the undirected edge ei,i+<sup>1</sup> = (n<sup>i</sup> , ni+1) are added to P. The length of an edge is the Euclidean distance between the configurations at its endpoint nodes, considered in joint space.

$$||e\_{ij}|| = d(n\_i, n\_j) = ||\mathbf{q}\_i - \mathbf{q}\_j||\_2 \tag{5}$$

At this point, the graph is a linear chain, so between any two nodes there is a single path, typically very long. In addition to inefficiency, having a single path through the graph does not provide options for avoiding obstacles or selecting the most reliable approach for a learned action. The graph needs much higher connectivity, by adding new edges linking existing nodes in P.

It is not feasible to test every pair of unconnected nodes, so we apply a heuristic. Let the length of an edge be the Euclidean distance between the configurations at its endpoint nodes, considered in joint space.

$$||e\_{ij}|| = d(n\_i, n\_j) = ||\mathbf{q}\_i - \mathbf{q}\_j||\_2 \tag{6}$$

and let µ<sup>e</sup> be the mean length of all the edges in the current (linear) graph. The heuristic is that when d(n<sup>i</sup> , nj) < µe, the average length of edges known from exploration to be safe, then the edge eij can be added to P, if it is not already present. With the inclusion of these edges, we expect that P will supports planning of multiple trajectories between any given pair of nodes. Because P is still a sparse approximation to the configuration space, trajectories across the environment will tend to be jerky.

Any path hn1, . . . , nmi in a PPS graph P corresponds with a safe trajectory h**q**1, . . . , **q**mi of the arm. The agent designates a home node, nh, where the arm rests naturally and that allows relatively unoccluded observation of the environment. By convention, trajectories begin at nh, and eventually return there, too. We will also define the terms n<sup>f</sup> for the final node of a trajectory, and n<sup>p</sup> for the penultimate node.

The PPS graph P can then be used as "scaffolding" to learn increasingly expert ways to reach and grasp. By searching the information in the PPS graph P, we can define a function h that provides a discrete approximation to g −1 from Equation (3):

$$C(I\_b) = \{ (\mathbf{q}, I\_{\mathcal{P}}) = n \in \mathcal{P} \, : \, \operatorname{match}(I\_b, I\_{\mathcal{P}}) \}\tag{7}$$

$$h(I\_b) = \mathbf{q}^\* = \operatorname{sleictq}\_{\mathbf{q}} \mathcal{C}(I\_b) \tag{8}$$

Given a current visual image I<sup>b</sup> of an object (e.g., a block) in the environment, we can identify nodes (**q**,Ip) = n ∈ P whose stored images I<sup>p</sup> of the hand matches (e.g., overlaps with) the currently sensed image I<sup>b</sup> of the object. The generic operator select**<sup>q</sup>** defines the role for a criterion for selecting among matching nodes, for example by maximizing the overlap between binary imagesI<sup>b</sup> and Ip, or by minimizing the distance between their centers.

### 3.2. Experiment 1: Creating the Peripersonal Space Graph

For our experiment, we apply the methods described above (section 3.1) to learn to control the left arm of our Baxter Research Robot (**Figure 1**), providing specific instantiations for the generic aspects of the method. The state of this arm can be given by eight degrees of freedom, a set of seven joint angles, **q** = hq 1 , . . . , q 7 i = hs0,s1, e0, e1,w0,w1,w2i and the aperture a between the gripper fingers, described by a percentage of its maximum width.

For the Baxter Research Robot, each visual percept I<sup>p</sup> is taken by a fixed-viewpoint RGB-D camera, providing an RGB image IRGB and a depth-registered image I<sup>D</sup> (**Figure 2**). During the construction of P, the agent may save a percept P (ni) taken while it is paused at n<sup>i</sup> .

For our experiment, the robot begins with an empty PPS Graph P, and the arm is initially at the home configuration **q**<sup>h</sup> = **q**(nh). The random motor babbling search described in Equation (4) is instantiated for our robot in a straight-forward way. For each joint angle k, the displacement to add is sampled from a normal distribution with a standard deviation equal to a tenth of the full range of that joint.

$$q\_{i+1}^k = q\_i^k + \Delta q^k\\\text{where } \Delta q^k \sim \text{N}(0, \sigma\_k) \text{ and } \sigma\_k = 0.1 \cdot range(q^k) \tag{9}$$

We impose a bias using a form of rejection sampling, requiring that the resulting end-effector pose must fall within the field of view, and must not collide either with the table or with the robot's own body. If either condition is violated, the proposed configuration is rejected and a new qi+<sup>1</sup> is sampled. As noted previously, human infants exhibit a bias toward keeping the hand visible (von Hofsten, 1982, 1984; van der Meer et al., 1995; van der Meer, 1997). Human infants are also soft and robust, so they can detect and avoid collisions with minimal damage. To prevent damage to our Baxter Research Robot, we implement these checks using a manufacturer-provided forward kinematics model that is below the level of detail of our model, and is used nowhere else in its implementation. In future work, we will considering biasing this sampling to resemble human infants' pre-reaching motions toward objects, or to move in a cyclic fashion, often returning to the center of the field of view.

To move along an edge eij from n<sup>i</sup> to n<sup>j</sup> , in the current implementation, the agent uses linear interpolation of each joint angle q k from its value in **q**<sup>i</sup> to its value in **q**<sup>j</sup> .

For this experiment, the total number of nodes created and added to P is N = 3, 000.

### 3.3. Experiment 1 Results

The Peripersonal Space graph P is a sparse approximation of the configuration space of the robot arm (**Figure 3**). It is evident that random sampling through unguided exploration has distributed N = 3, 000 nodes reasonably well throughout the workspace, with some localized sparse patches and a region in the far right corner that is generally out of reach of the robot's left hand. The display in **Figure 3A** overlays information available to the robot in the individual nodes of P. The information in **Figure 3B** is not available to the robot.

FIGURE 2 | An example of the agent's visual percept and stored representation for a node *ni* . (A) A single RGB image *IRGB*, scaled down to 120 × 160 resolution, taken while the arm configuration is set to q*<sup>i</sup>* = q(*n<sup>i</sup>* ). (B) The registered depth image *ID* taken at the same time. Note that the depth values are a measure of disparity, so smaller values are further from the camera. (C) The full representation the agent stores for the node *ni* . Aided by the yellow block held between the gripper fingers, the agent segments the palm mask, corresponding to the grasping region of the hand. The larger hand mask includes the palm mask (shown in yellow) and parts of the robot image segment near the block, typically the gripper fingers and lower wrist (shown in red). The range of depth image values within each mask is also stored, as are the center of mass and mean depth value for each mask. Finally, to estimate the direction the grippers are pointing, a vector is drawn from the hand mask center through the palm mask center.

Random exploration of the configuration space with N = 3, 000 creates 3,000 nodes, in a chain with 2,999 edges. Of the original 2,999 edges, 1,614 of them have length less than the mean length µ<sup>e</sup> of all 2,999 edges. The heuristic that creates a new edge between n<sup>i</sup> and n<sup>j</sup> when d(n<sup>i</sup> , nj) < µ<sup>e</sup> adds 108,718 new edges, so that P now has 3,000 nodes and 111,717 edges. By comparison, the complete graph with 3,000 nodes has 4,448,500 edges, so the PPS graph P has the same number of nodes and about 2% as many edges as the complete graph.

### 4. LEARNING A RELIABLE REACH ACTION

In our model, learning the reach action takes place in three stages. First, the agent must learn to detect the unusual event of bumping a block, causing a quasi-static change in the environment, against the background of typical arm motions that leave the environment unchanged. Second, the agent learns criteria for selecting nodes from the PPS graph, such that moving to one of those nodes increases the likelihood of bumping a specified block. Third, the agent learns how to interpolate in continuous space between the nodes of the PPS graph to further increase the likelihood of bumping a target block.

Since these three learning stages have different character, depend on different knowledge, and apply different methods, we describe our research on each of them with its own Methods-Experiments-Results description.

### 4.1. Observing the Unusual Event of a Bump

#### 4.1.1. Methods

During the construction of the PPS Graph, the agent's perceptual input can be easily factored into a static background, and a highly variable foreground corresponding to the robot's hand and arm. This allows the nodes of the PPS Graph to be characterized by the perceptual image of the robot's hand. By detecting a correlation between "random" motor output and perceived hand motion, the agent can diagnose that that the hand is part of the agent's "self."

Once the PPS Graph has been completed, additional objects are placed into the workspace. The objects used for this work are rectangular prism blocks with a single long dimension. The blocks are placed upright at randomly generated coordinates on the table in front of the robot, with the requirement that each placement leaves all blocks unoccluded and fully within the field of vision. The objects have distinctive colors not present in the background, making it easy to create a binary image mask for each object in the RGB image. This image mask can be applied to the depth image to determine the range of depth values associated with the object.

The agent creates binary image masks as more efficient representations of its own hand and of foreground objects that may be targets of actions. For each n<sup>i</sup> ∈ P, the agent finds the end effector in IRGB(ni) and records two binary masks that describe its location in the image. The palm mask p<sup>i</sup> is defined to be the region between the gripper fingers, which will be most relevant for grasping.<sup>2</sup> The hand mask h<sup>i</sup> includes this region as well as the gripper fingers and the wrist near the base of the hand. h<sup>i</sup> reflects the full space occupied by the hand, which is most useful to identify and avoid nodes with hand positions that may collide with obstacles. The state representation for a node also includes the range of depths the end effector is observed to occupy. This range is found by indexing into ID(ni) with either mask, and determining the minimum and maximum depth values over these pixels. That is, the depth range of the palm D(pi) ≡ [min(ID(ni)[pi]), max(ID(ni)[pi])], and the depth range of the full hand D(hi) is defined analogously. Edges can also be associated with a binary mask for the area swept through during motion along it, si,<sup>i</sup> ′ , approximated by a convex hull of the hand masks of the endpoint nodes, h<sup>i</sup> and h<sup>i</sup> ′ . The depth range of motion along an edge is the full range between the minimum and maximum depths seen at either endpoint, D(si,<sup>i</sup> ′) ≡ [min(D(hi), D(h<sup>i</sup> ′)), max(D(hi), D(h<sup>i</sup> ′)].

<sup>2</sup>We use the word "palm" for this region because of its functional (though not anatomical) similarity to the human palm, especially as the site of the Palmar reflex (Futagi et al., 2012).

The 2,999 edges in the original chain from motor babbling are shown as dotted red lines. Not shown are the edges added according to the safe motion heuristic.

Many (but not all) motions of the arm leave the other objects unaffected, so the new objects typically behave as part of the static background model. However, occasionally the hand bumps into one of the objects and knocks it over or shifts its position. This is defined as a bump event, and is detected by the agent as a quasi-static change to the perceptual image of the object.

When an image of an object is characterized by a binary mask, the difference between two images A and B can be measured by the Intersection Over Union measure:

$$IOU(A,B) = |A \cap B| / |A \cup B|. \tag{10}$$

Comparing the images of an object A at times t<sup>1</sup> and t2, when IOU(A(t1), A(t2)) ≈ 1 the object has remained static. In case we observe IOU(A(t1), A(t2)) ≪ 1, the object may have moved, but we take care to exclude the case of a temporary occlusion of an object by the hand or arm.

We define a reach as the action of following a trajectory resulting in a bump event with a target object. Even without knowing how to make a reach action reliable, the IOU criterion will allow the agent itself to distinguish between successful and unsuccessful reach actions. In subsequent stages, the agent will learn how to reach reliably.

#### 4.1.2. Experiments

The agent continues to practice its new capability to perform motions allowed by the PPS Graph and observe the results of these motions.

#### **4.1.2.1. Experiment 2: Exploration**

The agent follows this procedure:

	- If an apparent change at intermediate node n<sup>i</sup> that triggered an immediate return is not confirmed (i.e., IOU ≈ 1), then repeat the trajectory, continuing past n<sup>i</sup> , to search for a subsequent bump event.

By clustering the results of the IOU criterion, the agent learns to discriminate between the typical outcome of a trajectory (no change) and an unusual outcome (a bump event). These outcomes are defined as the unsuccessful and successful results of a reach action, respectively. Subsequent stages will identify features to allow increasingly reliable reach actions.

### **4.1.2.2. Experiment 3: Reach reliability**

To quantify this improvement, we establish a baseline level of performance for the policy of selecting a random final node n<sup>f</sup> and then following the shortest path in the PPS Graph to nf . This second experiment consists of 40 trials with a single, randomly-placed target block.

#### 4.1.3. Results

Following this procedure, with three new objects added to the environment, the agent moved along 102 trajectories and gathered 306 IOU values between initial and final object masks. Where t is the target object mask prior to the motion, and t ′ is the target object mask following the motion, the IOU values fell into two well-separated clusters.


Intuitively, a trajectory to a random final node is unlikely to interact with an object on the table. However, in a rare event the hand bumps the object, knocking it over, or sliding it along the table and sometimes off the table (the resulting absence of a final mask leads to an IOU of 0, so no special case is necessary).

The strategy of returning to the home node to observe the final mask allows the agent to rule out occlusion by the hand as the source of the perceptual change. This has not been observed to make false positive bump classifications. This is important so that the agent will not learn incorrect conditions for a bump. There are a small number of false negatives where the hand and object do collide, but without lowering the IOU enough to fall into the smaller cluster. The agent is still able to learn the conditions from the reduced number of observed bumps, and may even favor actions that cause larger, more reliable bumps as a result.

The agent can classify all future motions in the presence of an object by associating the resulting observed IOU with one of the two clusters. While we human observers can describe the smaller cluster as a bump event, the robot learning agent knows only that the smaller cluster represents an unusual but recognizable event, worth further exploration. The agent has no knowledge of what makes a reach succeed. The following stages will help fill that gap.

The quantitative baseline experiment gives a reliability of 20% for the reach action to a random final node, which will be compared other methods in **Figure 7**.


### 4.2. Identifying Candidate Final Nodes 4.2.1. Methods

The agent has identified the rare event of a bump, and has defined reach as the action that can cause this event. Choosing a target node n<sup>f</sup> randomly from the PPS graph gives a baseline reliability of 20%. The agent is now intrinsically motivated to search for ways to improve the reliability of the reach action. This can be done by identifying one or more features that discriminate between the cases that result in a bump, and those that do not.

The PPS graph stores a visual percept of the hand on each node, and the agent has a current visual percept of the target object. Comparing these percepts is straightforward, since they have the same frame of reference, and the agent has the RGB masks and the depth ranges from each image. Any nonempty intersection predicts that the hand and the target object will occupy the same region of the RGB image, or the same depth, or both.

The stored visual percepts also allow the agent to derive the image-space center of mass of the end effector at a given node. Centers and directions will have three components, two for the (u, v)-coordinates in the RGB image, and one (d) for depth values in the Depth image. For a node n<sup>i</sup> , the center of the palm c p i

is composed of the center of mass of p<sup>i</sup> and the average depth, mean(PD(ni)[pi]), and the center of the hand c h i is derived from h<sup>i</sup> and PD(ni)[pi] in the same manner. Centerc t for a target object with mask t and depth range D(t) in the current percept is also found analogously.

Using the PPS graph, the agent improves reach reliability in three steps.


### 4.2.2. Experiments

### **4.2.2.1. Experiment 4: Which intersection property is best?**

By further analysis of the data reported in section 4.1.3 from 102 reaching trajectories, the agent can determine which binary image mask, and which intersection property, best predict whether a trajectory will produce a bump event.

The agent compares binary masks b representing the palm (p<sup>f</sup> ) or the hand (h<sup>f</sup> ) at its final pose or throughout its final motion (sp,<sup>f</sup> ). For each binary mask b and the mask t representing the target object, the trajectories are placed in four groups according to whether b ∩ t and/or D(b) ∩ D(t) are empty or nonempty. Counts of observed bumps and the total number of trajectories within each group allow the conditional probabilities of a bump to be computed.

The set of PPS graph nodes that satisfy the selected mask intersection property, with the best choice of mask, will define the set of candidate final nodes for a reach trajectory.

### **4.2.2.2. Experiment 5: Using the candidate final nodes**

An improved reach action policy can be created by selecting the target node n<sup>f</sup> as a random member of the candidate final node set, rather than a random node from the entire PPS graph. The shortest graph path is found from the home node n<sup>h</sup> to this final node n<sup>f</sup> . This policy is evaluated using the same method as Experiment 2 in section 4.1.2: reaching for 40 blocks, presented individually at randomly assigned locations on the table.

#### **4.2.2.3. Experiment 6: Selecting the best candidate node**

In spite of every candidate node having non-empty intersections between both RGB and D masks of hand and target object, the reliability of this reach action is still only 52.5%. One reason is that the RGB and D masks taken together over-estimate the space occupied by the hand or an object, so the intersection may take place in empty space. Another reason is that some non-empty intersections may be very small, resulting in an imperceptible bump event.

To address this issue, we identify a distance measure between hand and target object, and then select from the set of candidate nodes, the node that minimizes that distance measure. Once this node is chosen, the rest of the path is planned as before. This improved policy is evaluated the same way as Experiment 4.

FIGURE 5 | Given percepts for hand and target object, the agent searches for the feature *f* that will maximize the conditional probability *P*(*Bump* | *f*). Each feature considers the centers of the palm and target in (*u*, *v*, *d*) image-space. *fu*, *fv*, and *fd* evaluate to true if the absolute difference in one coordinate is less than a variable threshold *k*, and *fc* is true if the distance between centers is less than *k*. The probabilities shown in this graph are based on the 102 trajectories used previously, and their outcomes. For all values of *k*, *P*(*Bump* | *f*) is maximized when *f* = *fc*. The agent therefore selects as *n<sup>f</sup>* the candidate node where the hand is closest to the target object, thereby minimizing *k* and maximizing *P*(*Bump* | *f*).

#### 4.2.3. Results

#### **4.2.3.1. Experiment 4 results**

The set of groups where b = p<sup>f</sup> contains the group with the highest conditional probability.

Each array represents the four possible intersection conditions, and each entry holds the conditional probability of a bump event in a trajectory satisfying that intersection conditioned, explained as the ratio of bump events to trajectories. Recall that three objects were present for each trajectory, so the total number of observations reflected in the denominators is 306.




A bump is most likely (64%) to occur at a final node n<sup>f</sup> where the palm percept has a nonempty intersection in both mask and depth range with the target percept, that is, where

$$p\_f \cap \mathfrak{t} \neq \emptyset \quad \land \quad D(p\_f) \cap D(\mathfrak{t}) \neq \emptyset. \tag{11}$$

The process of identifying a node as a candidate is demonstrated in **Figure 4**.

### **4.2.3.2. Experiment 5 results**

For the same 40 placements as the baseline (Experiment 2), 39 have at least one node with both mask and depth range intersections with the target (i.e., has a non-empty candidate final node set), and the policy of moving to one of these nodes bumps the target 21 times. Attempting a reach to the placement where no node has both RGB and Depth intersections was not successful. Overall, the reach action is now 52.5% reliable. The comparison in **Figure 7** shows reaching to an arbitrary candidate node is more than twice as reliable as the baseline action of moving to a random final node.

#### **4.2.3.3. Experiment 6 results**

**Figure 5** shows the results of comparing several different distance measures between the center positions of the hand and of the target object. This result supports the use of the final node candidate with the smallest center to center distance with the target ||c <sup>t</sup> − c p f ||. This result is also included in the comparison in **Figure 7**. Attempting the 40 reaches again, the agent now considers the reach action to be 77.5% reliable, with 31 successes, 7 false negatives, and 2 actual failures to bump the object.

Tabulated results from experiments 3, 5, and 6:


This method, for identifying candidate target nodes that increase the probability of bumping a specified block, can be extended to avoid bumping specified blocks.

## 4.3. Interpolating Between PPS Nodes

#### 4.3.1. Methods

Recall that the first improvement to the reach action was to identify a set of candidate final nodes, all nodes where the stored hand representation and the current percept of the target intersect in both the RGB and depth images. Moving to an arbitrary candidate final node instead of a random node from the PPS graph more than doubles the rate at which bumps are successfully caused. However, **Figure 5** demonstrates that the success rate for reaches increased as ||c p <sup>f</sup> − c t || decreased. Choosing the candidate node nearest to the target object improved the reliability of the reach to 77.5%, but this method is limited by the density of the PPS Graph near the target. Especially in relatively sparse regions of the graph, even the nearest node may not be close enough for a reliable reach. The agent must learn to make small moves off the graph to reach closer to the object than the nearest node.

The PPS graph P is a discrete, sampled approximation to a continuous mapping between the continuous configuration space of the arm, and a continuous space of perceptual images. The full Jacobian model J(q) relating joint angle changes 1q to changes in hand center coordinates 1c is a nonlinear mapping, dependent on the current state of the arm q, a seven-dimensional vector. The full Jacobian is therefore prohibitively difficult for the agent to learn and use. However, P does contain sufficient data for making linear approximations of the relationship between 1q and 1c local to a particular q<sup>i</sup> = q(ni). This estimate is most accurate near the configuration q<sup>i</sup> , with increasing error as the distance from q<sup>i</sup> increases.

The linear approximation at a node n<sup>i</sup> is derived using the neighborhood N(ni) ≡ {n<sup>i</sup> ′|∃ei,<sup>i</sup> ′}, the set of all nodes n<sup>i</sup> ′ connected to n<sup>i</sup> by an edge for feasible motion. The local Jacobian estimate ˆJ(ni) considers all edges ei,<sup>i</sup> ′ such that n<sup>i</sup> ′ ∈ N(ni). Each edge provides an example pair of changes 1q = q<sup>i</sup> ′ − q<sup>i</sup> and 1c = c p i ′ − c p i . If there are m neighbors, and thus m edges, these can be combined as an m×7 matrix 1Q and a m×3 matrix 1C, respectively. ˆJ(ni) is the least squares solution of

$$
\Delta Q \,\hat{f}(n\_i) = \Delta \mathcal{C}.\tag{12}
$$

For a given change 1q in arm configuration, 1q ˆJ(ni) = 1c gives a local linear estimate of the resulting change 1c in the appearance of the hand. Conversely, given a desired change 1c in the appearance of the hand, the pseudo-inverse ˆJ <sup>+</sup>(ni) makes it easy to compute the change 1q in arm configuration that will produce that result.

**Figure 6** shows an example graph neighborhood and a visualization of the information contained in each edge. The resulting <sup>ˆ</sup>J(ni) is a 7 <sup>×</sup> 3 matrix where the element at - row,col gives the rate of change for c col (either the u, v, or d coordinate of the palm's center of mass) for each unit change to q row. A possible adjustment 1q to q<sup>i</sup> may be evaluated by determining if the predicted new palm center cˆ p <sup>i</sup> ≡ c p <sup>i</sup> <sup>+</sup> <sup>1</sup>qˆJ(ni) and the palm mask p<sup>i</sup> translated by 1qˆJ(ni) have desirable features. Rotations and shape changes of p<sup>i</sup> that will occur during this motion are not modeled, but are typically small.

Where n<sup>f</sup> is the final node of the planned trajectory for a reach, the agent can use the local Jacobian ˆJ(n<sup>f</sup> ) and its pseudoinverse ˆJ <sup>+</sup>(n<sup>f</sup> ) to improve the accuracy of its final motion, and the likelihood of causing a bump event.

Where c p f is the center of the palm in the percept in node nf , and c t is the center of the target object, the desired change in the palm percept is 1c = c <sup>t</sup> − c p f . Then the updated final configuration is

$$q\_f^\* = q\_f + (c^t - c\_f^p)\hat{\jmath}^+(n\_f) \tag{13}$$

When the agent moves to the configuration q ∗ f , the palm center should be approximately aligned with the target's center. A motion that aligns the centers should increase the size of the intersection, making the action robust to noise, and increasing the likelihood of the resulting bump event.

While the ability to make a small move off of the graph to q ∗ f increases the robustness of the reach, it does not eliminate the need for a set of candidate final nodes, or for the decision to use the nearest node to the target as n<sup>f</sup> . As ˆJ <sup>+</sup>(n<sup>f</sup> ) is a local estimate, if ||c <sup>t</sup> − c p f || is large, the error in the recommended 1q will also tend to be large. Choosing the nearest candidate n<sup>f</sup>

FIGURE 6 | (A) The agent considers the graph neighborhood around a node *ni* to estimate the change in appearance for small changes in configuration near *ni* . The predictions will be made by a local Jacobian estimate *<sup>J</sup>*ˆ(*n<sup>i</sup>* ) (see Equation 12). *ni* is near the center of P and has a large number of neighbors. Each edge is relatively short in configuration space, where edge feasibility is measured, even though some neighbors appear distant in image space. The furthest neighbors tend to be those where most of the edge length comes from a difference in proximal joint angles that have a larger effect on workspace position. (B) The images of the node *ni* and one of its neighbors are superimposed with a representation of the edge, drawn between their centers of mass. This example illustrates a change in configuration 1*q* and the resulting change in center locations 1*c* along one edge.

minimizes the factor by which natural errors in ˆJ <sup>+</sup>(n<sup>f</sup> ) will be multiplied, giving the best accuracy for the final position of the reach. Adding the use of the inverse local Jacobian gives the final reaching procedure below.

### 4.3.2. Experiment 7: Reaching to Target Adjusted by Local Jacobian

The final improvement in the reach action starts with the trajectory planned to the closest candidate node n<sup>f</sup> to the target object. The configuration q<sup>f</sup> in that node is then adjusted according to the local Jacobian for the neighborhood of n<sup>f</sup> . The final motion in the trajectory then goes to q ∗ f , rather than q<sup>f</sup> . In effect, the PPS graph supports a local linear approximation to the full Jacobian over the continuous configuration space, based in the neighborhood of each node.

This improved policy is evaluated the same way as Experiments 3, 5, and 6.

### 4.3.3. Experiment 7 Results

Using this procedure on the training set of target placements, the agent perceives bumps at the final node of all 40 trajectories. This 100% result demonstrates that the reach action has become reliable, and is a significant improvement from the previous methods shown in **Figure 7**.


### 5. LEARNING A RELIABLE GRASP ACTION

In our model, after the intrinsic motivation pattern has resulted in a reliable reach action, the pattern may be applied a second time to learn a grasp action. As the reach action toward a target object becomes more reliable, the result of causing a quasi-static change in the image of that object becomes more typical. However, there is an unusual result: during the interaction with the target object, the hand may reflexively close, providing sensorimotor experience with attempted and successful "accidental grasps."

Driven by intrinsic motivation, the grasp action becomes more reliable, toward becoming sufficient to serve as part of a pick and place operation in high level planning. In this case, additional requirements may be learned in a more flexible order, so we present the learning stages of our agent according to the order in which it considered the concepts. The agent must begin with the Palmar reflex to observe the unusual results of a reliable reach action without consciously closing the hand with correct timing. Our agent then learned: how to most reliably set the gripper's aperture during the grasp approach, how to best align the hand, target, and final motion, and how to preshape the hand by orienting the wrist. Each stage is presented with a Methods-Experiments-Results description.

### 5.1. Reaching With an Innate Palmar Reflex 5.1.1. Methods

Human infants possess the Palmar reflex, which closes the hand as a response to contact of an object to the palm. Our work assumes that the Palmar reflex is innate and persistent during at least early stages of learning to grasp. Within our framework, the primary importance of this reflex is to enable the observation of accidental grasps as an unusual event while reaching. While the

closing of the hand is unconscious, the agent learns the motor commands and sensations of closing the hand.

When conditions are correct, the Palmar reflex causes an accidental grasp, where the object is held tightly in the hand and becomes a temporary part of the self. This gives a much greater level of control over the pose of the object, as it can be manipulated with the agent's learned scheme for moving the hand until the relationship ends with an ungrasp, opening the fingers to release the object. The variety of outcomes possible with the level of control a grasp provides imply a high potential reward from learning to predict the outcomes and actions to cause them, but it is also the case that grasps occur too rarely to learn immediately after learning to reach. Without enough examples, learning the conditions for a grasp may prove too difficult, leading to a modest rate of improvement and a low reward. In our model, the agent focuses next on an intermediate rare event.

The activation of the Palmar reflex is such an event that may be observed as an unusual result of successful reaches. When the hand's final approach to the target meets all necessary conditions of openness, alignment, and orientation, the target object passes between the grippers in a way that activates the simulated Palmar reflex, and the gripper fingers close. The openness of the grippers is a degree of freedom for the robot's motion, and is continually sensed by proprioception. As a result, accurate detection of when the Palmar reflex has been triggered does not rely on the visual percept, and can be observed in a rapid decrease of openness to a new fixed point.

The closing of the grippers, either by reflex or conscious decision, is necessary for the agent to gain a higher level of control over the object with a grasp. In some cases, the initial interaction between the hand and object does not lead to the grippers closing around the object, and the attempt to gain control fails immediately. We refer to this event as a Palmar bump, as it often involves knocking away the object before the grippers can close on it. Like other bumps, this is a quasi-static change with an observably low IOU value between masks, and it is the result of a successful reach. While the Palmar bump is not a successful grasp, it serves as a useful near-miss example, promoting use of the conditions that allowed the reflex to trigger in future grasp attempts.

When a grasp occurs, the activation of the Palmar reflex is followed by the object shifting from its initial quasi-static state to a new dynamic state. Now held between the gripper fingers, the object begins to follow the hand with continued motion correlated with the motion of the hand. The agent can identify this corresponding motion by comparing masks and depth ranges during the return trajectory. A grasp is successful if and only if the stored masks and depth ranges for each node of the trajectory intersect with those of the target object in the visual percepts during the return to the home node. Note that the full hand masks and depth ranges are used since the gripper fingers, once closed, may obscure the portion of the object in the palm region. If all nodes of the trajectory have an empty mask or depth range intersection, control was never gained and the result is a Palmar bump. If at least one node fails the intersection check, but not all nodes, the grasp is considered to be a weak grasp. Here the grasp was initiated, but due to a loose or poor placement, did not persist through the return trajectory. Note that the loss of control of the object in a weak grasp does not involve an opening of the grippers, as an intentional ungrasp action would. **Figure 8** provides an example of the agent's visual percepts of a trajectory that produced each type of result.

Since the Palmar bump and weak grasp cases fail to gain or maintain control of the object, both are successful reaches but failed grasps. By considering both situations to be failures, the successful grasps that emerge from this learning process are more likely to facilitate subsequent learning of higher order actions that require a grasp. However, Palmar bumps, weak grasps and grasps share the sensed result of reflexively closing the hand, and may be assumed to share similar preconditions as well. Until a sufficient number of successful grasps are observed, the agent will draw information from all cases where the Palmar reflex was activated to learn to grasp.

images for the portion to return to *nh* are shown in the second rows. Images for some nodes in the middle of trajectories with more than five nodes have been omitted. The agent classifies the result of the grasp attempt by observing the state of the target object during the trajectory. In all cases but *miss*, there is a substantial change between the first and last observations, and the trajectory is a successful reach. In all other cases these observations should be significantly different, and the reach component of the grasp was successful. Further classification depends on the state throughout the return trajectory and if the Palmar reflex was activated, as discussed in section 5.1.1. Only the result of the final example is considered to be a successful grasp.

### 5.1.2. Experiment 8: Monitoring the Palmar Reflex During Reaching

We first attached a break-beam sensor between the tips of the Baxter robot's parallel gripper fingers to provide the agent with a simulated innate Palmar reflex. Then our agent repeated all trials of Experiments 3, 5, 6, and 7 in section 4, using the same target placements and planned trajectories. For each trial, the agent records if the Palmar reflex was activated, and which category of result (grasp, weak grasp, Palmar bump, bump, or miss) it observed.

### 5.1.3. Experiment 8 Results

It is clear that learning to reach more reliably and with greater precision allows more Palmar reflex activations and grasps to occur. With the random trajectories of Experiment 3, one of 40 activated the Palmar reflex, and this was a successful grasp. Using the final reaching method of Experiment 7, the agent observed that the Palmar reflex was activated in 12 out of the 40 trials. Of these 12, 5 were successful grasp trajectories. These provide a baseline reliability of grasping with random motion trajectories (2.5%) and of grasping with a reliable reach trajectory (12.5%). These results and those for intermediate reach methods are tabulated below, and also shown alongside the rest of the results for this section numerically in **Figure 11** and spatially in **Figure 12**.

Tabulated results from experiment 8:


### 5.2. Initiating Grasps With the Gripper Fully Open

#### 5.2.1. Methods

While exploring PPS and performing reaches, the agent is motivated to keep the hand fully open (a = 100). This presents the largest silhouette of the hand to keep in view, as desired, and the full extension allows for more interactions with objects when the extremities collide with them. As the PPS Graph was created, this setting also allowed a brightly colored block to be placed spanning the full width of the grippers, simplifying visual tracking of the "palm."

With the new event of a Palmar reflex activation during the interaction, the agent may choose to investigate its degrees of freedom. Each of the joint angles in q have an understood role in the placement of the hand, but a does not appear to significantly affect the location of the hand's center of mass and does not differentiate graph nodes. This allows it to be freely modified to investigate its influence on the frequency of Palmar reflex activations.

### 5.2.2. Experiment 9: Which Gripper Aperture Setting Is Most Reliable?

While it is intuitively desirable for the agent to approach targets with the grippers open for a Palmar bump or grasp, the agent does not yet have sufficient data to reach this conclusion. This is gathered by repeating the trajectories of Experiment 6, the final reaching method, with the Palmar reflex active and each gripper aperture of 0, 25, 50, and 75% open. These four sets of results can be compared with those for the fully open gripper that were already obtained in Experiment 7.

### 5.2.3. Experiment 9 Results

Two conclusions may be drawn from the results of this experiment, which are visualized in **Figure 9**. First, it is clear that the probability of activating the Palmar reflex increases with the openness a of the gripper during the approach. As a decreases, the opening of the hand narrows, and the object is less likely to pass inside with an approach of equal precision, so there are less activations. Once a is sufficiently low that the object cannot fit in the hand, the Palmar reflex never triggers. The agent will continue using the fully open setting a = 100 in future attempts to maximize its expected success rate.

Second, we see that the openness of the gripper has almost no affect the probability of a bump. In fact, only one trial was perceived to fail with any setting, and this was a false negative. We claim that this demonstrates the agent could have learned the reach action with the same process and ending reliability for any gripper setting, and at that point would learn to prefer 100% open. It is therefore not necessary for our model to assume any initial setting a for the gripper opening while learning to reach.

## 5.3. Planning the Approach With Cosine Similarity Features

#### 5.3.1. Methods

When reaching, it is important that the candidate final nodes satisfying Equation (11) are identified, and n<sup>f</sup> is chosen to minimize ||c <sup>t</sup> − c p f ||. To plan reaches that activate the Palmar reflex, additional features are needed to ensure not only that the final position is correct, but also that the hand orientation and the direction of final motion are suitable. These must be compatible during the approach, and must also be effective for the current target object. To learn to use satisfactory relationships between these vectors, the agent constructs this set of vectors using information from its stored and current visual percepts:

**gripper vectors:** pointing outward, near parallel to the gripper fingers.

$$
\begin{array}{l}
\vec{\text{g}}\_{\mathcal{P}} \equiv \text{ drawn from } c\_{\mathcal{P}}^{h} \text{ through } c\_{\mathcal{P}}^{\mathcal{P}} \\
\vec{\text{g}}\_{f} \equiv \text{ drawn from } c\_{f}^{h} \text{ through } c\_{f}^{\mathcal{P}}
\end{array}
$$

**motion directions:** direction of motion along an edge or toward a target

mE <sup>p</sup>,<sup>f</sup> ≡ the direction of the edge-based final motion from c p p to c p f

mE <sup>p</sup>,<sup>t</sup> ≡ the direction of the modified final motion from c p p to c t

mE <sup>f</sup> ,<sup>t</sup> ≡ the direction of displacement from c p f to c t

**object orientation:** the perceived major axis of the target object oE ≡ drawn along the major axis of t.

(14)

The agent learns cosine similarity criteria for the vectors of final motions that most reliably cause Palmar reflex activations in Experiment 10. In Experiment 11, the agent plans trajectories with final motions that satisfy this criteria to improve the reliability of Palmar reflex activations and grasps.

FIGURE 9 | The portion of attempted reach trajectories that produce observed bumps (orange), ground truth bumps (yellow), and Palmar bumps, or bumps which also trigger the Palmar reflex (purple) for varying gripper apertures *a*. The high reliability of the reach action is independent of *a*, indicating it could be learned and executed with any setting. By contrast, triggering the Palmar reflex is much more likely as *a* increases, and is learned as a prerequisite for the Palmar bump event and later for the grasp action.

### 5.3.2. Experiments

### **5.3.2.1. Experiment 10: Learning reliable cosine similarities**

To discover the best relationship between these vectors for repeating the Palmar reflex activation event, the agent uses the data from repeating the final reach trajectories of Experiment 7 in Experiment 8 with the Palmar reflex enabled. For each trajectory, it considers the cosine similarity C(Ev1, Ev2) of each pair Ev1, Ev<sup>2</sup> ∈ {Egp, gE<sup>f</sup> , mE <sup>p</sup>,<sup>f</sup> , mE <sup>p</sup>,<sup>t</sup> , mE <sup>f</sup> ,<sup>t</sup> , oE} and results. The cosine similarities are discretized to the nearest value in {−1, −0.5, 0, 0.5, 1}. The rate of Palmar reflex activations is observed for trajectories grouped by their discretized C values.

#### **5.3.2.2. Experiment 11: Planning well-aligned final motions**

The agent uses the results of Experiment 10 to plan the next set of trajectories to interact with the target. At this time, the agent does not have the ability to change any gE<sup>i</sup> to a particular direction to be perpendicular to oE. Therefore, instead of the nearest candidate final node, n<sup>f</sup> is selected from the candidates such that |C(gE<sup>f</sup> , oE)| is minimized. As before, ˆJ <sup>+</sup>(n<sup>f</sup> ) is computed and used to modify the final configuration to a more reliable q ∗ f by Equation (13). The agent may apply ˆJ <sup>+</sup>(n<sup>f</sup> ) again to create a preshaping position, a copy of the final position translated in the direction of −Eg<sup>f</sup> . This image-space translation has a magnitude of 21, the mean length of the final motion for all Palmar bumps and grasps previously observed. The preshaping position has configuration

$$q\_p^\* = q\_f^\* + 21(-\vec{\mathfrak{g}}\_f / ||\vec{\mathfrak{g}}\_f||) \tag{15}$$

and will replace qp. With this use of ˆJ <sup>+</sup>(n<sup>f</sup> ), it is expected that gE<sup>p</sup> ≈ Eg<sup>f</sup> , and the motion from q ∗ p to q ∗ f should be in the direction of gE<sup>f</sup> , opposite of the translation. In place of mE <sup>p</sup>,<sup>f</sup> , mE <sup>p</sup>,<sup>t</sup> , and mE <sup>f</sup> ,<sup>t</sup> , the direction of this motion is parallel to the gripper vector and near perpendicular to the target major axis. The three steps of choosing n<sup>f</sup> , adjusting to q ∗ f to match centers with the target, and translating to create a well-aligned preshaping position with q ∗ p are visualized in **Figure 10**.

The agent must plan a trajectory that ends with this approach. q ∗ p is not stored in P, so to find a feasible path to q ∗ p , the agent first identifies the nearest node n<sup>n</sup> ∈ P that minimizes ||q ∗ <sup>p</sup> − qn||. A graph search then yields the shortest path from the home node to

nn. After visiting nn, the arm will be moved from q<sup>n</sup> to q ∗ p , and then make the final motion to q ∗ f to complete the trajectory.

The reliability of the grasp action using this method for planning trajectories with aligned final motions is evaluated using the same layout of target placements as Experiment 7, with the Palmar reflex enabled as in Experiment 8. The agent also continues to record the frequency of all types of Palmar reflex activations.

#### 5.3.3. Results

#### **5.3.3.1. Experiment 10 results**

When Ev<sup>1</sup> 6= Eo and Ev<sup>2</sup> 6= Eo, the highest rate of Palmar reflex activations occurs in the C(Ev1, Ev2) ≈ 1 group. For any Ev<sup>1</sup> 6= Eo, the trajectories where C(Ev1, oE) ≈ 0 have the highest rate. The agent concludes that the ideal approach for the Palmar reflex activation event should use matching directions for all vectors describing the motion and orientation of the hand, {Egp, gE<sup>f</sup> , mE <sup>p</sup>,<sup>f</sup> , mE <sup>p</sup>,<sup>t</sup> , mE <sup>f</sup> ,<sup>t</sup> }, and all of these parallel vectors should be perpendicular to the target's major axis oE.

### **5.3.3.2. Experiment 11 results**

Using trajectories planned in this manner, 39 of 40 reaches are successfully completed and 21 of these activate the Palmar reflex. 14 of these activations result in a grasp. By choosing the best aligned candidate final node instead of the closest candidate node and then adjusting the entire final motion to match its gripper vector, the reliability of grasping is nearly tripled to 35%. **Figures 11**, **12** provide additional comparisons with results from other learning stages.

Tabulated Results from experiments 8 and 11:


### 5.4. Orienting the Grippers With the Wrist 5.4.1. Methods

For our Baxter robot, the joint angle setting q 7 , which controls the most distal twist joint, "wrist 2" or w2, affects only a small portion of the wrist with a roll of the hand relative to the axis of the forearm without changing this axis. This alters the orientation and perceived shape of the gripper opening, but leaves the position largely unchanged. The primary modification is to the

plane in which the gripper fingers open and close. Adjusting this is analogous to a human's preshaping techniques to ready the hand for grasping an object, though simpler, as there are fewer ways to configure parallel grippers than an anthropomorphic hand. For a grasp to be successful, the cross section of the object in the gripper plane must be smaller than the space between the grippers. Additionally, the angle at which the plane and the object meet must not be so steep as to squeeze the object out of the grip. Intuitively, the most reliable grasp approach rotates w2 so that the gripper plane is perpendicular to the target object's major axis.

### 5.4.2. Experiment 12: Copying Successful Wrist Settings

Without intuition for the correct orientation, the agent must find another criteria for predicting the wrist orientation that will be most reliable. By this time, the agent has observed that, like the gripper aperture a, q <sup>7</sup> does not have a significant impact on the hand's location in the image. This allows the agent to consider modifying q <sup>7</sup> without considering the graph nodes visited to change. In the same way, these changes do not conflict with the learned requirements for reaching or the previous grasping method of choosing n<sup>f</sup> such that gE<sup>f</sup> and oE are approximately perpendicular. In order to avoid new failures from introducing large, sudden rotations of the hand near the target, when a new q 7 is chosen it will be used instead of the stored q 7 value of all nodes in the trajectory nT<sup>j</sup> .

To begin, the agent repeats each successful grasp, with a linear search over values of q 7 to identify the longest continuous range where the attempt still succeeds. The center of this range will be saved as the ideal q 7 value for this example grasp. The agent will then retry each trajectory from Experiment 11. For each of these grasp attempts, the adjusted final configuration q ∗ f is computed by Equation (13), as before. Using the Euclidean distance between all other joint angles, hq 1 f , . . . , q 6 f i, the nearest neighbor example grasp is found for the current trial. The grasp is attempted with the ideal q 7 value from this example and all other angles unchanged.

#### 5.4.3. Experiment 12 Results

Over the same set of 40 object placements from previous experiments, this technique increases the number of Palmar reflex activations (Palmar bumps, weak grasps, and grasps) to 30 (75%), and grasps to 20 (50%), as shown in **Figures 11**, **12**. These increases come at the cost of one bump, where the target is now missed because the rotation of the hand prevents a collision that used to narrowly occur. In principle, any time new successes are achieved, they can be treated as new example grasps with ideal q 7 values to consider for trials with nearby target placements, allowing for further improvements to the success rate. However, in this training set only two still unsuccessful grasp attempts have different nearest neighbor examples than previously, and neither changes to a success with the new q 7 value. Iterations of using new nearest neighbors therefore end, but may be returned to in future work once more examples are available.

Tabulated Results from experiments 8, 11, and 12:


### 6. CONCLUSIONS

We have demonstrated a computational model of an embodied learning agent, implemented on a physical Baxter robot, exploring its sensorimotor space without explicit guidance or feedback, constructing a representation of the robot's peripersonal space (the PPS graph), including a mapping between the proprioceptive sensor and the visual sensor.

We make use of a specific form of intrinsic motivation. After learning the typical result of an action, and identifying an unusual outcome, the agent is motivated to learn the conditions that make the unusual outcome reliable. We apply this process once to learn reliable reaching, and again to learn (relatively) reliable grasping.

This work makes several contributions to developmental learning:

### 6.1. The Peripersonal Space (PPS) Graph

By unguided exploration of the proprioceptive and visual spaces, and without prior knowledge of the structure or dimensionality of either space, the learning agent can construct a graphstructured skeleton (the PPS Graph) that enables manipulator motion planning by finding and following paths within the graph. The graph representation requires only limited knowledge of the attributes of the nodes, and no knowledge of the dimensionality of the embedding space.

### 6.2. Learning Reliable Reaching

By learning conditions to make a rare action (i.e., reaching to cause a bump of a block) reliable, the agent learns a criterion on perceptual images (stored and current) that allows it to select a suitable target node in the PPS Graph. Motion to that target node accomplishes a reliable reach. The PPS Graph representation accounts for reaching in a way that matches striking qualitative properties of early human infant reaching: jerky motion, and independence from vision of the hand.

By interpreting the target node and its neighborhood as a sample from a continuous space, the agent can approximate the local Jacobian of the hand pose in perceptual space with respect to the joint angles. This allows it to adjust the trajectory to make reaching more reliable.

### 6.3. Learning Reliable Grasping

At this point, reaching reliably displaces the target block. Occasionally, instead of quasi-statically displacing the block, the block continues to move, to follow the subsequent motion of the hand. Making this result reliable requires several distinct conditions. The innate Palmar reflex makes these rare events common enough to learn from. Conditions on gripper opening, wrist orientation, and approach direction can all be learned based on positive feedback from the unusual block motion.

### 6.4. Future Research Directions

Our current model is very simple, yet it supports learning of reliable reaching and grasping. We hypothesize that an improved dynamical model of hand motion will better explain early jerky motion. We also hypothesize that progress toward smooth, directed, adult reaching will build on approximated interpolation methods exploiting information in the PPS graph, such as the local Jacobian. Finally, we expect to be able to model improvements in the visual system, allowing observations of the size and shape of the target object to influence pre-shaping of the hand.

### 6.5. Significance for Developmental Learning

There have been recent impressive results from unguided end-toend learning of multiple games (Silver et al., 2017, 2018). While these results are very exciting, some limitations come from the need for vast amounts of training experience, and the lack of transparency and explainability of the learned knowledge.

We hope that our work on reaching and grasping in peripersonal space can illuminate the kinds of intermediate states that a developmental learner goes through. Those intermediate states make the structure of the knowledge more comprehensible, and the learning stages between them more efficient. Combining the strengths of these approaches could be important.

### AUTHOR CONTRIBUTIONS

BK contributed the initial conception. BK and JJ collaborated on the development of the model and the design of the study, analyzing the data, wrote sections of the manuscript, and both contributed to manuscript revision, and read and approved the submitted version. JJ created the robot implementation, carried out the experiments, and collected the data.

### FUNDING

This work has taken place in the Intelligent Robotics Lab in the Computer Science and Engineering Division of the University of Michigan. Research of the Intelligent Robotics lab is supported in part by a grant from the National Science Foundation (IIS-1421168).

### REFERENCES


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2019 Juett and Kuipers. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# A Conceptual Model of Tactile Processing across Body Features of Size, Shape, Side, and Spatial Location

*Luigi Tamè1,2 \* † , Elena Azañón3,4,5 \*† and Matthew R. Longo1*

*1 Department of Psychological Sciences, Birkbeck University of London, London, United Kingdom, 2 School of Psychology, University of Kent, Canterbury, United Kingdom, 3 Institute of Psychology, Otto von Guericke University Magdeburg, Magdeburg, Germany, 4 Center for Behavioral Brain Sciences, Magdeburg, Germany, 5 Department of Behavioral Neurology, Leibniz Institute for Neurobiology, Magdeburg, Germany*

#### *Edited by:*

*Matej Hoffmann, Czech Technical University in Prague, Czechia*

#### *Reviewed by:*

*Elisa Magosso, University of Bologna, Italy Ashley R. Drew, University of Washington, United States*

#### *\*Correspondence:*

*Luigi Tamè luigi.tame@gmail.com Elena Azañón eazanyon@gmail.com*

*† These authors have contributed equally to this work and share first authorship*

#### *Specialty section:*

*This article was submitted to Cognition, a section of the journal Frontiers in Psychology*

*Received: 05 September 2018 Accepted: 29 January 2019 Published: 26 February 2019*

#### *Citation:*

*Tamè L, Azañón E and Longo MR (2019) A Conceptual Model of Tactile Processing across Body Features of Size, Shape, Side, and Spatial Location. Front. Psychol. 10:291. doi: 10.3389/fpsyg.2019.00291*

The processing of touch depends of multiple factors, such as the properties of the skin and type of receptors stimulated, as well as features related to the actual configuration and shape of the body itself. A large body of research has focused on the effect that the nature of the stimuli has on tactile processing. Less research, however, has focused on features beyond the nature of the touch. In this review, we focus on some features related to the body that have been investigated for less time and in a more fragmented way. These include the symmetrical quality of the two sides of the body, the postural configuration of the body, as well as the size and shape of different body parts. We will describe what we consider three key aspects: (1) how and at which stages tactile information is integrated between different parts and sides of the body; (2) how tactile signals are integrated with online and stored postural configurations of the body, regarded as priors; (3) and how tactile signals are integrated with representations of body size and shape. Here, we describe how these different body dimensions affect integration of tactile information as well as guide motor behavior by integrating them in a single model of tactile processing. We review a wide range of neuropsychological, neuroimaging, and neurophysiological data and suggest a revised model of tactile integration on the basis of the one proposed previously by Longo et al.

Keywords: somatosensory processing, space, body representation, laterality, body shape

## INTRODUCTION

There are multiple factors that determine how tactile stimuli on our body are processed to produce coherent tactile experiences and guide motor behavior. A large body of research over the past decades has focused on the effects that direct changes in the nature of the stimuli, such as texture (Johnson and Hsiao, 1992), inter-stimuli delays (Craig, 1983), duration (Gescheider and Migel, 1995), frequency (Gescheider et al., 2002), and intensity (Craig, 1974), have on the somatosensory response. Less research, however, has focused on body features that critically affect tactile processing beyond the physical parameters of the touch. These features include the size, shape, and spatial configuration of the body part stimulated, as well as the integration across different parts and sides of the body. In this review, we will focus on these features and describe: (1) how and at which stages tactile information is integrated between different parts and sides of the body; (2) how tactile signals are integrated with online and stored postural configurations of the body and/or locations in space; and (3) how tactile signals are integrated with stored models of body size and shape. We will describe how these different body dimensions affect integration of tactile information to produce a coherent representation of touch and perception of the body as an integrated whole.

Several years ago, two of us proposed a model of somatosensory information processing (Longo et al., 2010). The central premise of this model was that the processing of tactile information goes beyond primary *somatosensation,* by integrating immediate sensory signals with stored representations of the body. This type of higher order somatosensory processing, or somatoperception, contributes to somatic perceptual constancy, providing a coherent tactile percept on the body and contributing to the formation of the bodily self. In this model, we described how information from the body surface is remapped into an egocentric reference frame, how information about the shape and size of the body interacts with tactile processing, and the role that exteroceptive (i.e., perception of objects in the external world through their contact with the body) and interoceptive perception (i.e., percepts about the nature and state of the body itself) has in tactile perception. As described in the original papers (Longo et al., 2010, 2015b), the model is consistent with a wide range of neuropsychological, neuroimaging, and neurophysiological data.

At the core of this model is the claim that many aspects of higher level perceptions about somatosensory stimuli require that sensory signals be integrated with stored representations about the body itself. Specifically, Longo et al. (2010) postulated three distinct mental body representations: the superficial schema, the postural schema, and the body model. The superficial and postural schemas were first postulated by Head and Holmes (1911) on the basis of their studies of braindamaged patients. One group of patients could detect that they had been touched, but could not perceive *where* on their skin the touch had been applied. Another group of patients could perceive the location of touch, but could not tell where their affected limb was in space when they could not see it. Head and Holmes postulated the existence of the superficial and postural schemas to account for the impairments of these two groups of patients, respectively. In the model of Longo et al. (2010), the superficial schema is described as a mapping between locations within primary somatotopic maps and locations on the skin surface. The postural schema, in contrast, is a more dynamic representation of current body posture (i.e., joint angles), incorporating both afferent proprioceptive signals and efferent copies of motor commands. Finally, Longo et al. (2010) proposed a third representation of the metric properties (i.e., size and shape) of the body, which they called the body model.

In this paper, we address some further factors, which were not addressed by the model of Longo et al. (2010). A first aspect is the fact that the body is bilaterally symmetric, with homologous locations on the right and left sides of the body. A second aspect is the use of prior locations and stored postural configurations of the body when localizing touch. Here, we attempt to integrate laterality into their model as well as the use of prior information, with the aim of describing how touch is processed given the duality of the body (i.e., left and right side) and brain structures (i.e., left and right hemispheres), which goes hand in hand with the perception of the body as a single unit. Finally, we review recent advances in understanding the integration of touch and higher level representations of body size and shape, an issue at the core of the model.

### INTEGRATION OF TACTILE INFORMATION BETWEEN THE TWO SIDES OF THE BODY

Coordination between the two hemispheres is paramount for perception and motor control of the body. Indeed, early processing of tactile signals occurring on the two sides of the body is critical to perform appropriate goal-directed bimanual motor tasks. This notion seems to clash with the classical view that unilateral tactile stimuli are represented only in the contralateral primary somatosensory cortex (SI) (Penfield and Boldrey, 1937; Nelson and Chen, 2008). Indeed, the somatosensory and motor systems require continuous and sudden switches between lateralized and joint interhemispheric processing. Such processing includes the execution of simple actions, as well as more complex goal-directed motor behaviors. The stage of tactile sensory processing at which the interhemispheric transfer of tactile information occurs is still matter of debate (Allison, et al., 1989; Kanno, et al., 2003; Hlushchuk and Hari, 2006; Sutherland, 2006; Tommerdahl et al., 2006; Jung et al., 2012; Tamè, et al., 2016). In this section, we will describe some recent evidence in humans suggesting an early interhemispheric integration of tactile signals between the two hemispheres, possibly serving the execution of appropriate motor behavior.

### Behavioral Evidence of Tactile Interhemispheric Communication in Healthy Subjects

The first stage of bilateral integration of tactile information, at cortical level, is generally thought to occur in brain areas beyond the primary somatosensory cortex (SI; Eickhoff et al., 2010); however, recent evidence have shown that SI contributes to such a processing (Kanno et al., 2004; Tan, et al., 2004; Tommerdahl et al., 2006; Tamè et al., 2012). In macaques, bilateral receptive fields have been described as early as somatosensory area 2 (Iwamura et al., 1994, 2002), an area considered to be the homologue of Brodmann area 2 (BA 2) of human primary somatosensory cortex. Furthermore, interhemispheric interactions have been observed for stimuli presented to both paws, even in the core area of SI (area 3b) of owl monkeys (Lipton et al., 2006; Reed et al., 2010, 2011).

In humans, there is growing evidence about how and when this exchange of tactile information between the two hemispheres is likely to occur (Tamè et al., 2016). For instance, Tamè and colleagues developed a paradigm of double simultaneous tactile stimulation (DSS; Tamè et al., 2011, 2013). In this study, participants were instructed to detect the presence of a tactile stimulus on a target finger. Depending on the condition, the target finger was stimulated in isolation or concurrently with another finger (i.e., masker finger). The masker was a stimulus on a finger of the same or a different hand (i.e., index and middle fingers of both hands). In accordance with previous literature, results showed that when a masker was present there was an interference effect regardless of the stimulated hand. However, critically the amount of interference varied as a function of the stimulated finger rather than the hand (i.e., which hemibody was touched; see **Figure 1**). The same interference was present when the non-homologous finger, with respect to the target, was the masker regardless of the hand. By contrast, such interference was significantly reduced when the masker was the homologous finger of the other hand. Therefore, the information is differently processed for homologous body parts (compared to non-homologous), as if they were coming from the same side of the body (for similar evidence on fingers homology interactions across side using a different

paradigm, see Rusconi et al., 2014). This somatotopic organization provides indirect evidence that SI is involved in the side integration processing of touch. Such integration is altered when the spatial relationships between the hands/fingers change (Tamè et al., 2011). These last findings are in agreement with those reported by Haggard et al. (2006), who showed that under tactile stimulation, identification of the hand is affected by changes in hand posture, whereas this is not the case for the identification of the finger. Specifically, these authors suggested that tactile detection and finger identification occur at a somatotopic representational level, whereas hand identification occurs at a higher level in which postural information are taken into account. The role of the postural configuration in tactile processing will be widely discussed in the next section.

### Neuroimaging Evidence of Tactile Interhemispheric Communication in Healthy Subjects

Furthermore, using functional magnetic resonance imaging (fMRI), Tamè et al. (2012) identified the neural bases of bilateral integration of touch on homologous and non-homologous fingers of the two hands. In particular, Tamè and colleagues used an fMRI tactile adaptation paradigm in which pairs of vibrotactile

opposite hand). Moreover, in different blocks, participants assumed different postures (i.e., hands palm down or hand palm up). Unfilled circles: Stimulation at the target finger; filled black circles: stimulation at the non-target finger. Bar plots show percent errors as a function of stimulation condition and hands' posture. Error bars represent the standard error of the mean (±SEM). Adapted from Tamè et al. (2011). © 2011 by Elsevier. Permission for the use of the image has been obtained from the Elsevier.

Tamè et al. Integration of Tactile Information

stimuli were delivered on the left and right index and middle fingers. The adaptation paradigm relies on the reduced response of certain neurons that results from the repeated presentation of a specific feature to which these neurons are selective. On this basis, Tamè et al. (2012) hypothesized that if there are neurons that have finger-specific selectivity (i.e., index and middle fingers) a greater adaptation should emerge when the index finger (i.e., same finger) is stimulated twice compared to when different fingers are stimulated (i.e., index and middle fingers). They expected that such a pattern should emerge in SI, which is known to hold somatotopic representations. Critically, if SI is also capable of integrating stimuli that come from the two sides of the body, such a pattern should be present regardless of the side of stimulation (i.e., fingers of the left and right hand). Tamè et al. (2012) found that BOLD response was indeed greatly reduced in SI, as well as in SII, when the same finger was stimulated twice (index-index) compared to when different fingers were stimulated (middle-index), both when stimuli were delivered on the same and different hands. This result proved that SI can integrate tactile stimuli coming from the two sides of the body. The most likely subarea(s) of SI responsible for mediating such a processing can be identified as areas BA1 and BA2. Indeed, using the SPM (Statistical Parametric Mapping) anatomy toolbox, Tamè et al., 2012 identified the origin of their BOLD response in such areas. This is also compatible with studies on monkeys which showed the presence of bilateral receptive fields in area 2 (Iwamura et al., 2002). In order to overcome the limited temporal resolution of fMRI, in a subsequent study, Tamè and colleagues used a magnetoencephalography (MEG) adaptation paradigm to investigate whether the integration of bilateral tactile stimuli in SI occurred at early or late stages of tactile processing (Tamè et al., 2015). The results showed that when tactile stimuli were delivered on different hands, neural responses were somatotopically constrained, being smaller for stimulation of homologous than non-homologous fingers. Importantly, neural responses of the tactile stimuli of the two sides of the body interacted in SI at short delays (i.e., 25 ms). This is most likely due to the fact that the temporal integration window in SI is short (Mauguière et al., 1997) and long in SII (Wühle et al., 2011), suggesting that selective interaction for short delays is likely to occur within SI, rather than deriving from modulatory effects from higher level brain areas. Therefore, this pattern of results provides substantial evidence that integration of bilateral tactile stimuli on the hands cannot solely derive from higher stages of the tactile representation processing (i.e., SII and beyond) as previously suggested by other reports (Jung et al., 2012; Chung et al., 2014). The discrepancy between these results and some previous studies can be ascribed to different factors. A first possibility is that Tamè et al.'s (2015) adaptation approach has a greater sensitivity to detect changes in the neural activity in the somatosensory cortex under bilateral stimulation (Tamè et al., 2016). Indeed, this is not a trivial problem given the overwhelming response generated in the contralateral hemisphere following unilateral tactile stimulation. Another possibility, not mutually exclusive with the one just described, is the different type and locus of stimulation they used in their study compared to other works. Tamè et al. (2015) used a mechanical piezo tactile stimulator (i.e., a matrix of 2 × 5 rods; 1 mm in diameter) applied on the first phalange of the index and middle fingers for 12 ms. Instead, Cheng et al. (2014) stimulated the right index finger using a band-type MR-compatible device that pressed the whole ventral skin surface of the finger for 3 seconds, a rather long stimulation compared to Tamè et al., 2015. Moreover, Jung et al. (2012) used constant-current square-wave pulse stimulation with a very short duration (i.e., 0.2 ms), though they stimulated the median nerve of both hands at the level of the wrist, rather than the fingers as Tamè et al. (2015) did.

Overall, this result suggests that tactile stimuli from the two sides of the body (i.e., fingers) interact at an early stage of the tactile representation processing in the primary somatosensory cortex, most likely through transcallosal pathways which connect SI in the two hemispheres (see also the graphical representation of the transcallosal pathways model, **Figure 3** in Tamè et al., 2016).

### Sensorimotor Interhemispheric Communication in Healthy Subjects

A recent study by Tamè and Longo (2015) provided behavioral evidence of the role of topographical organization of callosal connections in the integration of sensorimotor (i.e., touch) stimuli across the two sides of the body. Using a classical behavioral paradigm to quantify sensorimotor transfer between hemispheres, i.e., the Poffenberger paradigm (Poffenberger, 1912), the study revealed a modulation of the sensorimotor interhemispheric integration time as a function of the body part stimulated. The Poffenberger paradigm relies on the logic that sensorimotor information is integrated and processed within the same hemisphere when a motor effector and the sensory signal are on the same side of the body (uncrossed). This behavioral paradigm is based on the fact that people respond faster (lower reaction times: RTs) when sensory stimuli are presented in the hemifield (for visual or auditory stimuli) or hemibody (for tactile stimuli) ipsilateral to the hand used to respond (i.e., sensory stimulus and motor response occur in the same hemisphere: uncrossed) than contralateral (sensory stimulus and motor response occur in different hemispheres: i.e., crossed). Poffenberger proposed that the time required for signals to transfer between the two cerebral hemispheres is reflected by the crossed-uncrossed difference (CUD) (Poffenberger, 1912; Marzi, 1999). By contrast, if sensory input and motor effector belong to different sides of the body, the information has to be integrated across hemispheres (crossed). In their study, the authors showed that the crossed-uncrossed difference in processing time was larger on the finger (2.6 ms) and forearm (1.8 ms) than on the forehead (0.9 ms; Tamè and Longo, 2015). The callosal connections and density of bilateral receptive fields (RFs) are consistent with such temporal difference. Indeed, it has been shown that regions that represent the periphery of body have less dense callosal connections compared to regions that represent the center (Pandya and Vignolo, 1969; Caminiti and Sbriccoli, 1985; Iwamura et al., 2001). This result suggests that the interhemispheric integration of sensorimotor stimuli, at least in the tactile domain, varies as a function of the strength of callosal connections of the body parts (Tamè and Longo, 2015). Interestingly, the cost that is paid when processing a stimulus that is on the contralateral side with respect to the effector can be vanished when touch is delivered on a seen hand. Therefore, the interhemispheric integration of tactile-motor responses can be improved by vision of the body (cf. Tamè et al., 2017a). A question that is interesting to ask is, which are the possible mechanisms that can account for this result? A first possibility is that participant's performance is enhanced by improving their motor performance when seeing the hand. Indeed, it has been shown that when participants have to perform a goaldirected action, seeing their own hand starting point enhances their performance in the motor task (Prablanc et al., 1979; Rossetti et al., 1994; Blanchard et al., 2013). Similarly, another study has shown that manual responses are primed by the vision of the participant's own hand (Longo and Haggard, 2009). A second possibility is that some attentional mechanisms are mediating such effect. Indeed, when participants see their own hand, a facilitatory effect occurs, which improves the processing of spatial tactile information selection on the body and/or attenuates the conflictual response coding between the stimulus and effector when they belonged to different body sides (Pierson et al., 1991). Note that these two cases may not be mutually exclusive. The neural substrate of such a processing is unclear; therefore, future studies should try to provide empirical evidence to define such mechanisms. Having said that, however, we know that when non-informative vision of the body is present participants give faster responses to touch compared to when vision of the body is absent, a phenomenon named "visual enhancement of touch" (VET; Tipper et al., 1998; Kennett et al., 2001). The neural correlates of such effect are thought to derive from a multisensory modulatory effect from the parietal cortex (Ro et al., 2004) where there are bimodal neurons (Graziano et al., 1994) that preactivate the somatosensory cortex improving tactile performance. Alternatively, in the study of Tamè et al. (2017a), the primary somatosensory cortex may have processed such information through a coupling with the visual areas. Indeed, it has been suggested that the "low-level" sensory areas may be multisensory in nature (Ghazanfar and Schroeder, 2006; Macaluso, 2006; Bruno and Pavani, 2018; Convento et al., 2018; Holmes and Tamè, 2018). However, the effect reported by Tamè and colleagues (Tamè et al., 2017a) cannot be solely explained by such a perceptual mechanism, given that they found faster responses to touch when vision of the body was present only in the contralateral hemisphere, i.e., stimulus and effector on different sides of the body, but not in the ipsilateral. Therefore, further studies are needed to clarify the mechanisms as well as the neural correlates of the improvement of interhemispheric integration of tactile-motor responses by vision of the body possibly through the integration of the perceptual and motor perspectives.

Moreover, other research has demonstrated that task demands can modulate tactile perception and processing as well as brain areas involved (e.g., Pritchett et al., 2012; Romo et al., 2012;

Tamè and Holmes, 2016). In particular, relevant to the present context, finger-specificity interactions for tactile stimuli delivered on the two sides of the body are present only when complex tactile tasks (i.e., tactile detection in a go-no-go context, tactile localization, and discrimination) have to be accomplished (e.g., Tamè et al., 2011, 2017c; Dempsey-Jones et al., 2015), but not when simpler tactile tasks (i.e., tactile detection in a two-intervals force choice design) have to be solved (e.g., Tamè et al., 2014). Indeed, in the latter case, Tamè et al. (2014) showed that tactile interference is the same regardless of the stimulated fingers of the two hands (Tamè et al., 2014). Therefore, the topographic organization in the bilateral interaction is modulated by the specific task demands (Tamè et al., 2016).

### Neuropsychological Evidence of Tactile and Motor Interhemispheric Communication

Sensory interhemispheric communication has also been studied in brain-damaged patients. A typical neuropsychological example of bilateral integration is patients with tactile extinction. Such individuals are perfectly capable of detecting a single tactile stimulus on one or the other side of the body. However, when two tactile stimuli are delivered simultaneously on the two body sides, patient fail to report the contralateral stimulus with respect to the locus of the lesion (Bender, 1945). Other neuropsychological examples are provided by mislocalization or reduplication phenomena. Mislocalization of touch across body sides has been termed *allochiria* (Obersteiner, 1881), whereas reduplication has been termed *synchiria* (Jones, 1908). Arm amputees and brain-damaged patients with hemiparesis and hemisensory loss are cases in which *allochiria* has been described (Bisiach and Berti, 1995) and in which these individuals can report contralateral referral of tactile sensations to the phantom body part (Ramachandran et al., 1995) or to the hand rendered anesthetic by stroke (Sathian, 2000).

Medina and Rapp (2008) described a case of tactile *synchiria* in which an individual who suffered from a left frontoparietal damage experienced bilateral tactile sensations after unilateral stimulation. The authors ascribed this effect primarily to a deficit in the inhibitory mechanisms that, in healthy individuals, naturally suppress the bilateral percept. This interesting interpretation would support the notion that unilateral tactile stimulation is capable to produce signals in both hemispheres.

Other conditions in which tactile referral to other body parts emerges are provided by patients who show mirror movements across homologous body parts. For instance, Farmer et al. (1990) studied a patient who suffered from the Klippel-Feil syndrome, a skeletal abnormality that is typically associated with mirror movements of the hand muscles (Bauman, 1932), in which voluntary activation of a muscle is replicated by an identical involuntary movement in the homologous muscle of the opposite hand. Interestingly, the authors found that unilateral electrical stimulation of the index finger produces an excitatory response in the stimulated side as well as a bilateral excitatory response approximately equal size and latency, whereas in the healthy subjects such a response was only present in the stimulated side (Farmer et al., 1990). Compatible with the idea of similarity Tamè et al. Integration of Tactile Information

between homologous parts of the two sides of the body, a recent study investigating the contribution of proprioceptive signals from the two sides of the body in the control of joint movements suggests the existence of a control programme that is common and uses proprioceptive information from the same joints of the two sides of the body (Han et al., 2013).

Based on these findings, Tamè et al. (2016) suggested that tactile information is integrated through transcallosal pathways connecting SI of the two hemispheres. Here, we aim to integrate this proposal into the model of somatoperceptual information processing developed by Longo et al. (2010; 2015b). In particular, we suggest that afferent tactile inputs from the two sides of the body reach Brodmann (BA) areas 3a and 3b of the contralateral primary somatosensory cortex, then continue to areas 1 and 2 – which also receive direct inputs from the thalamus – where the signals between the two sides of the body are integrated. At this point, tactile laterality is communicated to other brain areas within (i.e., 3a and 3b) and beyond (parietal areas as well as motor and premotor cortices) SI. Such integration process can have an important advantage. Indeed, it would be inefficient to maintain double representations of each body part along the whole tactile processing pathway, given that the structure of the body is homologous on either side of the body midline. Therefore, at higher level representation stages, beyond *somatosensation* using Longo et al.'s (2010) nomenclature, tactile inputs are processed using a single body model, which does not distinguish between the left and right body side.

The presence of a single body representation, for both sides of the body, is further suggested by neuropsychological evidence in patients suffering from left parietal lesions. For instance, it has been proposed that the *body structural representation* (BSR) is a critical component in mediating the knowledge about the spatial configuration of bodies. This notion relies on the fact that damage of such a representation results in conditions such as autotopagnosia (Ogden, 1985; Sirigu et al., 1991) and finger agnosia (Kinsbourne and Warrington, 1962). Studies of neurological patients (Schwoebel and Coslett, 2005) and healthy adults (Felician et al., 2004; Corradi-Dell'Acqua et al., 2009; Rusconi et al., 2014) provide evidence that the bilateral parietal cortex may mediate the structural representations of the body. A study by Rusconi and colleagues, using a bi-manual version of the in-between task (i.e., participants estimate the number of unstimulated fingers between two touched fingers), suggests that the left and right posterior parietal cortices contribute to the on-line sensorimotor representations (Pisella et al., 2000). Instead, they suggest that the connections between the left anteromedial inferior parietal lobe (a-mIPL) and the precuneus (PCN) provide the core substrate of an explicit bilateral BSR for the fingers that when disrupted can produce the typical symptoms of finger agnosia (Rusconi et al., 2014). Therefore, this study supports the notion of the presence of a single body model as a lateralized neural structure provides information about the representation of the body parts in space relative to each other that applies to the two sides of the body. Similarly, patients who suffer from *synchiria* are not able to distinguish

anymore which is the side from where the tactile input is coming from, given that they perceive the sensation as occurring on both sides (Jones, 1908).

Furthermore, the study by Han et al. (2013), which we described above, may suggest that a similar integration flow is occurring also for the proprioceptive signals, though further evidence is needed to assess it. Indeed, proprioceptive signals for the control of joint movements may be controlled by a common programme that is the same for the left and right sides of the body. Such a possibility is compatible with the idea that tactile inputs are processed using a single body model, which does not distinguish between the two sides of the body.

Overall, the psychophysical, neurophysiological, neuroimaging, and neuropsychological evidence we described suggest that integration of the tactile signal between the two sides of the body – i.e., hands – is likely to occur at early stages of the tactile representation processing within the primary somatosensory cortex as depicted in **Figure 2** (for an extensive review on this topic, see Tamè et al., 2016). Therefore, the afferent flow of tactile information from the thalamus reaches BA areas 3a and 3b of SI of the contralateral hemisphere with respect to the locus of stimulation who themselves project to areas 1 and 2 – which also have direct inflow of information from the thalamus. We propose that the side integration occurs in areas 1/2 of SI through transcallosal connections as shown by the neuroimaging studies in humans we described (Tamè et al., 2012, 2015; for a review see Tamè et al., 2016). Following this process, information about tactile laterality is communicated to other brain areas within SI (i.e., 3a, 3b), parietal areas, as well as the motor and premotor cortices (Sutherland, 2006). We do not have specific prediction about the nature of such a signal, i.e., excitatory or inhibitory, which most likely depends on the specific task demands. Future studies should focus on trying to provide further empirical evidence that can possibly support/ rectify or reject this hypothesis. We believe that a sensitive approach to pursue this goal can be to perform a series of tactile tasks with different levels of complexity that involve bilateral tactile stimulation of the body as well as require sidedependent or independent representation of the body. Ideally, such approach should be performed in combination with the state-of-the-art neuroimaging techniques such as, for instance, fMRI (where in the brain this is occurring), EEG (when is occurring), and TMS.

### INTEGRATION OF TACTILE INFORMATION WITH POSTURE

The previous section has dealt with the integration across body sides, explicitly neglecting the role that posture has on tactile processing. However, even in tasks such as the ones reported so far, in which the goal is to report the exact finger that has been stimulated, proprioceptive information would still play a fundamental role. This is so, as localizing touch on a body surface is not by itself sufficient to interact with the environment (Driver and Spence, 1998). As we move, our bodies and limbs change position, and the relative location of each touch varies with respect to the body and other objects in the environment. It is because of this countless combination of tactile and proprioceptive signals, each indicating different locations in external space, that the brain needs to consider posture when processing touch. This integration allows representing touch beyond skin space, i.e., in an external reference frame, making it available for goaldirected actions (Driver and Spence, 1998; Yamamoto and Kitazawa, 2001). There is now a consensus in the literature that this integrative process of tactile remapping occurs by default, weighting each reference frame accordingly to task demands, even in situations where postural integration is unnecessary (Azañón and Soto-Faraco, 2008a; Azañón et al., 2010a; Badde et al., 2015; Heed et al., 2015).

In the present section, we will focus on this integration and describe evidence suggesting not only the integration of touch and online proprioceptive signals but also between touch and *a priori* information regarding specific locations in space (i.e., spatial priors) and/or canonical postural representations (i.e., prototypical postural configurations). These prior configurations or locations in space might enable faster motor responses to spatial locations where the occurrence of touch is more probable, allowing faster integration with other modalities, for instance, to avoid threating stimuli.

### The Role of Vision and Development in Tactile Spatial Perception

Studies of children provide evidence that the process of tactile remapping is acquired during development, probably through active interaction with the environment (Bremner et al., 2008a). Tactile remapping develops with age (Bremner et al., 2008b; Pagel et al., 2009; Begum et al., 2014; Rigato et al., 2014), it is not present in infants younger than 6–10 months (Bremner et al., 2008b; Rigato et al., 2014; Begum Ali et al., 2015), and it has been associated to the ability to perform the first reaches to objects across the body midline, which suggest a tight relation with experience (Bremner et al., 2008a; Rigato et al., 2014). Furthermore, studies of the congenitally blind provide further support of the role of early visual experience in the processing of tactile stimuli later in life (Röder et al., 2004). For instance, congenitally blind individuals, who have never experienced visual input, do not show a detriment in tactile localization performance when the hands are crossed as compared to uncrossed (Röder et al., 2004; Collignon et al., 2009). This is not the case, however, for sighted participants or people who have become blind later in life, even after many years of having lost sight: performance with hands crossed is largely impaired as compared to uncrossed, even in situations where posture is irrelevant (Röder et al., 2004). This suggests that extensive visual experience during the first years of life leads to a default encoding of touch in terms of external space, even in cases where taking posture into account is detrimental. In support to this idea, the deprivation of visual input during the first years of life, by congenital dense bilateral cataracts in humans, hinders the normal development of a default remapping of touch in external space (Ley et al., 2013; Azañón et al., 2018).

Through acting in the world, sighted individuals are exposed to continuous sensorimotor contingencies across signals from the various modalities. Tactile spatial perception, thus, might therefore emerge as the repeatedly experienced correlation of specific activity of skin receptors with proprioceptive and visual information about limb position and the object touching the skin (Heed et al., 2015). This idea comes across clearly in Nissen et al. (1951), where a chimpanzee was raised from birth with pads covering arms and legs. These pads allowed the chimpanzee to move but prevented climbing and any manipulative behavior. The lack of opportunity for manipulation and for association of visual with tactile-kinesthetic sensations compromised to large extent basic tactile orienting responses later in life, such as orienting the head to the location of single touches presented to either hand. This suggests a large degree of impairment in basic tactile spatial processing after sensorimotor deprivation.

### Spatial Priors and/or Canonical Postural Representations

Under a framework in which tactile spatial perception emerges through active exploration with the environment, it is plausible that with experience, initially uncorrelated distributions of locations in space across tactile, proprioceptive, and visual signals become correlated during development. For instance, given the morphology and physical constraints of the arm, touches on the right hand would occur more often on the right side and around the center of the body, with respect to the body midline. This frequent co-occurrence of sensory signals in particular locations of space might promote the emergence of visual *spatial priors*, serving as reference points for localization of tactile events, analogous to the use of spatial prototypes, or Bayesian priors in other forms of spatial representation (Huttenlocher et al., 1991; Körding and Wolpert, 2004). Similarly, frequent occurrence of touch while adopting particular body configurations might promote the emergence of proprioceptive *canonical postures* (i.e., prototypical postural configurations).

Note that spatial priors and canonical proprioceptive configurations could produce similar behavioral effects but correspond to two separate concepts. Spatial priors, as defined in this review, do not require stored proprioceptive information, but stored representations about the most plausible locations of touch in visual space (e.g., touches on the right hand would occur more often on the right side). To our knowledge, this is the first time, the concept of spatial prior, as defined in visual space, has been linked to tactile remapping. The concept of canonical posture, more widespread than the concept of spatial prior in the literature of remapping (Yamamoto and Kitazawa, 2001; Azañón and Soto-Faraco, 2008a; Bremner et al., 2008a,b; Longo et al., 2010), assumes the existence of stored proprioceptive representations, which contain the most plausible body configurations for a given touch (i.e., for a touch on the hand, the canonical configuration assumes uncrossed arms).

The existence of spatial priors is clear in vision. For instance, it has been shown that memories of spatial locations are biased towards particular locations of space in a highly stereotyped manner and across individuals. For instance, when recalling the location of a dot inside a circle, participants' responses are biased towards the centroids of each quadrant (Huttenlocher et al., 1991, 2004). A widespread assumption from this type of result is that by integrating the memory for the actual stimulus with categorical information about where stimuli are expected to be, perceptual accuracy can be increased, though at the expense of introducing systematic bias (Cheng et al., 2007). Similarly, spatial priors in touch might provide accurate and faster tactile localization performance, pulling in nearby stimuli (as shown for visual priors), but also increase errors when large mismatches occur between the spatial prior (defined in visual space) and online tactile-proprioceptive signals. This could explain why crossing the hands produce more tactile localization errors than when the hands are at its anatomical and, therefore, expected location (see **Figure 3D**; Yamamoto and Kitazawa, 2001; Shore et al., 2002).

In light with the idea that frequent co-occurrence of sensory signals can lead to the establishment of priors, Azañón et al. (2015) have recently shown that repetition of touch in the same crossed posture, even if unattended, can lead to an improvement in tactile localization, which increases with respect to the number of preceding trials. These results hence confirm that recent tactile-proprioceptive co-occurrences can influence future tactile perception. Furthermore, the authors did not find evidence of a general improvement across the course of the experiment, as performance with hands crossed returned to initial levels of impairment every time posture changed (i.e., from crossed, to uncrossed and back). This detriment in performance following changes in posture might suggest that the brain initializes a fixed, default localization process with every new crossed posture, assuming that touches are located at the anatomical side. Thus, few co-occurrences along the time of an experiment cannot override long-life priors.

A beautiful example of how powerful and long-lasting priors can be when processing touch comes from the Aristotle illusion, first accounted by Aristotle (384–322 B.C.) in the essay ''On dreams''. In this illusion, a single object is touched with crossed fingers, but strikingly, the individual perceives two rather than one object (Benedetti, 1985). The illusion probably occurs because our brain fails to account for the actual crossed posture of the fingers and processes the sensations arising from the touched object as if the fingers were in their usual uncrossed posture (or, similarly, as if the touch was coming from the anatomical side). Only after months of exposure to this unusual configuration of the fingers, touch takes the real posture into account, and the illusion disappears (Benedetti, 1991). Closely related to this, when two taps are applied in sequence to crossed hands at short intervals, many participants systematically report the first stimuli to occur on the opposite hand (Yamamoto and Kitazawa, 2001; Kóbor et al., 2006; Heed and Azañón, 2014). This can be interpreted as people initially perceiving the location of the touch from the visual side where the hand usually is in space. For instance, for a right-hand touch, the right side of space, which now is occupied by the left hand, would serve as a prior spatial location. Evidence for this comes from visuotactile attention paradigms. When a touch is presented on a crossed

hand, quickly followed by a light (<60 ms later), participants are faster in responding to the light in opposite-side (i.e., anatomically congruent) trials than in same-side (i.e., spatially congruent trials). Thus, touches to the left hand, now placed on the right side, facilitate processing of left hemispace visual events and vice versa (see **Figures 3A,B**; Azañón and Soto-Faraco, 2008a,b; Azañón et al., 2010a). In a similar fashion, a proportion of saccades or reaches directed towards a touch on a crossed limb are initially directed towards the opposite limb, as if they were uncrossed, and then corrected online, several hundred ms later (see **Figure 3C**; (Groh and Sparks, 1996; Overvliet et al., 2011; see Brandes and Heed, 2015 for reaching trajectory). Finally, it has been shown that disruption of tactileproprioceptive integration by transcranial magnetic stimulation (TMS) in humans, over the putative right ventral intraparietal cortex, induced participants to underestimate the height of touches delivered to the arm (Azañón et al., 2010b). In this study, participants placed their left arm upright, close to the face, and participants discriminated the location of a touch on the arm, with respect to a touch on the face. The location of the touches on the arm was perceived as coming from a lower

just noticeable difference (JND) for uncrossed and crossed postures.

position. This could suggest that disruption of tactileproprioceptive integration by parietal TMS forced touch to rely on an offline proprioceptive representation, in which the arms are represented in their prototypical position, with hands below the face (Azañón et al., 2010b).

In Longo et al. (2010), we introduced the idea that at early stages of tactile processing, and hence, before touch is integrated with an up-to-date proprioceptive representation, the brain assumes for each touch, a stored representation of a canonical posture for that touch. Later, this *a priori* information is overtaken by the actual proprioceptive information or simply weighted less. However, the evidence put forward for this claim (and reviewed in the previous paragraph) does not differentiate between spatial visual priors and canonical postural representations. From a spatial prior perspective, touch is referred in these examples, to the location in visual space where the hand normally is (i.e., the right side of space, for the right hand, or below the face in Azañón et al., 2010b TMS example), without need to account for a particular proprioceptive configuration. From a canonical perspective, however, this effect would be driven by a stored representation of the prototypical layout of the limbs (i.e., a default proprioceptive condition that assumes that the hands are not crossed and placed below the face; see for instance Yamamoto and Kitazawa, 2001).

Regardless of whether these effects are driven by purely visual or by purely proprioceptive priors or a combination of the two, definite and direct evidence for the existence of priors in touch is needed. Note that some direct hypotheses arise from the previous discussion: (1) If tactile stimuli are processed taking into account prior information (in particular, a priori spatial location), one might expect tactile localization biases to occur. (2) If the same skin area is stimulated under different postures, localization biases for that skin area should converge to particular areas of space. Thus, it should be possible to track experimentally these priors touching the same body areas across changes in posture. (3) If tactile stimuli are first processed using a priori information and this a priori information is subsequently adjusted based on the actual spatial location of body parts, then, larger biases should be found at early stages of tactile processing, as compared to later. With regard to possible neural substrates, multimodal neurons with "intermediate" receptive fields in the posterior parietal cortex, and whose activity is gain modulated by the position of the eyes in the orbit, the hand or the head (Pouget et al., 2002; Avillac et al., 2005; Chang and Snyder, 2010) might be able to encode visual priors. Similarly, area PE in the superior parietal lobule (equivalent to BA 5 in the human brain) might be involved in the processing of proprioceptive priors. Some PE neurons in the monkeys react to complex body postures involving several joints (Sakata et al., 1973), and some also respond to tactile stimuli, but only if the limbs and joints are placed in certain positions. Indeed, Sakata and co-workers already suggested that such neurons would be able to encode the spatial position of the touching object relative to the body axis (Sakata et al., 1973).

It is worth noting that the idea of canonical representations of the body is not new. Already in the 1970s, Bromage and Melzack oberved that during the induction of reversible upper and lower limb deafferentation, *via* brachial plexus and epidural anesthesia, participants reported highly stereotyped postures, with arms and legs at their anatomical side, with joints approximately midway through their range of flexion (above the abdomen or lower chest for the arms, and with the legs semiflexed at the hips and (Knees; Melzack and Bromage, 1973; Bromage and Melzack, 1974; see also Gross et al., 1974; Gross and Melzack, 1978). More recent studies have shown that a fully extended finger, wrist, and elbow become a flexed phantom after ischemic anesthesia, though some aspects of the induced phantom sensation change according to the posture held at the time of anesthesia (Inui et al., 2011, 2012a,b). Even though Bromage and Melzack considered these canonical representations outside the frame of tactile processing, the type of proprioceptive priors proposed here might be fundamentally equivalent. Indeed, the authors assumed that this postural archetype may arise by the activity in neural cell assemblies that are developed by earlier sensorimotor activities encountered in a life time, therefore including touch (Melzack and Bromage, 1973). Similarly, a recent study has shown preferential associations between the thumb and the index finger and the relative spatial positions of "top" and "bottom," suggesting that body parts and spatial locations are stably associated (Romano et al., 2017). In this study, participants were exposed to touches on either the thumb or index fingers. Both hands were placed in front of the body, one on top of the other, with the four stimulated fingers shaping the vertices of an imaginary square and with each homologous fingers (index and thumb) facing each other without touching. In this way, the thumb could be on a relative top position or on a bottom position and vice versa for the index finger. Participants received a single tactile stimulation at one of the four possible locations and were asked to discriminate as quickly as possible whether the top or bottom finger had been touched. The authors found consistent preferential associations between the index finger and the top position and between the thumb and the bottom position, both with and without vision. In this paper, the authors speculated that a canonical postural representation might contribute to somatosensory spatial processing and associate this representation to the fact that for many common grasping actions the index finger is placed in a relatively higher location than the thumb (Romano et al., 2017). This is in agreement with the idea that long-term sensorimotor experience, such as grasping, can create specific functional categories in the brain, which can modulate early stages of somatosensory processing (Shen et al., 2018).

### Examples of Integration of Touch and Online Proprioceptive Information

The idea put forward in this section is that at early stages of tactile processing, possibly before the brain had time to incorporate an online representation of current posture, touch is integrated with (or influenced by) stored representations. This is, however, independent of two facts, i.e., touch necessarily relies on up-to-date proprioceptive information to generate locations in external space, and localization of body parts is tightly linked to visual processing (Limanowski and Blankenburg, 2016). Thus, integration between touch and proprioception for tactile localization often co-occurs with vision (note that other forms of interactions, e.g., with motor commands, are omitted for the sake of brevity; Hermosillo et al., 2011).

The fact that tactile localization is affected by changes in posture (such as hand crossing) is evidence of the integration of touch with online proprioceptive information (Yamamoto and Kitazawa, 2001). There are many other examples in the literature showing effects of posture on somatosensory processing, even when these are visually induced (highlighting the role of vision in body parts localization; Gallace and Spence, 2005; Azañón and Soto-Faraco, 2007; Folegatti et al., 2009). For example, localizing the order of two touches, applied one to each uncrossed hand, becomes easier when the horizontal distance between the two hands increases (Shore et al., 2005). This improvement is observed, even if the separation is not physical, but visually introduced by mirror reflection (Soto-Faraco et al., 2004; Gallace and Spence, 2005). This is the case also for tactile localization with hands crossed (Roberts et al., 2003), which also improves when the separation spans other spatial dimensions (vertical and depth; Azañón et al., 2016a).

Studies on tactile spatial attention further demonstrate the strong interconnection between online postural information and touch (Lakatos and Shepard, 1997; Aglioti et al., 1999; Heed and Röder, 2010). For example, tactile attention to one hand in healthy individuals improves by separating the arms (e.g., Driver and Grossenbacher, 1996; Soto-Faraco et al., 2004). When the task requires switching attention from one hand to the other, then participants' performance improves by reducing the distance between the arms (Lakatos and Shepard, 1997). Furthermore, when participants discriminate the elevation of a tactile target applied to the index finger or thumb of one hand, there is facilitation from a simultaneous touch on the unattended hand when it is presented in a congruent (e.g., both up) rather than in an incongruent elevation, regardless of the orientation taken by the hand, and therefore the actual finger stimulated (e.g., whether both index fingers are placed on top of the thumbs or a single hand is rotated, and the thumb is on the top of the index finger; Soto-Faraco et al., 2004). Altogether, these results suggest that tactile attention is affected by the posture of the touched body part, given that performance is modulated by the distance and orientation of the body parts despite the somatotopic relationship across the involved skin sites is kept constant in the brain (see also Rinker and Craig, 1994; though see Evans and Craig, 1991; Evans et al., 1992; Röder et al., 2002; Haggard et al., 2006, and Kuroki et al., 2010 for evidence regarding a somatotopic dominance in tactile localization).

Research on patients provides further evidence of the influence of posture in tactile processing. This is the case, for instance, of tactile extinction, already defined in the previous section, or tactile hemineglect, in which tactile stimulation of the contralesional limb (usually the left) is neglected (Vallar, 1997; Driver and Vuilleumier, 2001). The strength of tactile inattention is reduced by the location of the affected body part in space. Thus, some patients improve tactile detection at the contralesional hand when it crosses the midline to the ipsilesional side (Smania and Aglioti, 1995; Moro et al., 2004) or even within the same hemispace when the affected hand crosses the other hand (Aglioti et al., 1999; Moro et al., 2004). Further support comes from patients with extinction anchored to different body parts. In particular, these patients extinguish touches that are presented at the left-most side region of the stimulated body part in external space, say the limb, the hand, or the finger (with respect to their long axis), regardless of the spatial orientation taken by them (e.g. palm up or down; Moscovitch and Behrmann, 1994; Tinazzi et al., 2000; see Medina and Rapp, 2008 for an example in other neurological patients).

Overall, these studies show the impact of postural information in tactile localization. It is important to stress, however, that postural information arises not only from proprioception, but in many instances also from vision. The role of vision in body part localization is evident when a conflict between proprioception and vision is introduced (Rossetti et al., 1995). For instance, in a recent study, Lohmann and Butz (2017) introduced a virtual dissociation of proprioceptive and visual hand position information by combining immersive virtual reality with online motion capturing. They showed that participants unknowingly shifted their hands to compensate for the visual shift. Perhaps the most classical approach to induce visuo-proprioceptive conflict, however, is the rubber hand illusion (RHI, Botvinick and Cohen, 1998). In this classical illusion, participants observe a fake hand being stroked while their real (unseen) hand is synchronously touched. After several seconds of simultaneous stroking, participants tend to perceive the felt tactile sensation as originating from the rubber hand. This usually results in a feeling of ownership and a relocation of the perceived position of the real hand towards the rubber hand (Botvinick and Cohen, 1998; see also Tsakiris and Haggard, 2005). By combining the rubber hand illusion with temporal order judgments with hands crossed, Azañón and Soto-Faraco (2007), found that observing a pair of uncrossed rubber hands reduces the deficit of localizing touches at the hands when crossed. Interestingly, this modulation was mostly observed when visual information about the rubber hands could be attributed to one's own actions (i.e., when movements of the real hand were mirrored by movements of the rubber hand, in an anatomical fashion), highlighting the role not only of visual information in tactile remapping but also of motor information and the sense of agency.

In summary, we have shown the profound effect that postural information has on tactile processing. However, we have also shown that this is not always the case. Early during development, and in individuals deprived from vision, touch is unaffected by the configuration of the limbs (Röder et al., 2004; Bremner et al., 2008b). Thus, active interaction with the environment and presence of visual inputs seem to modify the way we process and localize touch later in life. As a result of this same interaction, some postural configurations and spatial locations might become associated to particular touches over time, producing what we called canonical postural and spatial priors. We argued that these priors could serve as reference points for localization of tactile events, producing more accurate and faster tactile responses, although biased towards the prior location or proprioceptive configuration. The hypothesis that canonical priors might influence tactile processing is still speculative; however, a growing body of results, some of which have been reviewed here, provides increasing evidence of biases in tactile localization that fit well with the existence of such priors.

### INTEGRATION OF TACTILE INFORMATION WITH REPRESENTATIONS OF BODY SIZE AND SHAPE

The final form of integration we will discuss is integration of immediate tactile signals with stored representations of body size and shape. Several forms of perception involve referencing sensory signals to models of the body itself. For example, the use of convergence angles for visual depth perception requires that the distance between the two eyes be known (Banks, 1988), while the use of temporal differences when sounds reach the two ears for auditory localization requires that head width be known (Aslin et al., 1983). Other studies have shown, for example, that representation of eye-height affects perception of the passability of doorways (Warren and Whang, 1987;

Leyrer et al., 2015), hand size affects the visual size perception (Linkenauger et al., 2010, 2014), and arm length affects the size of peripersonal space (Longo and Lourenco, 2007; Lourenco et al., 2011) and perception of visual distance (Linkenauger et al., 2015). These issues are especially acute in touch, given that the primary receptor surface (i.e., the skin) is physically co-extensive with the body itself.

### The Role of a Body Model in Tactile Distance Perception

A central part of the model of somatoperceptual information processing proposed by Longo et al. (2010) was therefore a stored representation of body size and shape, what they called the *body model*. Stimulation of even single mechanoreceptive afferent fibers in the human median nerve can produce clearly localized tactile sensations (Schady et al., 1983). Imagine, however, that two distinct points on the hand are touched. There is nothing in either of the two resulting signals or their combination that specifies how far apart the two stimuli are. Perceiving the distance between two stimulus locations on opposite sides of the hand effectively reduces the problem of knowing how big one's hand is. Longo et al. (2010) proposed that this is achieved by combining the location of touch within primary somatotopic maps in somatosensory cortex with the body model.

Evidence in support of this interpretation comes from studies showing that illusions which alter the perceived size or shape of the body produce corresponding changes in the perception of tactile distance. Taylor-Clarke et al. (2004), for example, showed participants a magnified video image of their forearm alongside a minimized image of their hand. Subsequently, the relative perceived distance between two touches was expanded on the forearm and compressed on the hand. Similarly, de Vignemont et al. (2005) explored this issue using the so-called vibrotactile illusion. In the vibrotactile illusion, vibration applied to a muscle tendon produces an illusion of muscle lengthening and a corresponding illusion of proprioceptive limb displacement (Goodwin et al., 1972). Lackner (1988) showed that when this illusion was generated while the affected limb was in continuous contact with another part of the body, illusory changes of experienced body part size could be produced (i.e., the "Pinocchio illusion"). De Vignemont et al. (2005) used this method to produce the illusion that the index finger was longer or shorter than its actual size and showed that such changes affected the perceived distance between touches on the finger, compared to a control skin location (the forehead). Similar results have also been reported in other studies (Bruno and Bertamini, 2010; Tajadura-Jiménez et al., 2012).

Further evidence that higher level representation of the body shapes the perception of tactile distance comes from studies showing that the segmentation of the body into discrete parts produces categorical perception effects, with perceived tactile distances being expanded across joint boundaries (de Vignemont et al., 2009; Le Cornu Knight et al., 2014, 2017; Shen et al., 2018). Similarly, tool use, which can be interpreted as a functional extension of the body (e.g., Maravita and Iriki, 2004), has recently been shown to produce systematic changes in the perception of tactile distance on the arm wielding the tool (Canzoneri et al., 2013; Miller et al., 2014, 2017a,b). Moreover, the nature of these effects is determined by the relation between the tool and the body: a long stick altered touch on the forearm but not the hand, whereas a hand-shaped tool altered touch on the hand but not the forearm (Miller et al., 2014).

### Baseline Distortions of Tactile Distance Perception and the Pixel Model

Intriguingly, even at baseline, there are large misperceptions of tactile distance, which have been investigated since the 19th century. In his classic work, Weber (1996) noticed that as he moved the two points of a compass across his skin it felt like the distance between them increased as they moved from a region of relatively low sensitivity (e.g., the forearm) to a region of higher sensitivity (e.g., the palm of the hand). Subsequent research has replicated these results and found that the perceived distance between touches on the skin has a systematic relation to the relatively sensitivity of different skin regions (Goudge, 1918; Cholewiak, 1999; Taylor-Clarke et al., 2004; Anema et al., 2008; Miller et al., 2016), an effect now known as *Weber's illusion*.

Interestingly, similar results have also been found comparing the perceived distance between points aligned in different orientations on a single skin surface. For example, Longo and Haggard (2011) found that the perceived distance between touches on the hand dorsum was about 40% larger when the touches were oriented across the width of the hand, than along hand length. Other studies have reported similar results (Longo and Sadibolova, 2013; Calzolari et al., 2017; Longo and Golubova, 2017; Longo, 2017b; Tamè et al., 2017b), and similar anisotropies have been described on a number of skin regions, including the forearm (Green, 1982; Le Cornu Knight et al., 2014), the thigh (Green, 1982), the shin (Stone et al., 2018), and the forehead (Longo et al., 2015a; Fiori and Longo, 2018). Intriguingly, the direction of this effect appears to be the same on all skin regions where anisotropy has been reported, with distances aligned with body width overestimated compared to those aligned with body length or height. However, the magnitude of anisotropy appears to differ systematically across the skin, suggesting that it arises from factors specific to each skin surface rather than a more general perceptual or cognitive bias.

In previous work, we have suggested that such effects may arise from the geometry of the receptive fields (RFs) of neurons in somatosensory cortex, based on what we called the "pixel model" (Longo and Haggard, 2011; Longo, 2017a). The central idea of this model is that tactile RFs in a somatotopic map are treated like the pixels of a two-dimensional spatial image of the body, with distances calculated by counting the number of unstimulated RFs between two activation peaks. Because the RFs representing sensitive skin regions are smaller than those representing less-sensitive regions (Powell and Mountcastle, 1959; Sur et al., 1980), any given stimulus will have more unstimulated RFs between peaks if applied on a sensitive than a less-sensitive surface, potentially accounting for the classic form of Weber's illusion. Similarly, the RFs of neurons representing the hairy skin of the limbs are generally oval-shaped (rather than circular), with the long axis of the oval aligned with the proximo-distal limb axis (Powell and Mountcastle, 1959; Brooks et al., 1961; Alloway et al., 1989). This anisotropy of RF geometry can potentially account for the perceptual anisotropies described above, given that the spacing between the RFs of adjacent neurons in somatotopic maps is known to be a constant proportion of RF size (Sur et al., 1980). Recent results have been consistent with this model in showing that tactile distance anisotropies can be well characterized by geometrically simple deformations (e.g., stretches) of tactile space (Longo and Golubova, 2017; Fiori and Longo, 2018).

### Tactile Distance Perception and Clinical Disorders of Body Image

A number of recent studies have reported disruption of tactile distance perception in clinical disorders (e.g., Keizer et al., 2011, 2012; Scarpina et al., 2014; Spitoni et al., 2015; Mölbert et al., 2016; Engel and Keizer, 2017). For example, Keizer and colleagues (Keizer et al., 2011, 2012) found that in comparison with healthy controls, patients with anorexia nervosa overestimated tactile distances on both the belly and hand. In a subsequent study, Spitoni et al. (2015) compared tactile distances on the belly and sternum. Patients with anorexia overestimated distances on the belly compared to the sternum, but only when stimuli were aligned with the width of the body and not when they were aligned with body length. This effect is intriguing in that it shows specificity in the distortions of tactile distance perception shown by the patients that mirror their subjective body image (i.e., the fact that they experience their body as fatter than it actually is). Thus, this result provides further evidence for a deep relation between the experience of tactile distance and higher level representation of the body (cf. Longo, 2015).

There is also some evidence that the illusions of tactile distance perception we have described above mirror distortions of body perception in other domains [for review, see (Azañón et al., 2016b; Longo, 2017a)]. For example, studies investigating body representations underlying proprioceptive position sense have reporting similar distortions, with overestimation of hand width relative to length (Longo and Haggard, 2010, 2012a; Ganea and Longo, 2017). Similarly, other studies of the explicit body image have also revealed overestimation of body width, using a range of measures including visual comparison (Shontz, 1969; Longo and Haggard, 2012b), the image marking procedure (Meermann, 1983), the moving caliper procedure (Halmi et al., 1977; Dolan et al., 1987), the adjustable light beam apparatus (Thompson and Thompson, 1986; Dolce et al., 1987), and several others (Bianchi et al., 2008; Fuentes et al., 2013; D'Amour and Harris, 2017). The distortions described above of tactile distance perception thus appear to be just one reflection of a broader perceptual bias to overestimate body width, which appears in many types of task.

### DISCUSSION

In this review, we have explored two aspects of tactile processing that were not considered in the model proposed by Longo et al. (2010), i.e., the integration of touch across the two sides of the body and the use of stored proprioceptive information about the location of touch in space. In addition, we have reviewed recent results concerning the integration of tactile signals with representations of body size and shape since we developed the model.

Regarding the integration of touch across body sides, a large body of evidence, as discussed in the first section, suggests that the integration of tactile signals between the two sides of the body is likely to occur at early stages of tactile processing, i.e., within the primary somatosensory cortex. This line of evidence challenges the textbook account that SI supports only unilateral tactile representations of the contralateral side of the body, whereas structures beyond SI, in particular SII, support bilateral tactile representations. Therefore, in the construction of the somatic percept, the interhemispheric transfer of tactile information occurs very early in time and depends on the spatial and temporal characteristics of the stimuli (Tamè et al., 2012, 2015), the type of task (Tamè et al., 2011, 2014, 2016), as well as the relative position of the parts of the two sides of the body in space (Tamè et al., 2011, 2017c). We propose that such integration occurs in areas 1 and 2 of the primary somatosensory cortex through transcallosal connections as shown by the neuroimaging studies we described (e.g., Tamè et al., 2012, 2015). Following this integrative process, information is then sent to other brain areas within SI (i.e., 3a, 3b), parietal areas, as well as the motor and premotor cortices.

In our previous model (Longo et al., 2010), we proposed that three different types of body representations were required to process touch. Namely, the superficial schema, mediating localization of somatic sensations on the body surface; the model of body size and shape, which was discussed in the last section of this review, and the postural schema, an online and up-to-date proprioceptive representation of the limbs in space. Nonetheless, several considerations converge to support the idea that the processing of touch also involves an offline representation of the most plausible spatial locations for a given touch (Azañón and Soto-Faraco, 2008a; Overvliet et al., 2011) or the most possible configurations of the body in space (Yamamoto and Kitazawa, 2001; Romano et al., 2017). We suggest that these representations or stored information are tightly linked to the postural schema, specially, in the particular case of canonical proprioceptive priors. Minor deviations from this template are maximally informative for comparing current body posture and, in this way, retrieving the up-to-date body schema in a dynamic way. In this hypothetical framework, online sensory information about the tactile stimuli on a body part in a given posture (postural schema) would be combined with information about this offline proprioceptive standard, every time a

touch is presented. Consequently, when online information is accurate, both schemata are combined to increase accuracy and speed of tactile processing, as the prior should be seen as the statistical mean for all co-occurrences between touch and this particular body configuration encoded throughout a lifetime.

**Figure 4** shows an updated depiction of Longo et al.'s (2010) model where we have included the notion that touch is necessarily integrated across the two sides of the body. In **Figure 4**, we suggest that touch is integrated between the two sides of the body before the processing that constructs percepts and experiences of somatic objects and events and of one's own body (i.e., somatoperception). We have also included a fourth body representation, a canonical prior, to denote the use of priors in the localization of touch. This prior would interact mostly with the postural schema to produce a fast and accurate, though sometimes biased, localization of touch in space.

### REFERENCES


Taken together, with the inclusion of the concepts of body laterality and prior information, this review provides a more comprehensive conceptualization of tactile processing than our previous model (Longo et al., 2010, 2015a,b). Furthermore, with the revision of a wide range of recent neuropsychological, neuroimaging, and neurophysiological data, we provide evidence that the claims we made 8 years ago are still up-to-date.

### AUTHOR CONTRIBUTIONS

The three authors contributed equally.

### FUNDING

This paper was supported by European Research Council grant (ERC-2013-StG- 336050) under the FP7 to MRL.


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Copyright © 2019 Tamè, Azañón and Longo. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

# Brain-Inspired Coding of Robot Body Schema Through Visuo-Motor Integration of Touched Events

#### Ganna Pugach<sup>1</sup> \*, Alexandre Pitti <sup>1</sup> \*, Olga Tolochko<sup>2</sup> and Philippe Gaussier <sup>1</sup>

<sup>1</sup> ETIS Laboratory, University Paris-Seine, CNRS UMR 8051, University of Cergy-Pontoise, ENSEA, Cergy-Pontoise, France, <sup>2</sup> Faculty of Electric Power Engineering and Automation, National Technical University of Ukraine Kyiv Polytechnic Institute, Kyiv, Ukraine

Representing objects in space is difficult because sensorimotor events are anchored in different reference frames, which can be either eye-, arm-, or target-centered. In the brain, Gain-Field (GF) neurons in the parietal cortex are involved in computing the necessary spatial transformations for aligning the tactile, visual and proprioceptive signals. In reaching tasks, these GF neurons exploit a mechanism based on multiplicative interaction for binding simultaneously touched events from the hand with visual and proprioception information.By doing so, they can infer new reference frames to represent dynamically the location of the body parts in the visual space (i.e., the body schema) and nearby targets (i.e., its peripersonal space). In this line, we propose a neural model based on GF neurons for integrating tactile events with arm postures and visual locations for constructing hand- and target-centered receptive fields in the visual space. In robotic experiments using an artificial skin, we show how our neural architecture reproduces the behaviors of parietal neurons (1) for encoding dynamically the body schema of our robotic arm without any visual tags on it and (2) for estimating the relative orientation and distance of targets to it. We demonstrate how tactile information facilitates the integration of visual and proprioceptive signals in order to construct the body space.

Keywords: body schema, multimodal integration, artificial skin, parietal cortex, gain-field neurons, peri-personal space, visual reaching, non-linear mixed-selectivity

### 1. INTRODUCTION

The body schema is the perception that each individual has of his own body in space. The acquisition of this body schema during infancy helps to learn a structural organization of the body parts and their visual shape, to establish the boundaries of the body and to situate better its physical limits (Gliga and Dehaene-Lambertz, 2005; Klaes et al., 2015; Marshall and Meltzoff, 2015; Bhatt et al., 2016; Jubran et al., 2018). Gradually, the body schema grows to enhance spatial awareness to objects (reaching and grasping) (Van der Meer, 1997; Corbetta et al., 2000) and to others (self-other differentiation, eye-gaze; Deák et al., 2014).

#### Edited by:

Frank Van Der Velde, University of Twente, Netherlands

#### Reviewed by:

Solaiman Shokur, Alberto Santos Dumont Association for Research Support, Brazil Fernando Perez-Peña, University of Cádiz, Spain

#### \*Correspondence:

Ganna Pugach ganna.pugach@gmail.com Alexandre Pitti alexandre.pitti@u-cergy.fr

Received: 24 August 2018 Accepted: 06 February 2019 Published: 07 March 2019

#### Citation:

Pugach G, Pitti A, Tolochko O and Gaussier P (2019) Brain-Inspired Coding of Robot Body Schema Through Visuo-Motor Integration of Touched Events. Front. Neurorobot. 13:5. doi: 10.3389/fnbot.2019.00005 In order to guide the movement of the body in space and to allow interaction with an immediate environment, the brain must constantly monitor the location of each body part at different postures and to analyze the spatial relationship between body parts and neighboring objects.This process requires the integration of proprioceptive, tactile, visual, and even auditory information to align the different reference frames from each other; for instance, eye-, hand-, torso-, or head-centered reference frames. Although many data are collected from neurosciences, the mechanisms behind multimodal integration from raw input for aligning the different reference frames and for constructing this body schema are still under investigation and several models and mechanisms have been proposed; c.f., (Taira et al., 1990; Burnod et al., 1992; Sakata et al., 1995; Caminiti et al., 1998; Avillac et al., 2005; Borra et al., 2017). For robotics, endowing to robots a body schema could help in reaching and grasping tasks or in developing a sense of spatial awareness in order to interact physically and socially with persons.

Many neuroscience studies have focused on how various sensory modalities can be combined and integrated to achieve the perception of limb location and the representation of space immediately around the body (i.e., the peripersonal space). Graziano and Botvinick (2002) presented in one study two visions of how the brain represents the body through neurophysiology and psychology.The psychological approach emphasizes the multisensory nature of body representation and has shown that touch and proprioception are combined in a sophisticated mental schema from the body. In contrast, neurophysiology focuses on proprioception, a component of the representation of the body, and focuses primarily on the use of proprioception in the movement control.

In a dynamic environment, the characterization of the peripersonal space of a complex animal is fundamental for reacting appropriately when an object enters in it. The natural reaction could be either grasping or approaching the object if it is of interest, or avoiding it if it represents a danger (Graziano and Aflalo, 2007). Therefore, the brain integrates different information from visual, auditory or somatosensory systems to ensure an effective representation of the body and peripersonal space (Holmes and Spence, 2004).

The peripersonal space is defined as the space that immediately surrounds our body (Rizzolatti et al., 1997). The neuronal representation of the peripersonal space is constructed through a network of cortical and subcortical brain zones. To represent the space around the body and the individual parts of the body that can be reached with the hands, the brain must, in particular, calculate the position of the arms in space (Kakei et al., 2003). Neuroscientific studies suggest that such a representation can be instantiated in a variety of different reference frames, relative to the eye's reference frame, with respect to the hand's reference frame, or with respect to the reference frame of an arbitrary point between these two (Gross and Graziano, 1995; Mullette-Gillman et al., 2009; Chang and Snyder, 2010; Galati et al., 2010; McGuire and Sabes, 2011). The term "reference frame" (RF) is used to refer to the center of a coordinate system to represent objects, including the body itself, and the relationships between objects (Cohen and Andersen, 2002).

In the study of peripersonal space, Rizzolatti et al. (1997) found that there are bimodal neurons that respond to the tactile stimulus on a limb but also to visual stimuli near this body part, regardless of the location of the limb in space and its posture. In addition, Làdavas (2002) established psychophysical evidence of how the visual perception of the peripersonal space is modulated by the motor representations acquired during the execution of the action.

In macaque monkeys, the posterior parietal cortex (PPC) is involved in the integration of multimodal information to construct a spatial representation of the outside world (relative to the body of the macaque or parts of it) to planning and the execution of object-centered movements (Sakata et al., 1995; Andersen, 1997; Murata et al., 2016). In particular, the intraparietal sulcus (IPS) serves as interfaces between perceptual and motor systems to control the movement of arms and eyes in space. Observations have shown that multimodal integration in these areas is based on a multiplicative integration, i.e., gainmodulation or gain-field (GF) mechanism (Andersen et al., 1985; Pouget and Sejnowski, 1997; Salinas and Thier, 2000; Salinas and Sejnowski, 2001; Blohm and Crawford, 2009). For example, Bremner and Andersen (2012) have proposed that gainfield neurons compute a fixation-centered reference frame by subtracting the vector between the eye location and the hand position to derive the hand position relative to the target in a reference frame centered on the eye; (see also Baraduc et al., 2001; Ustun, 2016). Nonetheless, the details of how these steps can be processed by parietal neurons using tactile input and how spatial transformation can be processed in a real physical system have never been expressed nor explained in earlier works. Particularly, most modeling works have assumed to know the location of hand in the visual space and the visual shape of the arm configuration. It is noteworthy that roboticists have started to consider this research problem for robots as we will present it further.

The details of this gain-modulation mechanism will be presented in section 2, but in order to have a better understanding of how it works, we present the data recorded by Bremner and Andersen (2012) of PPC units when a macaque performs a reaching task. The authors found that area 5d encodes the position of the hand relative to the eye before the presentation of the target to be grasped. But just after the presentation of the target, these neurons were sensitive to the location of the target relative to the position of the hand independent of the position of the hand or target locations as well as the direction of the eye gaze. That is, the most relevant information for a successful task was the location of the target relative to the hand as soon as the target is presented. Moreover, this representation is dynamic and constructed during the approach of the hand toward the target. This mechanism is particularly interesting in terms of computational efficiency, because not all the spatial combinations between the hand, the eye and the target are necessary to be learned for estimating novel and unseen relative locations.

In **Figure 1**, we reproduce an excerpt of this work by Bremner and Andersen (2014) for a reaching task with different locations

of the Target T, of the Eye E and of the relative distance to the Hand H. The Target location in Eye coordinates is denoted as (T) and Hand location in Eye coordinates is denoted as (H), whereas the location of Target-in-Hand coordinates is denoted as (T − H) and its opposite direction is denoted as (T + H). In **Figure 1A**, the eye fixation is expressed with the red cross located at +10◦ horizontally, the initial position of the hand is visualized with the plain green circle and the targets are shown with green crosses and the dashed green circle. In this work, Bremner and Andersen (2014) performed an analysis of the neuronal population response for different coordinate systems (Target-Eye, Target-Hand, Hand-Eye) oriented in three directions of a pie chart. Bremner and Anderson made the single-unit recordings from the posterior portion of dorsal area 5 (area 5d), in the surface cortex adjacent to the medial bank of the intraparietal sulcus (IPS). Recorded neural activity was passed through a headstage, then filtered, amplified, and digitized and saved for off-line sorting and analysis. As for the analysis, they used a gradient analysis to determine which variable within a pair [Target-Hand (TH), target-Eye (TE), or Hand-Eye (HE)] exerted the most influence on the firing rate of a cell, or whether both had equivalent influence. In conjunction with a gradient analysis, Bremner and Anderson used an SVD (Singular Value Decomposition) analysis to assess whether the relationship between pairs of variables was separable (in other words, a multiplicative, gain relationship) or inseparable (vector relationship). They also realized a time-step analysis to calculate the resultant length and angle of the coordinate framework gradient for each cell. **Figure 1B** presents the evolution of one neuronal population response for the target location at −10◦ . The pie chart at the top indicates the proper interpretation of the direction of the arrow for the pair of variables considered. The length of the arrow indicates the activity level and the orientation of the arrow indicates the sensitivity to one coordinate system. We can see from the graph that before the presentation of the target, the neuronal population codes the position of the hand relative to the eye gaze (H on the circular diagram at the top). When the target is presented, however, this population changes to code the location of the target relative to the hand (T-H on the pie chart).This result indicates the flexibility of parietal neurons to change the coordinate system dynamically to represent one spatial information. This is in line with recent observations of parietal neurons found sensitive to different spatial coordinates centered in the shoulder RF, the elbow or a mixture of them with respect to the context; a phenomenon referred as nonlinear-mixed selectivity to designate this dynamic calculation made by parietal neurons (Zhang et al., 2017). The gain-field mechanism is one of few computational mechanisms that can support these types of dynamical transformation necessary for spatial representation by fusing the What and Where pathways.

In robotics, Hoffmann et al. (2010) presented one of the rare states of the art on the body schema from the perspective of robotics. Most of the review was focused on integrating visual and proprioceptive information. For instance, the better part of the robotic experiments were designed in using the linear combination of basic functions for visuomotor transformations (Halgand et al., 2010; Chinellato et al., 2011; Schillaci et al., 2014). However, in these works, the tactile information was not considered at all and it would have been interesting to use an artificial skin to contribute to the representation of the body schema and its space around as an additional modality with respect to the visual and proprioceptive modalities.

Hikita et al. (2008) proposed a bio-inspired model of the body representation of the robot through these three modalities. They used tactile information to trigger a Hebbian learning to associate the position of the arm with the focus point of visual attention when the robot touches the target with its hand or with a tool. This model allows taking into account the behavior of parietal bimodal neurons observed by Iriki et al. (1996).

The work of Roncone et al. (2015) also focuses on representation body and peripersonal space using an artificial skin. They concede, however, that their approach relies instead on existing engineering solutions and targets practical functionalities compared to the studies presented by Hikita et al. (2008). They associated each touch unit with a spatial receptive field extending in 3D space around the surface of the skin. Stimulations in the form of motor or visual events are detected and recorded. The developed architecture estimates the probability of contact with anyone which part of the body, i.e., to predict the tactile contact and to adapt the robot behavior to avoid or grasp an object (Roncone et al., 2016).

More recently, robotics studies with artificial skin have been developed to investigate biologically motivated models of peripersonal space. For instance, Roncone et al. (2016), Hoffmann et al. (2017) focused on the topological organization of visuo-tactile receptive fields in cortical maps to organize actions for an avoidance or reaching movement. Born et al. (2017) proposed a model of invariance learning based on Hebb's rule for the development of hand-centered visual representations. Lanillos et al. (2017) instead emphasized a predictive coding approach for discovering causal relationships in visual, tactile and motor stream to discriminate ego-motion and body parts.

In this paper, we propose a neural architecture of body and peripersonal space representation that relies on the integration of multiple feedbacks from the robot body; i.e., its proprioception, its tactile input and its vision. Our contributions are in the use of (1) the mechanism of gain-field neuromodulation as a main mechanism for integrating modalities from different reference frames and (2) an artificial skin developed for a robotic purpose. The model developed allows rebuilding the location of the arm in the visual field and the location of objects relative to the somatosensory field by aligning the different modalities from each other. Most importantly, the results obtained are close to the behavior of the parietal neurons recorded in the parietal cortex area 5d, presented in the work of Bremner and Andersen (2014): in comparison with the Bremner's and Anderson's work, our architecture allows us to represent the object location relative to the moving arm as soon as the object is presented by combining proprioceptive, visual and tactile inputs from the three different reference frames. And this representation is dynamic and constructed during the approach of the hand toward the target. We will present two robotic experiments with a similar protocol in sections 3.2 and 3.3.

Our experiments can contribute to the understanding of the biological principle of the peripersonal space representation. In this respect, they reinforce our previous works on spatial representation (Pitti et al., 2012, 2017; Mahé et al., 2015; Abrossimoff et al., 2018).

### 2. MATERIALS AND METHODS

### 2.1. Material

In our experiments, we use the Jaco robot arm from Kinova covered with an artificial skin that we developed, its properties are extensively presented elsewhere in Pugach et al. (2013, 2015, 2016). The visual input is commonly acquired by a static firewire camera fixed in height so that it can view the full arm moving, see **Figure 2**.

### 2.1.1. Artificial Skin

The artificial skin is a rectangular conductive fabric of dimension 250 × 320 mm with sixteen electrodes attached uniformly along the perimeter. The fabric resistance decreases when pressured. We use it in our previous works in order to develop a lowcost system based on the Electrical Impedance Tomography method (EIT) for data acquisition from the conductive fabric. The EIT is a non-invasive technique particularly used in medical imaging to reconstruct an internal spatial distribution of conductivity/resistivity from measuring iteratively the voltages from different current locations through electrodes placed on the circumference of the investigated object. The electronic hardware and the neural reconstruction are detailed in Pugach et al. (2013, 2015) and a touch-based control of the Jaco Arm covered with our artificial skin is detailed in Pugach et al. (2016). The spatial patterns of the tactile contact can be acquired and localized at a frequency of 40 Hz.

### 2.1.2. Vision System

The camera provides a video stream of 30 frames per second and a resolution of 160 by 120 pixels. The arm is in the center of the camera visual field. For the sake of simplicity, we have limited

the arm to a single degree of freedom in the visual plane of the camera. The maximum angle of joint movement is 100◦ .

## 2.2. Methods

### 2.2.1. Gain-Field Mechanism

The principle of integration behind gain-field neurons for spatial transformation is based on the by-product of the neural fields' activity between two or more modalities (Blohm and Crawford, 2009; Ustun, 2016); e.g., X and Y modalities. For instance, **Figure 3** shows the multiplicative binding X × Y between two neural fields X and Y, which can serve then to construct a relative metric to transpose signals from one reference frame to another. The amplitude level of the resulting neural field indicates their vicinity whereas its shape indicates their relative orientation (arrow). Such computation is similar to sigma-pi networks or radial basis functions networks and has been rediscovered recently in computer vision as gated networks for categorizing transformations (Memisevic, 2011). In robotics, gated networks have been emphasized recently by Sigaud et al. (2016), Sigaud and Droniou (2016), and Memisevic (2010) but they have been used mostly for categorization and not for spatial transformation as performed by gain-field networks–, for which the activity of each unit is meaningful and corresponds to a metric value and not a label.

In our case, gain-field networks will serve for two computations: learning where the arm is in the eye field—e.g., eye-centered RF, combining touch, visual and proprioceptive information—and learning where the target is relative to the arm (e.g., arm-centered RF); see **Figure 3B**. We explain first the mechanism of gain modulation and its equation in the next section 2.2.2, we present then in details how spatial transformation is done in the case of arm reaching in section 2.2.3.

### 2.2.2. Gain-Field Networks

Gated or gain-modulated networks are an instance of sigmapi networks constituted of radial basis functions pre-defined parametrically or learned that produce a weighted sum of joint probability distributions as output (Pouget and Sejnowski, 1997). The output terms Z are a linear combination of the product of the input variables X and Y whose cardinalities are respectively

nZ, n<sup>X</sup> and nY, so that predicting Zˆ consists on computing for all values Z<sup>k</sup> of Z, k ∈ nZ:

$$\forall k, Z\_k = \sum\_{i}^{n\_X} \sum\_{j}^{n\_Y} W\_{ijk} (X\_i \times Y\_j), \tag{1}$$

with W synaptic coefficients in n<sup>X</sup> × n<sup>Y</sup> × nZ. Since this matrix can be quite large, a way to reduce drastically the dimensionality of the gain-field networks is to multiply term by term, each X<sup>i</sup> and Y<sup>i</sup> with i ∈ nX, but this is not done in this work.

The global error E is defined as the Euclidean distance calculated between Z and Zˆ for all the input examples. The optimization function used for learning the synaptic weights of the output layer Z is the classical stochastic descent gradient. This is in line with our previous works (Pitti et al., 2012; Mahé et al., 2015; Abrossimoff et al., 2018), and differs slightly from Memisevic (2011) as they applied the algorithm to image problems only, not to robotics.

#### 2.2.3. Neural Architecture for Spatial Representation

Using the gain-field mechanism presented earlier, it is possible then to exploit their computational capabilities to represent the arm in the visual field (i.e., the body schema) as well as the location of the target relative to the arm (i.e., the peripersonal space).

**Figure 4A** shows this computational process decomposed into three steps: (1) location of the hand (tactile information) in the eye field from visuo-motor integration (Hand in Eye), (2) location of the target in the visual field (Target in Eye), (3) detection of the target position relative to the robotic arm (Target in Hand). We make the note that in this figure the eye is fixed and only the arm is moving.

We detail now the implementation steps of our computational model. The first part aims at learning the spatial location of the arm in the visual reference frame from the tactile input, see **Figures 4A,B** in the left figures. Here, various experiences of tactile feedback for different visual target position and motor/proprioceptive configuration permit to learn the visual location of a 'touched' target together with the arm configuration (the motor angle); explanation in section 2.2.4.1. This stage permits to build a visual reference frame centered on the arm. The second part aims at estimating the relative distance in the visual field between the arm-centered RF computed previously and the target RF, see **Figures 4A,B** on the right figures. This will permit to compute peripersonal space and pre-attentive tactile sensation.

### 2.2.4. Implementation

For simplicity, the vision system is based on color recognition. The input image is in RGB format and 160 x 120 pixels resolution. This image is first converted to HSV (Hue Saturation Value) in order to retrieve exemplars of which we vary the Hue. These variations make it possible to extract the predominance of a chosen color within the image. We then perform a binarization of the image, the initial image is transformed into a black and white image where all the pixels have only two values 0 and 1. We project later this image on neural fields of the same dimension.

#### **2.2.4.1. Part 1, arm in the eye-centered RF (HE)**

In order to determine where the arm is in the visual space, we use tactile information as a conditioning signal to combine

proprioceptive information and visual information as explained in the previous section, see **Figure 5-1a** Tactile input modulates the learning rate as a "Go signal," meaning that no tactile input induces no learning at all.

The learning stage is done using Equation (1) to associate the tactile and proprioceptive information to visual information, see **Figure 5-1b**. We fix the arm in an angular position and touch the artificial skin with an object (the focal point of the visual attention). Whenever the object touches the arm, the visual neuron associated with the tactile receptive field learns the combination of the touched visual position with the angular configuration of the arm / joint. Note that in the case of a bimanual robot, we may achieve tactile self-stimulation and thus provide self-calibration of the robotic body with artificial skin.

Recall that the learning algorithm of a neural network with Perceptron units consists in modifying synaptic weights W until finding the minimum mean squared error between the input X (i.e., the joint distribution between the motor angle and the tactile input) and the desired output D (i.e., the visual location of the target on the robotic arm). The equations of the learning rule and the output of each neuron are the same as the ones presented in section 2.2.2.

Furthermore, in order to model a spatial receptive field around the arm, i.e., the peripersonal space, we apply a Gaussian 2D mask on the output network (see **Figure 5-1c**). This mathematical operation permits to create a soft and smooth outline around the arm.

### **2.2.4.2. Part 2, target in the eye-centered RF (TE)**

After having learned the representation of the robotic arm in the visual field (HE), we use the simpler attention mechanism exploiting visual information only to represent the target in the eye-centered reference frame (TE). The determination of the position of the target is based on color recognition. An RGB image of the same size 160 × 120 pixels is converted to HSV and is subsequently binarized in correspondence with the color of the object. Thereafter, we project this binarized image on a neural field of the same dimension. Finally, we locate the x and y coordinates of the object's center in the visual field after selecting the most active neural position (see **Figure 5-2.2a**). This competition is made through a Winner Takes All rule (WTA) (Rumelhart and Zipser, 1985; Carpenter and Grossberg, 1988). The winning neuron generates an output at 1, the other neurons are set to 0. The target representation in eye-centered RF is performed by multiplicative neurons, the multiplication of WTA vectors with a Gaussian curve centered on x and y (see **Figure 5-2b**).

#### **2.2.4.3. Part 3, target in the arm-centered RF (TH)**

Once we have processed the position of the target and of the arm in the visual field, it is possible to compute their relative distance

using the gain-field framework as presented in **Figures 3**, **4**, which corresponds to the third part in **Figure 5**. This final layer is similar to the previous layers using basis functions. The product between two neural fields, the neurons perform a mutual information encoding between the two modalities, i.e., between the reference centered on the arm and the repository centered on the target. To derive the location of the target relative to the hand, we subtract the vectors between the target location on the eye (position x, y of the focal point of attention, cf **Figure 5-2a**) and the mutual center point (the coordinates x ′ , y ′ defined by WTA). The proximity of the target to the arm is defined by the amplitude level of the mutual center point taken from the argmax function (see **Figure 3**) and is converted to a value between 0 and 1. A value of 0 indicates that the target is far from the arm and is not in the peripersonal space. The value of 1 indicates that the target is touching the arm, which is confirmed by tactile feedback.

### 3. RESULTS

In this section, we present the results of three experiments using the proposed model of tactile, visual and proprioceptive integration to represent the body schema and the peripersonal space. The purpose of the first experiment is to present how the neural architecture represents the space around the body centered on the arm. The second and third experiments aim at modeling the similar behaviors of parietal neurons for coding information about the arm in the visual space and about the target in the arm-centered reference frame.

### 3.1. Experiment 1 - Representation of Space Around the Body

As explained in section 2.2.4, the first part of the learning stage consists of associating the proprioceptive information of the robotic arm with the visual location of a target in order to reconstruct its visual mapping. This is done for various arm configurations with tactile information as a conditioning signal for calibration.

We make the remark that it is possible to not use tactile input for the visual reconstruction as we have done in Abrossimoff et al. (2018), but without tactile information, the learning phase can take a long time because there is a very large number of possible combinations between the pixel values and the angular positions of the arm. Using tactile information instead, it can make this phase easier by making the correspondence between the visual location of one stimulus on the artificial skin and the spatial configuration of the arm, only when touched. Each motor angle is discretized in 100 units by population coding with a Gaussian kernel centered on the current motor angle. We record

FIGURE 6 | Visual prediction of the complete tactile RF distribution and of the whole arm location. Visual perceptron units estimate the visual location of each tactile-motor GF unit (i.e., the body schema). The learning is done only when a visual target touches the arm and depends on the specific tactile location on the artificial skin and on the specific arm motor angle. By activating virtually all the tactile units and for a specific motor configuration, it is possible to display the density distribution of the arm location in the visual field. The results are presented for four different motor positions: 20◦ , 50◦ , 70◦ , and 100◦ , respectively (A–D).

the activity of the visual neural network for all angles of the motor conditionally to the tactile activity.

After the learning phase, the output neurons from network **Figure 5-1.1b** are able to predict the visual representation of the arm even if the tactile information is not provided. The visual representation of the tactile units can be simply retrieved back from the learned model if we activate all the tactile units in the network **Figure 5-1a**. By doing so, it is possible to estimate the spatial distribution of all the receptive fields of the tactile units; which means, we can reconstruct the spatial location of the whole arm in the visual scene while loosing the information of each RF.

We present in **Figures 6A–D** the estimation of the full-arm posture after the learning stage for four different motor angles, 20◦ , 50◦ , 70◦ , and 100◦ . We can observe that the estimation, although noisy, represents well the arm configuration, although for a simple transformation like a rotation. In order to eliminate the noise of the spatial density distribution of the arm location, we applied a mean-field filter and then used

The direction of the arrows changes with respect to the motor angle and their length is non-linearly proportional to the distance to it.

a binary thresholding of the neurons twice, see **Figure 7**. In image processing, the mean filter is defined as the average of all pixels within a local region of an image. The same process is done with neural populations. Neurons that are included in the averaging operation are specified by a mask. As a first step, we have used a larger filtering mask to remove big tailed noise and as a second step, we have used then a smaller filtering mask to remove small noise. This may exempt to using vision to determine the position of the arm in the visual field when the arm is occluded or in the dark or to determine the relative distance of multiple locations on the arm (e.g., hand, forearm, elbow) to the target.

**Figure 8** shows the receptive fields of the visuo-tactile neurons computed for four different positions of the robotic arm and for all the locations of the target in the visual space; see the output network of **Figure 5-3**. This image has been obtained by collecting the spatial orientation and distance between the skin and the target computed from the neurons activity from the output network. For all the visual positions of the target around the arm, an arrow has been projected proportional to the amplitude level of the neural field and in the direction of the target as explained in **Figures 4**, **5-3**.

Without any target nearby the arm, the receptive fields aim at representing where the arm is. In the presence of a target within reach, however, the receptive fields serve to compute where the target is relative to the arm. This property of body representation has been observed by Graziano and Aflalo (2007).

With respect to the distance to the arm, the neural activity that computes the receptive fields is non-linear: the activity of the cells is higher when a target is placed nearby the skin while it decreases following a power-law scale when the distance to the arm augments. This is a consequence of the two gaussian field's multiplication. Thus, the more a target is entering the receptive fields, the more they encode with better precision its spatial distance and orientation. They are therefore more sensitive to nearby objects.

Furthermore, we can see also that our architecture is able to correctly predict the body schema as well as to represent its peripersonal space with respect to the arm position. This property of dynamic encoding has been observed for instance by Iriki et al. (1996).

### 3.2. Experiment 2 - Estimation of Visual Distance and Orientation of Target-Centered GF Neurons When the Arm Moves and the Target Is Fixed

The second experiment aims at replicating Bremner and Andersen (2014) observation of hand-centered parietal neurons sensitive to the relative distance and orientation of the hand to target (in our case the arm). Their activity level depends on both the position of the arm (proprioception) and the location of the target in the eye field. We present in **Figure 9** the scenario of the experiment. We set the target to four positions **Figures 9A–D** and we move the arm within the interval range between [0 and 100◦ ]. Every 10◦ , we record the activity of the multiplicative neuron which performs the computation of the relative distance between the arm and the visual target as explained in **Figure 4** and **Figure 5-3**.

We draw in **Figure 10** the relative visual distance and orientation of the receptive field computed with respect to the target location **C** and the most active tactile neuron retrieved. The arm moves toward the target, touches it and goes beyond it. The length of the arrow indicates the sensitivity of the RF whereas its orientation points to the nearest tactile point. The details of the neural activity retrieved for the four locations are presented in **Figure 11** and **Supplemental Data**. The left chart displays the amplitude level of the neuron taken from argmax function in resulting spatial RF between the arm and the target, which permits to have an estimation of the relative proximity. The middle displays the relative orientation angle in radian with respect to the motor position normalize between [0 and 1] and the right chart presents the same information in polar coordinates centered at the target location. The colors correspond to the angular motor positions. The length of the vector indicates the relative distance as in the previous experiment.

For location **A**, we observe that the target is in the peripersonal space during the entire movement of the arm and most of the time in an area of high activity. The multiplicative neuron encodes the location of the target in a mutual reference frame and changes between 0.05 and 1 (see **Figure 11A**). The maximum activity corresponds to the motor positions for 30◦ to 40◦ . This means that the focal point of attention is above the position of the artificial skin, which is confirmed by the orientation graphs.

In these graphs, the orientations for the arm positions 30◦ and 40◦ are missing. The neuron does not encode orientation for maximal activity because the target is within the visual location of the skin. We make the note that the experiment was organized so that the target did not touch the skin in order to have a stable visual response of the target's location.

For location **B**, the focal point of attention is quite far away from the arm, which corresponds to a weak activity of the neuron. The maximum activity does not exceed 0.02 but it is still possible to estimate the relative visual orientation from the resulting neural field. The neuronal activity varies in a narrow range (between 150 and 210◦ ) relative to the previous location of the target. For location **C**, we find a small variation in the neural activity for motor positions from 0 to 50◦ because of the large relative distance. The contact with the skin coincides with the motor position at 70◦ . And for location **D**, we see that the orientation is absent from the initial position in **Figure 11D**. The neural activity is 0. This means that the focal point of attention is out of the peripersonal space. But once the neuron activity becomes different from zero, the relative orientation of the target can be retrieved

even when the activity is very low and does not exceed 0.025.

As a short conclusion of this experiment, these results show that our neural architecture can encode information about relative proximity and orientation of a target with respect to the arm in a mutual reference frame. Neurons react independently of the location of the arm and of target in the visual field. The results obtained are therefore close to the recordings made by Bremner and Andersen (2014) of the parietal neurons in zone 5d.

### 3.3. Experiment 3 - Estimation of the Visual Distance and Orientation of Arm-Centered GF Neurons When the Target Moves to the Arm

The third experiment is the alternative version of experiment 2 expect that we fix the arm position to a certain location and move the targets toward it. The aim of experiment 2 was to analyze the change in estimating the relative distance and orientation of the arm toward targets during a reaching task. Besides, the aim of experiment 3 is to analyze the change in estimating the relative distance and orientation of approaching targets when the arm is fixed. It is not clear though whether the two experiments would give the same results, however this experiment aims at replicating the results of Graziano and Botvinick (2002) and Bremner and Andersen (2014) showing that the activity level of the parietal neurons depends on the position of the arm position (proprioception) and the location of the object in the visual field.

For this experiment, we fix the arm with the motor angle at 30◦ . **Figure 12** shows the three starting points of the targets to the robotic arm. The paths are within the peripersonal space area and do not exceed it and each trajectory ends with contact with the skin. We plot in **Figure 13** the estimated relative visual orientation in radians over time and in logpolar coordinates respectively in the top and middle charts as well as the estimated relative proximity to the arm in the bottom chart.

Figure 9 computed by the compound GF units from the arm and target centered RF when the arms moves in the interval [0◦ − 100◦ ]. The arrows length corresponds to the amplitude level of the neurons, i.e., the proximity of the RF, and their orientation corresponds to the orientation. The color code represents the different motor positions. For better visualization, we have increased the length of the arrow by 10 times.

For the three paths, the relative visual orientation does not change when the target is distant from the arm, which corresponds to a low activity of the neurons (between 0.01 and 0.4). But putting the targets closer to the arm induces a more precise estimation of their orientation. Thus, in accordance with section 3.2, the orientation calculation gains in precision with respect to the distance to the arm. This is also true for the estimation of the targets' direction: as seen in the middle charts, the big arrows–, which correspond to the closest targets' positions,–indicate the optimal direction of the targets to reach the arm.

For a better understanding, we present in **Figure 14** the changes of spatial receptive fields and corresponding relative visual orientation of arm-centered GF neurons in detail for trajectory A. The arm-centered RF is calculated by the multiplication of the arm prediction in the eye-centered RF and target location in the eye-centered RF and relative visual orientation is taken from argmax function; see Part 3 in section 2.2.4.1. In the beginning, the orientation almost does not change when the receptive field is homogeneous, as seen in the first four subplots. But when the object is close to the arm, the orientation changes in correspondence with the more active neurons.

Once more, the analysis of the obtained results shows that the representation of the target location with respect to the arm is dynamic and constructed during the approaching of the target toward the arm.

### 4. DISCUSSION

In this paper, we have proposed a brain-inspired model of multimodal neurons in the parietal cortex for the body representation of the robot arm Jaco and its peri-personal space. The neural model makes it possible to encode the location of the arm, the target and the relative distance between them in three different reference frames. This model is based on the integration of different modalities such as touch, vision and proprioception using the neural mechanism known as gain-modulation, which performs multiplicative interaction between variables. Such framework permits the dynamic coding of the body posture and targets in multiple coordinate systems even when the two systems are moving.This mechanism is particularly important for spatial interaction with objects and for solving spatial tasks online; e.g., tool-use, manipulation, dynamic coordination, interacting with someone else.

Before any target enters the peripersonal space of the robot, the arm and the target are coded in separate receptive fields: a receptive field centered on the artificial skin and another centered on the target in the visual space. As soon as the target enters the peripersonal space, the interaction between the two neural fields is computing a resulting receptive field (mutually referential), which makes it possible to estimate the relative distance and the relative visual orientation between the arm and the target. This behavior is similar to the one found in the parietal neurons and recorded by Bremner and Andersen (2012) and Bremner and Andersen (2014) for reaching tasks and by Iriki et al. (2001); Graziano and Botvinick (2002) for body image.

For instance, as soon as the robot moves toward or away the target, the spatial receptive fields of the neurons change and therefore the way targets are represented: in eye centered coordinates, in hand centered coordinates or in target centered coordinates. Thanks to the multiplication between the neural fields, the spatial resolution anchored at the arm becomes proportional to the vicinity of the target. Such computation may

FIGURE 11 | Spatial receptive fields and relative visual orientation of target-centered GF neurons. The left charts correspond to the estimated proximity of the four GF neurons target centered at fixed positions respectively at locations (A–D). The y axis corresponds to amplitude level of the neurons whereas the x axis and the color code represent the different motor positions between [0 and 100◦ ] and normalized between [0 and 1]. Location A is the nearest to the arm and location D is the farthest. Each target-centered neuron show different types of receptive field with respect to the distance to the arm. The higher the amplitude level is, the closer the arm is with respect to the target. When the amplitude level reaches 1, it indicates that the target is above the arm in the visual space. The middle and right charts represent the estimated relative visual orientation between the target and the robot arm. The middle plot displays orientation vectors coming from equally spaced points along a horizontal axis. It expresses the orientation vector components relative to the origin of the respective orientation vector. The x-axis and the color code represent the different motor positions, the arrows' length corresponds to the amplitude level of the neurons, i.e., the proximity of the RF, and their orientation corresponds to the orientation. The y-axis represents the y components in relative coordinates. The right chart presents the same information in polar coordinates centered at the target location.

ease motor control and help also to create a sense of spatial awareness around the body, which is useful for constructing a notion of agency (Pitti et al., 2009a,b), of self and of intersubjectivity (Murata et al., 2016; Pitti, 2017).

As the GF mechanism serves the encoding of dynamical events, its lability due to multiplicative interaction across heterogeneous events may be advantageous for the construction of a plastic infant's body image during development (Gliga and Dehaene-Lambertz, 2005; Marshall and Meltzoff, 2015; Bhatt et al., 2016) as well as for the purpose of other cognitive tasks such as tool-use and body extension (Iriki et al., 1996; Murata et al., 2016), perspective-taking to have person-centered viewpoints (Iriki et al., 2001; Murata et al., 2016) or during perceptual illusions such as the rubber hand illusion, to mismatch visuo-tactile events in a confused body-centered representation (Botvinick and Cohen, 1998; Tsakiris et al., 2007).

In our previous research, we have modeled the visuo-tactile integration with neural networks using our artificial skin in order to study the rubber hand illusion although we did not have motor information at this time (Pitti et al., 2017). We think it is theoretically possible to simulate it as we will have a fast readaptation of the new motor position for the seen visual position of the fake hand as during the first phase of visuotactile based learning in our experiment. The learning between visual and proprioceptive information will be fast because it will be actively modulated by tactile stimulation as we proposed in **Figure 4**.

About the integration of an external tool to the body image, see Iriki et al. (1996). We think the adaptation mechanism may be similar also to the first phase of the visuo-tactile based learning of our experiments. If we connect a tool to any tactile position on the artificial skin–, a normal location would be on the robot hand if it has tactile sensors,– and a target touches the tool, a visuo-tactile integration will be done not on the skin surface but where the target is (at the tool location). We suggest that some 'tool' neurons may modify rapidly the third circuit in **Figure 3** to model the "target-in-tool" centered reference frame, when we have the tool in hand, or even better, other maps may be created similar to this third circuit, each one specialized to a particular tool (Braud et al., 2018).

We think that our results are in line with observations showing how the peripersonal space increases when the subject is in motion (Noel et al., 2015; Bufacchi and Iannetti, 2018). Because gain-field neurons encode relative spatial information, they are effective either when objects are moving or when the body moves. In consequence, such mechanism may describe well spatial position of objects surrounding the body in motion. Since this construction is dynamic and depends on the context, peripersonal space remapping can work to certain limits only and spatial estimation may change also according to it. For instance, while the body moves, speed integration might be difficult for stabilization of incoming signals. Our framework may explain well how the remapping can be done of the third circuit in **Figure 3** (as explained earlier for tool-use) to enlarge peripersonal space to the new context.

A different prediction can be made on phantom limbs with the observation that many amputees are feeling their phantom limb moving (Ramachandran and Blakeslee, 1998). If we think that tactile, visual and proprioceptive information are missing but the circuits for spatial representations are still there as after the first phase of visuo-tactile based learning, we may simulate the position of the arm moving in different coordinate systems (HE or TH).

Moreover, we suggest that the Gain-Field mechanism strongly supports the body schema construction during development.

Many research suggest that body knowledge occurs early in life and that the different modalities conspire to represent the body structure and nearby targets. Hock et al. found that infants as young as 3 months old are sensitive to the overall organization of body parts; (see Zieber et al., 2015; Hock et al., 2016; Jubran et al., 2018). Meltzoff et al. (2018) reports that the contralateral hand areas of the somatosensory cortex in 7-month-olds' is active during contact with the hands, suggesting neural structures represent hands early in life. Bremner et al. (2008) showed how 9-month-olds' use different strategies to perform reaching and grasping tasks by choosing the most effective modality (vision or touch) and RF.

Although we entrust strongly vision for representing space, the tactile information greatly enhances the calibration of a multi-centered referential system by connecting the visual and the proprioceptive information. This aspect is often neglected in neural models of reaching and motor control such as the ones proposed recently in Ajemian et al. (2001), Chang et al. (2009), Brayanov et al. (2012), Blohm (2012), and Ustun (2016) or those fewly emphasized as in Andersen (1997) and Baraduc et al. (2001).

The same is true in robotics and it is only recently that tactile information is taken into consideration. For instance, a color code (or a QR code) is often used to disambiguate between the target and the robot arm and to compute the relative distance between them. Robot architectures taking account of tactile information allow on the contrary to have a visual marker on the target only and to reconstruct back the visual position of the arm

FIGURE 14 | Spatial receptive fields and relative visual orientation of arm-centered GF neurons. Activity of arm-centered GF neurons relative to the target location for trajectory A.

from tactile information. In our model, the position of the arm in the visual field is calculated via visual neurons (Perceptron units) that conditionally fire for conjunctive tactile and motor pattern. By doing so, they integrate tactile, visual, and proprioceptive information so that after the learning phase, these visual units are able to predict the visual location of the arm even without tactile input, just from the motor angle, where the arm has been touched by the target. This result is particularly interesting for motor simulation to anticipate contacts and to estimate the arm location even if the arm is occluded and in the dark.

For instance, one conclusion drawn by the Darpa Robotic Challenge is that all teams in the challenge failed to use aspects of the physical space to help their robots move (Atkeson et al., 2015). More contacts make tasks mechanically easier, but algorithmically more complicated. One full body artificial skin, however, is expected to be extremely useful as part of an early warning system to avoid errors and external disturbances.

Another use of tactile information is to ease motion control: as multiplicative neurons dynamically encode the location of objects relative to the robotic arm, the control task may be facilitated. The tactile sense may serve robots to perceive depth and calibrate the representation of the physical space relative to visual and motor modalities.

In our experiments, the camera was fixed and only the arm was moving. We think however that we can integrate this feature in the future. We did so partly in an earlier work based on audiovisual integration for eye-to-head change of reference frame with the head moving (Pitti et al., 2012). We think we can embed this feature using a similar network as the ones proposed in Andersen et al. (1985) and Salinas and Thier (2000) for visual and proprioceptive integration using GF neurons. The vestibular information can be useful as well and in line with evidences from neuroscience.

Because gain-field neurons encode a relative spatial information, they are effective either when objects are moving or when the body moves. In consequence, our architecture may describe well spatial position of objects with the body in motion.

Although our experiences are currently performed in 2D space and has been applied with one single degree of freedom only, and without taking account of the object shape (its affordance), we do not see any constraints to extend this framework to 3D reach and grasp. As it is known that the orientation of the hand, depth perception and the object shape are required for 3D grasping, many results emphasize the role of gain modulation also for it. For instance, Kakei and colleagues found that the control

### REFERENCES


of the forearm muscles for pronation/suppination are coded with parieto-motor neurons sensitive to visual directions (Kakei et al., 2003) as it is for arm motion. Experiments studying the hand orientation with oriented grippers showed also the importance of gain modulation for dynamically aligning the hand to the target orientation in the vertical plan (Baumann et al., 2009; Fluet et al., 2010) or for reaching objects aligned in various 3D orientations (Sakata et al., 1997; Murata et al., 2016). Furthermore, "depth neurons" have been found in the parietal cortex for the visual control on hand action (Rizzolatti et al., 1997; Sakata et al., 1997; Filippini et al., 2018). Sakata et al. (1997) suggested that depth movement is encoded from the associative interaction between size change and disparity change in the visual field and (Ferraina et al., 2009) proposed further that the GF mechanism supports the integration of hand movement depth for encoding of hand position and movement in 3D space.

Some recent robotic results found that it is possible to reconstruct back the 3D information of objects (Eslami et al., 2018) or to estimate their physics through observation, without interactions and from huge visual data only (Yildirim et al., 2017). Despite these impressive results, we believe nonetheless that embodiment –that is, the sensorimotor information structure of agents,– is mostly missing in these works in order for one agent to construct a unified and amodal spatial representation of the body. In future works, we will attempt to extend our framework to 3D space, toward learning the affordance of objects and interacting with them.

### AUTHOR CONTRIBUTIONS

GP designed and made the experiments and wrote the paper. AP designed experiments and wrote the paper. OT and PG supervised the research.

### FUNDING

This research project was partially funded by chaire d'Excellence CNRS-UCP. French consortium EQUIPEX ROBOTEX. Project Labex MME-DII (ANR11-LBX-0023-01).

### SUPPLEMENTAL DATA

Some complementary videos of Experiment 1 and 2 are provided at location: http://perso-etis.ensea.fr/alexpitt/PPS\_files.

Andersen, R. (1997). Multimodal integration for the representation of space in the posterior parietal cortex. Philos. Trans. R Soc. Lond. B Biol. Sci. 353, 1421–1428.


locations in parietal cortex. Nat. Neurosci. 8, 941–949. doi: 10.1038/ nn1480


visuomotor transformations and embodied simulation. Neural Net. 62, 102– 111. doi: 10.1016/j.neunet.2014.08.009


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2019 Pugach, Pitti, Tolochko and Gaussier. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Somatosensory Loss Influences the Adoption of Self-Centered Versus Decentered Perspectives

#### *Gabriel Arnold1,2 \*, Fabrice R. Sarlegna3 , Laura G. Fernandez2 and Malika Auvray2*

*1Caylar, Villebon-sur-Yvette, France, 2Institut des Systèmes Intelligents et de Robotique (ISIR), CNRS UMR 7222, Sorbonne Université, Paris, France, 3Aix Marseille Univ, CNRS, ISM, Marseille, France*

#### *Edited by:*

*Eszter Somogyi, University of Portsmouth, United Kingdom*

#### *Reviewed by:*

*Michael Barnett-Cowan, University of Waterloo, Canada Francois Quesque, INSERM U1028 Centre de Recherche en Neurosciences de Lyon, France*

> *\*Correspondence: Gabriel Arnold gabriel.arnold@caylar.net*

#### *Specialty section:*

*This article was submitted to Cognition, a section of the journal Frontiers in Psychology*

*Received: 10 August 2018 Accepted: 12 February 2019 Published: 11 March 2019*

#### *Citation:*

*Arnold G, Sarlegna FR, Fernandez LG and Auvray M (2019) Somatosensory Loss Influences the Adoption of Self-Centered Versus Decentered Perspectives. Front. Psychol. 10:419. doi: 10.3389/fpsyg.2019.00419*

The body and the self are commonly experienced as forming a unity. Experiencing the external world as distinct from the self and the body strongly relies on adopting a single self-centered perspective which results in integrating multisensory sensations into one egocentric body-centered reference frame. Body posture and somatosensory representations have been reported to influence perception and specifically the reference frame relative to which multisensory sensations are coded. In the study reported here, we investigated the role of somatosensory and visual information in adopting self-centered and decentered spatial perspectives. Two deafferented patients who have neither tactile nor proprioceptive perception below the head and a group of age-matched control participants performed a graphesthesia task, consisting of the recognition of ambiguous letters (b, d, p, and q) drawn tactilely on head surfaces. To answer which letter was drawn, the participants can adopt either a self-centered perspective or a decentered one (i.e., centered on a body part or on an external location). The participants' responses can be used, in turn, to infer the way the left-right and top-bottom letters' axes are assigned with respect to the left-right and top-bottom axes of their body. In order to evaluate the influence of body posture, the ambiguous letters were drawn on the participants' forehead, left, and right surfaces of the head, with the head aligned or rotated in yaw relative to the trunk. In order to evaluate the role of external information, the participants completed the task with their eyes open in one session and closed in another one. The results obtained in control participants revealed that their preferred perspective varied with body posture but not with vision. Different results were obtained with the deafferented patients who overall do not show any significant effect of their body posture on their preferred perspective. This result suggests that the orientation of their self is not influenced by their physical body. There was an effect of vision for only one of the two patients. The deafferented patients rely on strategies that are more prone to interindividual differences, which highlights the crucial role of somatosensory information in adopting self-centered spatial perspectives.

Keywords: body and self, proprioception, spatial perspectives, somatosensory loss, individual differences

### INTRODUCTION

One of the key roles of bodily self-consciousness consists in experiencing the body and the self as forming a unity. The experiential self, the locus of our sensations and commands of action, is typically felt as being located within the body and as being delimited by the boundaries of the body (see De Vignemont, 2018, for analyses of the concepts of bodily selfawareness and bodily ownership). The self can also be subjectively located within one specific body part, predominantly the head or the trunk (Bertossa et al., 2008; Limanowski and Hecht, 2011; Alsmith and Longo, 2014). In addition, experiencing the external world as distinct from the self strongly relies on integrating external multisensory sensations (e.g., visual, auditory, tactile) and internal somatosensory sensations into one egocentric body-centered reference frame, which results in perceiving single objects as being located at specific locations relative to the body. For instance, the simultaneous appearance of a car in the left visual field and the hearing of an engine noise in the left auditory field commonly result in a unique percept of one single moving car located leftward to the body.

The multisensory integration of external and internal information into a common body-centered reference frame is thought to rely on the adoption of a single self-centered spatial perspective (Blanke, 2012; Arnold et al., 2017). However, in the same way, as the body posture influences the perception of visual, auditory, and tactile sensations, body posture can also influence the definition of the egocentric reference frame into which these perceptions are integrated (Harris et al., 2015). For instance, deciding whether an object is oriented with its top up and its bottom down, which can be called perceptual upright, requires integrating visual and somatosensory cues. This process has been reported to be influenced by full-body rotations in roll or in pitch relative to gravity (Dyde et al., 2006; see **Figure 1**, for definitions of body rotation axes). In addition, egocentric and allocentric judgments of verticality have been reported to rely both on visual and somatosensory cues, with however a greater weight given to body reference for egocentric judgments such as indicating the vertical axis of our own head (Barnett-Cowan and Harris, 2008).

When standing upright with the head straight ahead as in **Figure 1**, there is little ambiguity when deciding where are the left, right, top, bottom, front, and back of the body. However, ambiguities appear when the different body parts are not aligned. Misalignments of the head and trunk, such as when the head is rotated in yaw relative to the trunk or bended forward, create left-right or top-bottom ambiguities, which have been reported to bias perception. For instance, when the head is rotated in yaw relative to the trunk, localization judgments of tactile stimulation on the trunk are biased toward the direction of the head (Ho and Spence, 2007; Pritchett et al., 2012). Gaze orientation, which consists of the combination of head and eye orientation, has also been reported to bias touch localization (Harrar and Haris, 2010). This influence of body posture and specifically the influence of head or gaze orientation on touch can reflect the existence of a reference frame transformation, from a body to a visual reference frame. A gaze-based visual reference frame would be particularly adapted for multisensory integration during perception and action (Cohen and Andersen, 2002; Harris et al., 2015). The importance of reference frame transformation into a unified head-centered or gaze-centered reference frame also reflects the important role of the head in defining the self. The use of a unified head-centered perspective allows the observer to perceive a unified external world, distinct from the self (Arnold et al., 2017).

Regarding spatial perspectives, when judging whether an object is located to the left or to the right of another person who has an ambiguous body posture (i.e., head rotated in yaw relative to the trunk), the reference frame used to make left-right judgments has been reported to result on a weighted combination of the person's head and trunk reference frames (Alsmith et al., 2017). The influence of body posture on the location and orientation of the self (Alsmith and Longo, 2014; Alsmith et al., 2017) and on other spatial processes, such as mental rotation (Amorim et al., 2006) or perspectivetaking (Kessler and Thomson, 2010; Arnold and Auvray, 2017), has indeed been described to reflect the involvement of embodied processes. According to this view, spatial cognition involves not only spatial representations but also motor and somatosensory representations of the body (Renault et al., 2018). More specifically, mentally displacing the self to adopt a decentered perspective would involve both a mental change in body posture and an emulation of the movements that would be necessary to physically place the body in a novel position and orientation.

Given the important role of somatosensory information and somatosensory representations in spatial cognition, what happens when bodily sensations are deficient? Previous studies have reported that somatosensory loss has profound consequences on spatial cognition in two rare cases of massive yet selective deafferentation. These two patients have lost proprioceptive and tactile afferents from below the neck (IW) and from the nose down (GL) due to a sensory neuropathy (for a more elaborate description of the patients, see Cole and Paillard, 1995; Miall et al., 2018). First, somatosensory loss has been reported to affect judgments of self-orientation as well as object orientation (Bringoux et al., 2016). For instance, to judge the orientation of external objects relative to gravity, GL is more influenced by visual surrounding than controls in a classic rod-and-frame test (Oltman, 1968) in which participants have to align a rod with the gravitational vertical. In addition, contrary to controls, GL is insensitive to self-rotation in pitch relative to gravity up to 18°. Second, somatosensory loss has been reported to impact imagery processes (Ter Horst et al., 2012). Compared to controls, IW has impaired motor imagery but enhanced visual imagery performance in mental rotation tasks. For instance, when judging the orientation of seen corporeal objects (e.g., hands rotated in roll relative to gravity), contrary to controls, IW's mental rotation processes are not influenced by the orientation of his own hands, suggesting the use of a visual strategy, rather than a motor one. Taken together, these results show that deafferented patients differ

to front), lateral (left to right), and vertical (foot to head) body axes, respectively. For rotations of specific body parts (e.g., the head), axes of rotation can be defined relative to the same reference frame but with an additional reference to the body part relative to which the moving body part rotates. For instance, turning the head toward the left shoulder when keeping an upright posture relative to gravity can be defined as rotating the head in yaw relative to the trunk.

from controls in spatial cognition, both with respect to the used perceptual cues (which is obvious considering the patients' somatosensory loss) and to the individual strategies that are involved (see also Renault et al., 2018).

In the present study, we investigated the role of somatosensory information and the impact of somatosensory loss when making spatial judgments directly relative to oneself. The graphesthesia task, which consists of recognizing tactile ambiguous letters (e.g., b, d, p, and q) drawn on the body surface, is an optimal tool to evaluate the spatial perspectives that are adopted to interpret tactile stimulation (Natsoulas and Dubanovski, 1964; Parsons and Shimojo, 1987; Sekiyama, 1991; Ferrè et al., 2014; for a review, see Arnold et al., 2017). When drawing ambiguous letters on the body surface, different spatial perspectives can be adopted, either self-centered (i.e., centered on one body part) or decentered (i.e., centered on a location external to the body). The participants' responses can be used to infer the spatial perspective they have adopted. For instance, when the letter "b" is drawn on a participant's forehead (from the experimenter's viewpoint), the recognition of the letter "b" requires the participant to adopt the experimenter's perspective, hence a decentered perspective. However, if the participant adopts a self-centered perspective, centered on the forehead, the letter may be recognized as the mirror-reversed letter "d," as if the letter was mentally projected forward the participant (see **Figure 2A**).

Individual differences in the adoption of spatial perspectives have been reported. Most people spontaneously adopt a selfcentered perspective, whereas some people adopt a decentered one (approximately 20% for the latter, Arnold et al., 2016). The adoption of spatial perspectives is also influenced by the physical body posture (Natsoulas and Dubanovski, 1964). For

perspective whose origin is located inside the head. *Source:* Arnold et al., 2017. (B) Illustration of the results reported by Natsoulas and Dubanovski (1964) showing that the adoption of self-centered versus decentered perspectives on the sides of the head depends on the orientation of the head in yaw relative to the trunk. With an ambiguous posture (head and trunk misaligned), the left-right axis of the observer's egocentric reference frame may be assigned with respect to the head or the trunk. When a tactile letter is drawn on the side of the head, with such an ambiguous posture, the left-right axis of the letter is assigned with respect to the trunk. Figure 2A is reprinted from Consciousness and Cognition 56. Arnold, G., Spence, C., and Auvray, M.

instance, when ambiguous letters are drawn on the left and right sides of the head, self-centered and decentered perspectives are adopted equally often when the head is oriented looking forward in the same direction as the trunk. However, the adoption of self-centered versus decentered perspectives varies with the orientation of the head in yaw relative to the trunk. Indeed, when the head is rotated leftward or rightward (i.e., toward the left or right shoulder), the sides of the head are aligned with the front of the trunk and people mostly adopt a self-centered perspective (see **Figure 2B**). Taken together, these results can be interpreted as reflecting the role of both trunk and head orientations in spatially defining the self relative to the body (see also O'Brien and Auvray, 2016, for the role of hand orientation on the adopted perspective).

The first aim of the present study was to evaluate the impact of somatosensory loss in adopting self-centered versus decentered perspectives. To do so, the performance of two well-characterized deafferented patients and 20 age-matched controls in the graphesthesia task was compared. Ambiguous letters (b, d, p, and q) were manually drawn on people's forehead, left side, and right side of the head, with the head aligned or rotated in yaw relative to the trunk. For control participants, previous work made us expect that adopting one or the other perspective should be influenced by the orientation of the head in yaw relative to the trunk, specifically when the ambiguous letters are drawn on the sides of the head (Natsoulas and Dubanovski, 1964). More specifically, the selfcentered perspective should be adopted more often on the left side when the head is oriented rightward rather than forward and on the right side when the head is oriented leftward rather than forward.

Following previous works on the impact of sensory loss on spatial cognition (Ter Horst et al., 2012; Bringoux et al., 2016), we hypothesized that, due to their massive sensory loss, the two deafferented patients' responses should be less influenced by their body posture than controls. However, as their locus of somatosensory loss differ (from neck and from nose down), the two patients should differ in the influence of body posture on the adopted perspective. As the crucial manipulation in our experiment is the orientation of the head in yaw relative to the trunk, proprioception of the neck should play a specific role. For instance, neck proprioception has been reported to play a role in posture stability, allowing the central nervous system to consider misalignment between the head and trunk (Blouin et al., 2007). Consequently, as proprioception of the neck is preserved for IW but not for GL, IW should be more influenced by body posture than GL. Finally, considering that somatosensory sensation is crucial to perform egocentric judgments (Lackner, 1988; Bringoux et al., 2016), we also hypothesized that deafferented patients may preferentially rely on a decentered perspective. However, any preference in perspective is likely mediated by strategies developed as a function of individual characteristics (Arnold et al., 2017).

The second aim of the present study was to investigate the influence of visual information on the adoption of selfcentered versus decentered perspectives to interpret tactile stimulation. Judgments of self-orientation rely both on visual and somatosensory cues (Dyde et al., 2006; Barnett-Cowan and Harris, 2008; Barnett-Cowan et al., 2010). In the graphesthesia task, adopting a decentered perspective can be considered as adopting the perspective of the experimenter who is drawing the tactile letter or more generally the perspective of another person who is facing the participant. The adoption of decentered perspectives has been reported to be influenced by the presence (Arnold et al., 2017) and position (Cohen and Lewin, 1986) of the experimenter. More generally, the presence of another person has been reported to influence to a large extent the tendency to adopt a decentered perspective (Tversky and Hard, 2009), even when the person is not relevant for the task (Quesque et al., 2018). In the present study, the participants completed the graphesthesia task both with their eyes open and their eyes closed, that is, seeing or not the experimenter. We expected the decentered perspective to be adopted more often when the eyes are open than when they are closed, in particular for the deafferented patients who are reported to rely more on visual information than control participants (Blouin et al., 1993; Bringoux et al., 2016).

### MATERIALS AND METHODS

### Participants

Two deafferented participants with severe somatosensory loss (GL, a 70-year-old woman; IW, a 65-year-old man) and 20 age-matched control participants (mean age = 68.2 years, range = 60–78; 10 men and 10 women) completed the experiment. To summarize their impairment, GL and IW suffered from an acute sensory neuronopathy when they were 31 and 19 years old, respectively. This resulted in the specific loss of large-diameter myelinated afferents. Since then, they have lost all somatosensory modalities (kinesthesia, tendon reflexes, touch, vibration, and pressure) in their body from nose down for GL (trigeminal division 3) and from neck down for IW (C3 root level). Small sensory fiber functions, such as pain and temperature perception, were not affected and neither were the motor nerves. The somatosensory loss is massive in these two patients, and it results in severe motor deficit, as for instance, they both use a wheelchair and they are severely impaired in the absence of vision (Blouin et al., 1993; Sainburg et al., 1993; Miall et al., 2018). There was no significant difference in age between each deafferented patient and the control participants (z-score IW = −0.50; z-score GL = 0.28). None of the control participants reported having neurological or sensorimotor disorder. This study was specifically reviewed and approved by the institutional review board of the ISIR, and it was conducted in accordance with its recommendations. All the participants gave their written informed consent in accordance with the Declaration of Helsinki. They were all naive to the purpose of the experiment.

### Stimuli

The four ambiguous lowercase letters b, d, p, and q were manually drawn by the experimenter on the participants' head surfaces with a rubber tipped stylus pen. The letters were drawn in one continuous stroke, beginning from the stem and ending with the loop. The letters were as close as possible to 5 × 5 cm in size. The experimenter was trained to draw the letters with a constant speed and pressure. The duration for tracing each letter was approximately 2 s. The letters were drawn on the center of the forehead and on the left and right temples. The tactile perception of the two deafferented participants was tested for these head surfaces before the experiment, and they both confirmed perceiving correctly the letters.

### Procedure

Each participant was comfortably seated on a chair during the experiment. On each trial, one of the four letters was drawn on the participants' surface of the head. The participants were instructed to verbally report the letter they perceived as spontaneously as possible. They were informed that each letter could be recognized in different ways, depending on how they assign the left-right and top-bottom axes of the letter, and that there were consequently no correct or incorrect responses. The reported response was registered by the experimenter before drawing the next letter.

The participants' head orientation in yaw relative to the trunk varied according to three different conditions: forward (i.e., aligned with the trunk), leftward (i.e., turned toward the left shoulder), and rightward (i.e., turned toward the right shoulder). For the leftward and rightward orientations, the participants were instructed to turn the head as close as possible to a 90° rotation in yaw, without feeling any discomfort. The participant's head was rotated around 60–70° in yaw relative to the trunk. The degree of head rotation was similar for the two patients and the controls and for the two directions of rotation (i.e., leftward and rightward). All along the experiment, the experimenter corrected the participants' head position if the rotation they performed did not match the one they achieved in the first set of four trials or if they performed head rotation in roll or in pitch. For each condition, the participants held their head rotated for four consecutive trials (i.e., approximately 20 s, corresponding to the tracing of the four letters plus the participants' answers). After this delay, the participants were asked to move their head to the next position. The experimenter frequently asked the participants about their fatigue or discomfort and encouraged them to take a break between two conditions whenever they feel tired. Note, however, that neither the control participants nor the patients reported neck fatigue due to the different head positions.

During the session with eyes closed, the participants were asked to close their eyes before turning the head and to keep their eyes closed during the four consecutive trials of each condition. However, they could open their eyes between two conditions. For some participants, the eyes-closed head turning varied relative to eyes open, not only in yaw but also in roll or pitch. In these cases, the experimenter corrected the head position. The degree of head rotation was thus similar in the sessions with eyes closed and open.

### Design

The experiment was divided into two sessions, one with eyes open and the other with eyes closed. The two deafferented patients performed the graphesthesia task with eyes open first and then eyes closed. For the control participants, in order to control for any order effect, half of them began with eyes open, whereas the other half began with eyes closed. Each of the two sessions was divided into two blocks of 36 trials, with a short break in between, resulting in a total of 144 trials for the entire experiment. Note that, due to fatigue, IW has not completed the last session of the experiment (i.e., the second block of trials of the session with eyes closed; he thus completed a total of 108 trials out of 144). In each block of trials, there were nine conditions resulting from the combination of the three head surfaces and the three head orientations. Thus, in each of the two sessions, there were eight trials for each condition (two presentations of each of the four letters). In each block of trials, the four letters were drawn consecutively with the same head surface and head orientation. The order of the nine conditions (3 head surface × 3 head orientation) in one block and the order of the four letters for each of the nine conditions were randomized for each participant.

### Data Analysis

Each of the participants' responses was categorized as resulting from the adoption of a self-centered perspective (e.g., response d for the letter b from the experimenter's point of view) or a decentered one (e.g., response b for the letter b). The responses corresponding to vertical inversions (e.g., response p or q for the letter b) represented only 2.7% of trials overall. They were considered as errors and they were excluded from subsequent analyses. After excluding the errors, the proportion of selfcentered responses was computed for each participant and each condition. To compare the results of GL and IW with those of control participants, t-test comparisons of a single value to a population sample was used (Nougier et al., 1996; Sarlegna et al., 2010). 95% confidence intervals were also provided.

### RESULTS

### Global Preferences for Self-Centered Versus Decentered Perspectives

All the participants, including the two deafferented patients, felt very well the stimulation (b, d, p, or q) on their forehead and sides of their head. Most of the control participants' responses corresponded to the adoption of a self-centered perspective (68.3%, *SD* = 39.9). **Figure 3** represents the participants' global proportion of self-centered responses (median = 71.5%, *Q1* = 51.6%, *Q3* = 94.5%, min = 6.9%, max = 100.0%). It shows a clear bias toward the adoption of self-centered perspectives for control participants with, however, an important interindividual variability. Moreover, only five control participants reported decentered responses most of the time (i.e., superior to 50%). Among them, only two participants reported more than 75% of decentered responses. Regarding the two deafferented patients' responses, GL reported a strong majority of self-centered responses (96.5%, *SD* = 7.2), whereas IW reported most of the time decentered responses (83.3%, *SD* = 15.3). Their proportion of self-centered responses were both significantly different from the control participants' proportion (*t*(19) = 4.56, *p* < 0.001, *η*<sup>2</sup> = 0.523, for

GL; *t*(19) = 8.01, *p* < 0.001, *η*<sup>2</sup> = 0.772, for IW) and beyond the 95% confidence interval of the control participants' proportion [95% *CI* = (55.4, 81.3)]. For control participants, the slightly greater proportion of self-centered responses in female (75.3%, *SD* = 25.6) than male (61.3%, *SD* = 29.2) was not significant (*t*(18) = 1.14, *p* = 0.269, *η*<sup>2</sup> = 0.067).

### Effects of Body Posture and Vision

To evaluate the effect of body posture on the adoption of self-centered versus decentered perspectives in control participants, an ANOVA was conducted on the proportion of self-centered responses with orientation of the head (forward, leftward, rightward), stimulated surface (forehead, left side, right side), and vision (eyes open, eyes closed) as withinparticipant factors and order between eyes open and eyes closed as a between-participant factor. There was a significant effect of the stimulated surface [*F*(2,38) = 14.28, *p* < 0.001, *η*2 = 0.429]. The proportion of self-centered responses was significantly greater for the forehead (mean = 84.1%, *SD* = 26.7) than for the two sides of the head [*F*(1,19) = 19.78, *p* < 0.001, *η*2 = 0.510], with no significant differences [*F*(1,19) = 1.34, *p* = 0.262, *η*<sup>2</sup> = 0.066] between the left (mean = 62.8%, *SD* = 31.1) and right (mean = 58.1%, *SD* = 33.8) sides of the head.

Importantly, there was a significant interaction between the orientation of the head and the stimulated surface [*F*(4,76) = 8.87, *p* < 0.001, *η*<sup>2</sup> = 0.318]. **Figure 4** shows that the proportion of self-centered responses on the forehead was not influenced by the orientation of the head in yaw relative to the trunk (83.9%, *SD* = 27.4, for the head oriented forward; 83.3%, *SD* = 27.5, for the head oriented leftward; 85.0%, *SD* = 27.2, for the head oriented rightward). On the contrary, the proportion of self-centered responses on the sides of the head was influenced by the orientation of the head in yaw relative to the trunk. When the head was oriented forward, the proportion of self-centered responses did not significantly differ from chance level (59.7%, *SD* = 33.8, *t*(19) = 1.28, *p* = 0.216, *η*<sup>2</sup> = 0.079, for the left side; 55.9%, *SD* = 36.6, *t*(19) < 1, *ns*, for the right side). When the head was oriented leftward, the proportion of self-centered responses on the right side (67.1%, *SD* = 32.5) was significantly greater than when the head was oriented forward [*F*(1,19) = 7.56, *p* < 0.05, *η*2 = 0.285] and it became significantly superior to chance level [*t*(19) = 2.35, *p* < 0.05, *η*<sup>2</sup> = 0.225]. Finally, when the head was oriented rightward, the proportion of self-centered responses on the left side (70.1%, *SD* = 33.1) was significantly greater than when the head was oriented forward [*F*(1,19) = 5.98, *p* < 0.05, *η*<sup>2</sup> = 0.240] and it became significantly superior to 50% [*t*(19) = 2.72, *p* < 0.05, *η*<sup>2</sup> = 0.280]. The adoption of self-centered versus decentered perspectives in control participants was therefore influenced by the stimulated surface and the body posture. However, **Figure 4** also shows that there was an important interindividual variability in the adopted perspective in every condition.

Regarding the deafferented patients, the adoption of self-centered versus decentered perspectives does not appear to be influenced by the orientation of the head relative to the trunk. GL was influenced neither by the orientation of the head nor by the stimulated surface as she almost systematically adopted a selfcentered perspective (see **Figure 4**). More specifically, when the sides of her head were stimulated, she adopted a self-centered perspective, whatever the orientation of her head. Contrary to control participants, for the forward orientation, her proportion of self-centered responses (100%) was well above chance level. It was significantly different from the control participants' proportion [*t*(19) = 5.67, *p* < 0.001, *η*<sup>2</sup> = 0.629] and beyond the 95% confidence interval of the control participants' proportion [95% *CI* = (42.2, 73.4)]. Thus, contrary to control participants, she adopted a self-centered perspective even when the tactile letters were drawn on a side surface of the head, which was not aligned with the front surface of the trunk.

IW's results reflect a preference for a decentered perspective. Consequently, **Figure 4** shows fewer self-centered responses for him than for GL and control participants. Even though his preference was not as strong and systematic as that of GL, the preference for a decentered perspective was clear for the forehead (93.8%), left (73.2%), and right (83.0%) sides of the head. Importantly, when the head was oriented forward, the proportion of self-centered responses (25.0%) on the sides of the head was significantly different from the control participants' proportion [*t*(19) = 4.41, *p* < 0.001,

*η*2 = 0.506] and was beyond the 95% confidence interval of the control participants' proportion [95% *CI* = (42.2, 73.4)]. Thus, contrary to control participants and similarly to GL, IW adopted a constant perspective across conditions even when the tactile letters were drawn on a side surface of the head, which was not aligned with the front of the trunk. Taken together, these results show that GL was clearly not influenced by the orientation of her head in yaw relative to her trunk when adopting a self-centered perspective, whereas IW adopted mostly a decentered perspective, with more variability than GL, but without showing the same pattern of responses than control participants.

Finally, there was no significant main effect of vision in control participants (68.9%, *SD* = 26.2, for eyes open; 67.8%, *SD* = 30.1, for eyes closed; *F*(1,19) <1, *ns*), but there was a significant interaction between vision and the orientation of the head [*F*(1,19) = 3.76, *p* < 0.05, *η*<sup>2</sup> = 0.165]. This interaction showed a significant effect of vision only when the head was oriented leftward [*F*(1,19) = 5.66, *p* < 0.05, *η*<sup>2</sup> = 0.230], with a greater proportion of self-centered responses with eyes open (72.5%, *SD* = 26.2) than with eyes closed (66.7%, *SD* = 29.8), but not when the head was oriented forward [*F*(1,19) = 1.87, *p* = 0.188, *η*<sup>2</sup> = 0.089] and rightward [*F*(1,19) <1; *ns*]. Regarding the deafferented patients, GL showed no effect of vision (95%, for eyes open; 97.2%, for eyes closed), whereas IW reported a slightly greater proportion of decentered responses when his eyes were open (86.1%) than when they were closed (80.6%). This difference of 5.5 points of percentage is significantly different from the control participants' difference [*t*(19) = 2.63, *p* < 0.05, *η*<sup>2</sup> = 0.267] and beyond the 95% confidence interval of the control participants' vision effect [95% *CI* = (−6.4, 4.2)]. This result suggests a greater bias toward adopting the experimenter's perspective when the experimenter is visible than when he is not.

### DISCUSSION

The present study investigated the role of somatosensory and visual information in the adoption of self-centered versus decentered perspectives. Two deafferented patients (GL and IW) and 20 age-matched control participants performed the graphesthesia task with ambiguous symbols drawn on the forehead, left side, and right side of their head. The orientation of the head in yaw relative to the trunk and the possibility to open or not the eyes were also manipulated to assess the influence of body posture and vision. Regarding control participants, the adoption of a self-centered versus decentered perspective depended on head orientation relative to the trunk. Regarding the deafferented patients, the orientation of the head in yaw relative to the trunk did not influence the adopted perspective, suggesting that somatosensory loss impacts self orientation. Contrary to controls, deafferented patients adopted a self-centered or a decentered perspective even for side surfaces of the head which were not aligned with the front surface of the trunk. Finally, only IW showed a slight effect of vision, with a greater preference for a decentered perspective when the eyes were open than when they were closed, that is, when the experimenter was visible than when he was not. Neither the control participants, nor GL, showed a significant effect of vision.

Self-centered perspectives were adopted in controls for tactile letters drawn on the forehead or on side surfaces of the head which were aligned with the front surface of the trunk. In these conditions, the left-right axis of the tactile letter is aligned with the left-right axis of the head or the trunk. This result confirms the previously reported role of both head and trunk orientations in making spatial judgments relative to the body and the self (Natsoulas and Dubanovski, 1964; Alsmith et al., 2017). The head likely plays a specific role due to the presence of several sensory systems in this body part. The trunk may also be important due to its central place in the body. Head and limb orientation can thus be easily defined relative to the trunk. The transformation of multisensory reference frames into a unified body-centered reference frame, which allows the observer to adopt a unique self-centered perspective on the external world, perceived as being distinct from the self, strongly relies on somatosensory information (see Arnold et al., 2017). For instance, neck proprioception is important to consider the orientation of the head relative to the trunk.

The results obtained with the two deafferented patients clearly show that somatosensory loss impacts the spatial perspectives that are adopted to interpret ambiguous tactile stimulations. Contrary to control participants, the perspective adopted by the patients did not depend on the orientation of their head in yaw relative to their trunk. Although IW has access to proprioceptive information about his neck, which GL has not (Cole and Paillard, 1995), he was not strongly influenced by his head orientation in yaw. However, IW's access to neck proprioceptive information may explain why his results are more variable than GL's ones in the graphesthesia task. A possible explanation to this variability is that his global somatosensory loss does not encourage him to use efficiently his preserved neck proprioceptive information. It would be interesting to evaluate further the role of neck proprioception, for instance, with head rotation in roll or in pitch relative to the trunk. As the letters b, d, p, and q are ambiguous not only along the horizontal axis but also along the vertical axis, the graphesthesia task with head rotation in roll relative to the trunk would be particularly interesting as it allows manipulating the vertical head axis relative to both the vertical trunk axis and gravity. With this manipulation, patients might be influenced more by gravity than by body posture, compared to controls.

Our results do not support the hypothesis that deafferented patients rely mostly on a decentered perspective due to a deficit in adopting an egocentric reference frame. While IW mainly adopted a decentered perspective, GL clearly preferred a selfcentered one. IW seemed to adopt a strategy based on external information and consisting in taking the experimenter's perspective, and he confirmed such strategy during a debriefing following the experimental session. IW's strategy, which relies on imagining how the letter could be seen by the experimenter, is compatible with his great reliance on visual imagery (Ter Horst et al., 2012). On the contrary, GL adopted more an internal strategy, with a systematic choice for a head-centered perspective. Note, however, that GL's internal strategy may also be visual, as she indicated having mentally projected the letters outside her body, in front of her eyes, during the post-experiment debriefing. Thus, GL's systematic adoption of a self-centered perspective is compatible with her previously reported dependence for visual information (Bringoux et al., 2016).

Using the graphesthesia task, Ferrè et al. (2014) have shown that self-centered perspectives are mostly adopted when the processes anchoring the self to the body are reinforced, highlighting the important role the body plays in the sense of self. The massive somatosensory loss in deafferented patients has the consequence that the self is less anchored to the body. Thus, self-orientation may rely more on external information, with an important visual dominance, and it may involve more cognitive strategies. During the debriefing, the two patients have indicated having chosen a given perspective that they kept during the entire experiment. Such use of a cognitive strategy may explain the lower variability in their responses than that of control participants. It might be the case that when the information coming from the body is no longer accessible, the sense of bodily self is more thought than felt. This view is consistent with the hypothesis that the body schema, involving a set of motor abilities and habits that enable movements and the maintenance of body posture, is deficient in deafferented patients, whereas the body image, which consists of a set of intentional states and mental representations of one's own body, is preserved (Gallagher and Cole, 1995).

It remains to understand why the two patients have adopted so different individual strategies. A recent study with these two deafferented patients, investigating their ability to develop and use spatial maps, suggests that individual differences, and thus strategies, may influence their spatial cognition even more than visual or somatosensory signals (Renault et al., 2018). Studies using the graphesthesia task have indicated several perceptual, cognitive, personal, and interpersonal factors that induce individual differences in the adoption of self-centered versus decentered perspectives (see Arnold et al., 2017, for a review). Differences between the two patients may be explained by gender. Males have been reported to adopt more often decentered perspectives than females (Krech and Crutchfield, 1958; Duke, 1966; Deroualle et al., 2017; but see Allen and Rudy, 1970). However, the control participants' results, similar for the two genders in the present study, do not confirm this gender effect. Some of the two patients' personality traits may have induced differences in their choice to adopt a self-centered versus a decentered perspective. More generally, the results of both controls and patients in the graphesthesia task show important interindividual variability in the perspective that was overall adopted as well as in the influence of body posture. These results highlight the existence of high-level cognitive processes such as decision criteria or consistency bias, in addition to the lower level perceptual and spatial processes underlying the task. Whereas the latter are influenced by somatosensory information, the former might be similar in deafferented patients and in controls.

To conclude, the present study confirms and extends the previously reported influence of head and trunk orientations in making spatial judgments relative to the body and the self (Natsoulas and Dubanovski, 1964; Alsmith et al., 2017). This result highlights the important role the body plays in perception and self-consciousness. Adopting a self-centered perspective, which is crucial for the multisensory integration underlying self-consciousness, or a decentered one, which is crucial to understand how the world is perceived by other persons, both involve processes that are anchored to the body. When internal information coming from the body is lacking, more cognitive strategies are adopted, based on thinking about the body rather than on feeling it.

### AUTHOR CONTRIBUTIONS

GA, FS, LF, and MA designed the experiment, performed the statistical analyses, and wrote the paper. GA and LF conducted the experiments.

### REFERENCES


### FUNDING

This work was supported by the Labex SMART (ANR-11- IDEX-0004-02) and the Mission pour l'Interdisciplinarité (CNRS, Auton, Subilma Grant).

### ACKNOWLEDGMENTS

We thank GL, IW, and the control participants for participating in the study.


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Copyright © 2019 Arnold, Sarlegna, Fernandez and Auvray. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

# Self-Unity as Ground Zero of Learning and Development

#### *Philippe Rochat\**

*Department of Psychology, Emory University, Atlanta, GA, United States*

Contrary to the suggestion that we are born in a state of confusion and primordial state of a-dualism with the environment, infancy research of the past 40 years shows that from the outset, infants are objective perceivers guided by rich evolved survival values of approach and avoidance in relation to specific resources in the environment such as faces, food, or smell. This starting-state competence drives and organizes their behavior. Evidence-based ascription of self-unity at birth is discussed. Selected findings are presented suggesting that self-unity is a primordial human experience, the main organizer of behavior from the outset. Self-unity is the necessary ground zero enabling the rapid learning and development taking place early in human life.

#### *Edited by:*

*Pablo Lanillos, Technische Universität München, Germany*

#### *Reviewed by:*

*Daniela Corbetta, The University of Tennessee, Knoxville, United States Norbert Zmyj, Technical University Dortmund, Germany Stephan Alexander Verschoor, Leiden University, Netherlands*

> *\*Correspondence: Philippe Rochat psypr@emory.edu*

#### *Specialty section:*

*This article was submitted to Cognition, a section of the journal Frontiers in Psychology*

*Received: 10 August 2018 Accepted: 11 February 2019 Published: 28 March 2019*

#### *Citation:*

*Rochat P (2019) Self-Unity as Ground Zero of Learning and Development. Front. Psychol. 10:414. doi: 10.3389/fpsyg.2019.00414*

Keywords: self, self-unity, development, infancy, early cognition, self-awareness

Are we born disorganized and in need of building an awareness of the self as an organized entity among other entities? Or, on the contrary, are we born with experiential self-unity and awareness that with maturation and experience become conceptual? Much progress in infancy research of the past four decades suggests that the latter is most likely and in particular that without an initial sense of self-unity, infants would and could not develop the way they do.

Self-unity as the embodied sense of self as an *organized and differentiated entity among other entities* is ground zero of learning and development. This is true for both empirical and common sense reasons. Without such experiential unity at the origins of development, it is difficult to conceive how consciousness in general might develop, and in particular, how selfconsciousness could develop the way it is described by current child studies, emerging from around 18 months of life with social emotions like embarrassment or shame (Rochat, 2009).

The driving argument here is that learning and development early in life and beyond would rest on a primordial and necessary sense of self-unity. The question is not anymore whether such experiential unity exists from the get-go, but rather what it is made of and how it manifests itself early in the life of the individual. This, I would assume, could represent important grounding information for designers of complex artificial learning systems trying to mimic human children in their rapid and rather canalized development as this article tries to show.

## ASCRIBING EXPERIENTIAL SELF-UNITY AT BIRTH

Immanuel Kant over three centuries ago already proposed that the sense of an embodied unity is a primordial foundation of being phenomenally conscious about something. According to Kant in his critique of Pure Reason (Kant, 1781/2007), impinging sensations from the

**194**

world, including sensations from the own body as an entity among other entities in the world, become synthesized into patterns of representations eventually forming higher concepts (Brook, 1994). Current infancy research demonstrates that infants from birth do manifest unity in a Kantian sense. In particular, research show that from birth on, infants are responsive to more than discrete, isolated sensations. From birth, they differentiate sensations that originate either from within or without the body (Rochat, 2011).

Infants are born objective perceivers and actors, not simply reflex machines (Rochat and Senders, 1991; Rochat, 2001). An abundance of recent empirical evidence calls for radical revisions of strong-held beliefs and premises from which highly influential theories were built. Newborns display much more than reflexes (Piaget, 1936), a-dualism (James, 1890), or blind auto-eroticism and primary narcissism (Freud, 1905/2000). From the get-go, they behave as differentiated and organized embodied entities among other entities. We are not born in a primordial state of un-differentiation or confusion with the environment (see Rochat, 2011 for further discussion).

It appears that newborns are not just bombarded by meaningless sensory stimulations. If that were the case, we would expect newborns' behavior to be fundamentally disoriented, a mere collection of responses that would jerk them around in a disorganized manner. Ample evidence demonstrates that this is not the case (Rochat, 2001). Newborns learn and actively explore their environment, even showing evidence that pre-natal experience and learning are transferred into post-natal life (Prechtl, 1984; Hepper, 2002; Hata et al., 2010).

In recent years, researchers have established striking evidence demonstrating, for example, that few hour old newborns show active preference in hearing their mother's voice compared to another female voice (DeCasper and Fifer, 1980), or that they tend to orient toward the scent of their mother's amniotic fluid experienced in the womb compared to the scent of the amniotic fluid of a female stranger (Marlier et al., 1998a,b). Newborns transfer prenatal experience and learning into postnatal life. They memorize and recall procedural knowledge over time, orienting head and mouth significantly more when, for example, the stimulation is food or any events associated with food and comfort (faces, posture, or certain tastes as well as smells, e.g., Marlier et al., 1998a).

In short, newborns' behavior shows plasticity and is not limited to the here and now of random stimulation but includes systematic self-exploration. Van der Meer and Lee (1995), as a case in point, demonstrated, for example, that neonates engage in systematic exploration of their own arms and hands when plunged in the dark with just a thin beam of light cutting across their visual field. These findings, among many others (see Rochat, 2001), point to an experiential awareness from the outset that is organized within a stable spatial and temporal organization.

### BODY SCHEMA AT BIRTH

In relation to the body as a whole, hand-mouth coordination systematically associated with the engagement of the feeding system, as in this case of the drop of sucrose on the tongue of the infant (Blass et al., 1989), is in itself suggestive that newborns do possess rudiments of a body schema (Gallagher and Meltzoff, 1996, see also Butterworth, 1992 for a similar argument). This primitive body schema is not rigid, changing and being re-calibrated as a function of rapid motor and postural progress in the weeks following birth (e.g., developing use of hands to reach, grasp, and explore objects). The organized behavior expressed in hand-mouth coordination implies some mapping of the body whereby regions and parts of the own body are actively and systematically (as opposed to just randomly) put in contact with each other, in this case hands and mouth with a straight and orchestrated spatiotemporal trajectory. Hand-mouth coordination is also well documented in fetuses. Already during the last trimester of gestation (Hata et al., 2010), hands and mouth move in an organized and coordinated fashion, following predictable spatiotemporal patterns with signs of motor anticipation (i.e., mouth opening in anticipation of manual contact with the mouth, without any visual guidance, see also Butterworth and Hopkins, 1988).

More recent observations vindicate the existence of a body schema at birth and in the first week of life. Filippetti et al. (2013) observe that healthy newborns aged between 12 and 100 h presented visually with a pair of faces of another infant stroked with a brush and prefer to look at the child's face touched in perfect synchrony with strokes applied by an Experimenter on their own cheek. Most striking is the fact that this significant preference vanishes when the two faces of the other infant being stroked are inverted by 180 degrees (i.e., upside down presentation). These findings demonstrate that newborns detect multisensory (i.e., visual-tactile) synchrony, but to the extent that it is related to their own body schema (canonical right side up face orientation). These observations show that infants from birth do engage in body perception guided by a canonical spatial representation of the own body, i.e., a body schema (Filippetti et al., 2013).

Other data using novel experimental paradigms further support the idea of early body perception, particularly evidence of an interoceptive sensitivity. Maister et al. (2017) observe that 5-month-old infants prefer to look at an animated character that moves on a screen out of synchrony with their own heartbeat, when presented side by side with another character moving in exact synchrony with their own heartbeat. Interestingly, infants who demonstrated the strongest visual preference were also those showing brain (EEG) signals that correspond to the heart evoked potential typically reported in adult studies. Maister et al. also report that infant's interoceptive sensitivity is particularly salient when infants are presented with animated characters displaying negative emotions, which presumably increases their autonomous cardiac response.

Meltzoff et al. (2018) report new electroencephalographic data collected on 60-day-old infants demonstrating that the neural representations of tactile stimulations applied on different parts of the infant's body are topographically analogous to the well-documented somatosensory cortex organization of adults. These data further support the idea of an organized body

schema from the outset of development, or at least early in the first week following birth (i.e., 2 months, Meltzoff et al., 2018). Finally, although remaining controversial, evidence of neonatal imitation would be another expression of an implicit body awareness and representation (body schema) whereby the sight of active bodily regions in another person (the model) is mapped onto homologous regions of the own body (Meltzoff and Moore, 1977).

In all, body schema and the active propensity of neonates to bring sense modalities and regions of their own body in relation to each other are now well documented. This, in itself, supports the idea that infants sense their own body from birth as an invariant spatial structure, as rudimentary and in need of further refinement. This structure is obviously not Euclidian in the sense of not synthesized (represented) in the mind of the young infant as a precise map of accurate spatial coordinates and configurations. It does not yet entail that the infant has already a re-cognizable image of her own body (a body image). This structure is essentially topological in the sense that it is made of focal attractor regions on the body surface that have great degrees of freedom and a high concentration of sensory receptors such as mouth and fingers. This topology is embodied in action systems that are functional from birth and drive early behavior.

### IMPLICIT SELF-AWARENESS IN NEONATES

Evidence of a body schema at birth provides some theoretical ground for the ascription of implicit self-awareness from the outset (Rochat, 2009, 2011). Neonates behave in relation to their own body in ways that are different, when compared to how they behave in relation to other physical bodies that exist in independence of their own (Lee and Aronson, 1974; Butterworth and Hicks, 1977; Jouen and Gapenne, 1995). They feel and demonstrate from birth a distinct sensitivity to their own bodily movements *via* proprioception and internal (vestibular) receptors in the inner ears. New data also demonstrate that newborn perception can be modulated by a sensitivity to their own heartbeat (Maister et al., 2017). Interoceptive, proprioceptive, and vestibular sensitivities are well developed and operational at birth. They are sense modalities of the self par excellence.

As expression of self-world discrimination, neonates root significantly more with head and mouth toward a tactile stimulation from someone else's finger than from their own hand touching their cheek (Rochat and Hespos, 1997). Rather than being in a state of fusion and confusion with the environment, few hours old infants pick up visual information that specifies movements of their own body or ego-motion while they in fact remain stationary. Like adults sitting in a stationary train while watching another train moving, neonates experience the illusion of moving. Research demonstrate that, like us, they adjust their bodily posture according to changes in direction of an optical flow that is presented in the periphery of their visual field (Jouen and Gapenne, 1995). This kind of observations point to the fact that from birth, infants are endowed with the perceptual, qua inter-modal capacity to pick up and process *self-specifying* information (Butterworth, 1992; Rochat, 2001).

Neonates experience the body as an invariant locus of pleasure and pain, with a particular topography of hedonic attractors, the mouth region being the most powerful of all, as noted by Freud years ago. Within hours after birth, in relation to this topography, infants learn and memorize sensory events that are associated with pleasure and novelty: they selectively orient to odors associated with the pleasure of feeding and they show basic discrimination of what can be expected from familiar events that unfold over time and that are situated in a space that is embodied, structured within a body schema. But if it is legitimate to posit an a-priori "embodied" spatial and temporal organization of self-experience at birth, what might be the content of this experience aside from pleasure, pain, and the excitement of novelty?

The proprioceptive sense of the body is, from birth on, a necessary correlate of most sensory experiences of the world. As proposed by Gibson (1979), to perceive the world is to *co-perceive* oneself in this world. In this process, proprioception or the muscular and skeletal sense of the body in reference to *itself* is indeed the sense modality of the self. From birth, proprioception alone or in conjunction with other sense modalities specify the own body as a differentiated, situated, and eventually also agent entity among other entities in the world. This corresponds to what Neisser (1988, 1991) first coined as the "ecological self," a self that can be ascribed to infants from birth. As pointed by Neisser (1995), criteria for the ascription of an ecological self rests on the behavioral expression by the individual of both an awareness of the environment in terms of a lay out with particular affordances for action and an awareness of the own body as a motivated agent to explore, detect, and use these affordances (Neisser, 1995; Rochat, 2011).

Newborns appear to meet the criteria for such awareness. They also seem to possess an a-priori awareness that their own body is a distinct entity that is bounded and substantial, as opposed to disorganized and "airy" (Rochat, 2001, 2011, 2012). Immediately after birth, infants perform self-oriented acts by systematically bringing hand to mouth, as already mentioned. In these acts, the mouth tends to open in anticipation of manual contact and the insertion of fingers into the oral cavity for chewing and sucking (Blass et al., 1989; Watson, 1995; Rochat, 2011). What is instantiated in such systematic acts is what would amount to an *organized body schema* (Rochat, 2012). These acts are not just random and cannot be reduced to reflex arcs. Hand and mouth are coordinated and not automatically triggered. It is a systematically orchestrated activity oriented toward an oral goal. It constitutes an open-looped and flexible system in contradistinction to the basic constitution of reflexes that are triggered and automatic, fundamentally closed-loop systems.

Hand-mouth coordination in neonates needs to be construed as functionally self-oriented acts proper. Because they bring body parts in direct relation to one another, as in the case of hand-mouth coordination, they provide neonates with invariant sensory information specifying the own body's quality as bounded substance, with an inside and an outside, specified by particular texture, solidity, temperature, elasticity, taste, and smell.

As discussed in previous works on the origins of selfperception and self-consciousness (Rochat, 2011, 2012), the a-priori awareness of the own body as a bounded substantial entity is evident in neonates' postural reaction and gestures when – for example – experiencing the impending collision with a looming visual object, an event that carries potentially life-threatening information. In a classic study performed years ago, it was reported that neonates aged 2–11 weeks manifest head withdrawal and avoidant behavior when exposed to the explosive expansion of an optic array that specifies the impending collision of an object. When viewing expanding shadows specifying an object either receding or on a miss path in relation to them, infants do not seem not manifest any signs of upset or avoidant behavior (Ball and Tronick, 1971). In a follow-up experiment, Carroll and Gibson (1981) report that 3-month olds facing a looming object with a large aperture do not show signs of avoidant behavior. Rather, they are reported leaning forward as if they wanted to look through the aperture. These observations indicate that very early on infants manifest what seems to be an a-priori awareness of their own body as substantial: a unified entity among other entities occupying space, thus potential obstacles and source of collisions.

### CONCLUSION: SELF-UNITY AND DEVELOPMENT

I tried to show that behavioral research over the past 40 years and current studies based on novel physiological and behavioral recording techniques (Maister et al., 2017; Meltzoff et al., 2018) demonstrate that the human neonate has rudiments of an experiential self-awareness that has unity, this unity justifying ascription of an implicit and embodied self-unity at birth, in other words the ascription of a minimal and necessary selfawareness from the outset of development.

In relation to development, the question is not how we eventually become mindful and self-aware from a starting state of confusion. It is not how we eventually become endowed with a strong mind pulling out of a primitive state of computational weakness, non-differentiation, and selflessness. Rather, based on what we now know about neonates, the question is how does the implicit awareness of the embodied self expressed already at birth come to be also explicit and conceptual by the second year, as children become self-conscious proper. How the experiential *I* eventually becomes the conceptual *Me*, and what might drive such development?

That is the perennial question of developmental psychology that not only infant and child researchers but also evolutionary and comparative psychologists keep tackling on all fronts (see Rochat, 2018). This effort is based on a new generation of behavioral paradigms trying to capture self-consciousness in human ontogeny, using, for example, as proxies first physiological signs of embarrassment (Lewis et al., 1989), and the sense of being potentially evaluated by others (emergence of evaluative audience perception – EAP, see Botto and Rochat, 2018).

In recent years, developmental cognitive neuroscience research yielded new neural markers of experiential awareness at birth, and even during the fetal stages of development. For example, first evidence of consciousness might be correlated with the development of functional neural pathways that link thalamus and sensory cortex already by the third trimester of gestation, or even earlier with the emergence of functional pathways necessarily involved in conscious pain perception (Lee et al., 2005). If there is a renewed effort in mapping pre- and postnatal brain growth, using neural markers that would correlate with levels of consciousness achieved by children in their development, we are still far from explaining the actual mechanisms that would drive such development. If there is a positive correlation between brain growth and levels of consciousness, including levels of embodied self-consciousness achieved by the child (see Zelazo et al., 2007), we are still far from a causal explanation.

The argument of unity and selfhood at birth rests on the idea that the development of self-awareness, from the implicit *I* to the conceptual *Me*, presupposes a representation to begin with what Zelazo (2004) labels "minimal consciousness" in his model of consciousness development. It is this minimal "embodied" consciousness in the newborn that I tried to emphasize in this article. However, aside from the empirically informed depiction of a starting state awareness and the distinction between various levels of experiential awareness and representation expressed by children in their development, the question of what might be the causes or developmental triggers of processes such as the spontaneous representational re-description mechanism proposed some years ago by Annette Karmiloff-Smith (1992) remains wide open. This is particularly true in light of the fact that such process appears to exist prior to language which is often considered as the major determinant of reflexive consciousness and meta-cognitive capacities, what Vygotsky (1978) viewed as internalized thinking derived from language acquisition.

Language and its progressive mastery do certainly play a causal role in the development of new explicit levels of consciousness. We do not have to assume that language shapes the mind, to recognize that language use by the child in interaction with scaffolding others and its progressive mastery does unquestionably contribute to children's reaching new levels of abstraction and representational re-description. But to a large extent, we are still very much agnostic as to what might trigger such re-description prior to language and what might lead infants in particular to re-describe their startingstate unity and sense of selfhood to eventually become explicit and conceptual about it. We can assume, however, that from the outset, social interactions with more advanced and linguistically competent others play a central role in infants' advances toward more abstract levels of embodied self-awareness (Vygotsky, 1978; Tomasello, 2019).

These developmental issues form a challenge that is worth embracing because the way children develop and what develops in their experience of the world, including their own body can reveal much of the building blocks and layers of human consciousness in general, human self-consciousness in particular (Rochat, 2003).

Those designing and building learning machines could gain from evidence regarding the self-unifying starting state of newborns, the "ground zero" of rapid learning and development in infancy and beyond.

### REFERENCES


### AUTHOR'S NOTE

This article is a novel elaboration and synthesis of similar ideas developed previously (e.g., Rochat, 2001, 2011, 2012, 2016, see reference list).

### AUTHOR CONTRIBUTIONS

The author confirms being the sole contributor of this work and has approved it for publication.


Rochat, P. (2001). *The infant's world*. Cambridge (MA): Harvard University Press. Rochat, P. (2003). Five levels of self-awareness as they unfold early in life.


Rochat, P. (2012). "Primordial sense of an embodied self-unity" in *Early development of body representations, cambridge studies in cognitive and perceptual development*. eds. V. Slaughter and C. Brownell (N.Y.: Cambridge University Press).


Tomasello, M. (2019). *Becoming human*. Cambridge, MA: Harvard University Press.

Van der Meer, A., and Lee, D. (1995). The functional significance of arm movements in neonates. *Science* 267, 693–695. doi: 10.1126/science.7839147


**Conflict of Interest Statement:** The author declares that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Copyright © 2019 Rochat. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

# How Cognitive Models of Human Body Experience Might Push Robotics

#### Tim Schürmann<sup>1</sup> \*, Betty Jo Mohler <sup>2</sup> , Jan Peters 3,4 and Philipp Beckerle5,6

<sup>1</sup> Work and Engineering Psychology Research Group, Human Sciences, Technische Universität Darmstadt, Darmstadt, Germany, <sup>2</sup> Amazon, Tübingen, Germany, <sup>3</sup> Intelligent Autonomous Systems Group, Department of Computer Science, Technische Universität Darmstadt, Darmstadt, Germany, <sup>4</sup> Max Planck Institute for Intelligent Systems, Tübingen, Germany, <sup>5</sup> Elastic Lightweight Robotics, Department of Electrical Engineering and Information Technology, Robotics Research Institute, Technische Universität Dortmund, Dortmund, Germany, <sup>6</sup> Institute for Mechatronic Systems, Mechanical Engineering, Technische Universität Darmstadt, Darmstadt, Germany

In the last decades, cognitive models of multisensory integration in human beings have been developed and applied to model human body experience. Recent research indicates that Bayesian and connectionist models might push developments in various branches of robotics: assistive robotic devices might adapt to their human users aiming at increased device embodiment, e.g., in prosthetics, and humanoid robots could be endowed with human-like capabilities regarding their surrounding space, e.g., by keeping safe or socially appropriate distances to other agents. In this perspective paper, we review cognitive models that aim to approximate the process of human sensorimotor behavior generation, discuss their challenges and potentials in robotics, and give an overview of existing approaches. While model accuracy is still subject to improvement, human-inspired cognitive models support the understanding of how the modulating factors of human body experience are blended. Implementing the resulting insights in adaptive and learning control algorithms could help to taylor assistive devices to their user's individual body experience. Humanoid robots who develop their own body schema could consider this body knowledge in control and learn to optimize their physical interaction with humans and their environment. Cognitive body experience models should be improved in accuracy and online capabilities to achieve these ambitious goals, which would foster human-centered directions in various fields of robotics.

#### Edited by:

Pablo Lanillos, Technische Universität München, Germany

#### Reviewed by:

Anja Kristina Philippsen, Center for Information and Neural Networks (CiNet), Japan Guido Schillaci, Humboldt-Universität zu Berlin, Germany

#### \*Correspondence:

Tim Schürmann schuermann@ psychologie.tu-darmstadt.de

Received: 30 October 2018 Accepted: 21 March 2019 Published: 11 April 2019

#### Citation:

Schürmann T, Mohler BJ, Peters J and Beckerle P (2019) How Cognitive Models of Human Body Experience Might Push Robotics. Front. Neurorobot. 13:14. doi: 10.3389/fnbot.2019.00014

Keywords: cognitive models, human body experience, multisensory integration, robotics, assistive devices, humanoids

### 1. INTRODUCTION

Multisensory integration is a key cognitive function for human body experience (Giummarra et al., 2008; Christ and Reiner, 2014) and cognitive modeling research suggests that it is performed in a Bayesian manner (Deneve and Pouget, 2004; Körding et al., 2007; Orbán and Wolpert, 2011; Clark, 2013). Sun (2008) defines cognitive models as computational models relating to one or multiple cognitive domains or functionalities. While this model class is occasionally referred to as computational models, the authors rely on the term "cognitive models" to reduce ambiguity with relation to Marr (1982) computational level of analysis, to which cognitive models do not need to be limited to. Cognitive models of the aforementioned integration processes consider sensorimotor precision with respect to the corresponding individual modalities (Berniker and Körding, 2011) and can determine posterior estimates based on prior knowledge and sensory information.

From the authors' perspective, modeling, and simulating multisensory integration mathematically could potentially help to endow (humanoid) robots with more human-like capabilities and improve scenarios with tight physical humanrobot interaction, e.g., in assistive devices. The increased interest and progress made toward such capabilities has stimulated research in this direction from which we can draw on a variety of works on robotic self-perception (Sturm et al., 2009; Ulbrich et al., 2009; Lanillos et al., 2017; Lanillos and Cheng, 2018), reviews analyzing connections between human body experience and robotics (Hoffmann et al., 2010; Schillaci et al., 2016; Beckerle et al., 2017) as well as recent works that propose cognitive models of bodily illusions using Bayesian approaches (Samad et al., 2015). Such illusions rely on targeted modulations of multisensory stimulation and make participants perceive artificial limbs as their own (Botvinick and Cohen, 1998; Giummarra et al., 2008; Christ and Reiner, 2014).

Obviously, such effects are of utmost interest for assistive robotics since exploiting them by means of control could help to integrate such devices into their user's body schema (Ehrsson et al., 2008; Christ and Reiner, 2014; Beckerle et al., 2017). Moreover, the body schema is directly connected to the sense of agency (Longo et al., 2008; Kannape et al., 2010), i.e., the feeling to have control over the own body. In assistive robotics, it is important to account for changes in each user's body schema to foster their sense of agency. Meanwhile, endowing humanoids with a body schema is promising for control reasons, e.g., keeping safe distances or reaching for targets (Roncone et al., 2015, 2016). As a psychological concept, the body schema can be understood as an adaptable (Somogyi et al., 2018), subconscious representation of the body's characteristics (Gallagher and Cole, 1995; Mayer et al., 2008), e.g., its kinematics and dynamics, which makes it promising for hand/tool-eye coordination in humanoid robots (Ulbrich et al., 2009). Psychological studies suggest that the representations of the human body itself and the representation of the environment in reach, i.e., the peripersonal space, are closely linked (Serino et al., 2007; Cléry and Ben Hamed, 2018). This appears to enable a flexible discrimination between the self and the environment including adaptation when using tools (Holmes and Spence, 2004; Hoffmann et al., 2010), a capability that is rather underdeveloped in contemporary humanoid robots (Hoffmann et al., 2010). Therefore, cognitive models that go beyond models which described the kinematic structure or dynamic properties of a robot as reviewed in Nguyen-Tuong and Peters (2011), seem to be required.

### 2. COGNITIVE MODELS

Among the existing cognitive models, we assume Bayesian and connectionist approaches to be most suitable for achieving human-like body representations in robots. In this section, we detail how we arrive at this assumption by considering conceptual foundations and empirical applications of the modeling approaches. An interesting example for their application are bodily illusion experiments, where the distance between the perceived position of the real limb and its indicated position, i.e., the proprioceptive drift, is understood as an objective, but also debated, measure of embodiment (Giummarra et al., 2008; Pazzaglia and Molinari, 2016). The assumption that participants could fuse multisensory information in a Bayesian process (Berniker and Körding, 2011) motivated the development of computational models that aim to estimate the proprioceptive drift from empirical input data (Samad et al., 2015). Accordingly, these Bayesian cognitive models compute estimations of the proprioceptive drift (Samad et al., 2015) and thereby propose quantitative approximations to the generative process of human sensorimotor integration. However, these models exhibit limited estimation accuracy and are constrained to offline application to the experimental population as a whole (Samad et al., 2015).

Marr (1982) defines three general levels of analysis for cognitive models: the computational, algorithmic, and implementational levels. The aforementioned research describing Bayesian cognitive models of multisensory information (Berniker and Körding, 2011; Samad et al., 2015) tends to define these inferential problems on the computational level. Here, modelers define the logic and structure of a computational problem. Yet, cognitive models of human body experience might also benefit from extension to deeper modeling levels (Griffiths et al., 2012), e.g., the algorithmic level, defining the processes and representations involved in solving the computational problem. Combined model specifications on the computational and algorithmic level can foster the prediction and explanation of seemingly error-prone or paradoxical behavior, as observed in research on causal reasoning (Tenenbaum et al., 2007) or decision making (Srivastava and Vul, 2015).

As a separate school of thought, connectionism commonly employs artificial neural networks to represent information in patterns of activation. While artificial neural networks do not need to be implemented in a neurally plausibile way by human standards, connectionism is historically inspired by the idea of creating "brain-like" systems (Thomas and McClelland, 2008). This aspect ties connectionist models to the implementational level of analysis (Marr, 1982), which concerns the physical realization of a model's computation in biological or technological hardware. Similarly to Bayesian approaches, multisensory integration can be approached in a connectionist fashion (Quinlan, 2003; Zhong, 2015). In fact, interpreting the weights of an artificial neural network as conditional probability relations creates a strong similarity between connectionist and Bayesian models of cognition (Thomas and McClelland, 2008). If a connectionist implementation mimics the close-to-optimal sensorimotor integration that humans seem to perform (Körding and Wolpert, 2006), its prediction of body experience should thus be alike Bayesian estimations.

While there are other schools of cognitive modeling (Sun, 2008), we focus on Bayesian approaches due to their relation to human sensorimotor behavior (Körding and Wolpert, 2006; Franklin and Wolpert, 2011) and connectionism because of its relation to developmental psychology (Shultz and Sirois, 2008) and developmental robotics (Lungarella et al., 2003). Being conceptually similar, both approaches can either be used to investigate the generative process behind human sensorimotor behavior or to control sensorimotor capacities in artificial systems. Yet, connectionism appears to be employed mostly without a direct relation to human performance (Katic and ´ Vukobratovic, 2003; Mett ´ a et al., 2010, 2017; Pasquale et al., 2015; Lakomkin et al., 2018), although some examples draw commendable design references from human neurobiology (Morse et al., 2010).

### 3. APPLICATIONS IN ROBOTICS

We expect that cognitive models of human body experience will improve the capabilities of robotic systems and discuss potentials and challenges of their implementation and utilization. Specifically, assistive robotic devices and humanoid robots are taken as examples that highlight the possibilities and their prospective effects.

Hoffmann et al. state that robots, which could include humanoids and assistive devices, need two things to perform a goal-directed action: a certain knowledge about their physical self and the mapping between their sensory and motor modalities (Hoffmann et al., 2010). In their review, they distinguish different kinds of kinematic body representations that are either fixed, self-calibrate to geometry changes, or are generated automatically, while only specific body representation models comprise dynamics (Hoffmann et al., 2010). In contrast to these explicit models, they describe implicit ones that represent the sensorimotor mappings, self-recognition, and temporal effects (Hoffmann et al., 2010). A more recent review by Schillaci et al. (2016) describes how explorative behaviors could drive motor and cognitive developments. Schillaci et al. describe such behaviors as a very ingenious method to acquire and maintain internal body representations in artificial agents, e.g., through MOdular Selection And Identification for Control (MOSAIC) models (Haruno et al., 2001).

### 3.1. Assistive Devices

Achieving a seamless integration of assistive robotic devices in supporting users' movements requires a better understanding of both human body schema integration and knowledge representation about the users' motor capabilities. A crucial point is to avoid excessive device activity, which might hinder body schema integration due to being perceived as external activity. By establishing the underlying processes of multisensory integration as elements of cognitive models, we propose that effects of robotic assistance can be predicted in multiple movement scenarios. These predictions can be used to adjust sensory feedback to the user by comparing estimated and required forces and torques to solve motor tasks over time. In case of a mismatch between actual and desired value, the need for changing motor behavior might be communicated to the user through (modulated) sensory feedback, which could also be used to foster co-adaptation of user and device (Beckerle et al., 2017, 2018).

Hence, such models could facilitate user- and applicationspecific assistance to assist-as-needed by the individual and in different situations. We argue that online models of required users' motor activities could help to complement and adjust assistance, easing both habituating to and weaning from it.

While assistance-as-needed might also be implemented through inverse dynamics models, cognitive models could help to tune factors that modulate the user's body experience. Humanin-the-loop experiments, e.g., robot-aided bodily illusions, could help to reveal those factors and how they influence embodiment (Beckerle et al., 2018). With this knowledge, not only force/torque or motion control, but also human-machine interfaces could be optimized with respect to embodiment of the assistive device, e.g., providing appropriate tactile feedback to shape the representation of the artificial limb (Giummarra et al., 2008; Beckerle et al., 2017). Through in-depth knowledge of the human cognitive body representation and a corresponding model-based control of the assistive device, co-adaptation might be systemized to achieve a congruent representation. Additionally, improper operation of the device by the user might be anticipated automatically and compensated for by means of control. While representing a great potential, the vision of assistive devices that understand their user's body experience and adapt to it—individually and online—also outlines the requirement for radical improvements of contemporary models.

### 3.2. Humanoid Robots

While assistive devices should interact seamlessly with their users, humanoid robots are intended to autonomously behave in a human-like manner. We expect that endowing humanoid robots with their own body schema and peripersonal space could tackle various recent issues. For instance, humanoids that have an understanding of their physical properties and environment could adapt their behavior to humans and the environment during physical, cognitive, and social interaction. Consider the example of standing in a crowded elevator: humans would adapt their relative positions, i.e., keep certain distances to others, while contemporary humanoid robots might not. The relation between knowledge about one's own body, obstacle avoidance, and social norms in interacting with humans highlights the potential of providing humanoid robots with a sense of their body and its environment.

While humanoid robots might be expected to produce human-like behavior regardless of the behavior generation process, this process itself might be required to be humanlike. Developmental robotics research draws its appeal at the edges of engineering, developmental psychology, and cognitive science by potentially improving the capabilities and autonomy of robots. Moreover, it promises to simultaneously reveal how developmental models may perform when implemented in a robotic body (Lungarella et al., 2003; Asada et al., 2009). Recent research enables humanoid robots to develop several forms of body representation (Martinez-Cantin et al., 2009; Lara et al., 2016; Hoffmann et al., 2018) or learn movement generation (Metta et al., 2017). While achieving flexible, autonomous behaviors, most contemporary studies do communicate about

the human-likeness of the behavior generation, but lack a formal evaluation method comparing it to human behavior.

Although these methods may be sufficient to improve autonomous behavior, we suspect differences between the robotic and human behavior generation processes. Specifically, these differences may show when observed human performance exhibits a variability that is not strictly required by the kinematic or dynamic properties of the task at hand. We hypothesize that complementing established kinematics and dynamics models through psychologically motivated cognitive models will help to approach a human-like behavior generation process and improve the design of behaviors and interactions in robots. While we believe that both Bayesian and connectionist modeling approaches could be employed for this, a comparison to actual human behavior is mandatory for evaluation. An appropriate example might be the sensorimotor task presented in Körding and Wolpert (2004): participants were asked to point at a target in virtual reality while their cursor underwent a lateral shift relative to the actual location their finger controlling it. In this human experiment, Krding and Wolpert conclude that participants internally represented the statistical properties of the task manipulation in consistency with Bayesian inference. Exposing a humanoid robot to a comparable task, three stages might finally lead to human-like performance. Firstly, precise sensors could measure the lateral shift to enable the robot to execute a corrected trajectory. Secondly, a more human-like behavioral variability might be reached by artificially restricting the corrected trajectory through an arbitrary error term. Finally, we postulate that control adaptation through cognitive models could intrinsically yield fully human-like behavior generation and might result in similar observations as those found by Körding and Wolpert (2004). **Figure 1** sketches how this might be implemented for the example of multisensory integration during sensorimotor manipulation, which applies to assistive devices similarly.

Pioneering work shows how the iCub robot can learn a peripersonal space model from data acquired via a wholebody artificial skin and physical contact with the environment (Roncone et al., 2015, 2016). While this approach is still rather engineered and does not try to approximate human behavior generation, it achieves sampling rates that enable online combination with control and is capable to predict contacts between the whole body of the robot and its environment. This information is used to design a controller that can either implement a safety margin around the body of the robot or support reaching objects in the robot's vicinity (Roncone et al., 2015, 2016).

### 4. CONCLUSION

Current developments of cognitive body models, Bayesian as well as connectionist ones, have the potential to push assistive robotic devices by making them understand their users' body experience and humanoid robots by endowing them with own

reached. In an iterative process, human cognitive function might be researched fundamentally and, in turn, models could be advanced through behavioral evaluation based on human data.

body knowledge. Assistive devices might utilize this knowledge by adaptive control improving their integration into their users' body schemes, i.e., devices could foster their embodiment themselves. Further, we postulate that such models might give humanoid robots a feeling for their own body and its surrounding that can be qualitatively comparable to human body perception, should the situation demand it. In both cases, we deem machine learning to be very helpful: assistive devices might learn how to improve their embodiment user-specifically, while humanoid robots could not only model their environment, but also improve their motions based on extensive body knowledge.

Future research should therefore improve models with respect to accuracy, specifications for individual users, and online capabilities. Therefore, experiments to determine modulating factors as well as prior knowledge about sensory precision should be improved, e.g., by human-in-the-loop approaches. A next step might be an integration of cognitive models with higherlevel self-perception architectures as proposed by Lanillos et al. (2017), Asada et al. (2009), and Morse et al. (2010) and their application for purposes of control (Roncone et al., 2015, 2016) or hand/tool-eye coordination (Ulbrich et al., 2009). Therefore, the

### REFERENCES


discussed cognitive models might be combined with established kinematic or dynamic models, which could be driven by model learning of an integrated body representation (Haruno et al., 2001; Nguyen-Tuong and Peters, 2011; Schillaci et al., 2016). Thereby, humanoids and assistive devices might be provided with more human-like behavior and improved capabilities to interact with human partners.

### AUTHOR CONTRIBUTIONS

TS and PB coordinated the development of the paper and the integration of individual contributions. BM and JP contributed content, opinions, and references. All authors revised the manuscript. All work for BM was done at TU Darmstadt before she began working at Amazon.

### FUNDING

This work was supported by the DFG project Users' Body Experience and Human-Machine Interfaces in (Assistive) Robotics (no. BE 5729/3&11).


reaching with whole body surface," in IEEE/RSJ International Conference on Intelligent Robots and Systems (Hamburg), 3366–3373.


**Conflict of Interest Statement:** BM is employed by Amazon, Tübingen, Germany.

The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2019 Schürmann, Mohler, Peters and Beckerle. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Social Antecedents to the Development of Interoception: Attachment Related Processes Are Associated With Interoception

#### Kristina Oldroyd\*, Monisha Pasupathi and Cecilia Wainryb

Social Development Laboratory, Department of Psychology, The University of Utah, Salt Lake City, UT, United States

#### Edited by:

Eszter Somogyi, University of Portsmouth, United Kingdom

#### Reviewed by:

Karen Lisa Bales, University of California, Davis, United States Lane Beckes, Bradley University, United States

> \*Correspondence: Kristina Oldroyd kris.oldroyd@psych.utah.edu

#### Specialty section:

This article was submitted to Cognition, a section of the journal Frontiers in Psychology

Received: 11 July 2018 Accepted: 14 March 2019 Published: 24 April 2019

#### Citation:

Oldroyd K, Pasupathi M and Wainryb C (2019) Social Antecedents to the Development of Interoception: Attachment Related Processes Are Associated With Interoception. Front. Psychol. 10:712. doi: 10.3389/fpsyg.2019.00712 Current empirical work suggests that early social experiences could have a substantial impact on the areas of the brain responsible for representation of the body. In this context, one aspect of functioning that may be particularly susceptible to social experiences is interoception. Interoceptive functioning has been linked to several areas of the brain which show protracted post-natal development, thus leaving a substantial window of opportunity for environmental input to impact the development of the interoceptive network. In this paper we report findings from two existing datasets showing significant relationships between attachment related processes and interoception. In the first study, looking at a sample of healthy young adults (n = 132, 66 males), we assessed self-reported interoceptive awareness as assessed with the Multidimensional Assessment of Interoceptive Awareness (Mehling et al., 2012) and attachment style as assessed with the Experiences in Close Relationships Scale-Short (Wei et al., 2007). We found relationships between aspects of interoception and attachment style such that avoidant individuals reported lower interoceptive functioning across several dimensions [r's(130) = −0.20 to −0.26, p's < 0.05]. More anxious individuals, on the other hand, reported heightened interoceptive across several dimensions [r's(130) = 0.18 to 0.43, p's < 0.05]. In the second study, we examined the congruence between a youth's self-reported negative emotion and a measure of sympathetic nervous system arousal (SCL). The congruence score was positively associated with parental rejection of negative emotion. These results suggest that parenting style, as reported by the mother, are associated with a youth's ability to coordinate their self-reported emotional and physiological responding across a series of independent assessments, r(108) = −0.24, p < 0.05. In other words, the more maternal reported parental rejection of youth negative emotions, the less congruent a youth's self and physiological reports of distress.

Keywords: interoception, attachment, development, body awareness, interoceptive accuracy

## INTRODUCTION

fpsyg-10-00712 April 20, 2019 Time: 18:52 # 2

### The Development of Interoceptive Functioning

Interoception refers to an individual's ability to detect and track internal bodily cues (Garfinkel et al., 2015) and has been demonstrated to have important implications for psychological and physical health (Craig, 2004; Pollatos and Schandry, 2008; Paulus et al., 2009; Füstös et al., 2012; Herbert and Pollatos, 2014; Stern, 2014). While the literature is steadily revealing some of the biological underpinnings of interoception (Craig, 2004; Li et al., 2016), the social antecedents to interoception have been largely ignored. In this paper we propose that the development of interoception may be influenced by attachment related processes.

### A Brief Overview of Attachment Theory and the Importance of Attachment Related Processes

Bowlby (1982) posited that human infants develop attachment bonds with their caregivers. These bonds are characterized by specific patterns of cognition and behavior in children that influence a range of functioning from emotion regulation to how they experience their close relationships (Fraley and Roisman, 2018). When a child feels loved, secure and confident in their relationship with their caregiver, they will use the caregiver as a "secure base" from which to explore their environment (Ainsworth et al., 2015). These children can manage anxiety with some degree of trust and are able to use others to help resolve unpredictable, threatening, or novel life events (Bowlby, 1973; Bretherton, 1985; Ainsworth et al., 2015). An infant is most likely to develop a secure attachment when a parent consistently provides sensitive and attentive caregiving (Bowlby, 1973; Bretherton, 1985; Ainsworth et al., 2015).

When a parent avoids responding to the child's immediate needs, makes them wait for relief and comfort, or responds frighteningly or inconsistently to their needs, children may develop an avoidant attachment (Ainsworth et al., 2015). Individuals with an avoidant attachment feel the need to be self-reliant, and emotionally strong, as others are perceived as only conditionally available (Hazan and Shaver, 1987). They tend to be rather isolated and place tremendous value on being independent. Avoidant individuals become enraged or highly anxious when forced to rely on others for help (Main et al., 1985; Dozier and Kobak, 1992).

An anxious attachment style often develops from the belief that the parent is available, but only conditionally, and that the parent is likely to withdraw that comfort and support if the child no longer meets certain standards – such as being well behaved or co-operative (Ainsworth et al., 2015). Individuals with an anxious attachment style tend to be overly clingy and become excessively upset when separated from their mothers or significant others (Hazan and Shaver, 1987).

While Bowlby was primarily focused on understanding the nature of infant-caregiver relationships, a plethora of research demonstrates that attachment characterizes human experience "from the cradle to the grave" (Bowlby, 1979, p. 129) (see Fraley and Roisman, 2018 for a review). From this perspective, attachment style affects more than just interpersonal functioning in infancy; it has enduring implications throughout the lifespan on emotion regulation, parenting practices, and health-related behaviors (Feeney and Collins, 2001; Waters and Waters, 2006; Raby et al., 2017). The present paper expands on these findings and introduces the idea that attachment related processes may have implications for the development of interoceptive functioning as well.

### Attachment Related Processes Influence a Person's Physiological Response Patterns

While neuroanatomy provides the hardware necessary for interoception, the strength of the signal produced by the body is also important. In some individuals, there may be a stronger/weaker interoceptive signal available to detect. The strength of the interoceptive signal produced and the ease with which this signal is transmitted from sensory modalities to the interoceptive centers of the brain may depend upon HPA axis functioning. A large body of extant work links HPA axis functioning attachment related processes.

### Attachment and HPA Axis Functioning

Strong evidence exists that individual differences in attachment are characterized by differential HPA reactivity to stress (Allen and Miga, 2010; Diamond and Fagundes, 2010). In general, individuals with anxious and avoidant attachment styles exhibit dysregulated HPA axis activity in response to stress across the lifespan (Bush et al., 2011; Hill et al., 2011; Browne and Jenkins, 2012; Hackman et al., 2012; Lovallo, 2013; Palmer et al., 2013). Given that stress and interoceptive functioning utilize the same anatomical pathways to facilitate communication between the brain and the body, attachment related processes that affect the stress response system (the descending brainbody connection) could also affect the interoceptive system (the ascending brain-body connection) (Seth, 2013). We believe that dysregulation of the HPA axis could affect interoception in two ways: by affecting the strength of the interoceptive signal and/or by affecting the processing of the interoceptive signal.

One example of an interoceptive signal that changes with HPA axis functioning is the stroke volume of the heart, defined as the amount of blood pumped by the heart in one contraction (Schächinger et al., 2001). The activation of the HPA axis results in the release of several hormones, including epinephrine. Epinephrine causes increased contractibility of the heart muscle, increased heart rate and increased depolarization of the heart, all of which lead to an increase of stroke volume (SV). Increased stroke volume has been empirically associated with increased interoceptive accuracy (Schandry et al., 1993) such that the more blood the heart pumps per beat, the better people are at estimating the number of times their heart beats during a timed trial. Thus, increased SV is thought to function as a "stronger" interoceptive signal. A chronically increased sympathetic outflow has been suggested to be one variable

contributing to the establishment of high interoceptive accuracy (Paulus and Stein, 2010).

In addition to changing the strength of the signal, increased HPA activation may affect interoception by changing how the interoceptive signal is processed. Cortisol, the final product of HPA axis activation, has been shown to modulate interoceptive signal processing such that the brain becomes increasingly attuned to interoceptive signals in its presence (Rief et al., 1998). For example, a dose of 4 mg of intravenously administered cortisol has been demonstrated to increase performance on tests of Interoceptive Accuracy (IAcc) (Schulz et al., 2013). These data suggest that cortisol may lower the threshold for interoceptive signal processing within the brain (Schulz et al., 2013). This finding is supported by fMRI data indicating that the parts of the brain responsible for the attentional processing of interoceptive signals (e.g., the ACC and OFC) show greater activation in the presence of cortisol (Cameron, 2002; Critchley et al., 2004; Pollatos et al., 2007). Thus, when the HPA axis is activated it may alter how the brain deals with incoming bodily cues (Craig, 2002; Schulz and Vögele, 2015). If a person experiences chronic dysregulation of the HPA axis, this may permanently induce altered perception of bodily cues (Schulz and Vögele, 2015).

### The Interoceptive Network of the Brain Is Affected by Attachment Related Processes

The basic architecture of the brain is constructed through an ongoing process that begins before birth and continues into adulthood (e.g., Goddings and Giedd, 2014). Attachment related processes have been demonstrated to affect the quality of that architecture (Cozolino, 2014). From this perspective, normal brain development, including development of the interoceptive network, may be dependent upon a satisfactory attachment (Emde, 1988; Schore, 2000).

The interoceptive network is composed of three major brain regions: the anterior insula cortex (AIC), the anterior cingulate cortex (ACC) and the orbitofrontal cortex (OFC) (Craig, 2002, 2009; Kurth et al., 2010; Uddin, 2015). Each of these regions may be influenced by attachment related processes in ways that could be important for the development of interoception. For example, the AIC is the interoceptive center of our brain (Craig, 2004). Attachment related processes have been found to be correlated with insular anatomy such that children who are classified as having an anxious or an avoidant attachment style demonstrate markedly lower insular volume and smaller surface area than control groups (Kühn and Gallinat, 2013; Sheffield et al., 2013; Lim et al., 2014). Additionally, attachment related processes have been shown to be related to electrical activation in the insula such that people with an avoidant attachment style showed decreased insular activation in response to stimuli than do securely attached individuals (DeWall et al., 2011).

### The Development of a Bodily Self and Interoception

Lastly, we believe that attachment related processes may influence the development of interoception by influencing the development of the bodily self (Gallagher, 2000; Tsakiris, 2017). Because young infants have limited resources, they cannot use action to collect evidence about the causes of their own interoceptive experiences. Instead, infants rely on caregivers' reactions to their behaviors to inform conceptualizations of interoceptive states. For example, when a young infant becomes fussy they may not understand the source of their own discomfort. It is only when a caregiver provides the child with a nipple and the act of eating begins to alleviate the infant's discomfort, does the infant start to learn about the feeling of hunger. Insensitive caregiving, characterized by slow or intermittent responsiveness to the infant's needs and rejection of infant distress may impair the child's ability to form accurate representations of bodily sensations.

### Summary

To summarize, thus far we have outlined the idea that links may exist between attachment related processes and the development of interoception. Attachment may influence the development of interoception by modifying functioning of the HPA axis, by affecting the growth of neural architecture, and by influencing development of the bodily self. In Study 1 we examine whether attachment style is correlated with an individual's interoceptive functioning in a sample of young adults. In Study 2 we consider how parenting style may be related to a youth's coordination of physiological and self-reported aspects of emotional distress.

### STUDY 1: INTRODUCTION

Attachment related processes may lay the foundation for the development of interoception. Just as different attachment styles are associated with distinct behavioral responses to interpersonal cues, that attachments styles will also be associated with distinct behavioral responses to bodily cues. Thus, people with an anxious attachment who exaggerate the seriousness of relationship threats, over-emphasize their sense of helplessness and vulnerability in relation to their partners (Mikulincer and Shaver, 2012), and overly attend to internal indicators of emotional distress (Cassidy and Kobak, 1988) may respond similarly to bodily cues. This would include paying hypervigilant attention to the bodily sensations. By contrast, those high in attachment avoidance, who tend to minimize experiences of negative affect and to direct attention away from threat cues in interpersonal situations (see Diamond and Fagundes, 2010 for a review), may divert attention from bodily cues, suppress emotion-related action tendencies, or inhibit and mask bodily cues.

Traditional methods of assessing interoception focus solely on a person's ability to accurately detect bodily cues such as the heartbeat (e.g., Whitehead et al., 1977; Schandry, 1981). However, we believe the concept of interoception to be a more nuanced concept than the simple ability to track heartbeats. The Multidimensional Assessment of Interoceptive Awareness Scale (MAIA) was developed in order to allow researchers to assess a person's attitudes and beliefs about their bodily cues and to parse beneficial from maladaptive functions of interoception

(Mehling et al., 2012). The MAIA consists of eight subscales: notworrying, emotional awareness, attention regulation, trusting, body listening, noticing, not-distracting, and self-regulation. Preliminary research suggests that each of these subscales are associated with distinct neural patterns of activation in the interoceptive network (Stern et al., 2017). These patterns of neural activation may underlie discriminate behavioral responses to bodily cues: hyperarousal for individuals with an anxious attachment and hypoarousal for individuals with an avoidant attachment. In study 1 we examine correlations between self-reported attachment style and self-reported interoceptive functioning. We predict that self-reported attachment style will be associated with the following patterns of scores:

> H1a: Avoidant individuals will manifest lower scores of the subscales of noticing, not distracting, not worrying, emotional awareness, body-listening and trusting.

> H1b: Avoidant individuals will score higher on the attention regulation scale.

> H1c: Anxious individuals will have higher scores on noticing, emotional awareness, body listening.

> H1d: Anxious individuals will show lower scores on the not-distracting, not-worrying, self-regulation and attention regulation subscales.

### STUDY 1 METHODS

### Participants

This study made use of existing data from previously conducted research (Oldroyd et al., unpublished) designed to examine the link between embodied narration and caloric consumption. Participants were 135 students (68 male) drawn from the participant pool at a large Rocky Mountain university. The average age was 23.5 years (SD = 5.9). The majority of participants identified as white (77%) with others identifying as Asian (17%), Pacific Islander (0.05%), Black (0.02%) and Latino (0.02%). 5% of participants chose to not report.

### Procedure

Upon arriving at the lab, participants provided written informed consent. Next, participants played a video game on an iPad, wrote narratives about their experience playing the video game, and then completed a 20-min questionnaire session during which time they had ad libitum access to snack foods. The order of the questionnaires was randomized, with the exception of the demographics questionnaire and a questionnaire that asked participants about the snack foods that they had been offered. These were presented last, so as to not influence participants' eating behaviors. Upon completion of the questionnaires, participants were debriefed and dismissed.

### Measures

Twelve questionnaires were administered: Five of these questionnaires were theoretically important to the questions being asked in the original study examining the effect of embodied narration on caloric consumption (Oldroyd et al., unpublished). They were the MAIA, ECR-S, NASA-TLX, Emotional Responding, and Barrett Impulsivity Scale. Two of these questionnaires were of theoretical importance to the questions presented in this paper and will be discussed in detail below. They are the Multidimensional Assessment of Interoceptive Awareness (Mehling et al., 2012) and the Experiences in Close Relationships—Short (Wei et al., 2007). The other questionnaires were time fillers that were given in order to extend the period of time that our participants spent in the lab and had snack foods available to them. All of the questionnaires listed were scored and an ANOVA run on each one to make sure that scores did not differ by assigned narrative condition. Results for the two key questionnaires are as follows: MAIA: F(3,129) = 11.96, p = 0.41 and ECRS: F(3,131) = 42.20, p = 0.94.

### Measures Used in This Study **Multidimensional assessment of interoceptive awareness (MAIA)**

The MAIA is a multifaceted body awareness questionnaire that is designed to measure interoceptive awareness. The MAIA is composed of 32 items on a 6-points Likert scale, with ordinal responses coded from 0 ("never") to 5 ("always"). This multidimensional instrument results in eight subscales: (1) Noticing, the awareness of one's body sensations (4 items, Cronbach's α = 0.64); (2) Not-distracting, the tendency not to ignore or distract oneself from sensations of pain or discomfort (3 items, Cronbach's α = 0.64); (3) Not-worrying, the tendency not to experience emotional distress or worry with sensations of pain or discomfort (3 items, Cronbach's α = 0.69); (4) Attention regulation, the ability to sustain and control attention to body sensation (7 items, Cronbach's α = 0.73); (5) Emotional awareness, the awareness of the connection between body sensations and emotional states (5 items, Cronbach's α = 0.73); (6) Self-regulation, the ability to regulate psychological distress by attention to body sensations (4 items, Cronbach's α = 0.72); (7) Body listening, the tendency to actively listen to the body for insight (3 items, Cronbach's α = 0.65); and (8) Trusting: the experience of one's body as safe and trustworthy (3 items, Cronbach's α = 0.74). The score for each scale is calculated by averaging the scores of its individual items, and thus can vary in the 1–5 range.

### **Experiences in close relationships-short (ECR-S)**

Attachment anxiety and avoidance were assessed using the 12-item short version of the Experience in Close Relationships (ECR-S; Wei et al., 2007). The ECR-S has two 6-item subscales evaluating people's attachment anxiety (e.g., "I worry that romantic partners won't care about me as much as I care about them") and avoidance (e.g., "I try to avoid getting too close to my partner"). Each of the 12 items was scored on a 7-point scale ranging from 1 (disagree strongly) to 7 (agree strongly). Low scores on both anxiety and avoidance represent attachment security. Cronbach's α = 0.72 for the Anxiety Scale and Cronbach's α = 0.74 for the Avoidance Scale.

#### Measures Collected but Not Used in This Study

The following questionnaires were administered during the course of the original study. They were not of theoretical interest to either the original study or to his one. They were instead used as time fillers to extend the amount of time that participants spent in the lab and with the proffered snack foods. These questionnaires are: Berkeley Expressivity Questionnaire (Gross, 2013), Basic Needs Scale (Johnston and Finney, 2010), Flourishing Scale, (Diener et al., 2010), Ryff Scale of Well Being (Ryff, 1989), Big Five Personality John and Srivastava, 1999, demographic questionnaire, and snack questionnaire.

### STUDY 1 RESULTS

Correlations between the subscales of the MAIA are reported in **Table 1**. Descriptive statistics of the subscales of the MAIA are reported in **Table 2**. Tests of the a priori hypotheses were conducted using Bonferroni adjusted alpha levels. Correlations between the attachment style and self reported interoception indicated that individuals that score high in attachment anxiety also tend to score high on the noticing scales, r(133) = 0.18, p < 0.05, and on the emotional awareness scale, r(133) = 0.18, p < 0.05. See **Table 3**. Individuals who score higher in attachment anxiety also manifest a negative correlation with the 'not-worrying' scale, r(133) = −0.43, p < 0.001, indicating that the more anxious a person's attachment style, the more they notice and worry about their bodily cues.

Individuals who scored high in attachment avoidance scored lower on the scale of attention, r(133), −0.20, p < 0.05 and trust, r(133) = −0.26, p < 0.001. This means that the more avoidant a person's attachment style, the less attention they paid to their bodily cues and the less they tended to trust those cues. Scatterplots of the data are presented in **Figure 1**.

### STUDY 1 DISCUSSION

These findings offer support for the idea that attachment related processes and interoceptive functioning are correlated



and suggest that people's responsivity to bodily cues may mirror their responsivity to interpersonal cues. For example, individuals with an anxious attachment style who often demonstrate a hyper reactivity to social/relationship stimuli, are vigilant in detecting potential interpersonal threats, persistently signal their distress, and seek excessive reassurance/support in social situations (Noriuchi et al., 2008; Vrticka and Vuilleumier, 2012), may repeat this pattern of behavior with regards to bodily cues. This would explain the positive correlation between an anxious attachment and the noticing subscale. For these individuals, hypervigilance may manifest as excessively attentive monitoring of bodily sensations for threat. Attentional vigilance for bodily symptoms results in a greater chance of detecting potential sources of threat, exacerbating pain, deterioration in physical health, and social isolation (Salkovskis and Kobori, 2015). These individuals may misinterpret normal body symptoms as an indicator of a serious or threatening health problems. This would account for the negative correlation between anxious attachment and the 'notworrying' subscale.

Avoidant individuals have often been described in the psychophysiology literature as manifesting a disconnect between their bodily cues and their physiological responses (e.g., Diamond et al., 2006). For example, the person with an avoidant attachment disorder may present as if they are very calm while in a distressing situation, when in fact psychophysiological measures show an elevated



<sup>∗</sup>p < 0.05, ∗∗p < 0.01.

#### TABLE 3 | Pearson correlation matrix among subscales of MAIA and attachment style.


<sup>∗</sup>p < 0.05, ∗∗p < 0.01.

heart rate and cortisol levels (Spangler and Grossmann, 1993; Diamond and Fagundes, 2010). Over time, avoidant individuals learn to suppress their behavioral responses and to overregulate their affect, resulting in the appearance that they are unaffected by stressful situations. Given this, the negative correlation between body attention and avoidant attachment style makes sense. In the face of potentially negative bodily cues, avoidant individuals may minimize, dismiss and suppress them (as they do with problematic social cues) (Fraley and Shaver, 1997; Allen et al., 1998). This is in line with the notion that avoidant individuals are 'preemptive' in the avoidance of stress such that they disengage their attention from potentially distressing experiences before negative affect has been encoded and experienced (Fraley et al., 2000).

In addition to displaying decreased negative affect, avoidant individuals also demonstrate lower levels of trust: trust in close, personal relationships (Mikulincer, 2004) and trust in themselves (Cassidy, 2001). The negative correlation between

body trusting and avoidant attachment follows this pattern. Thus, a person who has not developed trust in a loving caretaker and learned to trust their own decisions about who is safe and who is not, may not expect their body to give them reliable and important signals that warrant their attention.

Although these data provide support for our proposition regarding the relation between attachment related processes and interoception, Study 1 was limited in that both variables of interest (e.g., attachment and interoception) were obtained via self-report questionnaires in a collegiate sample. A second set of available data in our laboratory allowed us to examine the extent to which individuals were coherent across physiological and psychological responding – a different take on interoception – and their mothers' general responses to their emotional distress.

### INTRODUCTION TO STUDY 2

Autonomic nervous system (ANS) activity reflects an awareness of and a responsiveness to the environment and supports behavioral and emotional regulation (Beauchaine, 2001; Porges, 2007). One parameter of ANS reactivity, skin conductance level (SCL), is considered to be an objective marker of the sympathetic branch of the ANS (Beauchaine, 2001). Thus, greater SCL responding is indicative of greater emotional arousal. Recent work suggests that positive correlations between subjective and objective reports of arousal are associated with greater activity in the interoceptive network of the brain such that, the more congruence between a person's selfreported emotional arousal and SCL activity, the better his interoception functioning may be (Kleckner et al., 2017). In Study 2 we examine the congruence between youth's selfreported emotional arousal and SCL arousal and consider how attachment related processes may be related to a child's congruence scores.

Caregiving is an example of an attachment related process, and empirical work shows that the attachment and caregiving systems are often activated simultaneously (Doinita and Maria, 2015). Thus, the type of care that a child receives can be predictive of the type of attachment that they will form (George and Solomon, 1999). Caregiving that is responsive and accepting is positively associated with a secure attachment style and has also been indicated as a precursor to the development of a core bodily self (Fotopoulou and Tsakiris, 2017). Thus, an infant that is "affectively attuned" with their caregiver (Stern, 1985) may develop stronger interoceptive abilities. From this perspective, the development of interoception is a generative model wherein a caregiver's actions combine with an infant's perceptions of bodily cues and the origins of core subjective feelings such as hunger and satiation, cold and warmth are social not biological (Fotopoulou and Tsakiris, 2017).

One hallmark of sensitive parenting is the parental acceptance of negative emotion. By contrast, insensitive parenting can include the rejection of negative emotion. Parental rejection of negative emotion occurs when caregivers reject, ignore, or fail to respond to a child's signs of distress.

In Study 2 we examine the congruence between youth's selfreports of negative emotional responding to an angry memory and physiological reports of negative emotional arousal as a function of mother's caregiving style. According to attachment theory, children with mothers high in acceptance will feel more comfortable acknowledging their own distress and thus have greater congruence between their subjective reports and physiological manifestation of arousal. In contrast, admitting distress may be difficult for children with an avoidant attachment, who often adopt a deactivating, minimizing strategy toward negative cues. Given that dismissing children may have an interest in downplaying their distress, we would expect that they would report less emotional responding than their physiological responding would suggest. This would result in lower congruence scores. Thus, we hypothesized that youth with a mother who is more accepting of negative emotions will manifest a greater congruence score between self-reported distress and physiological measures of distress. Further, we present the idea that parental rejection of children's negative emotions may also affect children's development of interoception.

> H2a: Mothers' scores on a measure of parental rejection of negative emotion will be related to lower congruence scores between 0 and −1 in their children.

> H2b: Mothers' scores on a measure of parental acceptance of negative emotion will be related to higher congruence scores between 0 and 1 in their children.

### STUDY 2 METHODS

### Participants

Participants in this study included 108 youth and their mothers. Youth (53 male) were drawn from a community of a mediumsized Rocky Mountain city. Participants ranged in age from 8 to 17 years old, evenly distributed across the age continuum. The majority identified as white (88%) with other participants identifying as Asian (1.5%), Pacific Islander (2.2%), Black (3.6%), Native American (2.9%), and Latino (5.8%). Mothers (N = 108) ages ranged from 27 to 61 years (M = 42.54, SD = 6.36). This was an educated sample with all of our mothers reporting that they had graduated high school, 27% completed some college, 45% having a college degree, and 18% having a post-graduate degree. The majority of participants identified as white (91%) with others identifying as Asian (1.5%), Pacific Islander (1.5%), Black (1.5%), Native American (2.9%), and Latino (4.4%).

Participants in this study were a subset of participants used in a larger NIH funded study on narration and emotion regulation (Wainryb et al., 2018). Participants were included in this study if they had been randomly assigned during the first study to the narrate condition. Limiting our study to those in the narrate condition standardized the experimental protocol, allowing us to examine the relevant questions for this study without the confounding variable of experimental condition. The original

paper did not report on physiological data or on mothers' questionnaire data.

### Procedure

Youths and their mothers arrived together and following assent/consent procedures wherein both participants completed written, informed consent, they were separated for the remainder of the study. Youth were taken into a private room and were hooked up to physiological recording equipment. Once the equipment was in place youth completed several baseline tasks including a 3-min vanilla baseline, an easy task designed to relax and orient the participant to the lab environment while giving the equipment time to calibrate, a 2-min talking baseline which allowed us to get baseline physiological readings while the participant was conversing with a research assistant, and a 4-min paced respiration task designed to obtain baseline respiratory sinus arrhythmia (RSA) (Diamond and Otter-Henderson, 2007). Next, youth were asked to, "Think of time when someone said or did something and you ended up feeling really angry at that person." Once a memory had been retrieved, youth were told to spend 3-min thinking about that memory (exposure). After a 1 min rest period, participants narrated their story to a trained research assistant (regulate). Following another 1 min rest period, participants were asked to think again about the nominated memory (re-exposure). Following each task, participants filled out a questionnaire to assess self-reported emotional responding. After re-exposure, the physiological equipment was removed. Participants completed a manipulation check, provided a title for their angry memory, and reported how long ago the memory occurred. Participants were then compensated for their time and excused. For full details on the experimental protocol see the original publication (Wainryb et al., 2018).

Following assent and consent procedures, mothers were placed in a room by themselves and asked to complete a computerized survey consisting of 11 questionnaires. Of interest to this study was the Emotion Related Parenting Questionnaire-Short (Gottman and DeClaire, 1997), which is described in detail below. Other questionnaires administered but not part of this study are listed in the Measures section below.

### Measures

#### Youth

### **Self-reported emotional responding when recalling and narrating about past events**

Following each of the five tasks in the study (Vanilla baseline, talking baseline, exposure, regulation, re-exposure) participants reported the extent to which they felt angry, scared, ashamed, sad and guilty on a scale of 1 (not at all) to 7 (extremely). In recent physiological research, the vanilla baseline technique has replaced the resting baseline period with a simple, minimally demanding task to maintain consistent alertness and baseline stability (Jennings et al., 1992). Scores for all emotions were summed and divided by 5 to compute a "Negative Emotional Responding" variable for each task. Scores were standardized within person by finding the standard deviation of all five scores and then dividing the mean differences by standard deviation.

### **Skin conductance level (SCL)**

Skin conductance level (SCL) was measured and analyzed using Biopac MP 150 system. Skin conductance was recorded continuously throughout each session, with task on and offsets also recorded. Average SCL was computed as an index of sympathetic nervous system arousal for each task. Scores were standardized within person by finding the standard deviation of all five scores and then dividing the mean differences by the standard deviation.

### **Self-report and physiological congruence score**

For the youth participants, the primary measure of interest for this study was a congruence score between self-reported emotional responding and physiological measures of emotional responding. This congruence score was derived by computing the correlation between two measures: subjective emotional responding across the two baseline, exposure, regulation, and re-exposure epochs of the experiment and the physiological reports of sympathetic nervous system arousal during the same time periods. We correlated youth's self-reported negative emotional responding with the task-average SCL reading across the five tasks: vanilla baseline, talking baseline, exposure, regulation, and re-exposure. Higher correlations are indicative of more congruence between a youth's self-report and physiological responding. Thus, for a participant with a positive correspondence score, when their self-reported negative emotional responding increased, so did SCL. For a participant with a negative correspondence score, when SCL increased, self-reported emotional arousal decreased. Correlations in this sample ranged from r = −0.84 indicating that when participants' SCL increased, self-reported emotional responding decreased, to r = 1.0 indicating that when SCL increased, self-reported emotional responding also increased.

### Mothers

Mothers completed the Emotion Related Parenting Scale, Short (ERPS-S; Gottman and DeClaire, 1997), a 20-item questionnaire that results in four scale scores, each representing a different parenting style as described in meta-emotion theory (Gottman and DeClaire, 1997). Responses for each scale are summed and divided by the total number of items for that subscale. Each scale is comprised of three items. The resulting four scales are labeled (1) emotion coaching scale (Cronbach's alpha = 0.73), (2) feelings-of-uncertainty/ineffectiveness scale (Cronbach's alpha = 0.73), (3) parental rejection of negative emotion (Cronbach's alpha = 0.64), and (4) parental acceptance of negative emotion scale (Cronbach's alpha = 0.74). Measures obtained but not used in this study include: Children's Reports of Parental Behavior Inventory (CRPBI-30; Schuldermann and Schuldermann, 1970), The Berkeley Expressivity Questionnaire (Gross and John, 1997), The 10-item Emotion Regulation Questionnaire (Gullone and Taffe, 2012), BRIEF, Strategies of Anger Regulation in Adolescents (SAR-C) (von Salisch and Vogelgesang, 2005), and about themselves: Strategies of Anger Reduction (SAR-A) (von Salisch and Vogelgesang, 2005), SARI (Sadness and Anger Rumination Index, Peled, 2006), Buss Perry aggression questionnaire (Buss and Perry, 1992), Big Five Inventory (Goldberg D. T., 1993; Goldberg L. R., 1993); Test of Self-Conscious Affect, TOSCA (Tangney, 1991), Experiences in Close Relationships, ECR-S (Wei et al., 2007).

### STUDY 2 RESULTS

fpsyg-10-00712 April 20, 2019 Time: 18:52 # 9

A multiple regression was run wherein mother's scores on the parental acceptance of negative emotion and the parental rejection of negative emotion subscales of the Emotion Related Parenting Scale, Short (ERPS-S; Gottman and DeClaire, 1997) were entered as a predictor of a youth's congruence score. Youth's age and gender were also entered in the model. Correlations are reported in **Table 4**. R 2 for the overall model was 8.4% with an adjusted R <sup>2</sup> of 4.7%. The model was statistically significant, F(1,107) = 2.92, p = 0.05. Results indicate that a mother's score on the subscale titled, 'Rejection of Negative Emotion' could significantly predict a youth's congruence score, such that the more a mother endorses items indicating that she rejects her youth's negative emotions, the less her youth's selfreported emotional responding scores are congruent with their physiological measures of responding. The regression equation was: Congruence score = 0.33<sup>∗</sup> (−0.03) (Mother's score on parental rejection of emotion scale). Regression coefficients and standard errors are reported in **Table 5**.

### STUDY 2 DISCUSSION

Study 2 investigated whether attachment related processes, operationalized as parental acceptance or rejection of negative emotion, could predict congruence between youth's objective and subjective measures of emotional responding. This congruence score operates as a measure –albeit imperfect – of interoceptive functioning (see Kleckner et al., 2017). We found that rejection of negative emotion decreased congruence between a youth's objective and subjective measures of emotional responding.

The results of Study 2 support our hypotheses by demonstrating that the higher a mother's score on the Rejection of Negative Emotion Scale, the lower a youth's congruence score. This means that the less accepting that mom is of negative emotion, the lower the relation between their youth's emotional and physical responding. This pattern is reminiscent of the classic pattern of psychophysiological responding typical of



<sup>∗</sup>p < 0.05.

avoidantly attached individuals (Diamond, 2001; Gross and Thompson, 2007) wherein they minimize self-reports of distress while demonstrating higher than average levels of physiological distress (Dozier and Kobak, 1992; Roisman et al., 2007). This makes sense when interpreted from within the attachment literature. Children with a secure attachment should feel more comfortable acknowledging their distress. In this situation we would expect that self-reports and physiological reports to display a higher level of congruence. By contrast, a child with an avoidant attachment may minimize their distress and a child with an anxious attachment may maximize their distress, resulting in lower congruence scores.

Unexpectedly, mothers' acceptance of negative emotion did not predict higher congruence scores. The absence of a relation between these two suggests that parental acceptance of negative emotion may not be the best tool to assess sensitive caregiving. Further analyses showed that the parental acceptance of negative emotion scale was not significantly correlated with mothers' self-reported maternal warmth. By contrast, parental rejection of negative emotion was significantly correlated with maternal warmth. Specifically, the more rejecting of negative emotion a mother reported being, the less self-reported maternal warmth (r = −0.18, p = 0.04).

### GENERAL DISCUSSION

Although a great deal of research has transpired in the last decade advancing our understanding of interoception, the literature has not considered how interoception develops; and yet, there has been indirect evidence that supports the notion


<sup>∗</sup>p < 0.05, ∗∗p < 0.01.

that interoception, a very embodied phenomenon, has social origins. From this perspective, interoception develops initially in the context of interpersonal relationships. To the extent that caregivers recognize, honor, and respect their children's bodily experiences, the child will develop more accurate interoception. To the extent that a child's bodily experiences are denied, devalued, ignored, or punished by parents, the child will find ways to avoid feeling them, and develop a distorted sense of interoception.

In this paper we have demonstrated that interpersonal relationships (e.g., attachment styles) are associated with later interoceptive functioning such that when you have an attuned caregiver, you lay better groundwork for future interoception. In study 1 we show that attachment style is linked to interoception generally. We also begin to tease apart some of the more nuanced and interesting ways in which non-attuned caregiving can result in problematic interoception. These results suggest that different types of non-attuned caregiving may result in distinct patterns of interoceptive functioning later in life. While not addressed in this paper, this question will be an important one for future researchers to ask. In study 2 we show these same links, but with a more direct assessment of caregiving and a more direct assessment of interoception. Primarily, in Study 2, we examine the effects of parental rejection of negative emotion. Within the attachment literature this type of dismissive parenting is associated with an avoidant attachment style in children. Why would dismissive parenting be associated with lower interoceptive awareness?

Attachment theorists like Stern (1985) and Fonagy (2001) have argued that for the child to know their own mind, they need to see it reflected in a sensitive caregiver. Here, we contend that for the child to know their own body, they need to see it reflected in a sensitive caregiver also. For example, when a child who is learning to walk falls down and feels physical pain, a parent that acknowledges the child's discomfort with a statement along the lines of "Ouch! That must have hurt" is arguably promoting greater interoceptive awareness in their child than a parent who exclaims, "You're fine! That didn't hurt! Get back up!" The mirroring received by the child in the first instance should allow a child to become confident in their ability to detect bodily cues and comfortable with the acknowledgment and expression of them. This promoting of interoception arises from the parent noticing what the child is experiencing, drawing joint attention to the feeling, and labeling it –processes that can be examined in greater detail in future work considering social antecedents to interoception.

Finally, while this study was unable to examine the neurobiological links between attachment related processes and interoception directly, the extant findings in the literature provide ample evidence that early attachment related experiences, including trauma, shape the neural structure that underlies interoception including the anterior cingulate cortex (van der Werff et al., 2013; Teicher et al., 2014) and the orbitofrontal cortex (Schore, 2005). For example, Schore said in 2005, "The orbitofrontal cortex is the hierarchical apex of the limbic system and is identical to Bowlby's control system of attachment" (Schore, 2005, p. 216). Thus its functioning is correlated with early caregiving experiences. The orbitofrontal cortex is also the critical brain region for the subjective evaluation of bodily stimuli (Bechara et al., 2000; O'Doherty et al., 2001; Schoenbaum et al., 2003, 2006; Kringelbach, 2005; Rolls and Grabenhorst, 2008). Once bodily cues are felt and noticed, the OFC may be responsible for how an individual interprets them. The OFC also plays an inhibitory role in autonomic functioning, allowing it to be a central player in the process of affect regulation (Fuster, 2001; Schore, 2005). People with an underdeveloped OFC demonstrate greater distress in the face of novel or aversive stimuli (Schore, 2005) and more anxiety related hyperactivation of the interoceptive network. Thus, an underdeveloped OFC typically corresponds with an exaggeration of the importance of bodily cues and the tendency to attribute benign physical cues with deleterious implications.

In closing, we establish the idea that a link exists between attachment related processes and the development of interoception. Attachment related processes are thought to affect the development of interoception by influencing the growth of neural architecture and by modifying functioning of the HPA axis. Further, the idea that caretaking behaviors affect children's development of interoception is presented. We argue that by continuing to examine the links between social and biological factors, we will begin to build a foundational understanding of how interoception develops.

Future research investigating the relation between interoception and attachment related processes could address the following issues. The first refers to the association between self-reported emotional responding and physiological measures of responding, and the extent to which the congruence between the two can be considered a proxy for interoceptive functioning. The second relates to how early social experiences with a primary caregiver could influence the development of the interoceptive network of the brain. The third focus for future research should be to establish how parenting style, specifically in relation the socialization of bodily cues, could account for variations in interoceptive functioning. Finally, it is of crucial importance in all of this work that we develop a reliable method of quantifying interoception across the lifespan that will facilitate longitudinal developmental studies.

### ETHICS STATEMENT

This study was carried out in accordance with the recommendations of the Institutional Review Board at the University of Utah with written informed consent from all subjects. After meeting with study staff, both mothers and youth gave written informed consent in accordance with the Declaration of Helsinki. The protocol was approved by the Institutional Review Board at the University of Utah.

## AUTHOR CONTRIBUTIONS

KO developed the theoretical idea and performed the analytic calculations. KO, MP, and CW contributed to the final version of the manuscript.

### REFERENCES

fpsyg-10-00712 April 20, 2019 Time: 18:52 # 11



relationship to dimensional measures of body awareness. Hum. Brain Mapp. 38, 6068–6082. doi: 10.1002/hbm.23811


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2019 Oldroyd, Pasupathi and Wainryb. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Exploring Self-Consciousness From Self- and Other-Image Recognition in the Mirror: Concepts and Evaluation

Gaëlle Keromnes<sup>1</sup> \*, Sylvie Chokron<sup>2</sup> , Macarena-Paz Celume1,3, Alain Berthoz<sup>4</sup> , Michel Botbol<sup>5</sup> , Roberto Canitano<sup>6</sup> , Foucaud Du Boisgueheneuc<sup>7</sup> , Nemat Jaafari8,9 , Nathalie Lavenne-Collot<sup>5</sup> , Brice Martin10, Tom Motillon<sup>1</sup> , Bérangère Thirioux4,8 , Valeria Scandurra<sup>6</sup> , Moritz Wehrmann11, Ahmad Ghanizadeh<sup>12</sup> and Sylvie Tordjman1,2 \*

<sup>1</sup> Pôle Hospitalo-Universitaire de Psychiatrie de l'Enfant et de l'Adolescent (PHUPEA), Centre Hospitalier Guillaume Régnier, Université de Rennes 1, Rennes, France, <sup>2</sup> Laboratoire de Psychologie de la Perception (LPP), Université Paris Descartes, CNRS UMR 8242, Paris, France, <sup>3</sup> Laboratoire de Psychologie et d'Ergonomie Appliquées (LaPEA), Université Paris Descartes, UMR T7708, Boulogne Billancourt, France, <sup>4</sup> Laboratoire de Physiologie de la Perception et de l'Action, Collège de France, CNRS UMR 7152, Paris, France, <sup>5</sup> CHU de Brest – Service Hospitalo-Universitaire de Psychiatrie, CHU de Brest, Hôpital de Bohars, Bohars, France, <sup>6</sup> Child and Adolescent Neuropsychiatry, University Hospital of Siena, Siena, Italy, <sup>7</sup> Département de Neurologie, Centre de Mémoire de Ressource et de Recherche, CHU de Poitiers, Poitiers, France, <sup>8</sup> Université de Poitiers, Unité de Recherche Clinique Intersectorielle en Psychiatrie à Vocation Régionale Pierre-Deniker du Centre Hospitalier Henri Laborit, Poitiers, France, <sup>9</sup> INSERM U 1084, Experimental and Clinical Neurosciences Laboratory, Groupement de Recherche, CNRS 3557, Poitiers, France, <sup>10</sup> Service Universitaire de Réhabilitation, Hôpital du Vinatier, Université Lyon 1, CNRS UMR 5229, Lyon, France, <sup>11</sup> International Research Institute for Cultural Techniques and Media Philosophy, Bauhaus-Universität Weimar, Weimar, Germany, <sup>12</sup> Department of Neuroscience, Research Center for Psychiatry and Behavioral Sciences, School of Advanced Medical Sciences and Technologies, Shiraz University of Medical Sciences, Shiraz, Iran

A historical review of the concepts of self-consciousness is presented, highlighting the important role of the body (particularly, body perception but also body action), and the social other in the construction of self-consciousness. More precisely, body perception, especially intermodal sensory perception including kinesthetic perception, is involved in the construction of a sense of self allowing self-other differentiation. Furthermore, the social other, through very early social and emotional interactions, provides meaning to the infant's perception and contributes to the development of his/her symbolization capacities. This is a necessary condition for body image representation and awareness of a permanent self in a time-space continuum (invariant over time and space). Self-image recognition impairments in the mirror are also discussed regarding a comprehensive developmental theory of self-consciousness. Then, a neuropsychological and neurophysiological approach to self-consciousness reviews the role of complex brain activation/integration pathways and the mirror neuron system in self-consciousness. Finally, this article offers new perspectives on self-consciousness evaluation using a double mirror paradigm to study self- and otherimage and body recognition.

Keywords: self, self-consciousness, body-self, body image, body perception, intermodal sensory perception, body action, development

### INTRODUCTION

Self-consciousness can be defined for an individual as the awareness of his/her own body in a time-space continuum and its interactions with the environment – including others. It also encompasses the awareness that the individual has of his/her own identity, built over time in interaction with others. It is at the root of higher level processes, such as the theory of mind

#### Edited by:

Eszter Somogyi, University of Portsmouth, United Kingdom

#### Reviewed by:

Shuichi Nishio, Advanced Telecommunications Research Institute International (ATR), Japan J. Scott Jordan, Illinois State University, United States

#### \*Correspondence:

Gaëlle Keromnes g.keromnes@ch-guillaumeregnier.fr Sylvie Tordjman s.tordjman@ch-guillaumeregnier.fr; s.tordjman@yahoo.fr

#### Specialty section:

This article was submitted to Cognition, a section of the journal Frontiers in Psychology

Received: 03 September 2018 Accepted: 15 March 2019 Published: 07 May 2019

#### Citation:

Keromnes G, Chokron S, Celume M-P, Berthoz A, Botbol M, Canitano R, Du Boisgueheneuc F, Jaafari N, Lavenne-Collot N, Martin B, Motillon T, Thirioux B, Scandurra V, Wehrmann M, Ghanizadeh A and Tordjman S (2019) Exploring Self-Consciousness From Selfand Other-Image Recognition in the Mirror: Concepts and Evaluation. Front. Psychol. 10:719. doi: 10.3389/fpsyg.2019.00719

or empathy, processes that allow us not only to be aware of others but also to differentiate ourselves from them, from their image and from their perceptive and emotional experiences (Decety and Sommerville, 2003; Rochat, 2003).

Self-consciousness is at the intersection of different disciplines, such as neurophysiology, psychiatry, psychology/neuropsychology, psychoanalysis and philosophy, which puts it at the center of many research topics. Many authors have highlighted the crucial role of the body in the development of self-consciousness, both as an interface with the environment and as an actual part of the self (body-self) (Damasio et al., 2000; Ionta et al., 2011). In addition, the importance of the mirror in psychoanalytic and psycho-developmental models of self-consciousness strengthens the place of body-self in the construction of self-image recognition that implies self-consciousness (Wallon, 1934).

We propose, in this article, first to conduct a historical review of research on the concepts of self-consciousness, including the development of self-consciousness and the role of the body (especially, body perception but also body action) and the social other in the construction of self-consciousness. In this perspective, neuropsychological and neurophysiological approaches to self-consciousness will be developed. Second, the importance of self-image and self-recognition in the mirror will be underlined, especially with regard to the interest of the mirror in the evaluation of self-consciousness.

### CONCEPTS OF SELF-CONSCIOUSNESS AND SELF-IMAGE

### Theoretical Bases: Conceptualizing the Self

For centuries, theorists have sought to understand and define self-consciousness (Maine de Biran, 1834; Piaget, 1936; Wallon, 1959b; Merleau-Ponty, 1964; Vygotsky, 1978; Neisser, 1991; Rochat, 2003). One of the theoretical models, developed especially by Piaget (1936) and Merleau-Ponty (1964), is that of an innate self-consciousness, at least in its bodily dimension. In this model, the subject is born subject and knows himself as a subject. His/her subsequent psychic development concerns how he/she will build and shape the world around him/her. According to Piaget (1936), the interactions between the child and the environment around him/her are governed by the rhythm at which the maturation of his/her central nervous system takes place. His fundamental work in developmental psychology has influenced many of his successors, including Merleau-Ponty (1964) whose philosophical work has partly focused on the phenomenology of perception. Some of the notions he developed joined Piaget's ideas. Notably, he insisted on the importance of the other for the individual, while involving a certain degree of innate self-consciousness, especially with a body schema already present in the child at a very young age and thanks to which he/she will be able to interact with others.

At the beginning of the twentieth century different currents of thought appeared, principally developed by Vygotsky (1933). Vygotsky's ideas were far from Piagetian theories, by questioning the Piagetian egocentric stage that places the child at the center of the development of the self and the surrounding world representations. For Vygotsky, the child is not the main worker of this construction; he/she is a learner and it is the presence of the other and of the external environment which will allow him/her to be able to build him/herself as an individual (Vygotsky, 1978).

These ideas were taken up by many contemporaries. In France, the influence of Wallon (1959b) was particularly important. Through his work, Wallon questions a possible co-construction of self-consciousness and consciousness of the other based on the individual's interactions with the external environment. Wallon focuses on the importance of the presence of the other to be with the child in his/her self-construction. The child learns attitudes from others, first by simple mimicry and reciprocal emotional contagion. According to Wallon, this emotional reciprocity comes to sign the initial impossibility for the child to dissociate him/herself from others during the first months of life. He/she would not be aware of being a separate individual from his/her parents. It is through a game of reciprocal stimulation and alternation that the child finally would become aware of the boundary that separates him/her from the other, from his/her own ego. For Wallon, this development of self-consciousness ends at the age of 3, age of crisis in which the child can affirm him/herself and oppose to others his/her own desires and ideas. This approach dominated currents of thought during the following decades.

Neisser (1991) described two distinct ways of building the self: (a) Through body perceptions and interactions with the environment, and (b) Through the relationship with others. Furthermore, other authors such as Prinz (2013) discussed the notion of social mirroring and proposed that social mirroring is a prerequisite for the constitution of mental selves. Finally, Webb and Graziano (2015) highlighted the important role of attention processing in the development of self-consciousness with regard to external and internal environmental stimuli.

A summary of conceptualizations of the self is presented in **Table 1**.

More global approaches to the concept of self-consciousness have been proposed more recently, notably by Damasio et al. (1999), whose work focuses on the study of the neural basis of cognition and behavior. He has been one of the most active researchers in the field of awareness/consciousness exploring the mechanisms that underlie it. In his work, he provides a summary of ideas developed over his various studies by offering three distinct levels of awareness/consciousness that ultimately lead to self-consciousness: (a) Primary consciousness, (b) Reflexive consciousness, and (c) Self-consciousness. Primary consciousness is a vigilant core consciousness which develops between the age of 6 months to 1 year old and allows the infant to evolve in his/her environment even when he/she is not able to differentiate between him/herself and the rest of the world. Primary consciousness is shared by other animal species with sensory organs and a complex brain (Jones and Mormede, 2002). Reflexive consciousness allows the individual to understand that he/she is the one who directs his/her own actions and thoughts, and that he/she controls his/her

#### TABLE 1 | Conceptualizing the self.

fpsyg-10-00719 May 4, 2019 Time: 16:17 # 3


reasoning and behavior. It corresponds to the consciousness of not being the other and would be shared by humans and big primates. Self-consciousness, the higher level of consciousness, refers to the ability to appropriate one's own history, to be aware of a unity of the self that persists despite the passage of time and the environmental changes. According to Damasio, this does not appear before the age of 2 in human beings. Interestingly, the age of 2 corresponds also to the child development of language.

The notion of primary consciousness developed by Damasio et al. (1999) can be compared to the concept of "minimal self." The minimal self represents the most basic level of the self. It refers to self-consciousness as a subject of an immediate experience and pre-reflexive origin of action, experience and thought (Gallagher, 2000). The pre-reflexive "it is mine" (or feeling of belonging; of "mineness") of a conscious experience, is a central characteristic of the minimal self. Therefore, the minimal self can be differentiated from more elaborate aspects of the self, such as the reflexive self (explicit consciousness of an "I") and the narrative self (experience of a self with specific characteristics and one's personal history).

More recently, Decety and Sommerville (2003) presented their conceptualization of self in a literature review. This conceptualization reflects in some ways the stratification model described by Damasio. Accordingly, the construction of the self is a multidimensional and evolving process that takes place from infancy and develops throughout the first years of life. This process involves physical, psychological and social factors and allows the development of different types of consciousness with different levels. Decety and Sommerville highlighted the cognitive dimension of self-development involving shared self-other representations, ultimately leading to self-other differentiation.

The different levels, types, contents and alterations of self- consciousness are summarized in **Table 2**.

In light of these different approaches, two primordial concepts can be highlighted. The first is that the acquisition of self-consciousness comes mainly by the differentiation between oneself and the other, with the recognition of each other's identities. The second concept is that the body, an interface between oneself and the other, is one of the essential keys in the course of this process. This concept is developed in the next section.

Concerning the first concept, self-consciousness would be built in relation to the other (relational dyad), through relational and emotional synchronization, and through the other's eyes (Wallon, 1984; Feldman, 2007; Haag et al., 2005, 2010). The mother looks at the baby and the baby looks at the mother, but he/she sees also his/her reflection in the eyes of the mother. Self-consciousness, with the integration of body image, also passes by the imitation of the other, going for example from simple imitation in the new born of sticking out the tongue to more complex development including the child's verbal language (Nadel-Brulfert and Baudonnière, 1982; Nadel et al., 1983; Nadel, 2011).

Here we could hypothesize that self-consciousness is built up through the imitation of the other, with the representation of what is identical through synchronization, but also with the representation of what is different. Later, the appearance of a gendered body refers to sexual differentiation and is probably in adolescence a new mobilizing lever of this process.

## The Role of the Body in Self-Consciousness

#### Body Perception and Self-Consciousness

Today, links between body and mind seem to be well established. Clinical practice recalls it every day with regard to the frequency of the psychosomatic manifestations observed in patients with psychiatric disorders (Testa et al., 2013). The body appears here as reflecting psychological problems and at the center of the psychological process of self-consciousness (Gernet, 2007).

For Wallon (1959a), when the child is born, he/she sees him/herself as dislocated, with distinct parts and members, and gradually, he/she sees them unified in a coherent body. The consciousness of a body self would be an indispensable prerequisite for the construction of the child's personality. This concept was first introduced in 1794 under the name of cenesthesia. Hübner defined cenesthesia as a general sensibility that represents to the soul the state of its body whereas the sensibility informs the soul on the external world and the internal sense gives representations, judgments, ideas and concepts (Starobinski, 1977). Wallon will later describe it in a simpler way by designating under this term two types of sensibilities: an internal and visceral sensibility, and a proprioceptive and postural sensibility whose joint action will be responsible for kinesthetic sensations.

This concept of cenesthesia evolved toward the concept of body schema at the end of the nineteenth century, following

TABLE 2 | Levels, types, contents, and alterations of self-consciousness (based on Damasio et al., 1999; Decety and Sommerville, 2003; Rochat, 2003; Parnas and Henriksen, 2014; Keromnes et al., 2017).


<sup>∗</sup>Five levels (Rochat, 2003) in contrast to a level zero corresponding to a level of confusion with absence of self-consciousness.

Bonnier's (1904) clinical observations on impairments in the perception of the own body resulting from certain neurological lesions or benign disorders such as vertigo. The existence of a normal perceptive system to which these anomalies would be related is then hypothesized. This perceptual system would correspond to what Head and Holmes et al. (1911–1912) conceptualized shortly thereafter under the term of body schema. This corporeal pattern would gradually form in the first months of life, when the child seems to have a keen interest in exploring his/her own body through touch, but also to explore his/her immediate environment. Through this exploration, the child gradually learns the boundaries between his/her body and his/her surroundings. Although several studies have underlined the risk of body schema deficits in children with severe visual deficits (for a review, see Lueck and Dutton, 2015), Head and Holmes rejected any participation of the optical pathways in the acquisition of this body schema which reflects an overall intuition concerning the present situation of the body in space.

This last remark emphasizes the fundamental difference between this body schema concept and the self-image concept introduced by Schilder (1968). The first one is based on postural elements whereas the other refers to the symbolic and affective experience based above all on a visual perception of oneself. The two, however, are not dissociated given that they contribute together to the constitution of the body-self.

As body schema, self-image is not innate but is acquired gradually. The concept of self-image is affectively and symbolically charged. Self-image is not just an observed image. It includes also how individuals represent their own body in their mind and the others' representations on their own body. This representation is initially performed as part of the interactions between the child and the other, as if the child could see him/herself first through the eyes of the other before being able to imagine his/her own body. Various authors have described the important role of the mirror in the construction of self-image and, more broadly, the self. This aspect will be detailed later.

This overview of ideas which were developed for over a hundred years shows the importance of the body in major theories of developmental psychology. Indeed, more recent authors, such as Damasio et al. (1999), have highlighted the central place occupied by the body in the phenomenon of consciousness. They support the idea that conscious thinking is primarily based on our visceral perceptions. In their model, they developed different possible levels of self-consciousness, placing bodily perception before any level of consciousness. The perception of the external world described in primary consciousness becomes possible only if this fundamental bodily perception is operational.

In recent years, many authors have also participated in improving our understanding of the pathways involved in self-consciousness, with a very specific focus on "body

self-consciousness" (Aspell et al., 2009; Pasqualini et al., 2013). This concept is divided into three dimensions: self-localization, first-person perspective, and self-identification.

Different experimental studies have shown that bodily self-consciousness is malleable. Sforza et al. (2010), for example, studied facial recognition in healthy individuals, through a simple experimental paradigm. An examiner touched the subject's face while the subject observed the same action being applied simultaneously to another person's face. The results of this study showed frequent errors in identifying the image of the other as his own. Similarly, by manipulating visual-tactile inputs, an illusory feeling of ownership can be induced by an artificial hand (rubber hand illusion; Botvinick and Cohen, 1998). As a matter of fact, viewing another person's hand or face being stroked in synchrony with strokes applied to our own corresponding non-visible hand or face can induce illusory self-attribution of the visible hand or face. Moreover, participants perceive their hand to be at a position that is displaced toward the fake hand's position (proprioceptive drift) or judge another person's face as similar to their own face. Many experimental paradigms have shown similar results, suggesting that bodily self-consciousness may waiver with contradictory sensory stimulation. In addition, as discussed below, our motor actions may also contribute to self-consciousness.

#### Body Action and Self-Consciousness

As recently underlined, the motor control system in the brain not only controls complex actions but is also concerned by body representation (Murata et al., 2016). More precisely, Murata et al. (2016) showed that the motor control system contributes to perception of the hands as part of one's own body. According to Gallagher (2005), the perception of one's own body is the fundamental process of self-recognition. In this way, the hands are not only effectors in movement, but could be considered as a link between the mind and motor control. Along those lines, what is called now the sense of agency can be viewed as a subjective awareness that one's generated action is attributed to one's self. The sense of agency occurs when an executed action is recognized as being generated by one's own body. The sense of agency is thus expected to occur exclusively during voluntary movement. According to Blakemore et al. (1999), a copy of the motor command, that is the efference copy, can pass into the forward model to predict feedback in response to a given motor command. In this way, the comparison between sensory feedback and the corollary discharge contributes both to the precision of the movement and to recognition of who generated the observed action. In turn, the sense of agency participates to the construction of self-consciousness through the production and the control of motor actions.

Several neuropsychological and neuroimaging experiments have revealed that the inferior parietal cortex is involved in the sense of agency (for a review, see Murata et al., 2016). Indeed, as demonstrated by Sirigu et al. (1999), patients with lesions of the inferior parietal cortex present deficits in agency recognition. Moreover, in the same way, a few human brain imaging studies have reported that the inferior parietal cortex is involved in the detection of agency of action in healthy participants (Farrer et al., 2003, 2008; Decety and Grèzes, 2006; Chambon et al., 2013). Interestingly as we will discuss below, parietal lesions are also responsible for spatial neglect as well as somatoparaphrenia, two neuropsychological deficits that can also affect self-consciousness.

## Neuropsychological Approach to Self-Consciousness

#### Disturbed Sense of Agency and Disturbed Body Ownership

As discussed above, the sense of body ownership, as well as the awareness of being causally involved in an action, the sense of agency, have been mostly investigated in healthy participants by using experimental as well as functional neuroimaging methods (Farrer et al., 2003; Tsakiris et al., 2007). A complementary approach is the study of neurological patients showing specific neuropsychological disturbances of these senses after brain damage.

As discussed below, brain-damaged patients, especially after a stroke, may present anosognosia for hemiplegia (AHP) affecting the sense of ownership and therefore self-consciousness. Brain-damaged patients with AHP deny typically the weakness of their paretic or plegic contralesional limb and are convinced that they move properly. These patients may also show a disturbed sense of ownership, with respect to paretic/plegic limb. They experience their contralesional limb as not belonging to them and may even attribute them to other people. This deficit is often called 'somatoparaphrenia'. AHP is thus characterized by their false belief that they are not paralyzed. Their feeling of being or not being causally involved in an action – their sense of agency – is thus dramatically disturbed. As incredible as it may appear, despite the obvious fact that the contralesional limb is severely paralyzed, these patients behave as if the disorder did not exist. When they are asked to move the paretic/plegic arm or leg, they may do nothing or may move the limb of the opposite side. However, in both situations, they are either convinced that they have successfully executed the task or may argue that they can move in a generic manner. Interestingly, although they are unable to move their contralesional limb when asked to do so, they may explain their impossibility either by confabulations (I could move it yesterday, but my arm is now tired) or by external causes (the ground is slippery, and I cannot walk on it) (Nathanson et al., 1952). Regarding the neuroanatomical correlates of the AHP, several studies have suggested that the right insular cortex might be a crucial anatomical region in integrating input signals related to self-awareness about the functioning of body parts (for a review, see Karnath and Baier, 2010). In addition, confirming this hypothesis, converging evidence has been reported that the anterior insular cortex is also a central structure for pain mechanisms and temperature regulation (Craig et al., 1996; Kong et al., 2006). In this way, the anterior insular cortex could well represent an important correlate of human "interoception" as well as a crucial cortical area for body ownership, for the sense of agency and more generally for self-consciousness (Craig, 2002, 2009). Moreover, the anterior insular cortex was suggested to

be involved in other cognitive and emotional processes that could well contribute to self-consciousness such as the feelings of anger or anxiety (Phillips et al., 1997; Damasio et al., 2000), craving (Contreras et al., 2007; Naqvi et al., 2007), and visual self-recognition (Devue et al., 2007).

Finally, anosognosia and disturbed sense of body ownership are often associated with another neuropsychological deficit consecutive to a right parietal lesion and affecting spatial representation: unilateral spatial neglect (USN).

#### Personal Neglect

Unilateral spatial neglect is a disorder in which patients are unaware of the hemispace contralateral to the lesion. Usually a left USN is observed after a right parietal lesion. As Schilder (1968) has proposed, space can be divided in extrapersonal space, peripersonal space, and personal space. Along those lines, patients suffering from USN may ignore either the extrapersonal space (either near or far) or the personal space contralateral to the lesion. In this latter case, when suffering from personal neglect, patients ignore their own contralesional body parts. Indeed, patients may not use their contralesional hemibody although not being paralyzed. In some cases, patients may exhibit somatophrenia and can explain that their contralesional arm or leg is behind the closet, or that their husband or wife took it with them. Importantly, many patients are unaware they have these problems (anosognosia).

A neuropsychological approach can indeed be of interest to assess the role of the different cortical and subcortical structures involved in self-consciousness to decipher the neurophysiological basis of self-consciousness as presented in the next section.

### Neurophysiological Approach to Self-Consciousness

Evolutionary psychology postulates that self-consciousness, as well as other higher cognitive faculties, would be unique to the human being and thus, would distinguish us from animal species, even from the most evolved ones (Rochat, 2018). These theories are now challenged by neurobiological advances, highlighting the involvement of certain brain structures in the process of self-consciousness reported in some primates.

### Complex Activation and Integration Pathways

Schilder, psychiatrist and psychoanalyst, departed from psychodevelopmental theories of self-consciousness in 1935 to question the neurophysiological mechanisms allowing an individual to be situated in a given space-time (Schilder, 1968). His ideas opened new perspectives on various research projects. Since that time and still today, many experimental paradigms have been designed and developed to better understand the neurological pathways of self-consciousness.

Lhermitte (1939) was one of the first to publish his research on the neurophysiological mechanisms of self-consciousness. He described a probable activation of right parietal cerebral structures related to the process of acquisition of the selfimage. Numerous studies supported later this hypothesis and it appears nowadays well established that right brain structures, particularly parietal ones, are involved in global selfconsciousness (Taylor, 2001).

More specifically, besides the involvement of the inferior parietal cortex in the sense of agency described above in the "Body action and self-consciousness" section, the primordial role of the temporoparietal junction has been also reported (Ionta et al., 2011; Graziano, 2018). It corresponds to a zone of integration of multimodal sensory information that may play a key role in the first-person perspective, and the distinction between oneself and the other, as well as in some more complex mechanisms of the theory of mind, which includes the ability to understand the intentions, desires, and beliefs of the other. Aspell et al. (2012) studied more particularly its activation during the phenomenon of out-of-body experiences both by continuous electroencephalographic monitoring during such phenomena, but also by observing that such experiments could be triggered in healthy individuals by transcranial magnetic stimulation (TMS) of the temporoparietal junction (Blanke et al., 2005).

The role of frontal cortical structures was also discussed concerning more specifically an activation of the pre-frontal cortex which would intervene in the process of differentiation between self and others (Van Veluw and Chance, 2014). It is noteworthy that several fRMI studies on the theory of mind have highlighted the key role of the median prefrontal cortex (Van Veluw and Chance, 2014).

Finally, the role of the vestibular system was also described in the development of spatial bodily self-consciousness (Pfeiffer et al., 2014). One of its functions is to provide information on the position of the body taking into account the variations of the Earth's gravitational system, essential for the brain's encoding of the body's spatial orientation in the environment. Some studies have hypothesized that the vestibular system could be part of a larger network involved in spatial exploration including the parietal lobes and the anterior insular cortex already mentioned in the section on "Disturbed sense of agency and disturbed body ownership" (Brandt et al., 1994; Karnath et al., 2004). This could well explain how caloric vestibular stimulation may reduce somatophrenia. Indeed, it was demonstrated that such stimulation applied in right brain-damaged patients can induce transitory remission of anosognosia for hemiparesis as well as permanent disappearance of somatophrenia (Cappa et al., 1987; Bisiach et al., 1991; Rode et al., 1992; Vallar et al., 2003). Also, caloric vestibular stimulation can reduce AHP and USN (Cappa et al., 1987; Bisiach et al., 1991; Rode et al., 1992; Vallar et al., 2003), confirming the role of the vestibular-parietal network in body awareness and self-consciousness.

It is noteworthy that integrated models of self-consciousness, involving sensory and motor multimodal integration, are related to ideas already developed by Sherrington almost one century ago. Sir C.S. Sherrington (1906) was an English neurologist who received the Nobel prize in medicine with Adrian (1932) for their work on the neural system. According to their work, the self-consciousness in the here and now is based on visuomusculo-labyrinthic or tactile-muscular-labyrinthic perceptions (Wallon, 1959a). This model echoes also some of Piaget's ideas, as he described within his developmental stages a first sensorimotor stage during which the child exists only through movement and sensation (Piaget, 1956). This description shows even today the importance of sensory stimulation and how it can be integrated in the brain toward the construction of self-consciousness, especially in very young children.

Within this process of integration and complex activation, the role of a particular neuron system – the mirror neurons – is subject of much debate.

#### The Mirror Neuron System

fpsyg-10-00719 May 4, 2019 Time: 16:17 # 7

Mirror neurons were first described by Rizzolatti and Sinigaglia (2008). They are a system of motor neurons whose particularity is to activate themselves both when we perform a given action but also when we see someone else performing the same action, or even when we think of or speak about its realization without however, initiating it. They were first detected in monkeys after it was observed that they could frequently perform an action immediately after seeing it in one of their congeners, as if mirroring the other.

These descriptions might suggest that the neurons involved in such reactions are at the level of the optical pathways and are activated by visual stimulation. However, Rizzolatti's studies in functional brain imaging suggest that this "mirrored" reaction would correspond to a brain activation at the level of the premotor frontal cortex, the superior temporal sulcus, and certain parietal areas (Rizzolatti and Sinigaglia, 2008). Other recent functional MRI studies confirm these data, without showing any activation in the occipital visual areas (Calvo-Merino et al., 2005). Studies of mirror neurons system in primates showed an activation of the F5 brain area that corresponds in humans to the Broca area (i.e., the inferior part of F3 corresponding to Brodmann's areas 44 and 45). It could be hypothesized that these mirror neurons have also a role in the production of verbal language and in the ability to communicate with others.

It is recognized that these neurons underlie partly our ability to connect with each other. This finding placed them at the center of social cognition. Their role was particularly discussed in the ability to differentiate oneself from the other, but also in the interactions with others, at bodily, affective, and cognitive levels (as in the phenomenon of empathy, for example).

A parallel can be made here between the mirror neurons functioning and the gestual and emotional reciprocity described by Wallon. In the Wallon theory (1959b), the infant reproduces many actions that he/she sees in adults (a smile, for example) as if he/she was facing a mirror. It is through these experiences, more precisely through identification and differentiation processes, that self-consciousness can develop.

At present, the mirror neuron system suffers from some criticism for its lack of specificity and more recent research suggests that it has been granted quickly with too much credit (Hickok, 2014). Their discovery nevertheless created an emulation that has brought a lot to scientific research.

Despite constant efforts to better understand the specific paths involved in self-consciousness, these are not yet well established. Limits appear given the complexity of the information to be processed. For some authors such as Damasio et al. (1999), Tononi and Edelman (2000), or Seth et al. (2006), there is no fixed structure responsible for the existence of self-consciousness at its different levels. For these authors, self-consciousness responds in fact to a much more comprehensive brain activation that allows an individual to locate him/herself in the here and now, and to take into consideration his/her personal history with associated affects.

### SELF-IMAGE AND THE MIRROR

### A Brief History of the Mirror

Today, omnipresent in our homes, the mirror is an ancient object whose form or use has changed over the centuries. The appearance of first mirrors is difficult to date. Descriptions can be found in ancient times when they were already known for their reflective capacity. The first mirrors were made of glass and lead. Their preparation from these materials involve some work and so made the object scarce and expensive. It remained the richest privilege for centuries. In the middle ages, for example, some most precious materials such as gold or silver, could be used to manufacture mirrors (Melchior-Bonnet, 2011).

It was not until the end of the seventeenth century that mirrors were more affordable, spread in homes, and became a need for everyone. The introduction of mirrors in our lives has probably much changed the way people perceive their own image (Melchior-Bonnet, 2011).

Alongside these historical and societal considerations, it is interesting to note that the mirror has long been seen as a fascinating object. In the study of the Inca civilization, we find mirror descriptions used as "fire lighter" (Nordenskiôld, 1926). This very special ability, yet based on simple optical properties, made it an almost magical object and, in fact, a luxury item. In the Middle Ages, the most beautiful and expensive mirrors were installed in castles of noble or royal people, and poets or storytellers of the time were celebrating their magical power of reflection. It is precisely this power of fascinating reflection that is found in many ancient myths. Since ancient times, selfreflection is the subject of many productions. The most common is the myth of Narcissus or Perseus. In both cases, the mirror or the reflect leads to a tragic death. Narcissus pays the price for his vanity and love for his reflection he sees in the water. Perseus is using the reflectivity of a mirror to defeat and kill his enemy Medusa. It might be seen here as the beginning of a reflection on self-consciousness.

### The Importance of the Other in the Construction of Self-Consciousness: The Theories of Self-Recognition in the Mirror

In his work on building self-consciousness, Wallon (1934) explains how young children, interacting with their environment, gradually become aware of their own body. One of the major parts of his work is the description of the reaction of the child in the mirror. Wallon observed that between 6 months and 2 years of age, the child develops a fascination with his/her reflection in the mirror, even after having understood that is fictional. The child can contemplate this reflection a long time, enjoying looking

at this "other" who is not "a real self." Starting with the first reflections, looking at the mirror with his/her parents, he/she turns to them in search of reassurance, as a validation that the image he/she is seeing is actually him/herself.

As we mentioned earlier, it is through this game of alternation and sharing between him/herself and the other that the child becomes aware of his/her individuality. The mirror is, for Wallon, one of the major mediators in this process, because it is through it that the child can interact with others.

Zazzo (1948) followed some of Wallon's ideas. He published his observations from the reactions of an infant confronted with his/her images via his/her reflection in a mirror, but also photographs and films. He states that when the child is facing the mirror as well as other visual aids, the recognition of the other is far ahead of self-recognition. He establishes also that for the three types of images (mirror, photo, and film) there is a first period during which we observe that the subject appears not to recognize or even watch his/her own image. Besides, he explains that the recognition of the other through the mirror begins well before in a picture or movie. However, if the self-image in the mirror is recognized much earlier, it remains a long time affected by uncertainty and anxiety unlike still images such as photographs.

The finding showed that between 2 and 3 years of age, the child becomes increasingly aware of his/her body image and would be able to understand that he/she is alone facing his/her own image and that his/her reflection is not somebody else or a double. According to Zazzo, this development occurs in parallel with the explosion of language, an additional tool that helps the child to understand the distinction between him/herself and the other and between him/her and his/her reflection.

Yet the essential work of Wallon and Zazzo appear neglected when, some years later, Lacan grabbed the topic and brought it to psychoanalysis by publishing "The mirror stage as formative of the function of the I as it is revealed in psychoanalytic experience." In a replay of the theories developed by Wallon, Lacan describes how the mirror stage helps the child to give up his/her fragmented body to become aware of its identity through the mirror image (Lacan, 1966), the description of the "Stage of the mirror" by Lacan is now a cornerstone of the psychoanalytic approach to the construction of the self. Lacan himself referred the origin of his theory to the work of the American psychologist James Mark Baldwin who influenced also Piaget and Vygotsky. Baldwin's theories focused on the progressive distinction between self and other through social interactions. He was one of the first to see the way the child behaves in front his/her reflection as an indicator of the construction of the self (Müller and Runions, 2003).

Many other authors were concerned about this topic in the aftermath of Lacan, notably Françoise Dolto, who published in 1987 with Nasio "The child of the mirror." If she adhered partially to Lacan's theories, her approach differs on an essential point. Unlike Lacan (and Wallon before him) who described the infant as a fragmented being, with no containment in the months preceding the identification with reflection, Dolto highlights a primary narcissism that makes the child a cohesive whole, maintained through basic external, and visceral sensations. Therefore, the infant cannot be fully satisfied with this image as it is incomplete, reflecting only one side of his/her body, whereas he/she even "feels as a whole in his/her being" (Dolto and Nasio, 2002).

The weight given to the mirror in psychoanalytic and psycho-developmental theories stresses the importance of body-self and self-image, but also the importance of the other in the construction of self-consciousness.

### Impairments in Self-Image Recognition in the Mirror

Disorders of self-consciousness are related to various disturbances of the pathways involved in self-consciousness. Different components of self-consciousness can be impaired and produce various clinical syndromes. Impairments in bodily self-consciousness lead to somatognosic disorders among which are found the out-of-body experiences, the heautoscopy, and the autoscopic hallucinations. These phenomena have been widely studied in the case of disorders of neurological origin, particularly in certain dementia syndromes (Blanke, 2007). Their mechanisms are still poorly understood, although Blanke has been able to demonstrate a dysfunction of the temporoparietal areas previously described, with the additional involvement of certain occipital areas in the case of autoscopic hallucinations (Blanke and Mohr, 2005).

Heautoscopy was described by Lhermitte (1939) in his work «The image of our body» like an almost hallucinatory experience during which the patient suddenly sees his/her image appearing in front of him/her. He explains it schematically by two essential components: a visual hallucination and a disturbance of bodily self-consciousness. The latter causes the individual a feeling of partial depersonalization that makes it difficult for him/her to locate him/herself – either in his/her own body or projected in the autoscopic image. One of the variations of this heautoscopy (or illusion of the double) is the phenomenon of negative heautoscopy that Maupassant has described so well, when sinking into madness: Indeed, he described this phenomenon in which the individual no longer sees his/her image in a reflective surface. This illustration highlights an example of a situation in which the relationship to the mirror is disturbed. It shows also that such disruptions in the recognition of self-image are not confined to neurological disorders. They are also common in psychiatric disorders and especially psychotic disorders.

It is currently accepted that early disturbances of the child's psychological development can be associated with various psychiatric disorders which may eventually become fixed in adulthood (Jones, 1997; Tackett et al., 2009). Given the important role of the mirror during psychological development, we can suppose that the relation to the mirror, and therefore to the reflection, is disturbed in children with atypical psychological development. Salem Shentoub was one of the first to study selfrecognition in children with developmental disabilities, including intellectual disability. He described the reaction of "mentally retarded" children in front of the mirror and observed differences in their behaviors compared to typically developing children (Rustin et al., 1954). The reported reactions are varied, ranging from an apparent absence of self-image recognition to complex affective manifestations, including preliminary interactions with

the reflection or various stereotyped behaviors. However, the children's reactions in front of the mirror appear to be an extension of their usual behavior and the confrontation with their image does not trigger in these children more specific reactions, even in the most severe cases. The most alarming observation was the apparent absence of self-recognition, which questions a possible absence of recognition of the other as well as selfconsciousness impairments. Nevertheless, Shentoub observed that the repetition of mirror experiences in the same child allowed him/her to become acquainted with the other and with his/her image, and this was associated with an overall positive behavioral evolution. He was already raising here a possible remediation that could be accomplished through the mirror image.

These intellectually disabled children observed by Shentoub did not show a psychotic disorder marked by an internal disorganization or a rupture with reality. However, it cannot be ruled out that young patients with schizophrenia might also exhibit some atypical behaviors when facing mirrors due to their own perceptual and/or cognitive developmental disturbances.

The psychoanalytic theories presented briefly in this article allow us to consider these disturbances of self-image recognition as an indicator or even a possible marker of a disturbance of psychological development occurring upstream of the mirror stage. Lacan, himself, mentions in his work the absence of reaching the mirror stage in psychotic patients, thus preventing their symbolic identification with their own image.

On this topic, Abely (1930) described in the early 1930s "a need for certain individuals to examine themselves at length and frequently in front of a reflective surface" (quoted in Meaulle, 2007). He reported here a phenomenon he observed at the dawn of the appearance of early dementia in some of his patients. In his descriptions, this fixation on the mirror can be accompanied by a search for dialogue with the reflection which is considered as a distinct other. Shortly after him, Delmas (1929) published similar observations.

The observations Abely and Delmas developed, almost simultaneously but separately, now refer to the same concept, the "mirror sign." This sign would be for them a clinical marker of psychotic disorganization. The mirror sign, for these authors, more than a fascination for the patient's reflection, corresponds in fact to a search for the patient's own image in the mirror – an image that disintegrates more and more permanently with the onset of psychotic disorders. It is noteworthy that in the interpretation they make of this mirror sign, this sign would be part of the prodromal phase, and would disappear once the disorder developed.

It is noteworthy that the mirror can be used to study impairments in self-image recognition. A new double mirror paradigm for the study of self-other differentiation, selfidentity and self-image recognition, and manipulation of spatial reference frames in social interactions, was proposed by Alain Berthoz (Collège de France, Paris), using the "Double Mirror" designed by Moritz Werhmann (Alter Ego System© that includes a set of white computer-controlled light emitting diodes/LEDs fixed on the frame of the mirror on both sides), and studied in healthy participants by Thirioux et al. (2016). Previous studies (Harrington and Spitzer, 1989; Caputo et al., 2012; Bortolon et al., 2017) have used the mirror to explore self-image recognition in schizophrenia but the Alter Ego System (c), which combines the facial images of two individuals sitting on each side of the mirror, offers a new double mirror paradigm to examine self-other recognition impairments in individuals with schizophrenia. Self-other recognition impairments have been also examined in schizophrenia by Slowinski et al. (2017) but they used a "mirror game" (without a real mirror) based on interactions between the patient and an artificial agent, a computer avatar or a humanoid robot, which cannot be compared to self-other recognition involving only human individuals. The paradigm of the double mirror was used for the first time to study self-other differentiation in individuals with schizophrenia compared to typically developing controls (TDC) (Keromnes et al., 2018). The visual recognition task consisted in recognizing more the other's face through the mirror (as through a transparent window) or his/her own face reflected in the mirror according to the light intensity of the LEDs set (the higher the light intensity is, the more visible is the image). The results showed that individuals with schizophrenia, independently of age and schizophrenia severity, were centered on their own image, with both significant earlier self-recognition, and delayed other-recognition compared to TDC during the visual recognition task. In addition, there was no significant effect of intermodal sensory stimulation (visual-tactile or visual-kinesthetic stimulation) on self–other recognition in individuals with schizophrenia, whereas self-centered functioning was significantly increased by visual–tactile stimulation and decreased by visual–kinesthetic stimulation in TDC. The findings suggest that self–other recognition impairments might be a possible endophenotypic trait of schizophrenia. It would be of interest to conduct the same experimental study using the double mirror on individuals with childhood onset schizophrenia and catatonia, characterized by a very early onset of schizophrenia but also severe clinical impairments and longer episodes of schizophrenia (Bonnot et al., 2008), to verify if similar results are observed in this population.

### CONCLUSION

The literature review presented in this article emphasizes the role of body perception, body actions and of the self-image in the construction of self-consciousness. Of importance, we demonstrated here that a multidisciplinary approach is mandatory to address such a complex concept. We aimed also to highlight the interest of self-image recognition in the mirror to assess self-consciousness but also the role of the other in self-image recognition. Self-image development might be a good indicator of the evolution of the self-consciousness process, especially through self-and other-image recognition in the mirror (Tordjman and Maillhes, 2009). Self-consciousness can be impaired in one or several of its components (identity, body, etc.). Self-recognition, and notably self-image recognition, can be disturbed in various disorders, especially neurodevelopmental

disorders (dementia, psychiatric disorders, etc.) (Blanke, 2007). Considering impairments in self-consciousness and self-image recognition may open important perspectives, especially for early diagnosis and therapeutic strategies in neurodevelopmental disorders. However, a limit of such phenomenological inquiry remains the detection of these disturbances that relies on patients' verbal reports (Martin et al., 2014). These patients' reports should indeed be interpreted with caution, especially because body-self is related to non-verbal aspects of consciousness. Thus, a challenge consists in finding a way to objectify such self-disturbances in individuals with a non-verbal approach (Mishara et al., 2014). The double mirror, mentioned previously in this article, might be a useful instrument to investigate further self–other recognition impairments in self-consciousness disorders in general and neurodevelopmental disorders such as schizophrenia or Autism Spectrum Disorder. Self–other face identification in the mirror

### REFERENCES


Bonnier, P. (1904). Le Sens Des Attitudes. Paris: C. Naud.


may improve bodily self-consciousness and sustain self–other differentiation in these disorders. Future studies are required to explore this perspective. In particular, the double mirror system could be useful for early diagnosis, follow-up, and therapeutic perspectives based on cognitive remediation helping individuals with self-consciousness disorders to improve self–other differentiation.

### AUTHOR CONTRIBUTIONS

GK and ST wrote the first draft of the article. SC contributed in a significant way to this article by adding notably the part on the neuropsychological approach to self-consciousness. M-PC revised the first draft of the article. AB, MB, RC, FDB, NJ, NL-C, BM, TM, BT, VS, MW, and AG reviewed and approved the final version of the article.



Rustin, E., Soulairac, A., and Shentoub, S. A. (1954). Comportement de l'enfant arriéré devant le miroir. Enfance 7, 333–340. doi: 10.3406/enfan.1954.1469

Schilder, P. (1968). L'image Du Corps. Paris: Gallimard.



**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2019 Keromnes, Chokron, Celume, Berthoz, Botbol, Canitano, Du Boisgueheneuc, Jaafari, Lavenne-Collot, Martin, Motillon, Thirioux, Scandurra, Wehrmann, Ghanizadeh and Tordjman. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Prerequisites for an Artificial Self

Verena V. Hafner <sup>1</sup> \*, Pontus Loviken2,3, Antonio Pico Villalpando<sup>1</sup> and Guido Schillaci 1,4,5

<sup>1</sup> Adaptive Systems Group, Computer Science Department, Humboldt-Universität zu Berlin, Berlin, Germany, <sup>2</sup> Softbank Robotics, Paris, France, <sup>3</sup> Centre for Robotics and Neural Systems (CRNS), University of Plymouth, Plymouth, United Kingdom, <sup>4</sup> The BioRobotics Institute, Scuola Superiore Sant'Anna, Pisa, Italy, <sup>5</sup> Department of Excellence in Robotics & AI, Scuola Superiore Sant'Anna, Pisa, Italy

Traditionally investigated in philosophy, body ownership and agency—two main components of the minimal self—have recently gained attention from other disciplines, such as brain, cognitive and behavioral sciences, and even robotics and artificial intelligence. In robotics, intuitive human interaction in natural and dynamic environments becomes more and more important, and requires skills such as self-other distinction and an understanding of agency effects. In a previous review article, we investigated studies on mechanisms for the development of motor and cognitive skills in robots (Schillaci et al., 2016). In this review article, we argue that these mechanisms also build the foundation for an understanding of an artificial self. In particular, we look at developmental processes of the minimal self in biological systems, transfer principles of those to the development of an artificial self, and suggest metrics for agency and body ownership in an artificial self.

Keywords: artificial self, developmental robotics, sense of agency, predictive processes, sense of body ownership, minimal self

### 1. INTRODUCTION

Edited by:

Pablo Lanillos, Technical University of Munich, Germany

#### Reviewed by:

Keyan Ghazi-Zahedi, Max Planck Institute for Mathematics in the Sciences, Germany Dalila Burin, Tohoku University, Japan

### \*Correspondence:

Verena V. Hafner hafner@informatik.hu-berlin.de

Received: 07 September 2018 Accepted: 17 January 2020 Published: 21 February 2020

#### Citation:

Hafner VV, Loviken P, Pico Villalpando A and Schillaci G (2020) Prerequisites for an Artificial Self. Front. Neurorobot. 14:5. doi: 10.3389/fnbot.2020.00005 People can usually easily recognize their own body and the results of their own actions. This apparently simple skill likely contributes to what makes us feel as separate entities in the world (Van Den Bos and Jeannerod, 2002) and it is indeed fundamental for interacting with the environment and with other individuals. A current research trend suggests that the minimal self - the pre-reflective experience of being a self, or the awareness of oneself as a subject of experience (Blanke and Metzinger, 2009)— would be characterized by two important aspects: a sense of body ownership—I feel corporal sensations as uniquely belonging to my own body—and a sense of agency - I feel being in control of my own actions (Gallagher, 2000).

Topics such as body ownership and agency that have traditionally been investigated in philosophy have recently gained attention from other disciplines, such as brain, cognitive and behavioral sciences, and even robotics and artificial intelligence. Some neuroscientists, for example, interpret certain human mental disorders—such as schizophrenia—as the result of a disrupted sense of the self (Frith et al., 2000; Nelson et al., 2014; Klaver and Dijkerman, 2016; Sterzer et al., 2016). In robotics, intuitive human interaction in natural and dynamic environments becomes more and more important, and requires skills such as self-other distinction and an understanding of agency effects (Holthaus and Wachsmuth, 2012; Belpaeme et al., 2018). Developmental psychologists study the emergence of self-awareness from very early stages of development. Selfawareness would unfold already during the first months of life, when infants seem to start having a sense of how their own body is situated in relation to other entities in the environment (Rochat, 2003). Infants at 5 months of age, for example, are able to distinguish their own leg movements from those of another infant, when they are displayed in a mirror (Rochat, 2003). These action-effects have been studied in infants using different modalities including sound (Paulus et al., 2012).

These findings represent a valuable source of inspiration for roboticists, whose aim is to develop autonomous robots capable of living in and interacting with the human society. Developmental robotics addresses this challenge by implementing methods and algorithms for motor and cognitive development in artificial systems inspired by infant development (Cangelosi and Schlesinger, 2015). In developmental robotics, state of the art machine learning techniques are applied to computational models, creating artificial systems that can adapt to new situations and learn in an open-ended fashion. The emergence of the self represents a key step in cognitive development. Therefore, there is a growing interest in the developmental robotics community on implementing processes capable of enabling the experience of the self—with phenomena such as sense of body ownership and agency—in artificial agents.

On the other side, robots can represent valuable tools to investigate phenomena of subjective experience typical of humans. In fact, robots are equipped with sensors and actuators that can be inspected and controlled during their operations. What the robot sees and perceives, and its internal states can be logged and further analyzed which is obviously not possible in humans. If robots were capable of detecting and recognizing their own body and movements, their interaction with the environment and with people would be much more efficient and natural. However, the questions about which computational processes are needed to implement a primitive sense of body ownership and agency in robots, and of how the ontogenetic process of the individual shapes the development of the self, are still open.

This manuscript follows-up a previous review paper (Schillaci et al., 2016), in which we investigated studies on mechanisms for the development of motor and cognitive skills in robots. In this review paper, we argue that the same mechanisms also build the foundation for the development of an artificial self. In fact, in infants, the self seems to emerge along the motor and cognitive development of the individual (Lagercrantz and Changeux, 2009). Implementing similar processes in artificial systems may provide insights also in the possibility to develop an artificial self. In this work, we address the role of developmental processes in the emergence of an artificial self, and we suggest the concept of self-manifolds in artificial systems and the use of metrics for establishing the boundaries of an artificial self.

The review paper is structured as follows. First, in section 2, we revisit the concepts addressed in our previous review (Schillaci et al., 2016) and frame them within the context of the development of an artificial self. In particular, we present advances in the study of behavioral and computational components that allow autonomous motor and cognitive development in artificial systems. We discuss how these components can build the foundation for an artificial self. In order to do so, we ask whether and how the minimal self is affected during the ontogenetic process of the individual, and how open-ended learning and social interaction can shape the development of an artificial self, and then review robotic studies addressing this question. In section 3, we review studies on metrics and boundaries of the human self, and propose their use also for artificial systems. Finally, in section 4, we provide our conclusions and open challenges in the quest for the development of an artificial self.

### 2. BEHAVIORAL AND COMPUTATIONAL COMPONENTS

In the robotics literature, the study on the artificial minimal self is young and fragmented. Unfortunately, a study presenting a comprehensive overview on the robotic investigations on this topic is missing. Nonetheless, many articles can be found providing interesting insights on aspects and prerequisites that can be related to the development of an artificial self. Two recent papers highlight both aspects of the human minimal self and an artificial minimal self. Georgie et al. (2019) look at developmental indices and behavioral measures of the minimal self, and Lanillos et al. (2019) look into computational models of neurological disorders related to the minimal self. In particular, they look into the balance between sensed and predicted sensory effects in ASD and schizophrenia.

In a previous review paper (Schillaci et al., 2016), we investigated studies on mechanisms for the development of motor and cognitive skills in robots. In particular, we identified three main behavioral and computational components that can enable autonomous acquisition of motor skills and the implementation of basic cognitive capabilities: (1) exploration behaviors; (2) internal body representations; (3) sensorimotor simulations. In this review, we extend the review provided in Schillaci et al. (2016) by creating links to the topic of the development of an artificial self, beside introducing more recent robotic studies on related topics. We particularly focus on those ones that propose strategies to scale up with motor and cognitive development. We extend exploration behaviors with artificial curiosity and sensorimotor simulations with predictive processes in order to strengthen the aspects of the development of a minimal self. All three components are processes or cognitive skills that run in parallel and independently from each other and can be seen as building blocks of the minimal self as discussed later.

### 2.1. Self-Exploration Behaviors and Artificial Curiosity

Human fetuses seem to already have some limited control on their body, as they react to touch, sound, smell, and pain, and even show facial expressions responding to external stimuli (Lowery et al., 2007). Some researchers (Lagercrantz and Changeux, 2009), though, believe that these reactions may have subcortical non-conscious origin and that, only shortly after birth, newborns show signs of basic self-awareness. In fact, developmental studies provide evidence about infant behaviors displaying some level of self-awareness in their first weeks of life (Rochat, 2011). Nonetheless, whether—and to what extend—selfawareness is present at birth, developmental researchers believe that it would unfold during early stages of development [see Rochat (2003) for empirical evidence and proposals]. However, why and how self-awareness exactly would emerge during infancy are still open questions and in particular there are no thorough theories or computational models explaining their function. Hart and Scassellati (2011) argue that self-identification algorithms are the first step toward a more comprehensive model of the robotic self.

There is a general consensus on recognizing the important role in the development of self-awareness to the perceptual experiences that toddlers undergo when exploring and playing with their surroundings. The self would emerge through the active interaction with one's physical and social environment (Verschoor and Hommel, 2017). Indeed, exploration behaviors are recognized as the means for motor and cognitive development in infants, as well as in robots [see Schillaci et al. (2016) for a review]. Several studies investigate the cognitive mechanisms and drives behind exploration and play in infancy. In infants, curiosity—which is usually inferred through their use of prolonged visual attention to stimuli (Benson and Haith, 2010. p. 157–167; Grgic et al., 2016 ˇ ) is thought to drive the emergence of ordered developmental trajectories, including in domains such as vocal development, imitation and tool use discovery (Acevedo-Valle et al., 2018; Oudeyer, 2018). This is contrary to earlier belief that infants learn by random actions, but rather that their actions are goal-directed from the very start (Von Hofsten, 2004).

Infants' curiosity, play and exploration—and the likely goaldirected nature of their actions—have attracted the interest of developmental roboticists. In fact, studies on artificial curiosity have demonstrated how mechanisms for goal-directed exploration can be used to efficiently learn robot dynamics, even if the artificial system is characterized by complex highdimensional embodiments. Artificial curiosity goes beyond novelty detection that would drive the agent to novel, but not necessarily predictable regions of its sensorimotor space. In contrast, artificial curiosity drives the agent toward regions where the learning progress can be maximized (Oudeyer et al., 2007). The main difference to typical machine learning scenarios is that the agent creates its own training samples for a desired learning trajectory.

The first studies on artificial curiosity and exploration in robots were limited, in a way. Although promising and demonstrating that curiosity-driven and exploration behaviors can efficiently solve inverse and forward kinematics problems, they mostly focused on relatively simple tasks, such as reaching actions for robot manipulators. Prolonged and incremental learning, until recently, was not a main priority in these studies. Indeed, it is still a great challenge in the whole robotics community. Seemingly, assuming that, in infants, selfawareness is a result of complex and prolonged interactions and experiences, the study on the development of an artificial self has to address, as well, how self-awareness would unfold along incremental learning in robots.

Recently, interesting studies have been published on topics close to this line of thoughts. For instance, studies in the literature on goal-directed exploration in artificial systems proposed ways to scale up learning to multiple task spaces (Forestier and Oudeyer, 2016; Forestier et al., 2017) or to domains where exploration of a task space requires action planning in multiple steps (Loviken and Hemion, 2017; Loviken et al., 2018). **Figure 1** shows the results of a curiosity-based learning method for humanoid robots, where the sensory space was partitioned into a disjoint set of finite elements. In this space, every element was seen as an independent goal-babbling problem and a planning module could be added by observing transitions between the different elements (Loviken and Hemion, 2017; Loviken et al., 2018).

Acevedo-Valle et al. (2018) studied intrinsic motivation systems in the context of early vocal development which further develop through social reinforcement. An artificial agent was endowed with a proprioceptive mechanism, which was used to prevent the execution of unreachable motor configurations or invalid (painful) configurations. Moreover, the authors introduced an expert instructor which produced correct utterances whenever the exploring autonomous learner was emitting similar (although still not correct) sounds. This resulted in a social reinforcement, which provided

clues to the learner of interesting sensorimotor regions to explore.

Interesting advances have been made also in the context of goal generation. For instance, Mannella et al. (2018) show how an artificial system can autonomously generate goals to be used in an intrinsic motivation system to explore and to gather knowledge about its own body. In Schillaci et al. (2020), the authors present an architecture for curiosity-driven goal-directed exploration behaviors on a camera-equipped robot arm. A combination of deep neural networks for offline unsupervised learning of low-dimensional features from images, and of online learning of shallow neural networks was used. The artificial curiosity system assigned interest values to a set of pre-defined goals, and drove the exploration toward those that were expected to maximize the learning progress. Moreover, the authors proposed the integration of an episodic memory system to face catastrophic forgetting issues, typically experienced when performing online updates of artificial neural networks. The results showed that adopting an episodic memory system not only prevented the computational models from quickly forgetting knowledge that have been previously acquired, but also provided new avenues for modulating the balance between plasticity and stability of the models.

In humans, the self develops along the ontogenetic process of the individual. This is closely related to mechanisms of open-ended learning and social interaction, but also on the establishment and refinement of plastic body representations. The next section will provide an overview of recent studies on body representations in artificial systems.

### 2.2. Body Representations

Many researchers have suggested theories in trying to explain the experience of body ownership and agency, and self-awareness in general. Sense of agency and sense of body ownership seem to be strongly linked, but many empirical studies still investigate them separately from each other. The appearance of the first signs of self-awareness in newborns seems to be dependent to the establishment of thalamocortical connections (Lagercrantz and Changeux, 2009). In general, the sense of body ownership seems to be strongly intertwined with an internal representation of the body maintained by our brain. Here we adopt the conceptual clarification by Gallagher (1986) between body image and body schema, where body image is a conscious representation or image of the body, whereas body schema is a non-conscious representation of sensorimotor skills. While we interact with the environment, we generate a rich set of multi-modal sensory and motor experience (Schillaci et al., 2016). This information has been proposed to be integrated in a sort of a body schema into our brain, which would keep an up-to-date representation of the positions and configurations of the different body parts in space (Maravita et al., 2003; Hoffmann et al., 2010). Moreover, the body schema very likely undergoes a continuous process of adaptation, as humans and animals follow an ontogenetic process where corporal dimensions and morphology change over time. The way in which we represent and feel our body seems to strongly rely on these representations, which would integrate inputs from different sensory modalities (Azañón et al., 2016). Scientists carried out experiments to explore how the brain combines information from the flow of sensory input data to create a feeling of body ownership, such as the famous experiment of the rubber hand illusion, where the participant is confused by the sight of a fake hand and synchronized sensory stimulation (Botvinick and Cohen, 1998).

Some researchers in cognitive development link the construction of the self to the experience encoded in a sort of autobiographical memory (Nelson, 2003). Pointeau and Dominey (2017) review a range of robotic experiments that address different aspects of the self and relate them to the definition of the self as given by Neisser (1995). Ulric Neisser proposed five types of self-knowledge that correspond to five distinct components of the self: ecological, interpersonal, conceptual, temporally extended, and private. The ecological self, that is "the individual situated in and acting upon the immediate physical environment" (Neisser, 1995), is perhaps the level which is most interesting here, and it is rather easy, given the current robot technologies, to design robotic experiments addressing it. Ecological proprioception is integrated with different modalities of sensory information concerning one's own body as interacting within the environment (Gallagher, 2007). The tactile modality has received particular interest from researchers on subjective experiences, and on their impairments in patients with brain disorders. Van Stralen et al. (2011), for instance, studied how self-touch influences the structural representation of one's own body and found that self-touch may be modulating impairments in body ownership.

Developmental roboticists have also focused their attention onto the role of the tactile modality in the formation and maintenance of body representations. For instance, Zenha et al. (2018) studied how a body schema can be adapted incrementally in a humanoid robot based on touch events. Hoffmann (2017) studied the role of self-touch experiences in the formation of a self. Self-touch would provide redundant information that would facilitate the formation of a body representation. Timing and synchrony has been identified also as an important feature in support to the integration of information from multiple modalities within a body representations. Nabeshima et al. (2005) present a robotic study in support of that.

Hoffmann et al. (2018) studied a self-organizing model for body representation on an iCub humanoid robot with an artificial pressure-sensitive skin. In particular, the proposed framework was used to learn a topographic representation of the robot's body surface from experience, that is by receiving tactile stimulations all over its artificial skin, including multi-touch stimulations.

### 2.3. Sensorimotor Simulations and Predictive Processes

A growing number of scientists now consider the brain as an active organ of inference (De Ridder et al., 2013; Picard and Friston, 2014; Kirchhoff, 2018). Self-awareness and self recognition are thought to be dependent also on predictive processes - or sensorimotor simulations—implemented by the brain (Hohwy, 2013; Apps and Tsakiris, 2014; Friston, 2018). Predictive processes may have several functions, but one important is that of sensory attenuation. Pyasik et al. (2019) showed that felt ownership of a fake hand in the rubber hand illusion experiment caused attenuation of somatosensory stimuli generated by its movements comparable to the attenuation of self-generated stimuli. Burin et al. (2018) also investigated the influence of timing on the effect of agency.

Similar computational models can be implemented into robots to provide them with predictive capabilities. Sensorimotor predictions and prediction errors can be recorded and analyzed, as well. In humans – in contrast – such properties cannot directly be observed and controlled. Bechtle et al. (2016) and Lang et al. (2018) implemented internal models into a humanoid robot to study how body representations can emerge from sensorimotor experience, and how predictive processes can be run through these computational tools. They found that prediction errors can serve as a cue to distinguish between self-generated perceptual events and those generated by other subjects. Moreover, they showed how predictive processes can be used to attenuate self-body perception (see **Figure 2**). Lang et al. (2018) adopted a convolutional neural network for implementing a forward model, which generates image predictions from low-dimensional proprioceptive and motor states (see **Figure 3**).

Pico et al. (2016) demonstrated that a two-wheeled mobile robot was capable of detecting unexpected changes in the environment and able to classify motor behaviors by comparing the ego-noise generated by its motors with the ego-noise prediction of its internal model. In a first experiment, several egonoise prediction models have been trained, each of them with a different motor command pattern. All models were then fed with a particular motor sequence, obtaining a series of ego-noise predictions. The robot was able to determine the correct motor command pattern by selecting the model with the lowest egonoise prediction error. In a second experiment, one ego-noise forward model has been trained by implementing random motor babbling on the robot in a flat arena. The model was tested by making the robot do a series of runs from side to side of the arena

(not shown) (Lang et al., 2018).

while calculating ego-noise predictions. A ramp was then added in the middle and the runs were repeated. A comparison between the ego-noise prediction errors generated in the flat arena and those of the arena with the ramp on the middle, showed that the ego-noise prediction error increased when the robot was over the ramp. This demonstrated that the robot was able to detect changes in the inclination of the surface it moves only by making ego-noise predictions.

Predictive models can also be used for robot imitation. Pico et al. (2017) utilized robot ego-noise as a mean for communicating intended actions among robots. In an experiment, a robot generated a series of ego-noise audio (emulated by a loudspeaker) representing an intended motor command sequence and conveyed it to another robot. The receiver robot obtained auditory features from the ego-noise through a convolutional autoencoder. These audio features were then fed into an inverse model in order to obtain motor command predictions, which were similar to the motor commands that generated the audio produced by the sender robot.

Winfield (2018) describes a range of different experiments with artificial agents running internal simulations of themselves, others, and the environment, and compares these skills to an artificial Theory of Mind. "Theory of mind is the term given by philosophers and psychologists for the ability to form a predictive model of self and others" Winfield (2018). These internal simulations show how to increase robot safety (Blum et al., 2018) by anticipating self and other behavior (Winfield and Hafner, 2018).

Predictive processes have also been studied by Hinz et al. (2018) in the context of the rubber hand illusion. The authors analyzed the drift in the perception of the real limb toward the fake limb, which would suggest an update of body estimation resulting from stimulation. In particular, they compared body limb drifting patterns of human participants with the endeffector estimation displacement of a multisensory robotic arm enabled with predictive processing perception. They observed similar drifting patterns in both human and robot experiments, suggesting that the perceptual drift is due to prediction error fusion, rather than hypothesis selection.

Touch seems to be a more direct sense, which could be trusted more for prediction than distant senses such as vision. It also equally concerns sense of agency and sense of body ownership. Ciaunica (2019) emphasizes the developmental aspects of touch, self-touch and intersubjective touch. An interesting aspect of predicting the sensory consequences of touch is the feeling of ticklishness, that has been addressed by Sarah Blakemore in a paper with the title "Why can't you tickle yourself " (Blakemore et al., 2000). This phenomenon of ticklishness has also been shown in mice recently (Ishiyama and Brecht, 2016). In a preliminary study on touch prediction in artificial systems, Stiehler and Hafner (2017) could show how a predictive model learns to predict the sensory consequences of touch. The sensory consequences of self-touch are usually more predictable than those of being touched by someone else. The sensation of ticklishness might be triggered by specific changes in prediction error over time, but there is little work so far on this topic. Quantitative studies showed that self-generated forces are perceived in the tactile modality as weaker than externally generated forces of the same magnitude, suggesting again that sensory consequences of a movement are anticipated and attenuated (Shergill et al., 2003).

Vicente et al. (2016) showed how predictive process can also support adaptation of body schemas. The authors combined predictions made by a learned internal model with the actual visual feedback to improve the perceptual skill of a humanoid robot.

The aforementioned studies suggest that predictive processes—as simulations of sensorimotor activities—are important tools for implementing basic cognitive capabilities in artificial systems, and may represent necessary building blocks for providing robots with subjective experiences, such as those typical of the minimal self.

### 3. METRICS FOR AN ARTIFICIAL SELF

As mentioned before, the minimal self is often described by two major building blocks: a sense of body ownership and a sense of agency. Both are subjective measures (articulated by the word "sense"), and can vary between individuals, over time, and depending on the situation. As has been shown in various experiments, for example in the rubber hand illusion (Botvinick and Cohen, 1998), and in virtual reality studies (Blanke and Metzinger, 2009; Banakou et al., 2018), both the sense of body ownership and the sense of agency can be altered in humans. This points toward a certain plasticity of the brain's body representation. Predictive capabilities play a major role in maintaining a consistent minimal self. Based on our self-models, we as humans anticipate the effects of our own actions and can thus monitor them. Longo et al. (2008) for example take a psychometric approach to the question of embodiment and sense of agency based on introspective reports of the rubber hand illusion.

In artificial agents, a similar measure for a sense of body ownership and a sense of agency might be identified. As discussed in the previous sections, most models related to agency and ownership rely on forward models and internal simulations, and have permanent access to a prediction error. When such a model is embodied in an artificial agent, the agent has also direct access to this measure. Michel et al. (2004), for instance, showed in a robotics study that extensions of the self in the visual field can be identified by learning the time delay between actions and their effects.

What could be the necessary requirements of measuring selfness in artificial agents? In analogy to prediction and anticipation in the human minimal self, a sense of agency and a sense of body ownership should be linked to changes in the prediction error in artificial agents over time as well. Preliminarily ignoring the complex dynamics of the prediction error, we could say that the lower the error in the prediction of the consequences of self-generated actions, the stronger a sense of agency and body ownership.

Given the considerations taken above, we can characterize a self-manifold in sensorimotor space with the following properties: It is dynamic, as it can change with body growth and the acquisition of new skills; it is adaptive, where the error tolerance can vary according to the specific context and the states of the system and of the surrounding environment.

The self-manifold outlines the boundaries of the self, both related to body ownership and agency, which cannot be clearly separated. A concrete example of learning manifolds in sensorimotor space, however not related to the concept of self, can be found in Laflaquière et al. (2015). The boundaries of the self related to body ownership are closely related to notions of peripersonal space (PPS) (Clery and Hamed, 2018). The same can hold for agency if we consider multisensory channels including tactile information and assume temporal and crossmodal predictions (Clery and Hamed, 2018).

Prediction errors—such as those produced by forward models—may be used for determining the boundaries of the selfmanifold in the sensorimotor space of artificial agents. Hereby, we encourage further robotics investigation within this research line, as it may provide insights in the understanding of the human self and in the implementation of the artificial self.

This idea follows the argument of Gallese and Sinigaglia (2010) who envision the bodily self as a manifold of action possibilities that cannot be reduced to any form of proprioceptive awareness. Action possibilities necessarily require a system that is able to make predictions about the consequences of own actions. Actions not only include physical body movements and change of postures, but also interaction with the external world, including interaction with objects but also other agents (see Neisser, 1995's notion of interpersonal self).

For simplification, we only consider prediction errors caused by actions affecting the peripersonal space of the agent. A selfmetric for an artificial agent is a systematic way to assign a value to each suitable instance of an agent self. It should allow us to compare the self-ness of one agent at a certain instant in time to the self-ness of another agent or the same agent at another instant in time.

Nonetheless, there are still open issues that need to be solved for deciding on such a metric: what timing issues arise; what are the modalities to include or exclude; and which are suitable computational models for multimodal integration. Such a metric will also allow to decide the balance of predicted information vs. perceived information and might ultimately shed light on mechanisms of disturbances of the self in humans.

Similarities to the concept of the self-manifold can be found with that of the markov blanket (Kirchhoff et al., 2018). Organisms tend to self-organize within a coherent whole, maintaining a boundary that separates their internal states from the external world. A markov blanket has been theoretised as defining the boundaries of such systems in a statistical sense. If taking the theoretical standpoint of the Free Energy Principle, as proposed by Friston (2013) , this would mean that organisms maintain their integrity by minimizing variational free energy (surprise) over their internal states. That is, they maximize evidence for their own models, i.e., their own existence (Kirchhoff et al., 2018). In predictive coding, free energy is associated with prediction errors. The freeenergy bound, or markov blanket, can be associated with a prediction error boundary. A self-manifold may thus be formalized as a markov blanket around the sensorimotor states of an agent.

### 4. CONCLUSIONS

In this manuscript, we studied the literature on developmental processes for an artificial self. We reviewed a number of works addressing the self in artificial systems and suggesting basic behavioral and computational components that may serve for the implementation of subjective experiences in robots. However, many questions and challenges in the development of an artificial self still remain open.

In section 2, we reviewed the behavioral and computational components necessary to develop an artificial self - inspired by models of the human self - in the three areas "Self-exploration behaviors and artificial curiosity," "Body representations," and "Sensorimotor simulations and predictive processes." These ingredients of an artificial self have been studied extensively in robotics and computational modeling, and will need to be integrated for a full understanding of the self using computational methods.

A common trend in both analytic sciences such as psychology and neuroscience and synthetic sciences such as robotics is to look more into the developmental processes that shape the self. This allows us to identify prerequisites and test existing theories of the self.

In section 3, we pointed out that beside the challenging task of implementing such mechanisms in artificial systems, there is a need for defining and designing metrics for an artificial self. We suggested requirements for such a self-metric and identified properties of a self-manifold as being adaptive and dynamic. Although we are far from establishing whether artificial agents can ever undergo subjective experiences, these metrics may provide support and insights in the investigation of the self, in both robots and humans.

To conclude this review, we suggest a number of open challenges of the artificial self. In particular, there is a need of integrating the three main behavioral and computational components mentioned above: Self-exploration behaviors and artificial curiosity, body representations, and sensorimotor simulations and predictive processes.

Moreover, further investigation is required in addressing the following overall challenges: designing models for multimodal integration in lifelong learning robotics setups; working on a refinement of self-metrics; identifying difference and complementarity between agency and body ownership; realizing the integration of temporal and intentional binding effects within predictive computational models; and resolving synchronization as well as conceptual issues.

In robotics, we can access internal states and inspect sensorimotor and prediction information. However, to what extent can this privileged point of view allow us to state—if ever possible—that a robot is undergoing subjective experience? Indeed, there is a need for further debating the possibility of phenomenological experience in artificial systems.

### AUTHOR CONTRIBUTIONS

VH and GS produced most of the text within this manuscript. PL and AP contributed to section 2, in particular discussing studies on goal-directed exploration (PL) and ego-noise representation and imitation (AP).

### FUNDING

This work of GS, VH, and AP was funded by the European Union's Horizon 2020 research and innovation programme under grant agreement No. 773875 (EU-H2020 ROMI, Robotics

### REFERENCES


for Microfarms) and by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) – 402790442 (Prerequisites for the Development of an Artificial Self). PL has received funding from the European Union's Horizon 2020 research and innovation programme under the Marie Sklodowska-Curie grant agreement No. 674868 (APRIL), where VH is also an associate partner. The work of GS has received funding from the European Union's Horizon 2020 research and innovation programme under the Marie Sklodowska-Curie grant agreement No. 838861 (Predictive Robots).

### ACKNOWLEDGMENTS

We acknowledge support by the German Research Foundation (DFG) and the Open Access Publication Fund of Humboldt-Universität zu Berlin.


Hoffmann, M., Straka, Z., Farkaš, I., Vavrecka, M., ˇ and Metta, G. (2018). Robotic homunculus: learning of artificial skin representation in a humanoid robot motivated by primary somatosensory cortex. IEEE Trans. Cognit. Dev. Syst. 10, 163–176. doi: 10.1109/TCDS.2017.2649225

Hohwy, J. (2013). The Predictive Mind. Oxford: Oxford University Press.


International Conference on Development and Learning and Epigenetic Robotics (Tokyo), 119–124.

**Conflict of Interest:** PL was employed by SoftBank Robotics.

The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2020 Hafner, Loviken, Pico Villalpando and Schillaci. This is an openaccess article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.