# BEYOND EMBODIED COGNITION: INTENTIONALITY, AFFORDANCE, AND ENVIRONMENTAL ADAPTATION

EDITED BY : Zheng Jin, Maurizio Tirassa and Anna M. Borghi PUBLISHED IN : Frontiers in Psychology

#### Frontiers Copyright Statement

© Copyright 2007-2019 Frontiers Media SA. All rights reserved. All content included on this site, such as text, graphics, logos, button icons, images, video/audio clips, downloads, data compilations and software, is the property of or is licensed to Frontiers Media SA ("Frontiers") or its licensees and/or subcontractors. The copyright in the text of individual articles is the property of their respective authors, subject to a license granted to Frontiers.

The compilation of articles constituting this e-book, wherever published, as well as the compilation of all other content on this site, is the exclusive property of Frontiers. For the conditions for downloading and copying of e-books from Frontiers' website, please see the Terms for Website Use. If purchasing Frontiers e-books from other websites or sources, the conditions of the website concerned apply.

Images and graphics not forming part of user-contributed materials may not be downloaded or copied without permission.

Individual articles may be downloaded and reproduced in accordance with the principles of the CC-BY licence subject to any copyright or other notices. They may not be re-sold as an e-book.

As author or other contributor you grant a CC-BY licence to others to reproduce your articles, including any graphics and third-party materials supplied by you, in accordance with the Conditions for Website Use and subject to any copyright notices which you include in connection with your articles and materials.

All copyright, and all rights therein, are protected by national and international copyright laws.

The above represents a summary only. For the full conditions see the Conditions for Authors and the Conditions for Website Use. ISSN 1664-8714 ISBN 978-2-88945-815-8 DOI 10.3389/978-2-88945-815-8

#### About Frontiers

Frontiers is more than just an open-access publisher of scholarly articles: it is a pioneering approach to the world of academia, radically improving the way scholarly research is managed. The grand vision of Frontiers is a world where all people have an equal opportunity to seek, share and generate knowledge. Frontiers provides immediate and permanent online open access to all its publications, but this alone is not enough to realize our grand goals.

### Frontiers Journal Series

The Frontiers Journal Series is a multi-tier and interdisciplinary set of open-access, online journals, promising a paradigm shift from the current review, selection and dissemination processes in academic publishing. All Frontiers journals are driven by researchers for researchers; therefore, they constitute a service to the scholarly community. At the same time, the Frontiers Journal Series operates on a revolutionary invention, the tiered publishing system, initially addressing specific communities of scholars, and gradually climbing up to broader public understanding, thus serving the interests of the lay society, too.

## Dedication to Quality

Each Frontiers article is a landmark of the highest quality, thanks to genuinely collaborative interactions between authors and review editors, who include some of the world's best academicians. Research must be certified by peers before entering a stream of knowledge that may eventually reach the public - and shape society; therefore, Frontiers only applies the most rigorous and unbiased reviews.

Frontiers revolutionizes research publishing by freely delivering the most outstanding research, evaluated with no bias from both the academic and social point of view. By applying the most advanced information technologies, Frontiers is catapulting scholarly publishing into a new generation.

### What are Frontiers Research Topics?

Frontiers Research Topics are very popular trademarks of the Frontiers Journals Series: they are collections of at least ten articles, all centered on a particular subject. With their unique mix of varied contributions from Original Research to Review Articles, Frontiers Research Topics unify the most influential researchers, the latest key findings and historical advances in a hot research area! Find out more on how to host your own Frontiers Research Topic or contribute to one as an author by contacting the Frontiers Editorial Office: researchtopics@frontiersin.org

# BEYOND EMBODIED COGNITION: INTENTIONALITY, AFFORDANCE, AND ENVIRONMENTAL ADAPTATION

Topic Editors:

Zheng Jin, Zhengzhou Normal University, China Maurizio Tirassa, University of Turin, Italy Anna M. Borghi, Sapienza University of Rome, Italian National Research Council, Italy

Image: Bug\_Fish/Shutterstock.com

Citation: Jin, Z., Tirassa, M., Borghi, A. M., eds. (2019). Beyond Embodied Cognition: Intentionality, Affordance, and Environmental Adaptation. Lausanne: Frontiers Media. doi: 10.3389/978-2-88945-815-8

# Table of Contents

*05 Editorial: Beyond Embodied Cognition: Intentionality, Affordance, and Environmental Adaptation*

Zheng Jin, Maurizio Tirassa and Anna M. Borghi

## CHAPTER 1

## SENSORIMOTOR PROCESS AND AFFORDANCE

*08 The Sense of 1PP-Location Contributes to Shaping the Perceived Self-location Together With the Sense of Body-Location*

Hsu-Chia Huang, Yen-Tung Lee, Wen-Yeo Chen and Caleb Liang


Maxwell J. D. Ramstead, Samuel P. L. Veissière and Laurence J. Kirmayer

## CHAPTER 2

## EMBODIMENT, LANGUAGE PROCESSING AND ENVIRONMENTAL ADAPTATION


David Martínez-Pernía, David Huepe, Daniela Huepe-Artigas, Rut Correia, Sergio García and María Beitia

## CHAPTER 3

## METHODOLOGICAL ISSUES ON EMBODIED COGNITION RESEARCH

*119 White Lies in Hand: Are Other-Oriented Lies Modified by Hand Gestures? Possibly Not*

Katarzyna Cantarero, Michal Parzuchowski and Karolina Dukala

*129 Commentary: Is There any Influence of Variations in Context on Object-Affordance Effects in Schizophrenia? Perception of Property and Goals of Action*

Thomas J. Faulkenberry and Luca Tummolini

## CHAPTER 4

## NOT ONLY THE EMBODIED BUT ALSO THE ENMINDED APPLIED TO MODERN TECHNOLOGY


# Editorial: Beyond Embodied Cognition: Intentionality, Affordance, and Environmental Adaptation

Zheng Jin<sup>1</sup> \*, Maurizio Tirassa<sup>2</sup> and Anna M. Borghi 3,4

*1 International Joint Laboratory of Behavior and Cognitive Science, Zhengzhou Normal University, Zhengzhou, China, <sup>2</sup> Department of Psychology, University of Turin, Turin, Italy, <sup>3</sup> Department of Dynamic and Clinical Psychology, Sapienza University of Rome, Rome, Italy, <sup>4</sup> Institute of Cognitive Sciences and Technologies, Italian National Research Council, Rome, Italy*

Keywords: embodied cognition, intentionality, affordance, environmental adaptation, ecology

### **Editorial on the Research Topic**

### **Beyond Embodied Cognition: Intentionality, Affordance, and Environmental Adaptation**

Considering that humans must use external tools to solve problems, any account of human cognition should incorporate such intentional tool-using processes into its models of environmental adaptation. In the traditional ecological paradigm and embodied cognitive science, affordances (i.e., possibilities for action which are available for an agent to perceive directly and act upon) are nested in environment (Gibson, 1979). Intentionality is defined as a power of minds that simultaneously coordinates with multiple affordances (e.g., Kiverstein and Rietveld, 2015). Exploring potential mechanisms that are responsible for intentionality would open up new avenues for developing alternative paradigms of psychology on differing assumptions regarding the relationship among mind, body, and environment. This Research Topic is devoted to the particular question; how embodied cognitive processes contribute to the adaptation to a given environment with intentionality.

One of the interests of this Research Topic was to provide an explanation of the relationship between the sensorimotor process and one's interaction with the environment. Huang et al. aimed at the exploring the potential dissociation between the sense of 1PP-location (i.e., first-person perspective) and body-location. In doing so they approached a topic that is of great interest in the field of self-consciousness and self-perception. Since the sense of self-location is crucial for one's interaction with the environment, recognizing the distinctive roles of 1PP-location and bodylocation would contribute to a better picture of environmental adaptation. Their data showed that under different manipulations of movement, the spatial unity between 1PP-location and bodylocation could be temporarily interrupted. Interestingly, they also observed a "double-body effect" and further suggested that it is better to consider body-location and 1PP-location as interrelated but distinct factors that jointly support the sense of self-location. Their conclusion may help to explain the tremendous flexibility of our bodily experiences in coping with novel environmental challenges. By recruiting patients with schizophrenia, Sevos et al. examined whether the addition of a more salient action context can promote the emergence of affordance effect during the perception of everyday objects. Participants performed two Stimulus–Response-Compatibility tasks in which they were presented with semantic primes related to sense of property or goal of action prior to viewing each graspable object. Controls responded faster when their response hand and the graspable part of the object were compatibly oriented, but only when the context was congruent with the individual's needs and goals. When the context operated as a constraint, the affordanceeffect was disrupted. These results support the understanding that object-affordance is flexible and not just intrinsic to an object. The authors also noted that the lack of sensorimotor facilitation in patients with schizophrenia would require extensive use of higher cognitive processes even for the

Edited and reviewed by: *Bernhard Hommel, Leiden University, Netherlands*

> \*Correspondence: *Zheng Jin jinzheng@zznu.edu.cn; zhjin@ucdavis.edu*

#### Specialty section:

*This article was submitted to Cognition, a section of the journal Frontiers in Psychology*

Received: *05 December 2018* Accepted: *11 December 2018* Published: *21 December 2018*

#### Citation:

*Jin Z, Tirassa M and Borghi AM (2018) Editorial: Beyond Embodied Cognition: Intentionality, Affordance, and Environmental Adaptation. Front. Psychol. 9:2659. doi: 10.3389/fpsyg.2018.02659* simplest routine activities in their daily life. Their conclusion was informative to understand the specific mechanisms behind schizophrenia.

Another appealing questions in this topic is how affordances are perceived. Human infants are not born understanding how to perceive affordance. It takes two corequisite sets of control processes to explain the manner of the affordances learning (Iran-Nejad and Bordbar). For conceptual understanding (CU), knowers have deliberate attention-allocation control over their first-person "knowthat" and "knowhow" content combined as mutually coherent corequisites. For biofunctional understanding (BU), knowers have attention-allocation control only over their knowthat content but knowhow control content is ordinarily conspicuously absent. With a thematic focus on embodiment science and an eye toward systematic consensus in systemic cohesion, Iran-Nejad and Bordbar's study explored the roles of biofunctional and conceptual control processes in the wholetheme spiral of biofunctional understanding. They tested a hypothesis of the difference between CU and BU. Their findings supported the notion that individuals are capable of engaging in mind-body cohesion-sensing and consensus-seeking practices. These findings are also discussed in terms of the predicted differences between BU and CU control processes, their roles in regulating the physically unobservable flow of systemic cohesion in the wholetheme spiral, and a proposal for systematic consensus in systemic cohesion to serve as the second guiding principle in biofunctional embodiment science next to physical science's first guiding principle of systematic observation. Ramstead et al. extended the notion of affordance to encompass the sociocultural level and the scaffolding it provides to cognition. They investigated the ways in which people perceive and engage with cultural affordances. Aiming to account for the relationship between cultural content and normative practices on the one hand and immersive participation on the other hand, they focused on the social practices that regulate joint attention and shared intentionality.

We received a study from Schilhab noticing that the multifunctional nature of smart technology leading to noticeable changes in affordances and embodiment. She addressed the question to how social interaction (e.g., deep conversations) facilitates the development of offline-cognition (e.g., mental imagery, stream of consciousness, etc.) that enables the (self-) regulation of online cognition and interactions with technology. This opinion article took us back to 2002, when one of Wilson's much-cited six claims were published (Wilson, 2002). To some extent, this article, together with Lee et al. (2012)'s results, again invites us to take more seriously the philosophical issue of the "natural kind" (e.g., Millikan, 1999; Ellis, 2001) for which off-line aspects of embodied cognition is a proxy in on-line interactions with the environment. In traditionally view, Behavior is thought to be a means to control the environment, which ignores the fact that the target object can be perceived through activity. For example, some people take their seat after confirming the space between the table and the chair; others first sit down and then reevaluate or adjust the distance. Therefore, even though the initial stimulus is always perception, the interaction continues back and forth between perception and action. Cognition serves to guide the behaviors that acquire perceptions needed for new behaviors. The two behaviors, before or after reappraisal, communicate information. Gibson (1979) conceptualized this information as survival-related symbols given by the environment to an organism. Shaw extended this definition with the concept of intention (e.g., Shaw, 2001). In brief, to survive, organisms coordinate with their environment, communicate information, and realize intentions. The conceptualized living (or survival) is similar to the Gih of oriental philosophy. In this Topic, Lee et al. attempts to build a meta-theory and to demonstrate empirical designsfor Gih, discussing the problems of the mind and body, or the subject and object, compared with the concept of "affordance" proposed by ecological approaches. They claimed that Gih should not remain in the domain of mysticism; the concept may be addressed by psychological science to make use of valuable insights from Eastern philosophy through empirical research.

Three studies investigated whether and how one important characteristic of the sensory, motor, and emotional system is reflected in language processing. Marino et al. report two experiments on the relationship between language and affordances. Participants were presented with short sentences composed by verbs referring to motor chains and nouns of tools, and were required to decide whether the image following the sentence was mentioned in it or not. The results showed that the grasp verb motor chain activated volumetric information, while the functional motor chain activated information related to tool use. Overall the studies demonstrate the influence of the motor system and of its chained organization on language processing. Buccino et al. investigated the embodiment of second language and evidenced that embodied cognitive processes appear to be substantially the same in L2 as it is in L1. Starting from the available evidence to the effect that language processing relies on the same sensory, motor, and emotional structures that are involved when individuals experience the contents of language material, they found that the processing of English nouns by native speakers of Italian who also speak English recruits the same neural substrates as the Italian equivalents. Baumeister et al. investigates whether the link between language and emotion is reduced in L2. Late Spanish-English bilinguals were required to categorize a set of English and Spanish words into "associated to emotion" or "not associated to emotion," then they were submitted to a surprise recognition task (old/new word). Electromyography (EMG) and skin conductance (SC) were recorded; in particular, muscle activity for corrugator and zygomaticus muscles in response to happy and angry emotional words for both L1 and L2 was detected. Results indicate stronger enhancement of memory for emotional over neutral stimuli in L1 than in L2; furthermore, results of the EMG and SC recordings indicate a slightly reduction of facial motor resonance and SC responses to emotional stimuli in L2. In line with embodied cognition views, they suggest that the processing of emotional L2 words is less grounded in the motor, sensory, and autonomic nervous systems than the processing of L1 words.

In addition, one general original research and one commentary article were concerned with the methodological issues on embodied cognition research. Cantarero et al. focus on the relationship between gestures and moral behavior and investigates whether body gestures commonly associated with (dis)honesty influence white lies. Participants were asked to give feedback about the work of an artist they did not like in his face, facing the dilemma between telling him the truth or lying to him, thus preserving him from feeling bad (other oriented lie). During the conversation they had to hold the hand-over-heart gesture, typically related to honesty, or the fingers crossed, and hand over elbow gestures. In the first experiment they find that the handover-heart gestures is less associated to other oriented lies. In the pre-registered experiment 2 they did not replicate the previous result: the hand-over-heart gesture did not impede participants to use other-oriented white lies. The authors discuss their results in the framework of research on embodied cognition, arguing that high methodological standards are necessary, in particular when effect sizes are small. Based on Sevos et al. data, Faulkenberry and Tummolini's commentary pointed out the issues that are present when trying to interpret non-significant results in the traditional null hypothesis statistical testing framework, and offered a quick example of how to use a Bayesian approach to quantify evidence for object-affordance effects and other action-specific influences on perception in the study of embodied cognition.

This topic also comprises articles from other distinctive perspectives which speak to the multifaceted research in this field. Einarsson and Ziemke's contribute to this research topic, is an illustration—using the case of interactive music—of how seemingly highly abstract, disembodied and unsituated activities, such as the composition of musical works, can in fact be strongly grounded in concrete embodied and situated activity. Their theoretical perspectives and concrete examples may help to elucidate how situations—and with them affordances—are dynamically constructed through the interactions of biological, contextual, social, and cultural mechanisms as embodied and situated activity unfolds. Martínez-Pernía et al. introduced a level of treatment that precedes behavior and cognition in a case study. This theoretical consideration allowed the discovery of a better

## REFERENCES


relation between affordance and the environmental adaptation for the improvement behavioral and cognitive performance in their case study.

The final collection of 13 articles provides an overview of current trends and opinions, as well as perspectives on theoretical and methodological questions. As pointed out by our CFP, psychology has continued to formulate and refine a variety of paradigms to provide solutions for the mind-body problem. Although a number of contemporary psychologists believe they have avoided dualism by noting the close relationship between certain brain activities and certain cognitive events, it appears likely that such a relationship will soon be discovered for all mental events. Replacing the term mind-body with the term mind-brain does little to solve the problem of how the brain can cause something mental. The traditional metaphysics founded in subject-object dichotomy is still at the basis of the majority of paradigms in psychology. We hope that the reader will find the collected articles both informative and thought-provoking, and that this Research Topic will stimulate the scientific debate contributing to overcome such a dichotomy.

## AUTHOR CONTRIBUTIONS

ZJ drafted the paper. All authors provided critical comments and additions, and approved it for publication.

## FUNDING

ZJ was sponsored by the Nature Science Foundation of China (grants No.: U1504336); Aid program for Science and Technology Innovative Research Team of Zhengzhou Normal University; Program for Science & Technology Innovation Talents in Universities of Henan Province (HASTIT) (grants No.: 2017-cx-023); Youth Backbone Teacher Training Project of Henan Province (grants No.: 2017GGJS180).

Wilson, M. (2002). Six views of embodied cognition. Psychon. Bull. Rev. 9, 625–636. doi: 10.3758/BF031 96322

**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2018 Jin, Tirassa and Borghi. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# The Sense of 1PP-Location Contributes to Shaping the Perceived Self-location Together with the Sense of Body-Location

#### Hsu-Chia Huang<sup>1</sup> , Yen-Tung Lee<sup>2</sup> , Wen-Yeo Chen<sup>1</sup> and Caleb Liang1, 2 \*

*<sup>1</sup> Graduate Institute of Brain and Mind Sciences, National Taiwan University, Taipei, Taiwan, <sup>2</sup> Department of Philosophy, National Taiwan University, Taipei, Taiwan*

*Self-location*—the sense of where I am in space—provides an experiential anchor for one's interaction with the environment. In the studies of full-body illusions, many researchers have defined self-location solely in terms of *body-location*—the subjective feeling of where my body is. Although this view is useful, there is an issue regarding whether it can fully accommodate the role of *1PP-location*—the sense of where my first-person perspective is located in space. In this study, we investigate self-location by comparing body-location and 1PP-location: using a head-mounted display (HMD) and a stereo camera, the subjects watched their own body standing in front of them and received tactile stimulations. We manipulated their senses of body-location and 1PP-location in three different conditions: the participants standing still (Basic condition), asking them to move forward (Walking condition), and swiftly moving the stereo camera away from their body (Visual condition). In the Walking condition, the participants watched their body moving away from their 1PP. In the Visual condition, the scene seen via the HMD was systematically receding. Our data show that, under different manipulations of movement, the spatial unity between 1PP-location and body-location can be temporarily interrupted. Interestingly, we also observed a "double-body effect." We further suggest that it is better to consider body-location and 1PP-location as interrelated but distinct factors that jointly support the sense of self-location.

#### Edited by:

*Zheng Jin, Zhengzhou Normal University, China*

#### Reviewed by:

*Asghar Iran-Nejad, University of Alabama, USA Antonella Maselli, Fondazione Santa Lucia (Istituti di Ricovero e Cura a Carattere Scientifico), Italy*

> \*Correspondence: *Caleb Liang yiliang@ntu.edu.tw*

#### Specialty section:

*This article was submitted to Cognition, a section of the journal Frontiers in Psychology*

Received: *13 October 2016* Accepted: *27 February 2017* Published: *14 March 2017*

#### Citation:

*Huang H-C, Lee Y-T, Chen W-Y and Liang C (2017) The Sense of 1PP-Location Contributes to Shaping the Perceived Self-location Together with the Sense of Body-Location. Front. Psychol. 8:370. doi: 10.3389/fpsyg.2017.00370* Keywords: self-location, body-location, first-person perspective, body ownership, double-body effect

## INTRODUCTION

The sense of self-location has been regarded as a key component of bodily self-consciousness, especially in the research of full-body illusions (Ehrsson, 2007; Lenggenhager et al., 2007; Blanke and Metzinger, 2009; Serino et al., 2013; Maselli, 2015). How is self-location defined in this research field? As a first approximation, the sense of self-location is the subjective feeling of where I am in space (Ionta et al., 2011, p. 363; Blanke, 2012, p. 556; Pfeiffer et al., 2014, p. 4021). This understanding is natural, but can only serve as a starting point for investigation. To step forward, many researchers specify the sense of self-location in terms of the sense of body-location—the sense of where my body is. In the study by Lenggenhager et al. (2007, p. 1096), participants watched their virtual body in the front while receiving tactile stimulations on the back. Many of them mislocalized themselves "toward the virtual body" during the synchronous condition (2007, p. 1096). Other studies confirmed the findings using different kind of measures, from "the mental ball dropping test" in Lenggenhager et al. (2009), the proprioceptive drift measurement by Aspell et al. (2009), to the measures of peripersonal space by Noel et al. (2015), etc. In the review by Serino et al. (2013), self-location was defined as "the experience of being a body with a given location within the environment" (2013, p. 1239). Applying virtual reality techniques to study bodily illusions, Maselli and Slater (2014) also depicted self-location as "the experience of the body occupying a given portion of space in the environment" (2014, p. 1). Finally, in the fMRI study by Guterstam et al. (2015), self-location was characterized as "the experience that the body is located somewhere in space" (2015, p. 1416). Overall, this definition identifies self-location with body-location, or at least regards the former as determined by the latter.

To be sure, it is very useful to specify self-location in terms of body-location because it not only prevents the Cartesian chasm between self and body, but also makes the notion of self-location experimentally operational. Still, there is a concern: does this way of understanding do justice to another key factor in the sense of self-location, i.e., first-person perspective (1PP)? In this study, we assume that the most relevant aspect of 1PP with regard to self-location is its location. It is via its location that 1PP makes contribution to the sense of self-location. So we will speak about the sense of 1PP-location—the sense of where my firstperson perspective is located in space. Also, in this study both body-location and 1PP-location refer to participants' subjective experiences rather than the physical locations of their real body or eyes. Thus, during an out-of-body illusion, a subject could feel his/her body-location to be in a place different from the location of his/her real body (Lenggenhager et al., 2009). Similarly, under experimental manipulations, one's sense of 1PP-location could be separated from where one's eyes are physically located in space.

Most studies of self-location, including those just mentioned above, recognize that 1PP plays an important role in the sense of self-location. In the study by Ehrsson (2007), the participants were stroked on the chest which was blocked from view, and saw the stroking applied to a position slightly below the camera. The participants felt as if they were sitting behind their physical body and were looking at it from the location of their "illusory body" (2007, p. 1048). Notice that, in this study the location of the illusory body was determined by the location of the manipulated 1PP (i.e., the location of the camera; cf. also the chest-stroking case in Lenggenhager et al., 2009). In the fMRI study by Ionta et al. (2011), the participants used a cursor to indicate the direction of their 1PP: they felt that they were either looking upward or looking downwards. In our terms, Ionta et al. defined the direction in terms of the location of 1PP ("From where do I perceive the world," cf. 2011, p. 363). The results showed that "temporo-parietal junction (TPJ) activity reflected experimental changes in self-location that also depended on the first-person perspective" (2011, p. 363, cf. also p. 370, 371). Thus, Serino et al. (2013) suggested that "perspective is not wholly distinct from self-location" (2013, p. 1240, authors' emphasis). Finally, in studying judgments about self-location, Starmans and Bloom (2012) found that "children and adults intuitively think of the self as occupying a physical location within the body, close to the eyes" (2012, p. 317). Bertossa et al. (2008) also suggested that "Human volunteers generally seem to find it easy and natural to locate their center of self, the place 'I am' or the I-that-perceives. With considerable consistency, sighted or blind, Western or non-Western, it is placed somewhere near the center of their head" (2008, p. 333). Another study by Alsmith and Longo (2014) found that most self-location judgments pointed to either upper face or upper torso. All of these studies indicate a close connection between self-location and the location of 1PP.

Now, if the role of 1PP-location can be incorporated into the role of body-location, then there is probably no need to include the notion of 1PP-location in the definition of selflocation. But, is this indeed the case? Before articulating this issue, we will make a few remarks to clarify our terminology. First, although 1PP often refers to one's visual perspective, there is more to it. Other types of information, such as tactile, proprioceptive, vestibular signals, etc. also contribute to one's egocentric reference frame. On the other hand, in order to make the notion of 1PP experimentally operational, many studies consider 1PP as referring to visual perspective. This is reasonable since vision often plays a dominant role relative to other sensory modalities, which is important in the research of full-body illusions. In this study we will operate with the visual notion of 1PP in our experiments, but will take non-visual information into consideration as well. Second, in a recent review, Maselli (2015) defined visual-perspective as "the point from which visual information from the environment is gathered" (2015, p. S309). She chose the term "visual-perspective" instead of "1PP" to avoid confusion with "first-person visual perspective over the fake body" (2015, p. S309). In this study, we will continue to use "1PP" with this caution in mind.

Both body-location and 1PP-location are maintained and influenced by vision, proprioception, somatosensation, and vestibular information. Both are forms of subjective spatial awareness that usually match and integrated with each other. For example, while watching a live baseball game in a stadium, as I move from the outfield to an infield seat, my sense of body location becomes different and my sense of 1PP-location changes accordingly as well. However, we think that there are at least two reasons suggesting that 1PP-location plays a role in self-location that is distinct from body-location, and that a better characterization of self-location should include both body-location and 1PP-location. First, out-of-body experiences (OBE) have been described as a type of abnormal self-location, characterized by a sense of disembodiment and an experience of looking at one's own body from an elevated and distanced 1PP (Blanke and Mohr, 2005, p. 186; Serino et al., 2013, p. 1243). For example, an OBE subject reported that "she saw her whole body as if she were outside, from an external and superior point of view" (Maillard et al., 2004). Another subject said that "she felt she was floating above it and could view her body and its surroundings from above" (Greyson et al., 2014). Blanke and Mohr said that "During an OBE people seem to be awake and feel that their 'self,' or center of awareness, is located outside of the physical body and somewhat elevated. It is from this elevated extrapersonal location that the subjects experience seeing their body and the world" (2005, p. 186). These descriptions clearly suggest that, in the case of OBE, the sense of self-location is dissociated from the sense of body-location and tied to the sense of 1PP-location. If self-location is depicted only in terms of bodylocation, how to characterize OBE would become a problem. Hence, 1PP, more precisely, the location of 1PP, is important for specifying self-location, and its role is not the same as bodylocation.

To see the second reason, consider what Maselli (2015) calls the front-stroking and the back-stroking paradigms in the studies of full-body illusions. As mentioned earlier, participants in the front-stroking paradigm felt themselves to be in the location of the unseen illusory body (Ehrsson, 2007) and they experienced ownership of that illusory body (Guterstam and Ehrsson, 2012). Here, it is crucial to note that the location of the illusory body was determined by the location of the manipulated 1PP (i.e., the location of the camera), not the other way around. In the backstroking case, participants mislocalized themselves toward the virtual body, and some (but not all) of them also experienced it as their own (Lenggenhager et al., 2007). It is worth emphasizing that the virtual body was seen as located 2 m in front of the subject precisely because the camera was positioned 2 m behind the subject. These observations suggest that the role of 1PPlocation cannot be replaced by body-location. A better picture of self-location seems to be the following: in both the frontstroking and the back-stroking paradigms, self-location requires interaction between body-location and 1PP-location, and it is likely that body-location and 1PP-location are different factors in the sense of self-location.

If 1PP-location and body-location are not the same, will this provide any support to the dualism between self and body? The answer is negative. In everyday life, we experience ourselves as being in the location from where we can perceive the world. Our sense of self-location seems to lock into the 1PP-location given by ordinary experience. Moreover, this ordinary 1PP-location is not an abstract geometric point. There is a sense of embodiment tied to it: we feel that we have a body in (or in line with) that location, from where we can touch and act upon the world. Hence, recognizing the role of 1PP-location in the sense of selflocation will not risk falling into Cartesian dualism. In addition, in our previous study on the "self-touching illusion" (Liang et al., 2015), we observed a double-body effect: we manipulated the participant's visual perspective while letting him/her interact with the experimenter, such that the subject was touching someone and being touched at the same time, as well as watching his/her own body in front of him/herself. In the two synchronous fullbody conditions, many participants felt not only that "I was brushing my own hand" but also that "It felt that I had two bodies" (2015, p. 3–5, Supplementary Materials). If the doublebody effect is a solid phenomenon, it would support that 1PPlocation is embodied such that there is no tendency toward dualism.

In this study, we investigate self-location by addressing the following issues: first, can the spatial integration of body-location and 1PP-location be temporarily modified? Second, is it possible for healthy subjects to have the illusory experience of owning two bodies? In most previous studies, including both the backstroking and the front-stroking paradigms, both body-location and 1PP-location remained still throughout the experimental procedures. In this study, we used a back-stroking set-up and added in various forms of movement to study body-location and 1PP-location. We aim to propose a refinement of the current picture that characterizes self-location solely in terms of bodylocation.

Four experiments were conducted to address the above issues: the participants wore an HMD connected with a stereo camera behind them so that they watched their own body standing in front of them while receiving tactile stimulations. Depending on the experiments, the subjects either stood still (Basic condition), or were instructed to walk straight ahead such that they watched their body moving away from the position of their visual perspective (Walking condition), or the experimenter moved the stereo camera away from the subjects' body such that their visual content was systematically receding (Visual condition). Experiment 1 performed the Basic condition. The goal was to verify whether we could induce a bodily illusion similar to the one reported by Lenggenhager et al. (2007), and the results will provide a basis to compare with the data collected in the other conditions. Experiment 2 carried out the Walking condition to see (1) whether a variant of body-ownership illusion could be induced in this condition, and (2) whether the walking movement may modify the participant's sense of body-location. Experiment 3 conducted the Visual condition in order to test: (1) whether another version of body-ownership illusion could be induced in this set-up, (2) whether moving the stereo camera may influence the participant's sense of 1PP-location, and (3) whether it is possible for healthy subjects to feel as if they have more than one body. Finally, in Experiment 4 we performed the synchronous conditions of all the above three experiments. This would enable us to compare the three major conditions so as to investigate the relationship between body-location and 1PPlocation.

By conducting these experiments, we intended to test the following hypotheses: (1) the spatial unity between bodylocation and 1PP-location can be temporarily interrupted in some experimental conditions; and (2) the illusory experience of owning two bodies can be induced. If both hypotheses were verified, they would show that, first, body-location and 1PPlocation are two distinct factors in the sense of self-location, and that a better characterization of self-location should include both body-location and 1PP-location. Second, the double-body effect would support the view that the sense of 1PP-location is essentially embodied. Hence, in recognizing the role of 1PPlocation, the worry about dualism will not arise. We will discuss the implications of our experimental results and address the issues raised above.

## METHODS

## Participants

All four experiments in this study adopted within-subjects designs. Totally, we recruited 86 healthy volunteers. See **Table 1** below for the details of the participants. All participants gave their written consent prior to the experiments. All experiments

#### TABLE 1 | Overview of experiments.


were conducted in accordance with the Declaration of Helsinki. This study was approved by the Research Ethics Committee of National Taiwan University (NTU-REC: 201501HS009).

## Materials and Procedures

We used a head mounted display (HMD, Sony HMZ-T1) and a stereo camera (Sony HDR-TD20V) to conduct four experiments. The questionnaires were structured using a Likert scale from "strongly disagree" (−3) to "strongly agree" (+3), and the statements were distributed randomly; they can be divided into the following categories: 1PP-location, body-location, bodyownership, 1PP-location vs. body-location, double-body effect, and positive control (**Table 3**). Since the purpose of Experiment 1 was to compare our results with those of Lenggenhager et al. (2007), we adjusted the questionnaire in the following way: Q5 was reformulated as "It felt as if the body in front of me was mine." We also removed Q2, Q4, Q6, and Q7 from the questionnaire, and added in two statements about touch referral (see **Table 2**). We also had a screen-switch machine (ATEN, VM5808H, Taiwan) that can switch between the images taken by the stereo camera and other computer images. It allowed us to present questionnaires on the HMD.

The skin conductance responses (SCR) were recorded with a Data Acquisition Unit-MP35 (Biopac Systems, Inc. USA). SCR was measured in the synchronous and asynchronous conditions of Experiments 1–3, in which a knife was shown on the HMD scene, then cut toward the participant's physical body. To measure SCR, two single-use foam electrodes (Covidien, Inc., Mansfield, USA) were attached to the lower edge of the participant's right palm on the volar surfaces of the medial phalanges. Data were registered at a sample rate of 200 Hz, and analyzed with the Biopac software AcqKnowledge v. 3.7.7. We identified the amplitude of SCR as the difference between the maximal and minimal values of the responses within 5 s TABLE 2 | The questionnaire statements in Experiment 1.


*The questionnaires were in Chinese when presented to the participants. Here and in* Table 3 *we present the English translations.*



of the threat (Dawson et al., 2007). All subjects were informed beforehand that after the experiment they would orally answer a questionnaire presented on their HMD. They were advised to give their answers spontaneously based on their subjective feeling rather than on reasoning. Those subjects who did not show any SCR amplitude and those who did not pass the positive control (i.e., answered negatively to Q11) were excluded from

FIGURE 1 | Experimental set-ups. (A) Experiment 1 and the Basic condition of Experiment 4. The participant wore an HMD connected with a stereo camera positioned 2 m behind and received tactile stimulations for 70 s. (B) Experiment 2 and the Walking condition of Experiment 4. The participant wore an HMD connected with a stereo camera positioned 30 cm behind and received tactile stimulations for 70 s. At the 20th s, the subject was instructed to walk straight ahead for about 2 m. (C) Experiment 3 and the Visual condition of Experiment 4. The participant wore an HMD connected with a stereo camera positioned 30 cm behind and received tactile stimulations for 70 s. At the 20th s, the experimenter swiftly moved the stereo camera away from the participant's body for about 2 m.

the analyses. Totally, we excluded the data of three participants, including their SCR and questionnaires. See below for the procedures of each experiment.

## Experiment 1: Basic Condition (Sync. vs. Async.)

The participant put on an HMD connected with a stereo camera positioned 2 m behind him/her (**Figure 1A**). The participant also wore mini-headphones in order to listen to white noise during the experiment. Then the participant was asked to keep his/her eyes closed and wait for the announcement to begin. When the participant opened his/her eyes, he/she saw the back of his/her full body standing in front of him/herself from below the neck. This visual content of the HMD was real-time streaming of the video recording from the stereo camera. The intrinsic delay of the actual streaming was within 20–40 ms. The participant was brushed on the back for 70 s. In the synchronous condition, the visual content matched synchronously with respect to the tactile stimulations. The frequency of the brushing was about once per second. In the asynchronous condition, we played a pre-recorded video on the HMD such that the subject watched his/her back being brushed at a constant speed of about 2 s per stroke. At the same time, the experimenter brushed the participant's back and varied the frequency randomly from 1 to 3 s per stroke, so that the touch that the participant felt was not consistent with what he/she saw. SCR was measured in both conditions at the 60th s: a knife was first shown on the HMD scene for 1 s, then cut toward the participant's upper back (i.e., toward the participant's adopted 3PP) for another 1 s. After the experiment, the participant orally responded to a questionnaire presented on the HMD.

## Experiment 2: Walking Condition (Sync. vs. Async.)

The stereo camera was positioned only about 30 cm behind the participant. In the synchronous condition, the participant received synchronous tactile stimulations. At the 20th s, the subject was instructed to walk straight ahead for about 2 m and then was asked to stop (**Figure 1B**). The average walking velocity was about 0.67 m/s. Since the stereo camera remained in the same position, the walking movement caused changes in the subject's proprioception and visual content: the subject proprioceptively felt that his/her body was moving ahead, while at the same time watching his/her own body moving away from his/her visual perspective. The procedure of the asynchronous condition was the same, except that the brushing was asynchronous. In both conditions, the participant received tactile stimulations on the back for 70 s, followed by the same SCR measurement and questionnaires.

## Experiment 3: Visual Condition (Sync. vs. Async.)

The stereo camera was again positioned about 30 cm behind the participant, who was brushed on the back either synchronously or asynchronously for 70 s. The new factor was that, at the 20th s, while the subject was standing still, the experimenter swiftly moved the stereo camera away from the subject's body for about 2 m (**Figure 1C**). The average velocity with which the camera was moved back was about 1.33 m/s. This was to change the location of the participant's 1PP, such that the scene that the subject saw via the HMD systematically receded. The rest of the procedure was the same as in the above two experiments.

## Experiment 4: Basic, Walking, and Visual Conditions (Sync.)

In this experiment, we conducted the Basic, Walking, and Visual conditions (**Figures 1A–C**) with only synchronous brushing and did not measure SCR. In each of these conditions, the participant saw via the HMD the back of his/her full body standing in front of him/herself from below the neck, and was brushed on the back for 70 s, followed by a questionnaire.

## Data Analyses and Statistics

To analyze the questionnaire and SCR data collected in Experiments 1–3, we found that they were not normally distributed (using Shapiro–Wilk tests), so we used the nonparametric Wilcoxon's matched-pairs signed-rank tests to compare the synchronous and asynchronous conditions. For Experiment 4, we conducted Friedman's analyses of variance by ranks to determine whether there were significant differences among the three conditions, followed by Wilcoxon signedrank tests with Bonferroni correction as post-hoc analyses. Wilcoxon signed-rank tests were also carried out to compare Q6 in the Walking and the Basic conditions, and Q7 in the Visual and the Basic conditions. We adopted relatively high standards when interpreting the questionnaire data: in addition to the requirement that differences in data must be statistically significant (α = 0.05), the absolute value of the median of a major factor (such as 1PP-location, body-location, or doublebody effect) must be at least one (cf. Kalckert and Ehrsson, 2012). More precisely, if there was an effect on 1PP-location, the median of the positive statement Q1 must be at least positive one (+1), and the median of the negative statement Q2 must be at least negative one (−1). Likewise, if there was an effect on body-location, then Q3 must be at least +1 and the negative statement Q4 must be at least −1. All the other statements were formulated in positive terms, so their median values should reach at least +1 before we claimed to have observed genuine effects. The idea here is that if the absolute value of a median was <1, the group of participants would be considered to be uncertain about the questionnaire statement.

## RESULTS

## Experiment 1

In this section, we report only the experimental results from significant comparisons. The median values and interquartile ranges (IQRs) of the questionnaire statements of Experiment 1 are shown in **Table 4**. Statistical significances were observed in Q5 (z = −3.662, p < 0.001), Q8 (z = −2.695, p = 0.007), Q12 (z = −3.935, p < 0.001), and Q13 (z = −3.413, p = 0.001, **Figure 2A**). The SCR value was significantly higher in the synchronous than in the asynchronous condition (z = −1.964, p = 0.050; sync. median = 2.750, async. median = 2.190, **Figure 2B**). These results suggest that in the synchronous condition the participants felt that their 1PP seemed to be in the back of their body (Q8). More importantly, they felt that the virtual body in front of them was theirs (Q5). The tactile stimulations were felt to be where they saw the virtual body being touched (Q12) and was caused by the brush touching the virtual body (Q13).


*The data marked in red color represent significant comparisons in the experiments.*

conditions regarding body ownership (Q5), 1PP- location vs. body-location (Q8), and touch referral (Q12 and Q13). (B) SCR results. The SCR values were significantly higher in the synchronous than in the asynchronous condition when the knife threats were applied to the participant's physical body (which was viewed via the HMD from the adopted 3PP). Significance levels: \**p* ≤ 0.05; \*\**p* ≤ 0.01; and \*\*\**p* ≤ 0.001.

## Experiment 2

The median values and IQRs are presented in **Table 4**. The value of Q5 was significantly higher in the synchronous than in the asynchronous condition (z = −2.619, p = 0.009, **Figure 3A**), as was also true for the SCR values (z = −3.621, p < 0.001; sync. median = 3.061, async. median = 1.342, **Figure 3B**). The results indicate that, compared with the asynchronous condition, the participants in the synchronous condition experienced ownership of the virtual body in front of them.

were applied to the participant's physical body, the SCR values were significantly higher in the synchronous than in the asynchronous condition. Significance levels: \**p* ≤ 0.05; \*\**p* ≤ 0.01; and \*\*\**p* ≤ 0.001.

## Experiment 3

See **Table 4** for the median values and IQRs. The value of Q5 was significantly higher in the synchronous than in the asynchronous condition (z = −3.308, p = 0.001, **Figure 4A**), and the SCR values also followed this pattern (z = −3.920, p < 0.001; sync. median = 3.210, async. median = 1.175, **Figure 4B**). This also indicates that illusory ownership of the virtual body was induced in the synchronous condition.

## Experiment 4

As for the previous experiments, median values and IQRs are presented in **Table 4**. Using Friedman's analyses, we found that there were significant effects in Q1 (χ <sup>2</sup> = 16.333, p < 0.001),

Q2 (χ <sup>2</sup> = 13.547, p = 0.001), Q3 (χ <sup>2</sup> = 30.644, p < 0.001), Q4 (χ <sup>2</sup> = 23.741, p < 0.001), and Q10 (χ <sup>2</sup> = 6.206, p = 0.045). Then we conducted Wilcoxon signed-rank tests with Bonferroni correction (α = 0.05/3 = 0.017). The results are presented in **Table 5** (**Figures 5A,C,E**). Finally, paired Wilcoxon signedrank tests showed two other significant differences regarding Q6 (Walking vs. Basic: z = −2.049, p = 0.040, **Figure 5B**) and Q7 (Visual vs. Basic: z = −3.202, p = 0.001, **Figure 5D**).

Significance levels: \**p* ≤ 0.05; \*\**p* ≤ 0.01; and \*\*\**p* ≤ 0.001.

## DISCUSSION

In this study, we investigated self-location by a series of full-body experiments. The findings of Experiment 1 were all consistent with the results reported by Lenggenhager et al. (2007), indicating that we successfully induced a version of out-of-body illusion in the synchronous condition where the participants stood still. In addition to synchronized visual-tactile stimulations, Experiments 2 and 3 brought in different types of movement to induce two different versions of full-body illusion. In the Walking condition, the participants experienced illusory fullbody ownership during their walking movement. In the Visual condition, ownership of the virtual body was induced while the participants felt that their 1PP-location was systematically receding. These three experiments provide a good basis for the comparison between body-location and 1PP-location in Experiment 4.

The results of Experiment 4 enable us to address the two issues raised in the Introduction. First, can the spatial unity between body-location and 1PP-location be temporarily modified? Our results have shown that they can. They are different subjective experiences. Compared with the Basic condition, the Walking condition significantly changed the participants' sense of bodylocation without affecting their sense of 1PP-location, and they felt as if their body left the position of their 1PP (**Figures 5A,B**). Also, compared with the Basic condition, the Visual condition modulated the sense of 1PP-location such that the participants felt as if their 1PP had left their body (**Figures 5C,D**). Finally, we observed significant differences between the sense of 1PP-location and the sense of body-location in the comparison between the Visual and the Walking conditions (**Figure 5E**). These results strongly suggest that the sense of where my 1PP is positioned and the subjective feeling of where I feel my body is located are not the same experiences.

Second, is it possible for healthy subjects to have the illusory experience of owning two bodies? This can be addressed by the data of Experiment 4 about the double-body effect. The score of Q10 in the Visual condition was significantly higher than the Basic condition (**Figure 5C**, **Table 4**), indicating that illusory ownership of two bodies is indeed possible. This finding fits well with the report by Lenggenhager et al. (2007) that "None of the participants reported sensations of overt disembodiment" (2007, 1097). Although, the participants felt as if they were watching themselves from a position separated from their body-location, their sense of 1PP-location remained embodied. Hence, given the data on the double-body effect, recognizing the distinct role of 1PP-location in the sense of self-location will not risk falling back to the dualism between self and body.

We think that body-location and 1PP-location are interrelated but distinct factors that jointly support the sense of self-location. Based on our findings, we suggest that, instead of defining selflocation only in terms of body-location, the sense of self-location can be better characterized as the subjective experience of where I am in space that results from the interaction between bodylocation and 1PP-location. Below we discuss the implications of our experimental data and compare with other studies.

**(1)** Petkova et al. (2011) argued that viewing the virtual body from 1PP was absolutely crucial for body-ownership illusions to occur. They criticized the 3PP set-up that, since watching the virtual body from 3PP was similar to recognizing oneself on a monitor, the outcome could be just a visual


TABLE 5 | Experiment 4: Paired comparisons of questionnaire scores.

*All values were rounded off to the 3rd decimal place. The values in red color represent significant differences. Z-values and p-values are from matched pairs Wilcoxon tests, and effect size is reported by Cohen's r.*

self-recognition "without necessarily experiencing a somatic illusion of ownership" (Petkova et al., 2011, p. 5; cf. also Ehrsson, 2008). In both Lenggenhager et al. (2007) and in our experiments, the subjects watched the virtual body from 3PP via an HMD; hence, both studies would face the above criticism. However, in our Experiments 1–3 we measured the participants' SCR to acquire psychological evidence. Since Lenggenhager et al. (2007) did not do this, our SCR data can be considered as a significant supplement to their pioneering work and can help respond to the above criticism. The significant differences in SCR values between the synchronous and the asynchronous conditions in Experiments 1–3 suggest that the participants' experiences went beyond mere visual self-recognition. Although, there can be alternative interpretations and the issue remains open, the SCR data reported here provide new support for the view that it is possible for 3PP set-ups to induce body-ownership illusions.

**(2)** Our findings about the double-body effect was consistent with the study by Heydrich et al. (2013), where two different methods (an HMD-camera set-up and virtual reality techniques) were used to induce the experience of owning two bodies. Also, as mentioned in the Introduction, our previous study on the "selftouching illusion" also demonstrated that the double-body effect is possible: the subject sat face to face with the experimenter, and both used their right hand to touch each other's left hand with a paintbrush. Under synchronous visual-tactile manipulations, many subjects felt as if they had two bodies (Liang et al., 2015, p. 3–5, Supplementary Materials). So we think that it is possible to induce the double-body effect in healthy subjects.

The set-up of our previous study was similar to the study of body-swap illusion by Petkova and Ehrsson (2008). In one of their experiments (Experiment 5), using visual-tactile manipulations the participant and the experimenter faced each other and squeezed each other's hands synchronously (cf. their Figure 6). Many subjects reported that "I was shaking hands with myself!," supported by SCR measurements. In another experiment (their Experiment 1), the double-body effect was measured by questionnaire, but no such effect was observed (cf. their Figure 2). Petkova and Ehrsson interpreted these results as showing that the participants felt that their body swapped with someone else's. On the face of it, the bodyswap illusion and the double-body effect seem to be different phenomena. Do these experimental results count as against our view? We do not think so. Although, their Experiment 5 involved a subject-experimenter interaction, no questionnaire measurements were conducted and hence the double-body effect was not really tested. In their Experiment 1, the participants only passively received tactile stimulations while viewing a mannequin, and the camera remained still throughout the process (cf. also Petkova et al., 2011). This was very different from the set-up of our current study: compared with the Basic condition, the data of the Visual condition showed that the movement of the camera significantly enhanced the double-body effect. Hence, our view remains sustained that, under the manipulation of moving the camera away from the participants, the experience of owning two bodies could be induced.

**(3)** We have suggested that there is a sense of embodiment associated with the sense of 1PP-location. We would like to further suggest that this sense of embodiment in the 1PPlocation is distinct from the sense of 3PP body-location. In both the front-stroking and the back-stroking paradigms, while the participants see their body in front of them via the HMD, the sense of embodiment in the 1PP-location does not rely on viewing the body. In our experiments, the virtual body was seen from the adopted 3PP. The synchronized visualtactile manipulations caused vision to dominate over tactile sensations and proprioception, such that the illusory sense of self-location was induced. This was consistent with the study by Lenggenhager et al. (2007), in which many participants "mislocalized themselves toward the virtual body" (2007, p. 1096).

In contrast, the sense of embodiment in the 1PP-location is part of everyday experience. We feel that we have a body in (or in line with) the 1PP-location, from where we can perceive, touch, and act upon the world. This sense of embodiment in the 1PP-location is natural and does not depend on seeing one's own body. Moreover, in the Visual condition, the participants' self-location was manipulated by the movement of the stereo camera causing change in the optic flow registered from the 1PP, such that the 1PP-location was felt as if it was receding. Although, the participant stood still, the change in optic flow modified the vestibular sense and elicited an illusory sense of oneself moving backward (illusory self-motion). Previous studies have suggested that vestibular signals can contribute to the sense of self-motion (MacNeilage et al., 2012; Lopez et al., 2013; Barry

the double-body effect was induced. (D) Comparison of the Visual and Basic conditions for Q7. The significant difference for Q7 indicated the discrepancy between 1PP-location and body-location in the Visual condition. (E) Comparison of the Visual and Walking conditions. There were significant differences in 1PP-location (Q1 and Q2), and body-location (Q3 and Q4), indicating that the senses of 1PP-location and body-location were distinct between the Visual and the Walking conditions. Significance levels: \**p* ≤ 0.05; \*\**p* ≤ 0.01; and \*\*\**p* ≤ 0.001.

and Burgess, 2014), and that optic flow can elicit illusory selfmotion (DeAngelis and Angelaki, 2012). Also, Lenggenhager and Lopez (2015) suggested that the vestibular system could influence full-body ownership and self-location (2015, p. 17– 19). So we think that in the Visual condition the visual 1PP dominated the vestibular signals, such that there is a sense of embodiment tied to the participant's 1PP-location. Therefore, both daily experience and our experimental set-up suggest that the sense of embodiment in the 1PP-location is different from the sense of body-location experienced from the 3PP.

**(4)** In a review article, Blanke (2012) remarks that "In rare instances, however, self-location and first-person perspective can be experienced at different positions, suggesting that it may be possible to experimentally induce similar dissociations in healthy subjects." Blanke cites the study of OBE by De Ridder et al. (2007) for empirical support, in which a 63-year-old patient was described as follows: "His perception of disembodiment always involved a location about 50 cm behind his body and off to the left... The environment was visually perceived from his real-person perspective, not from the disembodied perspective" (2007, p. 1830). As we see it, two different notions of 1PP were involved in this rare case: the "real-person perspective" and the "disembodied perspective." The notion of 1PP in Blanke's remark refers to the "real-person perspective," which was tied to the patient's body-location. What makes this case perplexing was that the patient's sense of self-location split and linked to both the "real-person perspective" and the "disembodied perspective." Nonetheless, the patient's self-location still involved both the sense of body-location and the sense of 1PP-location in an unusual way, which was compatible with our view.

**(5)** Finally, a very useful account of self-location was recently proposed by Maselli (2015), in which she compared the front-stroking and the back-stroking paradigms. In our terms, this account proposes that in both paradigms self-location is intrinsically connected with and influenced by an embodied 1PP-location, but in very different ways. In the front-stroking paradigm, the experimental manipulation was designed to affect the participant's perceived self-location coded in an allocentric framework. In some studies within this paradigm (Ehrsson, 2007; Guterstam and Ehrsson, 2012), the visual and tactile sensations were both felt in the embodied 1PP-location, such that "the illusory self-location corresponds to the position of the visual-perspective" (Maselli, 2015, p. S310, author's emphases). In the back-stroking case, the multisensory conflicts can cause a re-coding of the peripersonal space (touch referral) and induce "a spatial dissociation between visual-perspective and self-location" (Maselli, 2015, p. S310, author's emphases). Thus, Maselli suggests that the sense of self-location can be regarded as "the blending of two parallel representations: the abstract allocentric coding of the position occupied in the environment, mainly associated with the visual-perspective, and the egocentric mapping of somatosensory sensations into the external space, mainly associated with peripersonal space" (2015, p. S310, author's emphases).

We fully agree with Maselli that both allocentric and egocentric representations are required to account for selflocation. We also welcome the emphasis on the role of 1PP in her account. However, there is a difference between her view and ours. Maselli (2015) describes self-location as "the experience of occupying a given position in the environment" (2015, p. S309). This is the natural understanding mentioned in the beginning of the Introduction. But she further characterizes self-location as "the perceived position of the body in space" (2015, p. S309). So she also understands self-location in terms of body-location. As the case of OBE and our experimental results indicated above, we think that it is insufficient to characterize self-location only via body-location. In this regard, our view is different from Maselli's. We propose the following picture: body-location and 1PP-location are two distinct factors that are spatially integrated most of the time, but this integration can be temporarily interrupted in a pathological case or an experimental set-up. Even when the spatial unity of body-location and 1PPlocation is temporarily modified, as induced in the back-stroking paradigm, both of these factors continue to interact with each other to maintain an illusory sense of self-location. In our picture, the sense of body-location and the sense of 1PP-location are interrelated factors that jointly support the sense of self-location. On the one hand, both the "3PP body-location" in the backstroking paradigm and the "illusory body-location" in the frontstroking paradigm are anchored in the subject's 1PP-location. On the other hand, 1PP-location is not an abstract geometric point. Rather, it is a subjective experience essentially tied to a sense of embodiment. Self-location results from the interaction between body-location and 1PP-location. If fact, we do not consider our picture to be fundamentally different from Maselli's. However, we do think that when Maselli specifies self-location in terms of the blending of allocentric and egocentric representations, her account is more congenial to our proposal here than construing self-location exclusively in terms of body-location.

## CONCLUSION

This study investigated self-location by manipulating 1PPlocation and body-location. The new methods introduced here participants' walking movement vs. the displacement of the stereo camera—generated different subjective experiences. Since the sense of self-location is crucial for one's interaction with the environment, we believe that recognizing the distinctive roles of 1PP-location and body-location would contribute to a better picture of environmental adaptation. We would like to make three concluding remarks. First, to situate our study in a broad picture, consider the two different paradigms reviewed by Rosch (2000). One is "analytic science": according to Rosch, "The analytic picture offered by the cognitive sciences is this:

## REFERENCES


the world consists of separate objects and states of affairs ... it deals with isolated units" (2000, p. 189–190). The other is "biofunctionalism": as Rosch characterizes it, in daily life there is "a powerful intuition of wholeness which goes beyond conceptual analysis into isolated units" (2000, p. 190). As Gibson suggested, "the words animal and environment make an inseparable pair. Each term implies the other" (Gibson, 1979, p. 8). In our experiments, the visual perspective was manipulated such that it felt as if the participant's 1PP was separated from his/her body. This was not an ordinary context. In this sense, we agree that our experiments are within the paradigm of analytic science. So what we have achieved is very modest: we have only demonstrated that the sense of 1PP-location and the sense of body-location can be manipulated selectively in specific settings. We do not claim that our experimental results may automatically apply to ordinary contexts. Second, based on the findings about the double-body effect, we have suggested that 1PP-location is essentially embodied. Hence, both the sense of 1PP-location and the sense of body-location are embodied experiences. We think that both 1PP-location and body-location are inherent in the subjective experience of self-location. The sense of 1PP-location and the sense of bodylocation jointly contribute to shaping one's experience of selflocation. Finally, we would like to suggest an issue for further study. The double-body effect certainly requires further study, and it would be significant to investigate the neural mechanisms that are responsible for self-location as well as the doublebody effect. They may help to explain the tremendous flexibility of our bodily experiences in coping with novel environmental challenges. We think that our experiments, especially the Walking and the Visual conditions, could contribute to this endeavor.

## AUTHOR CONTRIBUTIONS

HH and CL designed all experiments; HH, YL, and WC conducted the experiments and analyzed the data; HH and CL wrote the manuscript.

## ACKNOWLEDGMENTS

The authors would like to thank Wei-Yun Chen and Iris Yang for their assistance in our lab. We would also like to thank professor Chen-gia Tsai from the Graduate Institute of Musicology for the SCR equipment. Finally, this study was supported by Taiwan's Ministry of Science and Technology (project: MOST 104-2410- H-002-205-MY3).

Barry, C., and Burgess, N. (2014). Neural mechanisms of self-location. Curr. Biol. 24, R330–R339. doi: 10.1016/j.cub.2014.02.049


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2017 Huang, Lee, Chen and Liang. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Is there any Influence of Variations in Context on Object-Affordance Effects in Schizophrenia? Perception of Property and Goals of Action

1,2 1,2 <sup>3</sup> 1,3 Jessica Sevos \*, Anne Grosselin , Denis Brouillet , Jacques Pellet and Catherine Massoubre1,2

<sup>1</sup> Department of Psychiatry, University Hospital of Saint-Étienne, Saint-Étienne, France, <sup>2</sup> TAPE Laboratory, EA7423, University of Jean Monnet, Saint-Étienne, France, <sup>3</sup> Epsylon Laboratory, EA4556, Department of Psychology, University of Montpellier III, Montpellier, France

The simple perception of an object can potentiate an associated action. This affordance effect depends heavily on the action context in which the object is presented. In recent years, psychologists, psychiatrists, and phenomenologists have agreed that subjects with schizophrenia may not perceive the affordances of people or objects that could lead to a loss of ease in their actions. We examined whether the addition of contextually congruent elements, during the perception of everyday objects, could promote the emergence of object-affordance effects in subjects with schizophrenia and controls. Participants performed two Stimulus–Response-Compatibility tasks in which they were presented with semantic primes related to sense of property (Experiment 1) or goal of action (Experiment 2) prior to viewing each graspable object. Controls responded faster when their response hand and the graspable part of the object were compatibly oriented, but only when the context was congruent with the individual's needs and goals. When the context operated as a constraint, the affordance-effect was disrupted. These results support the understanding that object-affordance is flexible and not just intrinsic to an object. However, the absence of this object-affordance effect in subjects with schizophrenia suggests the possible impairment of their ability to experience the internal simulation of motor action potentialities. In such case, all activities of daily life would require the involvement of higher cognitive processes rather than lower level sensorimotor processes. The study of schizophrenia requires the consideration of concepts and methods that arise from the theories of embodied and situated cognition.

Keywords: context, embodiment, goals of action, object-affordance effect, schizophrenia, sense of property, sensorimotor simulation, Stimulus–Response-Compatibility

## INTRODUCTION

Embodied theories of cognition address the physical, motivational, and environmental dimensions of an individual's daily experience (Varela et al., 1992). Such a view posits that cognition emerges from the cooperation and co-evolution of perceptual and motor systems that allow sensorimotor patterns to be implemented. Perception is therefore more proactive toward than reactive to the individual's environment (Berthoz and Petit, 2003; Rizzolatti and Matelli, 2003; Gallese, 2007;

Edited by:

Anna M. Borghi, University of Bologna, Italy

#### Reviewed by:

Thomas J. Faulkenberry, Tarleton State University, USA Luca Tummolini, National Research Council, Italy

> \*Correspondence: Jessica Sevos j.sevos@hotmail.fr

#### Specialty section:

This article was submitted to Cognition, a section of the journal Frontiers in Psychology

Received: 26 May 2016 Accepted: 22 September 2016 Published: 05 October 2016

#### Citation:

Sevos J, Grosselin A, Brouillet D, Pellet J and Massoubre C (2016) Is there any Influence of Variations in Context on Object-Affordance Effects in Schizophrenia? Perception of Property and Goals of Action. Front. Psychol. 7:1551. doi: 10.3389/fpsyg.2016.01551

**20**

Barsalou, 2008, 2009; Vandevoorde, 2011); it's an internal simulation of action, designed for understanding the meaning of his environment (Berthoz, 2003). The subject builds the world in which he lives and acts, in accordance with his perception of that environment (Glenberg et al., 2013) and the actions he performs within it. Consequently, he perceives through active exploration and intentional activity according to his goals, tempered by intrinsic constraints of perceptual systems and context, and not simply by interpreting sensory messages (Haggard, 2005). This binding of action and perception allows the most economical solution to emerge from a set of possible actions. Gibson (1977, 1979) described "affordance" as the subject's faculty to guide his or her behavior, according to the perception of what the environment is offering in terms of action potentialities.

In cognitive psychology, numerous studies have operationalized this concept of affordance, (e.g., Tucker and Ellis, 1998, 2001, 2004; Ellis and Tucker, 2000; Borghi et al., 2012). In their seminal study, Tucker and Ellis (1998) associated the orientation of the typical graspable part of an object with the hand used by a participant to respond to instructions regarding the object, observing that this compatibility of orientations facilitates response. They showed that perception of an object automatically potentiates related actions (via simulation mechanisms) even in the absence of instruction or explicit intention to act.

However, others describe an implicit motor intentionality to act on an object as a basis of the affordance effect. Naming an object automatically elicits action potentialities not elicited by its passive viewing alone; its naming evokes gestural knowledge of the object's form and function (Bub and Masson, 2006). The behavioral impact of perceived object affordance seems to depend heavily on the action context in which the object is presented to carry on the subject motor intentionality.

These action potentialities could match behavior or attitude sketches ("covert behaviors") that the individual reenacts as he perceives environmental stimuli, and these simulations of action (Berthoz and Petit, 2006) might represent a third component that requires incorporation into the relation between perception and action (Garbarini and Adenzato, 2004). At the neuronal level, the brain employs similar neural resources and dynamic representations to execute, imagine, and perceive actions (Jeannerod, 2006; Barsalou, 2008). This ability to simulate an action in the absence of its effective implementation, which is underpinned by "canonical" and "mirror" neural networks, give meaning to the surrounding world (subjects and objects). In this case, knowledge is enacted and carries the implicit meaning of perceived world.

## Disembodiment in Schizophrenia

"This tacit or enacted knowledge is also the basis of "common sense" (Blankenburg, 2001; Fuchs, 2001): it provides a fluid, automatic, and context-sensitive pre-understanding of everyday situations, thus connecting self and world through a basic habituality and familiarity" (Fuchs, 2015, p. 199).

Embodied theories of cognition highlight the dependence of cognition on the subject's experience of the world without addressing mental pathologies, especially schizophrenia.

However, some approaches to psychopathology that consider the phenomenological dimension of embodiment (Sass and Parnas, 2003; Fuchs, 2005) describe schizophrenia as a disturbance of the individual's relationship with the world and others that constitutes a "lack of common sense" (Stanghellini, 2000).

From a clinical perspective, therapists report that their patients with schizophrenia experience some perceptual or cognitive fragmentation of the world accompanied by a certain loss of ease in their actions. As a result, they may experience a disintegration of habits or automatic practices. Due to an alteration of the body-based involvement, patients have to "think" deliberately about each action, before to perform it. Sass (2004, p. 136) describes the failure of patients with schizophrenia to perceive the affordances of people, things, or actions that give the objects "practical significance that, for example, make a chair a thingto-sit-on, a hammer something-to-pound-with, or a human body something to be approached, feared, or caressed."

Until now, experimental studies have focused on the impairment in the motor understanding of other's behaviors in schizophrenia, as revealed by a reduced activation of the mirror neural network. Numerous studies in patients show an inherent deficit that inactivates this neural network (Mehta et al., 2014a). Using different paradigms, most studies showed reduced mirror neuron activity (MNA) and greater deficits in theory of mind (Mehta et al., 2014b), emotion recognition (Mehta et al., 2014b), and expression (Varcin et al., 2010), action imitation (Park et al., 2008; Thakkar et al., 2014), and observation (Enticott et al., 2008) in patients. If this mirror neural network enables the internal simulation of behaviors, it is not surprising that individuals with schizophrenia have difficulties relating with the world in interpersonal relationships (Mehta et al., 2014b).

However, to our knowledge, only few studies extend this line of research to object's perception. For example, Delerue and Boucart (2012) noted the absence of a perception-action link in schizophrenia when they measured eye-tracking during an active visual scanning task to show the decoupling of an object's perception from the potential action. They demonstrated that the visual exploration of control subjects varied according to the instructions given, to name the object visually presented or to name the action inferred by the object. When participants had to name the object, they explored only the part useful for its identification, focusing, for example, on the tines of a fork. However, when they had to name the action, they explored the whole object, extending their visual explorations, in this case, to the handle, the graspable part of the fork. In contrast, visual explorations of patients were similar for the two tasks; in each case, they focused essentially on the useful part (tines) to identify the object (fork). Though patients with schizophrenia demonstrated no abnormalities in naming objects, the absence of facilitation in naming the actions of objects could reflect impaired perception of affordance.

Our team also studied the affordance in schizophrenia (Sevos et al., 2013). Since some authors suggested that "[i]f sensory and motor processes are basic to all other cognition, as much research in embodied cognitive science posits, then disorders that have traditionally been viewed as dysfunctions of higher

cognitive processes could in fact be explained by lower level sensorimotor processes." (Drayson, 2009, p. 338), we proposed to study a potential deficit of sensorimotor integration, instead of higher cognitive dysfunction, in this pathology. We evaluated whether perceived objects automatically evoke corresponding action processes (sensorimotor integration) using a Stimulus– Response-Compatibility (SRC) paradigm that provides for a shorter response time when the stimulus and response share the same properties (Sevos et al., 2013). In our first experiment (Experiment 1), we observed faster response times when the spatial localization of a stimulus and of the motor response were compatible (Simon effect), and patients with schizophrenia showed no impairment of visuo-spatial integration in this task. In our second experiment (Experiment 2), we replicated the tasks of Tucker and Ellis (1998) that measured the effect of compatibility between the orientations of common graspable objects and the hand with which the subject was to respond to instructions (object-based affordance effect). The absence of this effect even in patients with mild symptom severity suggested no automatic binding of perception and action in this population.

If a relationship between the features of a motoric object and the action to be carried out with it does not occur automatically, is it possible that adding contextual elements, making the action more relevant to the patient's needs and wishes, could induce this automatic link between perception and action?

Indeed, in controls, it is known that the context in which an object is observed influences how the object is perceived. The activation of action potentialities, such as grasp, is not completely automatic but depends rather on how attention is oriented toward the action-relevant features of an object. For example, the perception of the same object (door handle) could trigger different sensorimotor simulations (Tipper et al., 2006). Affordance effects were obtained only when subjects had to discriminate properties of an object linked to action (a shape); they did not occur when they had to discriminate color.

## The Influence of Variations in Context on Object-Affordance Effects

An individual's range of potentially available motor actions also depends on his unique "history" of interactions between the object perceived and actions carried out with it. Moreover, the context acts as either a resource or constraint according to the situation, the subject's ability to exploit environmental resources, and the subject's goals of action (Creem and Proffitt, 2001). Therefore, it is not surprising that there is widespread interest in the modulation of affordance effects in various experimental contexts (Borghi et al., 2012; Borghi and Riggio, 2015). Researchers disagree regarding the degree of automaticity of action potentiations versus task- and contextdependent activations (Creem-Regehr and Lee, 2005; Buxbaum and Kalenine, 2010; Borghi and Riggio, 2015). Nevertheless, efforts have been made to clarify these questions by modifying the context of experimental settings. Among such studies, we will emphasize those that address goals of action and sense of property.

## Goals of Action

The context in which an object is observed has been shown to influence how it is perceived. Some authors believe that affordance effects are not immutable but may vary according to the observer's goals and intentions in a given environment: the perception of one object might trigger different sensorimotor simulations. The simultaneous automaticity of the activation of affordance and the flexibility of its modulation according to the task and the physical and social context has been recently shown (Borghi and Riggio, 2015) as well as the same object can evoke different affordances (manipulative or functional grip) according to context (Kalénine et al., 2014). In the same idea, some authors have demonstrated a motor facilitation when two objects are congruent and disposed in a functional way to imply a specific action (Yoon et al., 2010), and that presenting a photo of a hand with prehensile posture congruent with that of the hand with which the subject was to respond facilitated the response to objects (Borghi et al., 2012). By implying the individual's underlying goal, the photo would more strongly induce interaction between the object and action. A facilitation effect in conditions of both functional congruency between two objects (presenting together a bottle and glass rather than a bottle and ball) and the status of the objects (presenting together a bottle and empty glass rather than a bottle and full glass) was also reported (De Stefani et al., 2012). The object's state also appears to influence affordances, since larger effects were observed when the perceived object appeared active (a door handle that was depressed) rather than passive (the door handle in apparent inactivity) (Tipper et al., 2006). All these studies showed the flexible experience of objects by individuals and the modulation of their perception according to their use in a given context.

Semantic material has also been used to investigate the modulation of affordance effects according to goals of action. Indeed, language, as memory or perception, implies sensorimotor simulation mechanisms linked to objects or situations to which these linguistic expressions refer (e.g., Glenberg and Kaschak, 2002; Zwaan and Taylor, 2006; Gallese, 2009; Borghi and Pecher, 2011). In particular, words evoke object affordance just as visual stimuli do (Gibson, 1979). This functional link between language and motor systems results from the often simultaneous occurrence of actions and their referents; neural populations, recruited to process a word and the referent body movement, frequently fire together and become strongly linked (Pulvermüller, 1999, 2001).

Borghi and Riggio (2009) showed that reading a sentence as a prime of an visual object automatically activate the goal of the action but only when the sentence included a verb of action compared with a verb of observation. They also observed an interference effect when the action sentence and perceived object were incongruent. Using a similar design, Costantini et al. (2011) observed the triggering of a simulation effect by an action sentence, but only when the proximity of objects permitted bodily interaction (i.e., in the peripersonal space). Thus, specifying the proper conventional use of an object encourages the simulation of a particular pattern of motor response.

On the other hand, some authors do not show that semantic context leads to an automatic and invariant simulation of specific motor programs (Van Dam et al., 2010). Indeed, in a functional magnetic resonance imaging task that involved words with both motor and visual characteristics, such as tennis ball and boxing gloves, stronger activation of motor areas when subjects thought about words with motor characteristics was reported (Van Dam et al., 2012). These findings suggest that the activation of motor-specific information during action-word comprehension is flexible and contextually dependent.

## Sense of Property

fpsyg-07-01551 October 3, 2016 Time: 10:38 # 4

Affordance effects can depend on the actor's sense of who owns the objects with which he interacts. This sense of property as consider as a basic mechanism that emerges automatically even during tasks not directly related to ownership (Tummolini et al., 2013). Using a SRC paradigm, Constable et al. (2011) showed variation in action potentialities evoked during the perception of objects according to the individual's understanding of who owned the object. They asked participants to decorate a cup and use it at home to create the feeling that they owned the cup. Fifteen days later, participants performed a task on computer that used photos of different cups: their own, one decorated by the investigator, and two others with no defined owner. Sensorimotor compatibility effects were shown for all photos except those of the investigator's cup. The authors conclude that a sense of ownership may be embodied in the visuomotor system that is sensitive to the status of an object's ownership and favors the inhibition of action that involves another person's objects rather than the facilitation of action toward self-owned objects. These data fit with the hypothesis of an early developmental sense of property. Indeed, as early as age four, children understand that inappropriate interaction with objects they do not own can result in negative consequences (Neary et al., 2009).

The sense of property is partly determined by one's identity (Dittmar, 1992), and name and surname are essential components of that identity. The repetition of an individual's names throughout daily life could automatically draw auditory and visual attention to these words (Moray, 1959; Wood and Cowan, 1995; Shapiro et al., 1997) and evoke more memories and emotions than other words do. The detection of one's surname among other stimuli has been demonstrated on both behavioral (Oswald et al., 1960) and cerebral levels (Perrin et al., 1999), even during sleep. One's surname seems to be a pertinent ecological stimulus for reference to "self " (Perrin et al., 2005). Markman and Brendl (2005) presented individuals' own surnames in the middle of a screen and asked participants to categorize the valence of positive or negative adjectives placed near or far from their surname. Half of the participants were instructed to pull a lever toward themselves in response to positive adjectives and to push it away in response to negative adjectives, and the other half were given the reverse instructions. Participants responded faster when positive adjectives were closer to their surname and when negative words were further away, irrespective of the participant's pushing or pulling the lever. The authors concluded that the speed of response movements depended more on the representation of the participant's self-the subject's surname on the screen-than the representation of their body, the physical activity of pushing or pulling of the lever.

If attentional processes can automatically be attracted by selfrelevant items (as participant's names -Moray, 1959; Gray et al., 2004- or object ownership -Turk et al., 2011), we can then expect that this kind of stimuli have an impact on other cognitive or behavioral processes.

## The Present Study

We examined whether the addition of a more salient action context can promote the emergence of affordance effect during the perception of everyday objects in patients with schizophrenia. Indeed, in this population, the simple perception of an object without context does not automatically evoke corresponding action processes, and this lack of sensorimotor integration (Sevos et al., 2013) could be associated with the absence of visual exploration of the action-relevant features of the object (Delerue and Boucart, 2012).

In this study, we explored the emergence of object-affordance effects in schizophrenia using variations of new experimental contexts in two experiments focusing on action potentiation during the perception of object handling. In the first experiment (Experiment 1), we began with the SRC paradigm of Tucker and Ellis (1998) and added the presentation of primes (surnames) to enhance the context, using the participant's surname or "Rani" as an imaginary surname to act as a reference of owning or not owning the perceived object. In the second experiment (Experiment 2), we added action sentences primes that were congruent or not with the goal of action induced by the conventional use of a given object. For example, we would present the sentence "For watering plants" followed by a photo of a watering can or a remote control.

## EXPERIMENT 1

To evaluate if the sense of property can modulate object's affordance effects (in controls and patients with schizophrenia), we introduced surnames, known to be ecological stimuli for selfreference (Markman and Brendl, 2005; Perrin et al., 2005), as primes to enhance the context of an object.

Because people are also known to interact with objects differently based on whether the objects belong to them or not (Constable et al., 2011), we considered the affordance effects related to the introduction of surnames to reference the subject's owning (participant's name) or not owning (imaginary name, "Rani") the perceived objects. We expected the emergence of affordance effects in both populations when objects were primed with the participant's surname but not the imaginary name.

## Materials and Methods Participants

Participants included 18 patients with schizophrenia (16 men, 2 women) recruited in the psychiatric departments of the University Hospital of Saint-Étienne and 18 healthy comparison subjects (15 men, 3 women) recruited by advertisement in the local newspaper (See **Table 1** for demographic comparisons).



All were volunteers and naive about the hypothesis of the experiment. The local ethics committee of Saint-Étienne approved the study (N◦ IORG0007394), and written consent was obtained from all participants after the nature of the procedures was fully explained.

Patients were included with a DSM-5 diagnosis of schizophrenia (American Psychiatric Association, 2013) and no change in antipsychotic medication and/or clinical status within 4 weeks prior to the study. The same senior psychiatrist assessed all patients using the positive and negative syndrome scale (PANSS; Kay et al., 1987). All were stable outpatients living in their own accommodations and participating in various psychosocial or professional activities.

Across groups, participants were excluded for (1) a diagnosis of neurological brain disorder or head trauma with loss of consciousness, (2) mental retardation, and/or (3) a history of substance abuse over the last 6 months. All participants were right handed (scores > 14, assessed using the modified Edinburgh Handedness Inventory; Oldfield, 1971).

A power analysis, conducted via G∗Power Software (Faul et al., 2007), with Cohen's recommendations (Cohen, 1988), which assumed a medium effect size of 0.25 for the ANOVA with one between-subjects factor and three within-subjects factors (eight levels as repeated measures), indicated that a total of 20, 18, and 16 participants were required, respectively, to have a 90, 85, or 80% power (a minimum required by Cohen, 1988) of detecting a significant effect at p-value of 0.05. Thus, our proposed sample size of 36 subjects will be more than adequate for the main objective of this study. The power analysis which assumed a medium effect size of 0.25 for the ANOVA with three within-subjects factors in each group of participants (controls and schizophrenics) indicated that a total of 20, 18, and 16 participants were required to have 90, 85, and 80% power of detecting a significant effect at p-value of 0.05. In the present study, 18 subjects performed all conditions in each group.

### Apparatus and Materials

We employed the same material we used previously (Sevos et al., 2013), adding only a visual prime (participant's surname or the imaginary surname "Rani") before presenting each object in each trial. We chose an imaginary name to avoid reference to any of the participants. Names printed in black 32-point Arial font were presented to subjects in the middle of a white screen.

A total of 88 black-and-white photographs of 22 objects graspable by one hand (Appendix 1) were presented in two horizontal (compatible with either a right- or left-handed grasp) and two vertical (upright or inverted) orientations (**Figure 1**) on the computer screen. The average size of photos was 512 × 384 pixels to maintain the proportions of each object at a distance of 50 cm.

### Design and Procedure

Participants were seated with their heads 50 cm in front of the screen and first shown every photo in both orientations to ensure they could recognize each object upright and inverted.

During the task, they were required to indicate as quickly as possible whether the object was upright or inverted by pressing the corresponding response key. Each participant carried out two blocks of 88 trials with a break of 3 min between blocks. In one block, subjects were to respond with their right hand for upright objects and their left hand for inverted objects. In the second block, they were asked to do the opposite – to respond with their right hand for inverted objects and their left hand for upright objects. The order of these blocks was counterbalanced between subjects. Within each block, 44 randomized trials were primed by the participant's surname and the other 44, by the imaginary surname "Rani." Before carrying out each block, the subject was informed that his surname would appear before objects as if he owned them or the surname Rani would appear before objects as if Rani owned them, and

FIGURE 1 | Examples of the stimuli used in Experiment 1: left orientation, upright; right orientation, upright; left orientation, inverted; right orientation, inverted.

they were reminded to respond to objects according to the object's orientation and not the presented surname. A 1-min break was proposed after the first 44 trials. The order of primes was also counterbalanced between participants. Moreover, for each participant, objects primed by their own surname were never primed by Rani, and objects primed by Rani were never primed by their own surname, and the order of presentation was counterbalanced between subjects. During the whole experiment, participants were instructed to keep their right finger on a right response key (L) and their left finger on a left response key (S). Response keys were situated 15 cm apart and 20 cm in front of the screen, on a standard European (AZERTY) keyboard.

Each participant received six practice trials for each kind of prime (surname of participant or Rani) before each block. Each experimental trial started with the presentation of the prime in the center of the screen for 250 ms followed immediately by a photo of one of the 22 objects. The stimulus stayed on the screen until an answer was given or up to 3000 ms. A brief auditory tone on the computer indicated errors to participants.

## Results

For all conditions, participants responded within the required time limit of 3000 ms. An analysis of variance (ANOVA) was conducted on the participants' data (errors and response times [RT]) with group (patients or controls) as the between-subject factor and prime (congruent or incongruent), response (left or right hand) and object orientation (left or right) as the withinsubject factors.

#### Errors

Errors were rare (M = 3.64%, standard error [SE] = 0.5). The ANOVA showed no main effect of either between- or within-subject factors: group [F(1,34) = 3.300; P = 0.078; η <sup>2</sup> = 0.09]; prime [F(1,34) = 0.027; P = 0.871; η <sup>2</sup> < 0.01]; response [F(1,34) = 0.156; P = 0.695; η <sup>2</sup> < 0.01]; and orientation [F(1,34) = 0.041; P = 0.840; η <sup>2</sup> < 0.01] (**Table 2**). Neither did we find interaction among factors (all F-values were less than 2.152).

### Response Times

The mean RT and standard deviation (SD) were calculated for each subject, and response times above 2 SDs of their own individual mean were eliminated (4.3%).

We found no effect of vertical orientation [F(1,34) = 0.926; P = 0.342; η <sup>2</sup> < 0.03] or mapping responses [F(1,34) = 1.985; P = 0.168; η <sup>2</sup> < 0.06].

Globally, an increased RT of patients (M = 783 ms; SE = 24) compared with controls (M = 671 ms; SE = 23) was reflected by a significant main effect of group [F(1,34) = 11.492; P = 0.002; η <sup>2</sup> = 0.25]. We found no main effect of prime [F(1,34) = 0.134; P = 0.717; η <sup>2</sup> < 0.01]. RTs did not differ significantly when the object was primed by either the participant's surname (M = 724 ms; SE = 19) or the surname Rani (M = 729 ms; SE = 17). Neither did we find an effect of response [F(1,34) = 0.233, P = 0.633; η <sup>2</sup> < 0.01] or orientation [F(1,34) = 0.007; P = 0.935; η <sup>2</sup> < 0.01]. The only significant interaction was between group, prime, response, and orientation [F(1,34) = 8.536; P = 0.006; η <sup>2</sup> = 0.21]. To facilitate reading, we will present separately the analysis according to type of prime (incongruent or congruent) and according to group (patients and controls) (**Table 3**).

In the incongruent prime condition, group [F(1,34) <sup>=</sup> 2.638; P = 0.114; η <sup>2</sup> = 0.07] did not change the compatibility effect (measured by the interaction of response × orientation), but in the congruent prime condition, the 3-way interaction of group × response × orientation was significant [F(1,34) <sup>=</sup> 6.202; P = 0.018; η <sup>2</sup> = 0.15]. When the prime was the participant's name, the temporal patterns of responses, which highlighted the effects of compatibility and incompatibility, differed significantly according to the group of participants.

In the control group, the interaction of response × orientation, which measures the compatibility effect, was significant [F(1,17) = 5.255; P = 0.035; η <sup>2</sup> = 0.24], as was the 3-way interaction of prime × response × orientation [F(1,17) = 6.642; P = 0.020; η <sup>2</sup> = 0.28]. For this group, in the congruent prime condition, right-hand responses were faster when the orientation of the object was also to the right (M = 658 ms; SE = 23) rather than left (M = 677 ms; SE = 24) [F(1,17) = 6.582; P = 0.020; η <sup>2</sup> = 0.28]. Similarly, left-hand responses were faster when the orientation of the object was also to the left (M = 647 ms; SE = 24) rather than the right (M = 678 ms; SE = 31) [F(1,17) = 5.802; P = 0.028; η <sup>2</sup> = 0.25]. The interaction of response × orientation was significant [F(1,17) = 9.602; P = 0.007; η <sup>2</sup> = 0.36]. By contrast, in the incongruent prime condition, the interaction of response × orientation was not significant [F(1,17) = 0.358; P = 0.558; η <sup>2</sup> = 0.02] (**Figure 2**).

In the patient group, the interaction of response × orientation was not significant overall [F(1,17) = 0.354; P = 0.560; η <sup>2</sup> = 0.02], not modified by the prime condition [F(1,17) = 2.584; P = 0.126; η <sup>2</sup> = 0.13], nor significant in either the congruent [F(1,17) = 0.390; P = 0.541; η <sup>2</sup> = 0.13] or incongruent prime condition [F(1,17) = 2.458; P = 0.135; η <sup>2</sup> = 0.13] (**Figure 2**).

## Discussion of Experiment 1

Response times of controls were shorter when the graspable part of the object and the response hand were compatible but only when the subject's surname was used to prime the object. When an imaginary surname was used as the prime, no compatibility effect was apparent, findings in accord with those of Constable et al. (2011) that specifying the owner of the perceived object modulated affordance effects. Participants might view their own surname as a request to perform a certain action with the object, whereas seeing another name might be interpreted that the other person should perform the action. Therefore, using a participant's own name as a prime seems to be an ecological way to lead subjects to build a sense of property of the perceived object.

Nevertheless, this kind of prime seems insufficient to create the sense of property in schizophrenia. Patients did not respond faster in the case of compatibility than that of incompatibility whether the objects were primed using their own or an imaginary name. In this group, even when primed by the


TABLE 2 | Error rates (SD) based on patient or control group, congruent or incongruent prime, left or right orientation, and left- or right-handed response in Experiment 1.

TABLE 3 | Means (SD) of response times (in ms) according to patient or control group, congruent or incongruent prime, left or right orientation, and leftor right-handed response in Experiment 1.


subject's surname, the visual perception of an object did not potentiate the actions "normally" associated with it. So, we can assume that for patients with schizophrenia, this type of prime is not sufficiently relevant to allow the emergence of sensorimotor compatibility between an object and action to perform.

Globally, responses of subjects with schizophrenia were slower even if they committed no more errors than controls and whether or not the name used as a prime was their own. Thus, it seems that additional cognitive cost is required of patients to achieve the same results. More costly in attentional resources, the implementation of controlled processes is required if the motor simulation does not emerge when the stimulus and response share sensorimotor characteristics.

The use of the participant's first name as a prime seems insufficient for patients to perceive the action-relevant features of prehensile objects. Indeed, the behavioral impact of perceived object affordance seems to depend heavily on the action context in which the object is presented, carrying on the subject motor intentionalities. Studies in healthy subjects have shown, for example, that naming an object evokes gestural knowledge about its form and function that automatically elicits action potentialities, whereas passively viewing the object does not (Bub and Masson, 2006).

In a second experiment, to facilitate the perception of the action-relevant features of an object and carry on an implicit motor intention in the patient, we primed objects using action sentences reflecting congruency with the use of the objects in everyday life.

## EXPERIMENT 2

If context and goal of action can modulate affordance effects (Borghi et al., 2012), the use of action sentence primes with sensorimotor characteristics congruent with the goal of action induced by the conventional use of a presented object should produce affordance effects in both groups. However, this effect should not emerge when objects are primed using incongruent sentences.

## Materials and Methods Participants

Subjects were 18 patients with schizophrenia (15 men, 3 women) and 18 healthy controls (15 men, 3 women) recruited in the same manner and using the same inclusion and exclusion criteria as those of Experiment 1 (See **Table 4** for demographic comparisons).

The local ethics committee of Saint-Étienne approved the study (N◦ IORG0007394), and informed written consent was obtained from all participants.

For the power analysis, see Experiment 1.

## Apparatus and Materials

Among the 22 everyday objects graspable by one hand used in the previous experiment, six of them were presented in double exemplary (for example two different saucepans, two different bottles of detergent. . .). In this second experiment, we kept 16 single objects. We presented all objects in two horizontal orientations (compatible with either right- or left-handed grasp) but in only upright orientation. Each object was primed one time by one action sentence congruent with goals of action

and response hand (left or right) in the control group and in the group with schizophrenia. Error bars represent standard errors of the mean.

TABLE 4 | Comparison of ages, years of education, and scores on the Edinburgh Handedness Inventory (SD) between patients and controls in Experiment 2.


induced by the use of the object and another time by an incongruent sentence (Appendix 2). We formulated the sentences in relation to the photos of objects used in Experiment 1 by asking 80 students to name the action verb and direct object complement that seemed to them most related to the object in each photo. We used those most often cited (>80%) and subsequently asked the students to form pairs of pictures and sentences that seemed to them most incongruent. The sentences were built to be of almost the same length in French. Overall, in this experiment, 64 pictures were displayed in the middle of a computer screen (with the same characteristics of those in Experiment 1) preceded by sentences written in black 32-point Arial font.

#### Design and Procedure

To ensure that subjects read the sentences presented as primes, we asked them to indicate as fast and accurately as possible if the object pictured (e.g., an iron) was congruent with the action sentence prime (for ironing clothes) or not (for cutting bread). Each participant carried out two blocks of 64 trials with a 3-min break between the blocks. These blocks differed in terms of response mapping (right-hand congruency versus lefthand incongruency and left-hand congruency versus right-hand incongruency) and were counterbalanced between subjects. As in Experiment 1, participants were instructed to keep their right finger on the right response key (L) and their left finger on the left response key (S) during the entire experiment. Response keys were situated 15 cm apart and 20 cm in front of the screen on a standard European (AZERTY) keyboard.

Each experimental trial started with the presentation of an action sentence as a prime for 2000 ms followed by the presentation of a photo of one object, which remained on the screen until an answer was given up to 3000 ms. A brief auditory tone on the computer informed participants of errors.

Each participant received eight practice trials using a different set of sentences and photos before each block. Depending on the situation, the response hand could be on the same side as the graspable part of the object (compatible orientation) or on the opposite side (incompatible), and the action sentence prime could be congruent or incongruent with the normal action induced by the given object.

## Results

For all conditions, participants responded within the required time limit of 3000 ms. We evaluated data regarding participants' errors and response times using ANOVA with group (patients or controls) as the between-subject factor and prime (congruent or incongruent), response (left or right hand), and object orientation (left or right) as within-subject factors.

#### Errors

Errors were rare (M = 3.46%; SE = 0.4), and the ANOVA showed no significant main effect of either between- or within-subject factors: group [F(1,34) = 2.738; P = 0.107; η <sup>2</sup> = 0.07]; prime [F(1,34) = 0.386; P = 0.539; η <sup>2</sup> = 0.01]; response [F(1,34) = 0.461; P = 0.502; η <sup>2</sup> = 0.01]; and orientation [F(1,34) = 2.400; P = 0.131; η <sup>2</sup> = 0.07]. **Table 5** delineates error rates. We found no interaction of factors (all F-values were less than 2.400).

### Response Times

We calculated the mean response time and standard deviation for each subject and excluded RTs above 2 SDs of the individual's own mean (4.4%).

We found no effect of mapping responses [F(1,34) = 1.227; P < 0.276; η <sup>2</sup> = 0.03].

The longer RTs of patients (M = 797 ms; SE = 35) than controls (M = 594 ms, SE = 23) reflected a significant main effect of group [F(1,34) = 22.897; P < 0.001; η <sup>2</sup> = 0.4]. We also found a main effect of prime [F(1,34) = 28.236; P < 0.001; η <sup>2</sup> = 0.5]. Response times were shorter when the prime was congruent with the goal of action induced by the use of the given object (M = 672 ms; SE = 20) than when the prime was incongruent with it (M = 719 ms; SE = 23). However, we found no effect of either response [F(1,34) = 0.663; P = 0.421; η <sup>2</sup> = 0.02] or orientation [F(1,34) = 0.180; P = 0.674; η <sup>2</sup> < 0.01]. The only significant interaction was observed between group, prime, response, and orientation [F(1,34) = 4.254; P = 0.047; η <sup>2</sup> = 0.11]. As in Experiment 1, we present separate analyses according to the congruency or incongruency of the prime and the patient or control group (**Table 6**).

The 3-way interaction of group × response × orientation was significant when the prime was congruent [F(1,34) = 8.774; P = 0.006; η <sup>2</sup> = 0.21] but not incongruent [F(1,34) = 0.340; P = 0.563; η <sup>2</sup> < 0.01]. When the prime was congruent with the goal of action induced by the use of the object, the temporal patterns of response differed significantly according to the participant group.

In the control group, the interaction of response × orientation, which measures the effect of compatibility, was not significant [F(1,17) = 2.890; P = 0.107; η <sup>2</sup> = 0.15], but the 3-way interaction of prime × response × orientation [F(1,17) = 10.359, P = 0.005, η <sup>2</sup> = 0.38] was. In the congruent prime condition, the interaction of response × orientation was significant [F(1,17) = 9.446; P = 0.007; η <sup>2</sup> = 0.36]. Right-hand responses were faster when the object was also oriented to the right

TABLE 5 | Error rates (SD) based on patient or control group, congruent or incongruent prime, left or right orientation, and left- or right-handed response in Experiment 2.


#### TABLE 6 | Means (SD) of response times (in ms) according to patient or control group, congruent or incongruent prime, left or right orientation, and leftor right-handed response in Experiment 2.


(M = 552 ms; SE = 18) rather than to the left (M = 577 ms; SE = 25) [F(1,17) = 5.428; P = 0.032; η <sup>2</sup> = 0.24]. Similarly, lefthand responses were faster when the object was also oriented to the left (M = 570 ms; SE = 25) rather than to the right (M = 601 ms; SE = 29) [F(1,17) = 5.357; P = 0.033; η <sup>2</sup> = 0.24]. By contrast, in the incongruent prime condition, the interaction of response × orientation was not significant [F(1,17) = 0.061; P = 0.808; η <sup>2</sup> < 0.01] (**Figure 3**).

In the patient group, the interaction of response × orientation was not significant [F(1,17) = 0.666; P = 0.426; η <sup>2</sup> = 0.04] and not modified by the prime condition [F(1,17) = 1.288; P = 0.272; η <sup>2</sup> = 0.07], whether congruent [F(1,17) = 2.505; P = 0.132; η <sup>2</sup> = 0.13] or incongruent [F(1,17) = 0.280; P = 0.604; η <sup>2</sup> = 0.02] (**Figure 3**).

## Discussion of Experiment 2

Controls responded more quickly when orientation was compatible between the graspable part of the object and the response hand but only when the prime was a congruent action sentence. An incongruent sentence cued no compatibility effect. These results show that affordance effects can be modulated according to variations in context and particularly according to the goals of action inferred from the experimental setting. Thus, specification of the proper conventional use of an object facilitates the simulation of a particular pattern of motor responses. By contrast, incongruency between the action sentence and perceived object disrupts the affordance effect. These results suggest that if the action implied by a sentence cannot be performed with the object, subjects might not activate the affordances usually provided by the object.

Nevertheless, the subjects with schizophrenia demonstrated no sensorimotor compatibility effect regardless of the congruency of the semantic prime. There was no action potentiation effect even when the sentence and implied action of the object's use were congruent, and their responses were not faster when the target orientation and response hand were compatible.

However, the relatively low and similar error rates between the two groups of subjects demonstrate the correct understanding of instructions and good involvement to perform the task properly of all participants. These results also highlight that patients had no more difficulty than controls in responding to perceived congruency between the sentence prime and target object, which indicated their understanding of the function of the everyday objects presented to them on the screen. Thus, all participants seemed sensitive to the congruency between the aim of action cued semantically and the perceived object. However, though we observed that patients considered semantic context, the expression of the goal of the action seemed insufficient to create action potentiation during the perception of the objects.

## DISCUSSION

We hypothesized that enriching the contextual environment could influence affordance effects in healthy subjects and facilitate their emergence in patients with schizophrenia. Adapting a paradigm inspired by Tucker and Ellis (1998) to observe the potential modulation of affordance effects, we conducted two experiments in which we introduced a picture of a graspable object using a semantic prime to suggest a sense of ownership of the object (Experiment 1) or goals of action for its use (Experiment 2).

The control group demonstrated the emergence of sensorimotor compatibility effects, but only when the prime was congruent with the perceived object. Indeed, the modulation of the environmental context by conceptual priming influenced the sensorimotor compatibility effects. In Experiment 1, they emerged only when the objects were preceded by the participant's surname as a reference for ownership of the perceived object, and in Experiment 2, they emerged when action sentences were congruent with goals of action induced by the conventional use of the perceived object. Action potentialities can emerge through simulation mechanisms when the meaning of a stimulus is relevant to the action and when the expected motor response shares components of this action (Girardi et al., 2010).

In our study, context seems to act as a resource to potentiate action when it is congruent with the action implied by the perceived object and with the intention of the subject. The presentation of a congruent prime would enable the preparation of motor action, resulting in the emergence of the affordance effect. Indeed, Vandevoorde (2011) showed that an efficient coupling between the perception of and action associated with an object that is perceived beneficial to the subject will facilitate the reactivation of this kinesthetic image in a similar situation by reinforcement and motor habituation. In this case, the object becomes a "visuomotor opportunity" (Rizzolatti and Sinigaglia, 2008) that is identified based on its motor potentialities. The brain would be able to recognize its environment solely according to these potentialities even in the absence of the superior mobilization of reasoning, so the motor system would then fully participate in identifying and understanding the surrounding world.

Though many studies have focused on the influence of physical context, it seems important to determine the influence of other kinds of context, such as social and functional context, on the activation of affordance (Van Dam et al., 2010; Borghi and Riggio, 2015). In Experiment 1, we observed, we believe for the first time, the potential influence of one's surname, an example of social and personal data with which we grow from childhood, to simulate an individual's appropriate actions toward an object when the name is used as a prime for the object's perception. In Experiment 2, we measured the influence of the input of functional knowledge using action s ("to drink coffee") to prime the presentation of an object ("a cup"). We wanted to ensure that the participant simulates the expected action: in this example, the most appropriate way to use the cup to drink coffee is to grasp it by its handle. If Van Dam et al.'s (2010, p. 5) showed "that preparing an action congruent to the typical, functional use of an object, facilitates processing of the word denoting the object," then we demonstrated the inverse relationship here.

The results of both our experiments provide further evidence that affordances are both intrinsic to objects and flexible,

that they involve the subject and his environment. Even if affordances are initiated automatically, they are then selected to the current task (Borghi and Riggio, 2015). Indeed, we also showed that an incongruent context did not provoke the emergence of affordance effects even when perceived objects were the same. Buccino et al. (2009) tested the modulation of the motor system when an object's features are violated, such as when the handles of graspable objects are broken, and found no activation of affordance in the absence of pragmatic conditions to perform an action associated with an object. Further studies are needed to detail the mechanisms underlying a total absence of sensorimotor activation or an inhibition of the action potentialities in the case of incongruent context (Anelli et al., 2012; Borghi and Riggio, 2015).

In our subjects with schizophrenia, selected primes did not seem to share sensorimotor features in a relevant way with the current task. Even priming the perception of a visual object using a semantic context to reinforce the sense of property or goals of action of the object did not automatically potentiate the action associated with its use.

In this study, patients were slower than controls in both tasks. Our previous study (Sevos et al., 2013) revealed no such slowdown when we measured the compatibility between the spatial localization of a stimulus and the motor response (Simon task), which seemed to indicate that visuo-spatial integration is automatic in both patients with schizophrenia and healthy subjects. However, in an affordance task, longer response times of patients than controls suggested no automatic binding between perception and action in patients. In that previous work, we interpreted the increased response time as the time to implement controlled processes more costly in attentional resources.

Our current findings again challenge the precept that the mere observation of graspable objects is sufficient to evoke their affordances because objects elicit components of appropriate motor programs associated with object interaction (Borghi and Riggio, 2015). For example, Yu et al. (2014) failed to replicate compatibility effects when participants were not explicitly instructed to imagine picking up pictured objects. However, in both of our experiments, patients with schizophrenia made no more errors than controls, reinforcing the idea that they had functional knowledge of the presented objects-if the object was upright or inverted, if one drinks coffee with a cup or frying pan. Though Garbarini and Adenzato (2004) claim that motor simulation is the only way to develop knowledge of the action possibilities made available by objects, we cannot

agree. Indeed, despite the lack of sensorimotor stimulation, our subjects with schizophrenia usually demonstrated the capacity to use everyday life objects in an appropriate way. In this case, simulation could be considered as the default procedure that can occasionally be supplemented or overridden by theoretical considerations, as proposed by Jeannerod and Pacherie (2004).

In pathology, Cattaneo et al. (2007) showed, for example, the inability of autistic children to rely on a motor preparation before executing a movement even if they desired to achieve a requested goal and were fully able to carry out the requested actions. The researchers recorded electromyographic (EMG) activity in children with autism and children with normal development as they executed a gesture (arm flexion toward itself) (Experiment 2). The gesture could lead to two different actions and so involved two different intentions-bringing a piece of food to the mouth (eating action) or putting a piece of paper into a container placed on the shoulder (placing action). Controls demonstrated the increased activity of muscles responsible for the final goal of the action (eating a piece of food) as soon as the action began (reaching for he piece of food). In the children with autism, those muscles became active only during the bringing-to-the-mouth phase. In another experiment (Experiment 1), those authors showed an indirect link between the activity of mirror neurons supposed to support the understanding of the intentions of others and sensorimotor simulation. Using the same procedure, they showed increased activation of jaw muscles as controls observed the eating action but not during the placing action, thus demonstrating the existence of links between motor intention and sensorimotor simulations that activate the muscles involved in the final action. In contrast, the children with autism showed no muscle activation while observing either eating or placing actions. The authors interpret these results as a lack of motor activation underlying action-understanding in children with autism; the children may understand the others' intentions cognitively (particularly when semantic cues are given by a piece of the object) but not experientially.

Motor facilitation during action observation, which putatively reflects the activity of mirror neurons, could also be reduced in schizophrenia. In their study, Enticott et al.'s (2008) group administered transcranial magnetic stimulation (TMS) while presenting video clips showing the abductor pollicis (APB) of the right hand during different activities (thumb movement, pen grasp, or handwriting) and recorded motor-evoked potentials (MEP) from the right APB muscle of subjects. The significant increase in the amplitude of MEP for these three activities compared with the baseline in controls and the absence of any change in patients with schizophrenia led these authors to conclude that reduced activation of mirror neurons impairs the ability to experience an internal simulation of other's behavior.

Using our experimental paradigm, associated with physiological measures (TMS or MEP for example), could be relevant to objectify the underlying sensorimotor process in schizophrenia. In the absence of such studies, our behavioral results seem to suggest that our subjects with schizophrenia also have impaired ability to experience an internal simulation of motor action potentialities when they perceived graspable objects, which would indicate that all activities of daily life would require the involvement of higher cognitive processes rather than lower level sensorimotor processes. Patients expressed, for example, knowing how to set a table but needing to think about each step to accomplish the task. Jeannerod (2001), Gallese (2009), and Vandevoorde (2011) agree that it is precisely this sensorimotor simulation that not only enables the subject to anticipate an action but provides as well a "motor thought," an automatic, almost intuitive knowledge from his surrounding world. This ability to anticipate his actions should allow the subject to act seamlessly in his environment and feel familiar with it and current social situations. At a perceptual level, objects and persons generally appear familiar and intelligible according to our expectations of them from past experience. We postulate that the impairment of sensorimotor simulation could partly explain the loss of this "common sense" of things, sometimes encountered in schizophrenia. If perception appears deprived of its fullness and no longer related to motor actions, but is more like a purely receptive process (Parnas and Handest, 2003), it is not surprising that subjects with schizophrenia can feel a strangeness, that is "when the meanings of objects in the world (e.g., "What is this chair for?") and of the actions of others (e.g., "Why is he laughing?") appear uncanny" (Stanghellini, 2000, p. 779).

Similarly, Fuchs and Schlimme (2009) claimed that the disintegration of all normally automatic behaviors of everyday life is a major feature of schizophrenia that more broadly reflects a "disembodiment" of the self or of the relation to objects (see also Fuchs, 2005; Stanghellini, 2009; Sass, 2013). Our experimental results seem to converge with clinical observations as well as psychopathological and phenomenological data that demonstrate a dissociation of patients with schizophrenia and their environment.

The study of pathology such as that of schizophrenia, which precisely undermines the notion of coherence, requires a more unified, coherent, and comprehensive approach that takes into account concepts and methods based in embodied cognition (Glenberg et al., 2013).

## Limitations and Implications for Future Studies

This pilot study is limited by our small sample size and the relatively weak positive and negative symptoms of patients that might restrict the generalization of our findings.

Future studies with larger samples of subjects with schizophrenia are warranted to confirm our findings, which suggest the impairment of sensorimotor integration even in patients with milder symptoms.

We explored the emergence of object affordance effects in schizophrenia by varying experimental contexts and found that the use of conceptual priming (making the action more specific or reinforcing the purpose of the action for the patient) seems insufficient to trigger motor simulation when subjects perceive objects used in everyday life. In futures studies, we propose to enhance context by introducing visuomotor or motor priming.

Indeed, the literature shows that the state of the motor system can influence the perception of objects (Craighero et al., 2002), but a phase of motor training can also be used to facilitate the effect of sensorimotor compatibility (Borghi et al., 2007).

In addition, the use of brain imaging techniques and electrophysiological measures enhance understanding of the cerebral and physiological mechanisms involved when control subjects perform tasks that involve motor simulation (e.g., Grezes and Decety, 2001, 2002; Buccino et al., 2009). Using such techniques in patients could similarly improve our understanding of these phenomena in schizophrenia.

## AUTHOR CONTRIBUTIONS

JS and AG designed the study procedures and methods, collected most of the data, conducted all statistical analyses, interpreted the data, wrote the first draft of the manuscript, and edited

## REFERENCES


subsequent versions. DB, JP, and CM proved advice regarding study procedures, methods, and data interpretation and edited the manuscript several times.

## ACKNOWLEDGMENTS

We would like to thank all participants, Dr. Thibaut Brouillet for his helpful comments, Pr. Vincent Dru for technical assistance and Rosalyn Uhrig, M.A., for the English language-editing of this manuscript.

## SUPPLEMENTARY MATERIAL

The Supplementary Material for this article can be found online at: http://journal.frontiersin.org/article/10.3389/fpsyg. 2016.01551

intention understanding. Proc. Natl. Acad. Sci. U.S.A. 104, 17825–17830. doi: 10.1073/pnas.0706273104




**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

The reviewer LT and handling Editor declared their shared affiliation, and the handling Editor states that the process nevertheless met the standards of a fair and objective review.

Copyright © 2016 Sevos, Grosselin, Brouillet, Pellet and Massoubre. This is an openaccess article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Biofunctional Understanding and Conceptual Control: Searching for Systematic Consensus in Systemic Cohesion

#### Asghar Iran-Nejad\* and Fareed Bordbar

Department of Educational Studies in Psychology, Research Methodology, and Counseling, The University of Alabama, Tuscaloosa, AL, United States

For first generation scientists after the cognitive revolution, knowers were in active control over all (stages of) information processing. Then, following a decade of transition shaped by intense controversy, embodied cognition emerged and suggested sources of control other than those implied by metaphysical information processing. With a thematic focus on embodiment science and an eye toward systematic consensus in systemic cohesion, the present study explores the roles of biofunctional and conceptual control processes in the wholetheme spiral of biofunctional understanding (see Iran-Nejad and Irannejad, 2017b, Figure 1). According to this spiral, each of the two kinds of understanding has its own unique set of knower control processes. For conceptual understanding (CU), knowers have deliberate attention-allocation control over their first-person "knowthat" and "knowhow" content combined as mutually coherent corequisites. For biofunctional understanding (BU), knowers have attention-allocation control only over their knowthat content but knowhow control content is ordinarily conspicuously absent. To test the hypothesis of differences in the manner of control between CU and BU, participants in two experiments read identical-format statements for internal consistency, as response time was recorded. The results of Experiment 1 supported the hypothesis of differences in the manner of control between the two types of control processes; and Experiment 2 confirmed the results of Experiment 1. These findings are discussed in terms of the predicted differences between BU and CU control processes, their roles in regulating the physically unobservable flow of systemic cohesion in the wholetheme spiral, and a proposal for systematic consensus in systemic cohesion to serve as the second guiding principle in biofunctional embodiment science next to physical science's first guiding principle of systematic observation.

Keywords: biofunctional understanding, declarative fact-seeking, procedural knowhow, embodiment science, spiral of biofunctional understanding, systematic observation, systematic consensus, unobservable systemic cohesion

## INTRODUCTION

## The Myth of the Knowledge Stored in Connections

In a panel discussion entitled "The computational conception of mind" with Gilbert Harman, John Haugeland, Jay McClelland, Allen Newell, Dana S. Scott, and Zenon Pylyshyn as participants, the moderator, Scott (1990) asked, Will there be a theory of comprehension? The answer to be sought by the audience was hidden in the moderator's

Edited by:

Zheng Jin, Zhengzhou Normal University, China

#### Reviewed by:

Ryan Alverson, Northern Kentucky University, United States Yuejin Xu, Murray State University, United States

> \*Correspondence: Asghar Iran-Nejad airanne@ua.edu

#### Specialty section:

This article was submitted to Cognitive Science, a section of the journal Frontiers in Psychology

Received: 10 April 2017 Accepted: 15 September 2017 Published: 24 October 2017

#### Citation:

Iran-Nejad A and Bordbar F (2017) Biofunctional Understanding and Conceptual Control: Searching for Systematic Consensus in Systemic Cohesion. Front. Psychol. 8:1702. doi: 10.3389/fpsyg.2017.01702

statement "In our view, the implicit knowledge is stored in connections among simple processing units organized into networks" (p. 39). This excerpt is widely circulated in unmistakably similar words throughout the community of the second generation cognitivist, especially, the literature on parallel distributed processing (PDP), a well-known predecessor to embodied cognition (see Harnad, 1990; Iran-Nejad and Homaifar, 2000).

For more than a decade after the cognitive revolution, shortterm control processes and long-term storage architectures dominated the field of first-generation cognition (Neisser, 1967; Atkinson and Shiffrin, 1968). Mind connections, frames, and hierarchies ranged from the most concrete sensory to the most abstract conceptual levels (Rumelhart and Ortony, 1977; Rumelhart, 1980). Network metaphors were everywhere representing unobservable mind connections in concept maps, semantic spaces, and memory taxonomies for saving content inside knowers. The mechanistic vernacular of the firstgeneration cognition, mostly metaphysical in nature, inspired by the computer-program analogy (Neisser, 1967), soon faced challenges from critics (Bransford and Johnson, 1972; Iran-Nejad, 1980/1987), could not stay free from trouble (Jenkins, 1974; Bransford et al., 1977; Iran-Nejad and Ortony, 1982; Shulman, 1999), and, before long, its pioneers began scrambling for replacement alternatives.

Metaphysical information processing scientists found their cues in spatial and computer metaphors (Roediger, 1980); and were inclined accordingly to mechanize human information processing (e.g., spreading activation) and humanize mechanical architectures, e.g., neural networks (Collins and Loftus, 1975; Beers, 1987; Smolensky, 1987). As a result, when information processing controls and architectures were still at the peak of their popularity, Neisser (1976), who wrote the original book on soft cognitive psychology less than a decade earlier (Neisser, 1967), lamented that information-processing and storage constructs had prestige and momentum but their computer-inspired control processes and structures ran contrary to the human nature (Iran-Nejad and Winsler, 2000). Neisser (1976) himself abandoned the computer metaphor in a hurry and turned first to ecological psychology (Gibson, 1966) and then to biology in search of a more natural human cognition (Neisser, 1987, 1994; Neisser and Winograd, 1988; Neisser and Jopling, 1997). Nevertheless, even today, articles, books, and even entire journals keep spreading Neisser's (1967) metaphysical footsteps in leaps and bounds across this planet (Khemlani et al., 2014). How could one answer Dana S. Scott's question about comprehension or make room for human understanding in this unlikely terrain?

As suggested by the above discussion panel, the momentum Neisser (1976) saw at the expense of human nature kept rising for a few more years (Rumelhart, 1975, 1976; Anderson, 1977; Rumelhart and Ortony, 1977; Schneider and Shiffrin, 1977; Shiffrin and Schneider, 1977; Anderson et al., 1978; Brewer and Lichtenstein, 1981; Brewer and Nakamura, 1984; Brewer, 2000). Then, it faced stiff resistance from supporters of human cognition (Black and Wilensky, 1979; Iran-Nejad, 1980/1987; Thorndyke and Yekovich, 1980; Iran-Nejad et al., 1981/1984; Alba and Hasher, 1983; Beers, 1987). During this transition period, many authors worked on connectionist, computational, and spreading-activation networks or similar metaphors that honored, in the words of Beers (1987), "the major tenets of schema theory by making them conform more clearly to an explicitly ecological perspective relating human understanding to the environment—social and otherwise—in which it takes place" (p. 376). Nevertheless, the reality of the human nature Neisser (1967) had first abandoned and then sought (Neisser, 1976) went his way neither time.

## The Myths of Vegetative Organs and Smart Connections

In the 1990s, the second generation of the science of cognition arrived in a hurry; and, in a short decade or two, embodied cognition swept the planet (Wilson, 2002; Adams, 2010), filling rapidly the post-transition void for a new mainstream field of cognition we own today. However, the unsystematic diversity of these embodied cognition networks, as evident in the titles and texts of the related literature reviews, must have been overwhelming. For example, in characterizing this period, Kiverstein (2012), among others (Wilson, 2002; Gärtner, 2011, 2013), asked exactly what researchers had in mind when they claimed that cognition is embodied. He came back emptyhanded and described the diverse movement in terms of four cold Es (embodied, embedded, extended, enacted) and one hot A (affective). Incidentally, Kiverstein left out one more E (expanded) to account for Clark's (2008) Supersizing the mind apparently by adding more body to the head and more head to the body. Kiverstein traced the origin of the diverse field of embodied cognition uniformly to the Embodied mind (Varela et al., 1991) that "sought to bring about nothing short of a paradigm change" (p. 741). Others suggested that the embodied mind perspective may be a revolution away from the first generation disembodied cognition as well as one from the second generation embodied cognition, a view with which we tend to agree (see Gärtner, 2013). Embodied cognition is. as Kiverstein (2012) suggested, about how cold and hot cognitions are embodied, how they are tacitly embedded in smart network connections, how they extend storage to external media, and how they enact downloads to the body and the world or uploads back into the head and its supersized networks of smart connections, all of which means knowledge is stored in smart connections without at least for now a trace of D. S. Scott's comprehension, not to mention understanding. Embodied mind researchers, by contrast, lean heavily in favor of the biofunctional wisdom of the biological body to ground the contemplative wisdom of the human conceptual understanding (Rosch, 2000, 2001) and find the mechanistic embodied cognition hard on the palate.

## Purpose

The focus of embodied cognition during the second generation cognition continued to be on knowledge and never on understanding. The overall concern of the present study is with biofunctional understanding and in that special sense with the embodied mind (Lakoff and Johnson, 1980; Posner et al., 1982;

Shulman, 1991, 2002; Johnson, 2015). To be sure, understanding was sought by embodied cognition researchers without the inclination to say or pretension to know the word as they looked for revolutions in expressions like symbol-grounding (Harnad, 1990), PDP connectionism (McClelland and Rumelhart, 1986; Rumelhart and McClelland, 1986; Rumelhart et al., 1986) or deep processing (Craik and Lockhart, 1972; Craik and Tulving, 1972). A closer look at the work of embodied cognition pioneers in neuroscience like Edelman (2006) and cognition such as Clark (2008) revealed that they used the concept of understanding 51 and 71 times, respectively, but invariably in the instrumental sense of the term for making sense of other things but never as a construct under investigation in its own right—like knowledge. Perhaps for these embodied cognition scientists, the best that the vegetative body could do would be to save in its intelligent neural networks its own content knowledge.

## Understanding in the Transition Era

In the heat of the transition period, two lines of research in cognitive psychology addressed understanding directly, both rising, in part, to challenge the myth that knowers needed abstract deep structures to reach the realm of understanding (see, however, Shulman, 1984, 1999). One of these came in a critique by Black and Wilensky (1979) of Rumelhart's (1975) first generation story grammar (Wilensky, 1983). Commenting on Rumelhart's pioneering claim that knowers had to employ deep structures to understand stories, these researchers reasoned that deep story structures presupposed rather than caused understanding. The second line of research came in the form of the biological embodiment of understanding and the straightforward assumptions that understanding is the special and unique function of the biofunctional wisdom in the nervous system as the one and only direct, necessary, and sufficient prerequisite for understanding, just as respiration was the special function of the biological activity in the respiratory system as the one and only direct, necessary, and sufficient condition for breathing (Iran-Nejad, 1980/1987; Iran-Nejad and Ortony, 1984; Iran-Nejad and Irannejad, 2017b). The biofunctional view of understanding is that biological systems, subsystems, and microsystems (i.e., neurons) take their turns to be the immediate and direct production site for the intellectual performance that their specialty prescribes. Therefore, far from what is implied by the botanical metaphor of vegetative organs often applied to them, the biological systems of the body are miraculous contributing sources, each in its own marvelously consensual way, to a very special type of wisdom aptly, we believe, called biofunctional understanding (Johnson, 2015).

## Embodying Understanding One Metaphor at a Time with Both Hands Tied in the Back

The transition era provided scarce ground for the kind of evidence, theory, and methodology about biofunctional understanding that is available worldwide today (e.g., Iran-Nejad, 2000; Ziemke et al., 2004; Borghi et al., 2013; Ghorbani et al., 2014; Alverson, 2015; Jin et al., 2015, 2016; Johnson, 2015; Billing et al., 2016; Caligiore et al., 2016; Soylu, 2016; Thill and Twomey, 2016). Therefore, early biofunctional theorizing had to scrape for embodiment one metaphor at a time, just as one had to struggle breathlessly against the downhill current of prestigious metaphysical cognitive psychology (Iran-Nejad, 1980/1987; Iran-Nejad and Irannejad, 2017b). Biological metaphors were shunned vehemently and rejected out of hand by editors, reviewers, and readers alike. Mechanical metaphors were more likely to be allowed; but seldom grabbed attention in the metaphysical world of cognition. Embodied metaphors like color-coded lightbulbs to represent dynamic diversity in unity and cohesive unity in diversity—were used for distributed constellations of firing neurons, blinking traffic arrows were used for their diversecontent sensemaking behavior, and momentary constellation firing was used to represent the multiple-source nature of the dynamic sensemaking process (Iran-Nejad, 1980/1987, 1984; Iran-Nejad et al., 1981/1984; Iran-Nejad and Ortony, 1984). Nevertheless, the sharply vivid metaphors notwithstanding, the experience was nothing less than swimming against a sharply downhill current.

The vivid analogy of the manual camera was used to protect biofunctional theory against the myth of saved prior knowledge in the form of deep structures or otherwise. The prior knowledge hypothesis assumed that past knowledge is inevitable for new learning to occur. The biofunctional theory explicitly disavowed and abandoned this assumption and used the analogy of the manual camera to show how understanding was possible without saved prior knowledge (Iran-Nejad, 1980/1987). However, the assumption of saved prior knowledge was so deeply entrenched that it kept appearing in the reviews of embodied cognition three decades later. Consider the title of the review by Gärtner (2013): "Cognition, knowing and learning in the flesh: six views on embodied knowing in organization studies." This title strongly implies that embodied cognition meant knowledge was saved in the flesh of the body. In fact, the assumption of saved prior knowledge, inevitable for metaphysical theories, is contrary to both the letter and the spirit of biofunctional theory.

The analogy of the manual camera was desperately used in the late 1970s to show that the saved prior knowledge assumption was unnecessary for biofunctional theory (see below). To picture external objects, a mechanical camera needed no internal blueprints for them and, in fact, such blueprints and their hegemonic character would get in the way of accommodating the ubiquitous phenomena of cohesive unity in diversity and productive diversity in unity. To picture a dog, a mechanical camera needed to know neither a disembodied internal template to match against the abstract shape of the dog, as assumed by first-generation cognition, nor an embodied internal statue to match against the body of the dog, as suggested by second generation embodied cognition. To picture a dog, the mechanical camera needed only its own physical hardware and no saved prior knowledge at all. Dynamic biofunctional embodiment of understanding was proposed to counter the theory of embodiment as saved prior knowledge in the body or in the head and to get rid of the assumption altogether. Neither the mechanically crude wisdom in the manual camera nor the

organically sophisticated wisdom in the biofunctional body were the wisdom of saved prior knowledge. According to biofunctional theory, biological flesh had the capacity to create knowledge on demand but no capacity to save and retrieve it whatsoever. That much must have been driven home for the proponents of the saved prior knowledge theory in the 1970s because enough of them rapidly packed their tools and abandoned their so-called structural schema theories and began scrambling for replacement alternatives (e.g., Rumelhart, 1980, 1984; Anderson, 1984).

The camera metaphor was also used in the late 1970s to bring a second problem to the attention of the proponents of saved prior knowledge, although the problem was directly aimed at deepstructure story grammarians: prior knowledge structures were shown to resist change and, as a result, they were more likely to be doubly in the way of understanding than fostering it (Iran-Nejad, 1980/1987). This was illustrated using a surprise-ending story by Thurmond (1978). The point made was that deep-structure templates were static long-term memory patterns. Consequently, they were stable to the point of allowing no change at all. This was so especially in their embodied long-term memory form, in which static embodied forms were as unchanging as statues (Miller, 1978)—they were permanently inordinately stable. To be sure, surprise-ending stories like Thurmond's also needed the benefit of inordinate stability; but to tolerate radical change, they had to be, paradoxically, inordinately changeable at the same time. Biofunctional systems allow the simultaneous capability of inordinate stability and unrestrained flexibility because they can readily create knowledge on demand. Miller (1978) illustrated how dynamic systems do this using the analogy of the shape of a fountain and contrasted it with the change-resistance capacity of static structures like the statue (Iran-Nejad et al., 1981/1984).

The Thurmond (1978) story, for example, was about a nurse, Marilyn. One late night in a large city, she leaves work at a hospital, where she had recently attended to patients badly beaten by a mugger in the area. Driving home on the freeway, she notices that she is running out of gas and debates what to do. Thinking about the mugger in the area and scared, she exits the freeway heading toward the station where she knows the friendly attendant, Gabriel. He fills, cleans the car windows, and when she is about to leave, he insists that she goes inside the station office first to see a birthday gift from his sister. She parks as he signals and follows him inside. Once there, he turns around, locks the door, and pulls a gun out of the drawer. Too frightened to defend herself, she begins experiencing the symptoms of shock as she watches him staring haggardly outside the window with lips moving. Finally, she hears him saying "Sorry, I had to scare you like that. I did not know what else to do when I saw that dude hiding on the floor in the back of your car. I will call the cops now."

In this relatively organically sophisticated storyline, up until the moment of surprise, readers entertain an inordinately stable understanding. In this pre-surprise understanding, the friendly Gabriel is seen as a wolf in sheep's clothing. Then, at the moment of surprise there is a dramatic, rapid-strike flipflop in understanding resulting in readers seeing Gabriel as a Good Samaritan. Remarkably, the storyline causes two mutually incompatible perspectives on one and the same exact text of the story, one way of understanding immediately after another with less than 2 s in between. Administering the rapid-strike flip-flop takes dismissing one understanding and re-assigning another to the relatively long text of the same story. This happens spontaneously without having to go back to actively recall and re-allocate attention to every word, phrase, and sentence of the text all over again (Schallert, 1982; Iran-Nejad, 1986, 1989a,b,c). If one were to assume tightly knit deep structures for every phrase and sentence in the story (see Rumelhart, 1975), instantaneous reorganizations like the one in the Thurmond story would be difficult to imagine, let alone to explain and actively enact.

How did the manual camera play the metaphoric role assigned to it out of desperation in the late 1970 to shed light on this storyline? Clearly, as demonstrated then in the form of a challenge to deep-structure story grammarians, adding the long-winded prior knowledge vernacular could do very little than being in the way. Without the prior knowledge vernacular, the metaphoric role of the manual camera was rather straightforward. In the absence of pre-existing frames, it would simply take a well-built manual camera in the hands of a lifelong photographer with flawless professional artistry to rapidsnap in immediate consecution two pictures of a single scene from two different angles. This would not make a perfectly tight metaphor for replicating the manner and nature of biofunctional embodiment of understanding but would be close enough of an approximation to show that no saved prior knowledge would be necessary and any would be in the way.

## Knower Control Processes<sup>1</sup>

It takes two corequisite sets of control processes to explain the manner of the biofunctional embodiment of understanding without resorting to any saved prior knowledge and do so over and beyond what was said above in the context of the manual camera metaphor (Iran-Nejad, 1990; Iran-Nejad and Chissom, 1992; Iran-Nejad et al., 1992), including the rapid-strike feats of multiple-source understanding (Iran-Nejad, 1986, 1989c) and enjoying (Diener and Iran-Nejad, 1986; Iran-Nejad, 1987; Iran-Nejad and Cecil, 1992) the likes of the Thurmond (1978) surprise-ending story (Iran-Nejad, 1983a,b). The first set, prerequisite for the second, includes processes like realization, recognition, revelation, hearing, seeing, appreciating, grasping, getting, understanding, clicking, apprehending, insight, and the like. Members of this set are knowthat, as opposed to knowhow, processes (Bransford and Schwartz, 1999); they rise spontaneously as a function of the immediate ground of multiple-source biofunctional understanding by the key process of biofunctional knowing by revelation, as opposed to by recall; they are called simply biofunctional understanding (BU) processes because they are caused by the immediate flow of ongoing biofunctional activity;

<sup>1</sup>The theme of this article hinges around the role of the intellectual capacity of knowing in conceptual control. Accordingly, it is deemed advisable for the sake of wholetheme cohesion to use derivatives like knower over the more ambiguous learner, knowhow over the less concise "knowing how," and knowthat over the less uniform and, for the time being, more standard "knowing that." Therefore, knower is used here interchangeably with the "active I," knowthat with declarative content, and knowhow with procedural content.

Iran-Nejad and Bordbar Biofunctional Understanding

and they present themselves to the unwary knower, unbeckoned and in an after-the-fact manner, all with the extraordinary but characteristic click of unmistakably understanding, albeit, at varying degrees of strikingness or surprise (Iran-Nejad, 2000; Prawat, 2000). These processes assume no saved prior or any other kind of knowledge; it is to these processes that, in part, the immediate analogy of the manual camera applies; and it is these processes that are the pure wisdom operators of the physical intellectual capacity of biofunctional understanding. The second set, post-requisite to the first, includes thinking, concentration, contemplation, meditation, prediction, foresight, hindsight, elaboration, application, evaluation, observing, listening, looking, and so forth. These processes are the source and operators of conceptual understanding (CU); they represent the key process of understanding fresh realizations further by reflection; they make up the wisdom of the intellectual capacity of knowing on demand; and they are dependent for their operation on the "active I" process—the third and only other source of contribution to the Iran-Nejad wholetheme spiral of biofunctional understanding and critical thinking (Iran-Nejad, 1978, 2000; Iran-Nejad and Gregg, 2001; Iran-Nejad and Irannejad, 2017a,b). Finally, it is this latter set of processes that links the embodied mind and biofunctional theories supportively and turns into oxymorons the theories of embodied cognition and biofunctional understanding. The two sets of understanding processes function differently, albeit complementarily, in the spiral of biofunctional understanding (Iran-Nejad and Irannejad, 2017b).

The BU and CU processes differ in the manner they relate to spontaneous systemic control relative to the third process the active I" or the person of the knower. BU processes sharpen the ground for cohesion sensing (e.g., spontaneous curiosity) and CU processes serve the cause of cohesion seeking (e.g., active questioning) on the part of the agent or the person of the knower in an overall physical system of diverse subsystems and microsystems (Caligiore et al., 2017). More specifically, CU processes like thinking and reflection may be described as cohesion seeking attention-allocation processes; BU processes such as realization and grasping may be described as cohesion-sensing. Substantial direct and indirect evidence suggests that embodied systems may contribute to understanding as long as the knower uses available "knowthat" content to keep the cohesion-seeking cursor of attention-allocation going on embodied systems as a necessary but insufficient condition for the CU set of control processes to play their role (Iran-Nejad and Irannejad, 2017b). The other necessary condition for CU processes is to serve as the corequisite knowhow for enabling systematic cohesion-seeking on the part of the knower. Accordingly, cohesion-sensing and cohesion-seeking make up the necessary and sufficient conditions for the person of the knower to stay actively involved in attention-allocation to embodied systems thereby combining the contributions of the available (a) declarative content and (b) procedural content. In short, CU is something the knower must (a) knowthat the knower does to keep the cursor of attention-allocation going on fresh revelations caused by embodied systems and (b) the knower must also knowhow to do the same; and do so systematically.

Thus, BU and CU control processes work corequisitely. The BU processes have to do with the contributions of the spontaneous cohesion sensing ground of the intellectual capacity of biofunctional understanding. This is the spontaneous wisdom of the cohesion sensing intellectual capacity of the physical biology. By contrast, the CU processes represent the deliberate wisdom of the cohesion-seeking intellectual capacity of metaphysical knowing. Knowers may deliberately allocate attention to the clicks of understanding coming in the form of, e.g., realizations. They may do so by means of the knowthat revelation content delivered in those realizations, using the knowthat content to allocate attention to the ongoing flow of cohesion, doing whatever it is supposed to be spontaneously doing, e.g., causing clicks of BU. However, the knower must also know (e.g., in order to avoid wild-goose chases after non-existent, unnecessary, irrelevant, or even superstitious knowhows), paradoxically, that the knower does not have to have the knowhow. Here, understanding as cohesion-sensing must come from the corequisite relation between two types of knowledge. In this case, in some very fundamental way, cohesion-seeking works with cohesion-sensing biofunctional understanding in a manner like "fishing" sense or meaning out of ongoing biofunctional activity without, paradoxically, even knowing how to fish but waiting patiently for the fish to surprise by jumping into one's lap.

## The Old and the New in the Long Story Made Short

The long story made short so far in this introduction (Iran-Nejad, 1978, 2000; Prawat, 2000; Johnson, 2015) tells something radically, if not paradoxically, new about something intuitively, if not otherwise, old. Biology is the spontaneous systemic source of the wisdom we have always known and called understanding. As a whole, this idea is, in part, radically new because it lifts dramatically in our minds the biofunctionally alive, well, and still running biological system from the status of the vegetative organ it has always unfairly, if not unethically, held to the new status of the wisdom source it is expectedly going to hold from here on. As a whole, the idea is, in part, old because it now holds inside something we have always known, namely, the intellectual wisdom capacity we call understanding.

For the sake of experimental biofunctional science, and while we are on the topic of something new and something old, the next BU and CU examples assume that the biofunctional process of understanding works analogously to the biofunctional process of salivation, housing some of the oldest and widely recognized and used variables in the experimental-science paradigm of Pavlovian classical conditioning. Conceptually, as in both CU and BU, we knowthat we salivate but, unlike in CU, we also knowthat, paradoxically, the biofunctional process of salivation is not something within the reasonable realm of our conceptual knowhow. We know that we get a steady stream of saliva in our mouths because we sense its post-production presence (i.e., its effect) in there; but we also know contently that getting saliva in our mouth does not have to be within the realm of our

conceptual doing. Unlike for CU, we are simply content and thankful, so to speak, that it gets there; and we are untroubled by the absence in our conceptual understanding of the "how" of the biofunctional process that happens to make sure that it is there as needed to play its vital corequisite role. The fundamental working assumption of the biofunctional theory is that biofunctional understanding occurs in an analogous manner. Knowers know that they understand because they sense the steady stream of its post-production understanding clicks at varying degrees of strikingness; and their sense of conceptual curiosity is ordinarily as unimpressed with the absence of the "how" of biofunctional understanding as it is by the how of biofunctional saliva production. If so, we predict that our study participants should, at least in principle, tend to agree intuitively, e.g., not only with the statement I know that I salivate [biofunctionally] even though I myself do not really know [conceptually] how to salivate but also analogously with the statement I know that I understand people [biofunctionally] even though I do not really know [conceptually] how to understand people.

## MATERIALS AND METHODS<sup>2</sup>

## Purpose and Rationale

The present study tests the a priori prediction, derived from the Iran-Nejad wholetheme spiral of biofunctional-understanding, that CU and BU processes appeal differently in cohesion sensing to knower control; and that CU and BU statements may be used, with ample caution, to carry out the test. More specifically, the CU statements contain both knowthat and knowhow content as corequisites. By contrast, BU statements carry knowthat content but the knowhow content is conspicuously absent in them. Since the two types of statements (a) are identical in format, (b) the format employed pits the declarative knowthat and the procedural knowhow types of content against each other, (c) both statements carry knowthat contents in them, and (d) the knowhow content is absent only in the BU statements, therefore, CU statements are expected to be rated in cohesion sensing as less coherent than BU statements. We report two experiments next. Experiment 1 tests our a priori prediction and Experiment 2 is expected to replicate the results of the first experiment.

## Design

We employed a one-way design with two levels of Statement as a within-subjects factor. In two experiments, participants read statements like the following for internal consistency, as response time was recorded. The two types of statements are otherwise identical in format and other but not all respects; and they should be relatively well-suited for this early-stage investigation.


We used a survey comprising 22 BU and 22 CU statements. Cronbach's Alpha was 0.88 for BU statements and 0.90 for CU statements. An additional CU example with both knowthat and knowhow content present in it was CU3 I know that I pay more attention to main ideas even though I myself do not really know how to pay more attention to main ideas. Therefore, this negative statement was expected to represent a false firstperson claim and be rated on the relatively lower end of the internal consistency scale, compared to BU statements. For this CU statement, the claim of phenomenological certainty in knowing that one pays more attention to main ideas must carry corresponding phenomenological certainty about the corequisite knowing how to pay more attention to main ideas. As a result, negating the knowhow is expected to conflict with the assertion of the knowthat and cause inconsistency. Contrariwise, the following BU example is expected to represent a true firstperson claim and be rated as relatively more internally consistent, compared to CU statements: BU3 I know that I experience clicks of understanding inside me every now and then even though I do not really know myself how to experience clicks of understanding inside me every now and then. This (true) statement was predicted to be rated as more consistent than CU statements in internal consistency. Knowing that one experiences clicks of understanding has no corequisite knowhow for experiencing those clicks because those clicks are the work of biofunctional understanding

## Participants

A total number of 34 students from the same graduate Educational Psychology course in the College of Education (21 women, 13 men; M age = 25, SD = 3.4) participated one semester apart in two studies (N Fall semester: 17, N Spring semester = 17) in exchange for course credit. All students who were contacted volunteered to participate and completed the survey with no missing values. A power analysis, using the GPower software package (Faul and Erdfelder, 1992), revealed that the sample size of 17 was sufficient. The recommended effect sizes ranged between small (f 2 = 0.02), medium (f 2 = 0.15), and large (f 2 = 0.35)

<sup>2</sup> It should be clear by now that knowthat, knowhow, and the like are special-status variables in content knowledge. It is also clear that time and again investigators have turned and returned to these variables; but treatments have seldom reached beyond content per se (Ryle, 1949; Williams, 2008). In relatively recent years, educational researchers have employed the factor of active control in learning (Brown, 1975; Bransford and Schwartz, 1999) and conceptual understanding (Shulman, 2002) as used in the present article (Iran-Nejad, 1990; Iran-Nejad and Chissom, 1992). It is in the background of this research and the literature we have reviewed in the introduction that we are presenting here what is, to our knowledge, the first experimental study of the relationship between knowing and understanding.

(see Cohen, 1977) and the alpha level was P < 0.05. The analysis showed that the statistical power was 1.00 which exceeded 0.99 for the detection of strong (perfect) power at the large effect size level (0.671). The purpose of the second experiment was to replicate the first.

## Procedure

The two sets of statements were presented to participants in Qualtrics version 2013 available online at www.qualtrics.com. The order of presentation was fully randomized. The instructions informed the participants to rate the internal consistency of each statement on a five-point Likert scale ranging from 1 (not consistent at all), 2 (somehow consistent), 3 (consistent), 4 (very consistent), to 5 (extremely consistent). Participants received the Qualtrics link to the study by e-mail. Clicking on the link took the participants first to an IRB-approved informed consent form followed by brief instructions with an example of each type of statement. The participants rated the statements as the program recorded the rating response time between key presses.

## Data Analysis

For the first analysis, two mean consistency rating (CR) scores were calculated over the 22 items within each statement type to obtain two mean scores, one for CU and one for BU statements for each participant. Similarly, two mean character response time (CRT) scores were calculated. First, for each participant, the time in seconds to rate the consistency of each statement was divided by the number of characters and spaces in that statement to obtain a response time in seconds per character. Then, a mean CRT was calculated across the 22 BU and the 22 CU items. Averaging across statements was deemed reasonable because these items were presented to each participant in a fully randomized order. The four means thus obtained for each participant were used as dependent measures in subsequent analyses. For each of the two studies reported, two one-way repeated-measures ANOVAs were used, one for each of the two dependent variables, with two levels of statement type (BU, CU) as a within-subjects variable. Subsequently, a set of linear mixed model (LMM) analyses were also conducted for both studies to confirm the findings of the first analyses. According to McCulloch and Searle (2000) for analyses such as repeated measures of survey respondents, it is common for the data to be correlated and thus, mixed models are used to extend the repeated measure models in GLM. In the present study, correlated data were possible, even though less-likely given that statement items were presented to each subject in a fully randomized order. Therefore, reporting the LMM results was deemed appropriate.

## RESULTS

## Experiment 1

As predicted, participants rated CU statements significantly less internally consistent than BU statements (see **Figure 1**, left panel), F(1,16) = 25.643, P < 0.001, η <sup>2</sup> = 0.616 (MBU = 2.55, SD = 0.65; MCU = 1.58, SD = 0.66). Similarly, the results of the analyses of the response time revealed that participants responded significantly more slowly to CU than BU statements (see **Figure 2**, left panel), F(1,16) = 7.53, P < 0.014, η <sup>2</sup> = 0.32 (MCRT/BU = 0.0925, SD = 0.045; MCRT/CU = 0.2451, SD = 0.22). Thus, the findings confirmed the a priori predictions of the study about the presence/absence of corequisite content knowledge differences in rated internal consistency and response time between CU and BU statements. Follow-up LMM analyses with fixed levels of statement type (CU, BU) and levels of statement items (44) set to be random confirmed the results. For consistency ratings, there was a significant effect for statement type, F(1,725) = 47.319, P < 0.000. This effect was also significant for response time, F(1,725) = 12.083, P < 0.001.

FIGURE 2 | Mean character response time (in seconds) for biofunctional understanding (BU) and conceptual understanding (CU) statement types for Experiments 1 (blue) and 2 (red).

## Experiment 2

The results of Experiment 1 supported the hypothesis that the knower control processes for CU such as thinking and contemplation are different from those for BU like realization and revelation. Experiment 2 used the exact same methodology as Experiment 1 for the purpose of replication. Data analysis of the consistency rating (CR) scores confirmed the results of the first study. Subjects rated the BU statements significantly more internally consistent than CU statements (**Figure 1**, right panel), F(1,16) = 29.65, P < 0.001, η <sup>2</sup> = 0.679, MBU = 2.80, SD = 0.85, MCU = 1.43, SD = 0.53. The results of the analyses of the CRT scores revealed that there was also a significant difference between participant responses to BU and CU statements (**Figure 2**, right panel), F(1,16) = 8.626, P < 0.001, η <sup>2</sup> = 0.381, MCRT/BU = 0.0828, SD = 0.05188, MCRT/CU = 0.2964, SD = 0.2717. Follow-up LMM analyses also confirmed the results both for consistency ratings, F(1,725) = 18.012, P < 0.001 and response time, F(1,725) = 11.208, P < 0.001.

Comparison of the left and right panels in the two Figures shows that the results of Experiment 2 closely matched those of Experiment 1. The two adjacent panels in each figure reveal the same pattern of results for the two experiments. As already said, all statements used the same exact format and normal semantic content; and they all contained knowthat knowledge. They differed, however, in the degree they did or did not carry knowhow content. In both studies, as expected, CU statements (that contained knowhow content) were rated as being more internally inconsistent and showed slower response time relative to BU statements that were characterizable by the relative absence of conceptual how content.

## DISCUSSION, CONCLUSION, AND FUTURE DIRECTIONS

## Knowledge Everywhere, and Not a Faint Sign of Understanding Anywhere

Historically, knowing and understanding have been regarded as one and the same intellectual capacity. As a result, studies of understanding have been non-existent in the midst of widespread investigations of knowledge. The default assumption has been that today's accumulation of basic scientific research on knowing, aided by physical science's brand of systematic observation, is allowing us for the time being to separate the relevant grains of fact from the irrelevant chaff of fiction in the realm of knowing and will, in all likelihood, some day serve the cause of tomorrow's understanding. The most characteristic attributes of this attitude in favor of knowledge at the expense of understanding have been inordinate stability of thinking about knowledge and resistance to change in favor of understanding. To add one more example to the literature cited in the introduction, Piaget's developmental research has been all about knowledge and none about understanding. For another different example, Grimm (2006) acknowledged that every serious epistemologist has denied the interchangeable relationship between knowing and understanding, but then Grimm himself went on to make the case again for the seductive idea that understanding is a species of knowledge.

The present study sought evidence for the opposite viewpoint—that knowing and understanding are different, in fact mutually corequisite, and complementary intellectual capacities; and they embody contrastively in their relative causes and consequences. The goal of this article, as described in the introduction, was to present the first original research study of the two main sets of dynamic and active control processes that integrate in relative cohesion the intellectual capacities of knowing and understanding into a wholetheme spiral of biofunctional understanding. Backed by evidence of the kind obtained in the original research reported in this article, the spiral promises to shed light on the historically dark ground of uncertainty, both in theory and method, surrounding the manner and nature of understanding. Given the transparent outline of the spiral in the introduction as the intrinsic context for the two sets of control processes targeted in the study, the evidence from the experiments reported—the first of its kind from where the non-existent state of the art in the experimental science of understanding stands at this early stage in the development of embodiment science—supported the a priori predictions tested in one experiment and replicated in a second experiment.

## A Different Kind of Consideration

Perhaps the non-existent state of the art of the experimental science of biofunctional understanding (ESBU) is ominously symptomatic of something too different altogether to expect from our existing state of the art in the experimental science of knowing that is confined today to the "prison house" of conceptual understanding (ESCU, see, e.g., Prawat, 2000; Iran-Nejad and Irannejad, 2017a). In other words, the sacred run of the mill in ESCU is something to which we have grown too accustomed as a comfort zone, which makes it something too frustrating to question. Nevertheless, question we must before we may figure out that hitherto-inconceivable way that must give us the new pair of feet for walking across the no man's land that Eleanor Rosch identified between today's ESCU and tomorrow's ESBU. Intriguingly, as we have been trying to show in this article, it is too simplistic to blame the bloom in ESCU for the doom in ESBU; nor is it realistic to wish for the bloom of ESBU to flourish in the doom of ESCU. The two-horn beast of the challenge we are facing is analogous to presenting the future scientists with the challenge of having their cake and eating it as well. Remarkably, that is exactly what the wholetheme spiral of biofunctional understanding promises us to be able to do. If so the findings of the two experiments reported here may represent, their limitations in the lights of both ESCU and ESBU notwithstanding, a distant ray of light at the end of the dark tunnel of the history of the intellectual capacity of understanding. Therefore, before we take another step toward experimenting with understanding, we must first take a good look at today's experimental science paradigm. In fact, this was the assumption with which the present investigation began and, now having completed the study for it, it might not be too farfetched a conclusion to palate.

## Physical Science's Guiding Principle of Systematic Observation

The main problem with today's experimental science is its exclusive reliance for a guiding principle on the physical science's systematic observation. With this principle in hand, we join Shulman (1999) to reminisce fondly with the comforting simplicity of behavioral science and sadly with the dismayingly unmanageable and possibly pseudo complexity of cognitivism, as behavioral scientists had predicted and cognitivists miscalculated. Of course, luckily today's cognitive science is an interdisciplinary science, which could include the embodied science of biofunctional understanding. The real problem with the experimental science of today is the extent of its overreliance on its one and only guiding principle of sensory observation based on two seductive assumptions, both of which can be readily shown to be flawed. One assumption is that sensory observation offers the most immediate window to the so-called physically observable world. The second assumption is that the sensory modalities have the widest and the most immediate contact with the real world. In fact, it is possible to show that it is the biofunctional modality that has the widest and the most immediate contact with the real world, which includes contact through the senses as well.

## Systematic Observation and Analytic Fact-Seeking

It may appear otherwise, but it may be fair to say, as suggested by Shulman, that behavioral scientists successfully transitioned psychology to science at the expense, fairly or not, of conceptual understanding and other unobservable mental states as overly subjective threats to the objective science of the kind established according to the powerful physical-science's guiding principle of systematic observation. Later, cognitive psychologists adopted the principle of hard and observable external objects and, encouraged by the soft and immediately unobservable nature of the computer program analogy (Neisser, 1967; Iran-Nejad and Winsler, 2000), generalized it to the soft and unobservable internal representations of the hard and observable external objects. According to the guiding principle of systematic observation, the hard external-world and its soft internal representation are, inherently or through the subjective eyes of study participant beholders, shrouded in unsystematic complexity. The goal of analytic science of cognition was to simplify complexity of mental representations by isolating observable facts in the form of declarative propositions, e.g., Mindfulness enhances critical thinking (Noone et al., 2016). These propositions could, then, be framed into binary ifquestion hypothesis testing, aimed at separating the significant gold of true propositions, discarding the insignificant chaff of false propositions, add the new truths in the form of basic scientific knowledge to the previously stored wealth of basic or pragmatically useful knowledge in external-media (e.g., textbooks) or internal long-term memory stores (e.g., hierarchical semantic networks). Subsequently, these soft but storable scientific (i.e., systematically-derived) facts might be uploaded or downloaded for the purposes of replication, generalization, and application. If we assume that knowledge and understanding are one and the same intellectual species, this is the tragic end of the story for human understanding. In the realm of the fact seeking field of cognition, the difference between the hard and the objective and the soft and the subjective is stark but confounded, making conceptual understanding abstract, subjective, and a bemuddling scientific liability.

For many decades, interested investigators have puzzled over the challenges that understanding-related factors such as motivation and transfer present to the community of experimental cognitive researchers. The reasons behind the challenging state of the art have been diverse; but they are all traceable to the study of cognition or knowledge in isolation. Among these investigators are leading practitioners like Bransford et al. (1977, 2000), Schön (1983), Bloom (1984), Shulman (1986, 1999), McCombs (1991), Gardner and Boix-Mansilla (1994), Willis (2000), and Salomon (2006). These scholars of science and practice have keenly observed the problem and its dismaying consequences in the trenches of the real world of practice. Shulman (1999), for example, pointed out.

After I finished graduate school and first began teaching the psychology of learning, I was confident that I really understood what the process of learning entailed. However, over the past 35 years, I have systematically studied learning and understanding in many contexts, and I have taught many courses on the subject. Alas, my understanding has now become more complex, vague, and somewhat ambiguous.

Having voiced concerns like this, Shulman spoke of the consequences as pathologies of which he named three: "we forget, we don't understand that we misunderstand, and we are unable to use what we learned. I have dubbed these conditions amnesia, fantasia, and inertia" (italics in original).

If we assume that factual knowledge and biofunctional understanding are different, we enter the new realm of the hard and unobservable biological systems and must deal with the evidence of the kind reported in the present study in this new light.

## Biofunctional Science's Guiding Principle of Systematic Consensus in Systemic Cohesion

The present study made use of participant subjective reports, a methodological liability if viewed solely through the objective fact-seeking lens of analytic cognition. As reviewed in the introduction, a growing literature now embraces the theory that the physical biology is a diverse—color-coded, so to speak source of special systemic functions (Iran-Nejad, 1980/1987). Among these are the special systemic sources that support the embodied-mind functions (Iran-Nejad and Ortony, 1984; Iran-Nejad and Gregg, 2011; Borghi et al., 2013; Alverson, 2015; Jin et al., 2015, 2016; Scorolli and Borghi, 2015; Caligiore et al., 2016; Thill and Twomey, 2016). Chief among these functions are those having to do with the newly

discovered idea that physical biology is the direct and immediate source of the hitherto-neglected wisdom of the intellectual capacity of biofunctional understanding that is the principle contributor to the systemic spiral of biofunctional understanding. Therefore, it is possible to show how human understanding is, by virtue of its fundamentally consensual nature, uniquely characterizable by systemic cohesion sensing, cohesion seeking, and, thereby, systematic science-quality consensus-seeking.

Given this line of reasoning, the newly found direct and immediate wisdom of biofunctional understanding frees embodiment science from the confining prison house of systematic fact-seeking observation (Iran-Nejad and Irannejad, 2017a). In this light, subjective data-gathering of the type done in the present study is a methodological asset rather than a subjective liability. Specifically, the spiral of biofunctional understanding spontaneously delivers its extraordinary clicks of understanding in systemic cohesion with affectively rich revelations (Iran-Nejad, 1987). Subsequently, the "active I" may use the knowthat results of the spontaneous revelations by immediate means of direct systemic cohesion sensing and seeking to engage in further conceptual understanding by reflection (Iran-Nejad et al., 2015). Therefore, in the embodied flow of the revelation-producing spiral of biofunctional understanding, subjective sense-making and sense-reporting find a new, indispensable, and unique methodological role to play (Iran-Nejad and Irannejad, 2017b, p. 3). Thus, in the science of biofunctional embodiment, the physical science's principle of systematic observation is a necessary but insufficient front for science making. What is needed, in addition, is the complementary guiding principle of systematic sensemaking backed by systematic consensus making. It gets a bit longwinded here to say given the available space, but it is in the light of the unified function of this immediate and direct (a) systemic cohesion sensemaking, (b) systematic consensual sensemaking, with (c) potential backing from systematic sciencequality consensus-making that the methodology and the findings of this study must be evaluated.

The finding of the difference between CU and BU supports the idea that the paradox of the missing "how" of the (physical) biofunctional understanding is real and within the grasp of systemic cohesion sensing of study participants and systematic consensus-seeking among professional scientists. There are indications that exemplary scientists like Einstein and Pasteur make systematic use of this spontaneous capacity for cohesion

## REFERENCES


in their science (Iran-Nejad, 2016). At the level of study participants, compared to thinking, college students in the present experiments seemed to be content not knowing how to understand even though they knew, paradoxically, that they did understand. An intriguing implication is that knowers at all levels from naïve study participants to advanced scientists may (be encouraged to) engage in systemic body-mind cohesionsensing as well as consensus-seeking (Caligiore et al., 2016). Further supportive evidence has been reported in a semester-long classroom intervention study in which undergraduate teacher education students were encouraged to seek their own firstperson revelations and engage in writing to reflect on them (Iran-Nejad et al., 2015). Therefore, there is hope for new embodiment-science methodology (Caligiore et al., 2016; Iran-Nejad and Irannejad, 2017a,b), that the physically hard and forbidding black box of the physical body may have now developed access windows for airing its infinite wisdom and for the light of systemic cohesion-sensing to shine through as directed by science-quality sources of systematic consensus.

## ETHICS STATEMENT

This study was carried out in accordance with the recommendations of The University of Alabama Institutional Review Board. All subjects gave written informed consent in accordance with the Declaration of Helsinki. The protocol was approved by The University of Alabama Institutional Review Board (IRB # 12-OR-392-R1).

## AUTHOR CONTRIBUTIONS

AI wrote the article. FB programed and ran experiments, helped with the method and results sections including experimental material and data analysis, and read and commented on drafts.

## ACKNOWLEDGMENTS

The authors acknowledge the support of the College of Education and the Department of Educational Studies in Psychology, Research Methodology, and Counseling for this study. Our special thanks go to the participants in the studies.


learned from human demonstration. Front. Psychol. 3:9. doi: 10.3389/frobt. 2016.00009




direction for evolutionary robotics. Conn. Sci. 16, 339–350. doi: 10.1080/ 09540090412331314821

**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2017 Iran-Nejad and Bordbar. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Cultural Affordances: Scaffolding Local Worlds Through Shared Intentionality and Regimes of Attention

#### Maxwell J. D. Ramstead1,2 \*, Samuel P. L. Veissière2,3,4,5 \* and Laurence J. Kirmayer<sup>2</sup> \*

<sup>1</sup> Department of Philosophy, McGill University, Montreal, QC, Canada, <sup>2</sup> Division of Social and Transcultural Psychiatry, Department of Psychiatry, McGill University, Montreal, QC, Canada, <sup>3</sup> Department of Anthropology, McGill University, Montreal, QC, Canada, <sup>4</sup> Raz Lab in Cognitive Neuroscience, McGill University, Montreal, QC, Canada, <sup>5</sup> Department of Communication and Media Studies, Faculty of Humanities, University of Johannesburg, Johannesburg, South Africa

#### Edited by:

Maurizio Tirassa, Università di Torino, Italy

## Reviewed by:

Erik Rietveld, University of Amsterdam, Netherlands Michael David Kirchhoff, University of Wollongong, Australia

#### \*Correspondence:

Maxwell J. D. Ramstead maxwell.d.ramstead@gmail.com Samuel P. L. Veissière samuel.veissiere@mcgill.ca Laurence J. Kirmayer laurence.kirmayer@mcgill.ca

#### Specialty section:

This article was submitted to Cognitive Science, a section of the journal Frontiers in Psychology

> Received: 27 April 2016 Accepted: 05 July 2016 Published: 26 July 2016

#### Citation:

Ramstead MJD, Veissière SPL and Kirmayer LJ (2016) Cultural Affordances: Scaffolding Local Worlds Through Shared Intentionality and Regimes of Attention. Front. Psychol. 7:1090. doi: 10.3389/fpsyg.2016.01090 In this paper we outline a framework for the study of the mechanisms involved in the engagement of human agents with cultural affordances. Our aim is to better understand how culture and context interact with human biology to shape human behavior, cognition, and experience. We attempt to integrate several related approaches in the study of the embodied, cognitive, and affective substrates of sociality and culture and the sociocultural scaffolding of experience. The integrative framework we propose bridges cognitive and social sciences to provide (i) an expanded concept of 'affordance' that extends to sociocultural forms of life, and (ii) a multilevel account of the socioculturally scaffolded forms of affordance learning and the transmission of affordances in patterned sociocultural practices and regimes of shared attention. This framework provides an account of how cultural content and normative practices are built on a foundation of contentless basic mental processes that acquire content through immersive participation of the agent in social practices that regulate joint attention and shared intentionality.

Keywords: affordances (ecological psychology), cultural affordances, radical embodied cognition, enactive cognitive neuroscience, free-energy principle, predictive processing, regimes of attention, cognitive anthropology

## INTRODUCTION

The acquisition of culture is notoriously difficult to study. Over 70 years of research on the development of person-perception, for example, have made it clear that children as young as 4 years of age have already acquired implicit biases about ethnicity and other socially constructed categories of persons (Clark and Clark, 1939; Clark, 1963; Hirschfeld, 1996; Machery and Faucher, 2005; Aboud and Amato, 2008; Kelly et al., 2010; Huneman and Machery, 2015; Pauker et al., 2016). These biases are consistent with the dominant culture of their societies, but are most often not consciously held or explicitly taught by their caregivers and educators. While most young children express a positive bias toward people they identify as members of their own group, children from minority groups typically show preferences for dominant groups, rather than for persons of their own ethnicity (Clark and Clark, 1939; Kinzler and Spelke, 2011). How such biases are acquired is still an open question. Ethnographic studies of socialization, education, and language acquisition

have pointed to broad cross-cultural variations in how children are instructed, spoken to, expected to behave, involved in community activities, and exposed to other socializing agents beyond nuclear or extended families (Mead, 1975; Schieffelin and Ochs, 1999; Rogoff, 2003). However, by age 5, children across cultures have for the most part become proficient in the dominant set of expectations and representations of their cultures, despite the much discussed poverty of cultural stimuli to which they are exposed (Chomsky, 1965). These matters point to a human propensity for 'picking up' the broad scripts of culture even without any explicit instruction. In other words, we all come to acquire the shared background knowledge, conceptual frameworks, and dominant values of our culture. The presence of intuitive or implicit, yet stable and widely shared beliefs and attitudes among children constitutes a challenging problem for cognitive and social science.

In this paper, we outline a framework for the study of the mechanisms that mediate the acquisition of cultural knowledge, values, and practices in terms of perceptual and behavioral affordances. Our aim is to better understand how culture and context shape human behavior and experience by integrating several related approaches in the study of the embodied, cognitive, and affective substrates of action and the sociocultural scaffolding of embodied experience. The integrative framework we propose bridges cognitive and social sciences to provide (i) an expanded concept of 'affordance' that extends to sociocultural forms of life, and (ii) a multilevel account of the socioculturally scaffolded forms of affordance learning and the transmission of affordances in patterned sociocultural practices.

The context of the present discussion is the search for the 'natural origins of content' (Hutto and Satne, 2015). We hope to contribute to the naturalistic account of the emergence of semantic content, that is, of the evolution (in phylogeny) and acquisition (in ontogeny) of representational or propositional content. Cultural worlds seem to be full of meaningful 'content' of explicit ways to think about and respond to the world in terms of kinds of agents, actions, and salient events. 'Content,' here, is defined in terms of representational relations with satisfaction conditions: a vehicle x bears some semantic or representational content y just in case there are satisfaction conditions which, when they obtain, tell us that the vehicle is about something. Semantics is an intensional notion (Millikan, 1984, 2004, 2005; Haugeland, 1990; Piccinini, 2015). How do humans acquire this cultural knowledge and capacity to respond in social contexts in ways that actors and others find meaningful and appropriate?

We hypothesize that agents acquire semantic content through their immersion in, and dynamic engagement with, feedback or looping mechanisms that mediate shared intentionality and shared attention. Semantic content, we suggest, is realized in culturally shared expectations, which are embodied at various levels (in brain networks, cultural artifacts, and constructed environments) and are enacted in 'regimes' of shared attention. We generalize contemporary ecological, affordancebased models of cognitive systems adapting to their contexts over ontogeny and phylogeny to account for the acquisition of cultural meanings and for the elaborate scaffoldings constituted by constructed, 'designer' niches (Hutchins, 2014; Kirchhoff, 2015a; Clark, 2016). We suggest that 'regimes of shared attention' that is, patterned cultural practices (Roepstorff et al., 2010) that direct the attention of participant agents—modulate the acquisition of culturally specific sets of expectations. Recent work in computational neuroscience on predictive processing provides a model of how cultural affordances could scaffold the acquisition of socially shared representational content. In what follows, we shall sketch a multilevel framework that links neural computation, embodied experience, cultural affordances, and the social distribution of representations.

We begin by specifying a conceptual framework for 'cultural affordances', building on recent accounts of the notion of affordances in ecological, enactivist, and radical embodied cognitive science (**Box 1**). We propose to distinguish two kinds of cultural affordances: 'natural' affordances and 'conventional' affordances. Natural affordances are possibilities for action, the engagement with which depends on an organism or agent exploiting or leveraging reliable correlations in its environment with its set of abilities. For instance, given a human agent's bipedal phenotype and related ability to walk, an unpaved road affords a trek. Conventional affordances are possibilities for action, the engagement with which depends on agents' skillfully leveraging explicit or implicit expectations, norms, conventions, and cooperative social practices. Engagement with these affordances requires that agents have the ability to correctly infer (implicitly or explicitly) the culturally specific sets of expectations in which they are immersed—expectations about how to interpret other agents, and the symbolically and linguistically mediated social world. Thus, a red light affords stopping not merely because red lights correlate with stopping behavior, but also because of shared (in this case, mostly explicit) norms, conventions, and rules. Both kinds of cultural affordances are relevant to understanding human social niches; and both natural and conventional affordances may be socially constructed, albeit in different ways (Hacking, 1999). Human biology is cultural biology; culture has roots in human biological capacities. The affordances with which human beings engage are cultural affordances.

We then assess the tensions between our proposed framework and radical enactivist and embodied approaches, which are typically committed to forms of non- (or even anti-) representationalism. On these views, perception, cognition, and action need not involve computational or representational resources. The scope of this claim varies. For some, this entails a rejection of computational or representational models and metaphors in the study of the mind—a staunch commitment to anti-representationalism (Varela et al., 1991; Gallagher, 2001, 2008; Thompson, 2007; Chemero, 2009). More conciliatory positions instead suggest that basic cognitive processes are without content, but accommodate a place for contentful cognition. They claim that certain typically human forms of cognition involve representations, in the sense that human agents have the dispositions (mechanisms, behavioral repertoires, etc.) that are required to immersively engage with sociocultural content (e.g., patterned symbolic practices, linguistic constructions, storytelling and narration). We argue that contemporary computational neuroscience complements

#### BOX 1 | Basic concepts of a framework for cultural affordances

Affordance: A relation between a feature or aspect of organisms' material environment and an ability available in their form of life (Chemero, 2003, 2009; Bruineberg and Rietveld, 2014; Rietveld and Kiverstein, 2014).

Landscape of affordances: The total ensemble of available affordances for a population in a given environment. This landscape corresponds to what evolutionary theorists in biology and anthropology call a 'niche' (Rietveld, 2008a,c; Rietveld et al., 2013; Bruineberg and Rietveld, 2014; Rietveld and Kiverstein, 2014).

Field of affordances: Those affordances in the landscape with which the organism, as an autonomous individual agent, dynamically copes and intelligently adapts. The field refers to those affordances that actually engage the individual organism because they are salient at a given time, as a function of the interests, concerns, and states of the organism (Rietveld, 2008a,c; Bruineberg and Rietveld, 2014; Rietveld and Kiverstein, 2014).

Cultural affordance: The kind of affordance that humans encounter in the niches that they constitute. There are two kinds of cultural affordances: natural and conventional affordances.

Natural affordance: Possibilities for action (i.e. affordances), the engagement with which depends on the exploitation or leveraging by an organism of 'natural information', that is, reliable correlations in its environment, using its set of phenotypical and encultured abilities (roughly what Grice meant by 'natural meaning') (Piccinini and Scarantino, 2011; Piccinini, 2015).

Conventional affordance: Possibilities for action, the engagement with which depends on agents' skillfully leveraging explicit or implicit expectations, norms, conventions, and cooperative social practices in their ability to correctly infer (implicitly or explicitly) the culturally specific sets of expectations of which they are immersed. These are expectations about how to interpret other agents, and the symbolically and linguistically mediated social world (Scarantino and Piccinini, 2010; Tomasello, 2014; Satne, 2015; Scarantino, 2015).

the more conciliatory of these approaches by providing minimal neural-computational scaffolding for the skilled engagement of organisms with the available affordances.

Having done this, we turn to affordances in social and linguistic forms of life. We examine local ontologies, understood as sets of shared expectations, as well as the complex feedback relations (or looping effects) between these ontologies and human modes of communication, shared intentionality, and shared attention. Drawing on the skilled intentionality framework (Bruineberg and Rietveld, 2014), we examine the dynamics of cultural affordance acquisition through patterned cultural practices, notably attentional practices. We hypothesize that feedback mechanism between patterned regimes of attention and shared forms of intentionality (notably shared expectations and immersion in local ontologies) leads to the acquisition of such affordances. This framework can guide future research on multilevel, recursive, nested cultural affordances and the social norms and individual expectations on which they depend.

## A THEORETICAL FRAMEWORK FOR AFFORDANCES

Much recent work in cognitive science has been influenced by the notion of affordances originally introduced by Gibson (1986). The interdisciplinary framework currently being developed to study affordances provides us with a point of departure for thinking about the evolution and acquisition of semantic, representational content. The aim of this section is to clarify the implications of adopting this framework.

Affordances are central to the emerging 'enactivist' and 'radical embodied' paradigms in cognitive neuroscience. Theorists of enactive cognition model the intelligent adaptive behavior of living cognitive systems as the dynamic constitution of meaning and salience in rolling cycles of perception and action, explicitly recognizing the emergence of meaning and salience in the active, embodied engagement of organisms with their environment (Di Paolo, 2005, 2009; Noë, 2005; Thompson, 2007; Froese and Di Paolo, 2011; Hutto and Myin, 2013; Di Paolo and Thompson, 2014; Hutto and Satne, 2015; Kirchhoff, 2016). Embodied approaches in cognitive science explain the feats of intelligence displayed by cognitive systems by considering the dependence of cognition on the various aspects of the body as it engages with its environment, both internal and external (Barsalou, 2008; Shapiro, 2010). 'Radical embodied' cognitive science extends the theoretical framework of ecological psychology (Gibson, 1986) to the embodied cognition paradigm, providing a phenomenologically plausible account of active, dynamical coping (Thompson and Varela, 2001; Chemero, 2003, 2009; Bruineberg and Rietveld, 2014; Rietveld and Kiverstein, 2014). Recently, the enactive, radical enactive, and radical embodied approaches have been extended to 'higher-order' social and cultural systems (Froese and Di Paolo, 2011; Hutto and Myin, 2013; Rietveld and Kiverstein, 2014). This latter branch of enactivist theory will concern us especially.

## Perspectives, Affordances, and Phenomenology

One of the distinctive contributions of ecological, radical embodied, and enactivist theories of cognition is their shared emphasis on the point of the view of the organism itself, understood as an intentional center of meaningful behavior. The implication of these 'perspectivist' approaches in cognitive science is that the world is disclosed as a set of 'affordances,' that is, possibilities for action afforded to organisms by the things and creatures that populate its environmental niche, as engaged through their perceptual and sensorimotor abilities (Turvey et al., 1981; Turvey, 1992; Reed, 1996; Heft, 2001; Silva et al., 2013; cf. also Varela, 1999; Thompson, 2007). To paraphrase Wittgenstein, the world is the totality of possibilities of action, not of things. Perspectivist approaches in cognitive science operationalize this view of the organism and propose an account of perception, cognition, and action that is closer to the phenomenology of everyday experience.

Affordances provide an alternative framework for thinking about perception, cognition, and action that dissolves the strict conceptual boundary between these categories in a way that is closer to the phenomenology of everyday life<sup>1</sup> . This approach

<sup>1</sup>Enactive accounts reject the rigid separation of perception, cognition, and action, emphasizing that organisms cope with their environment in rolling cycles of

echoes the kernel insights of the phenomenology of Heidegger (1927/1962) and Merleau-Ponty (1945/2012, 1964/1968) about perception and action. Cognitive agents experience the world perceptually through the mediation of action, as a function of those actions that things in the world afford. For example, my cup of coffee is not first perceived as having such and such properties (size, shape, color), and only then as providing the opportunity for sipping dark roast. Instead, my filled cup is directly perceived as affording the action of sipping. Filled cups of coffee afford sipping; a paved road affords walking; a red traffic light affords stopping. The claim, then, is that cognitive agents typically do not encounter the world that they inhabit as a 'pre-given,' objective, action-neutral set of things and properties, to be reconstructed in perception and cognition on the basis of sensory information, as classical models in cognitive science once suggested (e.g., Fodor, 1975; Marr, 1982; Dawson, 2013). The things that we engage are disclosed instead directly as opportunities for action—that is, as affordances. As Heidegger (1927/1962) famously argued, it is only when my smooth coping breaks down (say, when I run out of coffee, or when the cup breaks) that the objective properties of the cup become salient, present in perceptual experience at all.

The principal motivation for thinking of perception, cognition, and action in terms of engagement with affordances is that cognitive scientific accounts of these activities ought to be coherent with the phenomenology of action and perception in everyday life. Phenomenology tells us that there are dense interrelations between action and perception, that perception is mainly about the control of action, and that action serves to guide perception (Merleau-Ponty, 1945/2012, 1964/1968). Affordances provide a framework apt for this task, allowing us to integrate phenomenological experience into our models of explanation in cognitive science (Varela, 1996; Petitot et al., 1999). As the story goes, in the wake of the behaviorist turn, experiential factors and mentalist language were banished from psychology (Watson, 1913; Skinner, 2011). Cognitive science rehabilitated mentalism, at least to some extent, in its postulation of cognitive states and processes (Fodor, 1975; Putnam, 1975). Most contemporary functionalist and mechanistic accounts of cognition, however, contend that it is possible to exhaustively explain a cognitive function by specifying its functional organization or the mechanism that implements that function (e.g., Craver, 2007; Bechtel, 2008). As we shall see presently, the perspectivist emphasis on the dynamics of the phenomenology of everyday life that characterizes enactive and ecological approaches allows us to account for cognitive functions with a conceptual framework that explicitly bridges the phenomenology of action and perception, system dynamics, and functionalist cognitive neuroscience.

## Landscapes and Fields

Affordances, as possibilities for action, are fundamentally interactional. Their existence depends both on the objective material features of the environment and on the abilities of different kinds of organisms. This dependence on interaction does not mean that affordances have no objective reality or generalizability (Chemero, 2003, 2009). Affordances exist independently of specific individual organisms. Their existence is relative to sets of abilities available to certain kinds of organisms in a given niche. 'Abilities,' here, refers to organisms' or agents' capabilities to skillfully engage the environment, that is, to adaptively modulate its patterns of action-perception to couple adaptively to the environment. Without certain abilities, correlative opportunities for action are unavailable. Certain chimpanzees, for instance, are able to use rocks to cracks nuts. But for nuts and rocks to afford cracking, the chimp must already be cognitively and physiologically equipped for nut-cracking. In Chemero's model of affordances, objectivity and subjectivity do not have separate ontological status; they co-exist and co-emerge relationally.

Building on Chemero (2003, 2009) and Rietveld and Kiverstein (2014) define an affordance as a relation between a feature or aspect of organisms' material environment and ability available in their form of life. 'Form of life' is a notion adapted from the later Wittgenstein (1953). A form of life is a set of behavioral patterns, relatively robust on socio-cultural or biographical time scales, which is characteristic of a group or population. We might say that each species (or subspecies), adapted as it is to a particular niche and endowed with specific adapted abilities, constitutes a unique form of life. Different human communities, societies, and cultures, with sometimes strikingly different styles of engagement with the material and social world, constitute different forms of life. There are thus at least two ways to change the affordances available to an organism: (i) by changing the material aspects of its environment (which may vary from small everyday changes in its architecture or configuration to thoroughgoing niche construction) and (ii) by altering its form of life or allowing it to learn new abilities already available in that form of life (interacting in new ways with an existing niche by acquiring new abilities through various forms of learning).

Following recent theorizing on affordances (Rietveld, 2008a,c; Bruineberg and Rietveld, 2014; Rietveld and Kiverstein, 2014), we consider the distinction between the 'landscape' of affordances and the 'field' of relevant affordances. The claim is that, typically, organisms do not engage with one single affordance at a given time. The world we inhabit is instead disclosed as a matrix of differentially salient affordances with their own structure or configuration. The organism encounters the world that it inhabits as an ensemble of affordances, with which it dynamically copes and which it evaluates, often implicitly and automatically, for relevance. For an affordance to have 'relevance' here means that the affordance in question 'solicits' the individual, concrete organism by beckoning certain forms of perceptual-emotional

engagement in which the distinction between action, cognition, and perception is blurred. When such a distinction is made, enactivist thinkers typically resist the traditional picture that subordinates action to perception or cognition. Theorists who draw the distinction nevertheless emphasize the deep connection between perception, cognition, and action. There are good reasons to think that action is a precondition for perception or that perception is a form of action (Clark, 2016; Kirchhoff, 2016). As we shall see in Section "The Neurodynamics of Affordances," free-energy approaches frame perception and action as complementary ways of minimizing 'prediction error.' Our preference is to speak of rolling cycles of 'action-perception' to refer to the complex looping process whereby organisms cope with their environment. These cycles rely on various complementary computational strategies to minimize prediction error, which may (or may not) correspond to the traditional concepts of action, cognition, and perception.

appraisal and readiness to act. This occurs because affordances are both descriptive and prescriptive: descriptive because they constitute the privileged mode for the perceptual disclosure of aspects of the environment; and prescriptive because they specify the kinds of action and perception that are available, situationally appropriate and, in the case of social niches, expected by others.

The 'landscape' of affordances is the total ensemble of available affordances for a population in a given environment. This landscape corresponds to what evolutionary theorists in biology and anthropology call a 'niche' (Odling-Smee et al., 2003; Sterelny, 2007, 2015; Wilson and Clark, 2009; Fuentes, 2014). A niche is a position in an ecosystem that affords an organism the resources it needs to survive. At the same time, the niche plays a role vis-à-vis other organisms and their niches in constituting the ecosystem as a whole. A typical ecosystem (that is, a physical environment where organisms can live) has multiple niches, which have some degree internal structure: affordances have a variety of dynamics relationships (one thing leads to another, depends on, reveals, hides, enables, other possibilities for action; Pezzulo and Cisek, 2016). Thus, the niche is the entire set of affordances that are available, in a given environment at a given time, to organisms that take part in a given form of life. More narrowly, a niche comprises the affordances available to the group of organisms that occupy a particular place in the ecosystem or, in the case of humans, the social world—associated with (and partly constituted by) a form of life.

The 'field' of affordances, on the other hand, relates to the dynamic coping and intelligent adaptivity of autonomous, individual organisms. The field refers to those affordances that actually engage the individual organism at a given time. Of those affordances available in the landscape, some take on special relevance as a function of the interests, concerns, and states of the organism. These relevant affordances constitute the field of affordances for each organism. They are experienced as 'solicitations,' in that they solicit (further) affective appraisal and thereby prompt patterns of 'action readiness,' that is, act as perceptual and affective prompts for the organism to act on the affordance (Frijda, 1986, 2007; De Haan et al., 2013; Rietveld et al., 2013). This engagement will vary in complexity, conformity, and creativity from pre-specified or pre-patterned ways of acting to "free" improvisation, as we shall see below<sup>2</sup> .

The field of affordances changes through cycles of perception and action. Changes in the situation that the organism engages give rise dynamically to different solicitations, as a function of the state of the organism, much the way a physical gauge field gives rise to different potentials as a function of the local forces (Sengupta et al., 2016). Consider the action of drinking a cup of coffee. The filled cup affords a gradient (grasping, sipping), that is, a potential for coupled engagement. When generated by the organism-environment system, this gradient can be experienced by the organism as a solicitation. The gradient is dissipated through engagement. The experience of satiation that follows drinking, combined with the fact that cup has been emptied, alter the field of affordances, which as indicated changes as a function of the states of organism and niche. Thus, the gradient is 'consumed' or dissipates after successful engagement.

## Meaning and Affordances

Not all affordances are of the same kind. Here we draw on Grice's theory of meaning to suggest an approach to the varieties of cultural affordances in terms of their dependence on contentinvolving conventions. We argue that the affordances in human niches (what we call generally 'cultural' affordances) are of two distinct kinds: 'natural' and 'conventional' affordances.

Grice's theory of meaning, elaborated in a series of papers in the philosophy of mind (Grice, 1957, 1969, 1971, 1989), and later refined by Sperber and Wilson (1986), Levinson (2000), and Tomasello (2014), is often termed 'intention-based semantics', or 'implicature.' On a Grician account, meaning lies in a speaker's communicative intent; that is, in what she intends to convey through an utterance. Grice elaborated the first formula of his theory of meaning in these terms (using the subscript NN to signify to 'non-natural'):

"A meantNN something by X" is roughly equivalent to "A uttered X with the intention of inducing a belief by means of the recognition of this intention" (Grice, 1989, p. 19)

Taking this model beyond the dyadic sphere of conversational implicature, Grice later attempted to explain how "timeless" (that is to say, durable and widely shared) conventions of meaning are recognized in a shared cultural repertoire:

"x meansNN (timeless) that so-and-so" might at a first shot be equated with some statement or disjunction of statements about what "people" (vague) intend (with qualifications about "recognition") to effect by x (Grice, 1989, p. 220)

In the subsequent 'relevance' account, Sperber and Wilson (1986) translated this automatic 'first shot' recognition of conventional meaning as one in which human minds scan for salient, meaning-generating cues in the environment, and stop processing when the cues are secured.

Our model draws on Grice to describe the stabilization of cultural cues as affordances. Key to our approach is the implied ontological and epistemic status of other minds (that is, the intentions of 'persons') in the embodied cognitive work required in the 'recognition,' or more precisely, the enactment of meaning. Our proposal, then, is to follow Grice in understanding the thought, affect, and behavior of human agents as determined by implicit expectations about others' expectations. Specifically, we argue that humans behave according to the way they expect others to expect them to behave in a given situation (see **Figure 1**) 3 . As

<sup>2</sup> Some might express unease at the mixed language we use, which straddles phenomenology, system dynamics, and cognitive functions. We take this as a virtue of the multilevel nature of the explanatory framework provided by the notion of affordances, which is operative at all of these different descriptive levels. Readers who would prefer to keep phenomenological description distinct from other explanatory levels (i.e., neural, social, cultural levels of explanation) can replace our talk of directly modulating the landscape or field of affordances with a more phenomenologically neutral concept, such as the organism's 'selective openness' (Bruineberg and Rietveld, 2014). With this terminology, we might say that changes in the patterns of activity in the organism (states, interests, etc.) and the environment shape the organism's selective openness to saliencies.

<sup>3</sup>This basic cognitive formula for sociality requires three orders of automatic intentionality; that it is to say, an implicit, non-narrative, hypothesis-generating,

we shall explicate below, we contend that humans operate (often pre-reflectively) within the landscape and field of possibilities for variations in action<sup>4</sup> as a function of their expectations about what others expect of them in specific contexts (see **Figure 2**).

The importance of these revisions to Grice's model of meaning to our framework for cultural affordances is to highlight the dependence of certain kinds of affordances on joint intentionality, and effective social and cultural normativity and conventionality, or equivalently, the shared expectations (both implicit and explicit) that codetermine the affordance landscape and local field dynamics. Grice (1957) distinguished between natural and non-natural forms of meaning, emphasizing the latter in most of his work. Natural meaning is a relation between two things that are correlated. Smoke 'means' fire because tokens of smoke reliably correlate with tokens of fire. Similarly, (certain kinds of) spots mean measles (understood not as the popular category but as the biomedically recognized infection with a particular virus). Non-natural meaning instead depends on the capacity of individual agents to exploit explicit and implicit social 'conventions' (in the wide sense of locally shared norms, values and moral frames, expectations, ontologies, etc.) to infer the intentional states of other agents and thereby engage them or engage aspects of the environment with them. Red traffic lights, in virtue of convention (and law), 'mean' stop, and hence afford (and mandate) stopping—and this is made possible by the specifically human mastery of recursive inferences, both explicit and implicit, that agents make about other agents (Tomasello, 2014).

Recent work on information processing has extended Grice's framework to account for different kinds of information (Scarantino and Piccinini, 2010; Piccinini and Scarantino, 2011; Piccinini, 2015). A token informational vehicle x of kind X

error-reduction scenario of the "I think they think I think" variety that can be translated as "what would relevant others expect me to think/feel/do in this situation?"

<sup>4</sup>The notion of variation and improvisation in action within a convention is very important. Humans do not simply obey prescribed expectations, but also resist, transgress and transform them. Specific fields of joint-intentional affordances, thus, invariably entail different licenses for improvisation on expected behavior. The background formula for action is not simply "what would others expect me to do here?" but also "how much license or room to improvise do I have here given what the set of local cues tells me about others' expectations and the norms that should otherwise govern my behavior in this specific situation?"

(that is, a sign, a pattern of neural activation, or what have you) carries 'natural information' about some information source y of kind Y just in case there are reliable correlations between X and Y. Natural information, in other words, cannot misrepresent, for it is non-semantic; it is not the kind of thing that can be simply true or false. Such information can be exploited and leveraged by a cognitive system to guide intelligent behavior. Conversely, 'non-natural information' (or as we prefer to put it, 'conventional information'), pertains to semantic, content-involving representations that depend on social norms and cultural background knowledge. Non-natural information allows an agent to make a correct inference about some aspect of an intentional system, e.g., other agents, language and other symbolic systems such as mathematics, etc. Nonnatural information is semantic in that it obtains in virtue of satisfaction conditions (e.g., truth conditions). A vehicle carries this kind of information about some state of affairs just in case some (explicit or implicit) shared convention, in the sense outlined above, links a vehicle to what it represents.

In the psychological and anthropological literature, affordances are usually understood as interactional properties between organisms and their environment that can be individually discovered in ontogeny without social learning. Chimpanzees, for example, rediscover how to crack nuts with rocks in each generation without vertical social transmission of skills (Ingold, 2000, 2001; Howes, 2011; Moore, 2013). Most of what humans do, in contrast, is learned socially and requires complex forms of coordination. We suggest, however, that successfully learned human conventions that govern action are also best conceptualized as affordances. Such affordances depend on shared sets of expectations, reflected in the ability to engage immersively in patterned cultural practices, which reference, depend on, or enact folk ontologies, moralities and epistemologies. We might call these 'conventional' affordances.

An empty street affords being walked on or driven on to the lone pedestrian or driver. Yet affordances, especially those depending on conventions, might differ depending on context. A red traffic light, as we have seen, affords an agent stopping, particularly in the presence of others, and especially in the presence (real or imagined) of police who are expecting to intervene. But a driver might alter her behavior as a result of not being seen by others. A red traffic light in an empty street at 4:00 AM, thus, might afford transgression of the stopping rule following an inference about the absence of other minds likely to judge the agent. Departing from Grice and earlier theories of information processing (Dretske, 1995), one might understand the notion of information as probabilistic: to carry information implies only the truth of a probabilistic claim (Scarantino and Piccinini, 2010; Scarantino, 2015). Although this account was developed for natural information, we extend it here to conventional information, given the prominence of social improvisation. 'Conventions' need not be explicitly formulated as rules, and may instead originate in the actors' engagement with local backgrounds over time that is, from non-contentful developmental experiences, learning, or participation in social and cultural practices (Piccinini, 2015; Satne, 2015).

A cultural artifact may have multiple affordances according to its embedding in larger webs of relationships that are part of the individual's history of learning and the expectations for the potential participation of others. Indeed, to operate with conventional affordances, agents must have shared sets of expectations—we must know what others expect us to expect.

Simple rule-governed models of sociality go on the assumption that conventions lead to stable, binary affordances, where satisfaction conditions are either met or not. However, cultural symbols and signs are usually polysemous and their interpretation depends on context. Moreover, variations in the way agents engage with affordances in practice, often license what we could term 'skilled improvisation.' Rules and conventions can be followed slavishly, selectively ignored, deliberately transgressed, or re-interpreted to afford new possibilities. Natural dispositions for shared intentionality in what Searle (1991, 1992, 1995, 2010) calls the deep background, on this view, give rise to cooperative action not only through convention but also through iterative variations governed by modes of engagement with cultural affordances (Terrone and Tagliafico, 2014).

## THE NEURODYNAMICS OF AFFORDANCES

Some aspects of culture clearly involve content in the improvisational sense of the term: namely, those affordances that depend on conventions, social normativity, and the ability to improvise from a joint-intentional background enriched by cultural learning. Here, we aim to contribute to the effort to explicate the mechanisms by which basic minds are scaffolded into more elaborate content-involving processes. To explain agents' engagement with contentful affordances requires a theory of cultural content and representations.

Our hypothesis, to be explicated below, is that feedback loops mediating shared attention and shared intentionality are the principal mechanism whereby cultural (especially conventional) affordances are acquired. Before proceeding, however, we must face an objection stemming from tensions between our enactivistembodied-ecological framework and our aim of providing a theory for the acquisition of semantic content. We have suggested that conventional affordances depend on shared expectations, perspective-taking, and even mindreading abilities. However, proponents of radical embodiment and enactivism argue that cognition can be understood as the coupling of an organism to its niche through dynamical processes, without any need to invoke representational processes and resources like explicit expectations and mindreading (Varela et al., 1991; Gallagher, 2001, 2008; Thompson, 2007; Chemero, 2009). On these accounts, classical theories of cognition (Fodor, 1975; Marr, 1982), which modeled cognition as the rule-governed manipulation of internal representations, radically misconstrue the nature of agents' intentional engagement with their worlds. The claim, then, is that much cognition can (indeed, must) be explained by appealing only to dynamical coupling between organism and environment.

Rejecting the claim that cognition necessarily involves representations, radical enactivists insist that basic cognitive processes ('basic minds') can function entirely without content (Thompson, 2007; Hutto and Myin, 2013). The argument, then, is that minds, especially basic minds like those of simple organisms (and many of the unreflective embodied engagements of more complex minds), do not require content. They only require adequate forms of coupling, which need bear no content at all. Adequate coupling only requires an organism to leverage correlations that are reliable enough to be exploited for survival. This poses a challenge to a theory like ours, which aims to explicate the acquisition of cultural content in the form of conventional affordances. In this section, we accommodate this radical minimalism about representations and semantic content while sketching a neural computational account of the scaffolding of cultural affordances.

## Computation, Representation, and Minimal Neural Models

Recent work on computation and neurodynamics helps to clarify the scope of radical arguments against content-involving, representational theories of cognition. Although older semantic theories view computation as the processing of representations (with propositional content and satisfaction conditions) more recent theories do not make this assumption. The 'modeling view' of computation (Grush, 2001; Shagrir, 2006, 2010; Chirimuuta, 2014) suggests that computation in physical systems (calculators, digital and analog computers, neural networks) employs a special kind of minimal, structural or analogical model based on statistical correlations (O'Brien and Opie, 2004, 2009, 2015). On this view, a computational process is one that dynamically generates and uses a statistical model of a target domain (say, things in the visual field). The model is said to 'represent' that domain only in the sense that the relations between its computational vehicles (digits, neural activation patterns, or what have you) preserve the higher-order statistical, structuralrelational properties of the target domain, which can be leveraged to guide adaptive action. We might call this 'weak' (nonpropositional) content, based on structural analogy between vehicle and target domain (O'Brien and Opie, 2004, 2009, 2015). Such statistical models are much more minimalistic than traditional representational theories of mind, which require that internal representations bear propositional content (Fodor, 1975). Even more minimalistic accounts of computation are available. Computation can be defined mechanistically, as the rule-governed manipulation of computational (rather than representational) vehicles (Milkowski, 2013; Piccinini, 2015). On the mechanistic account, computations (digital, analog, neural) can occur without any form of semantic content (Scarantino and Piccinini, 2010; Piccinini and Scarantino, 2011).

Thus, some of the newest theories of computation are minimalistic about the representational nature of neural processes. Whether the modeling-structural and the mechanistic minimal statistical models deserve the label 'representation' is debatable (Anderson and Chemero, 2013; Piccinini and Shagrir, 2014; Hutto, 2015; Clark, 2016). To some degree the conflict may be merely terminological. What matters for our purposes is to note that the minimalistic statistical-computational models in the cognitive system can be leveraged to guide skilled intelligent, context-sensitive, adaptive behavior. This provides additional weight to the claim that basic minds are without strong, propositional, semantic content (Hutto and Myin, 2013; Hutto et al., 2014).

While this may be the case, human societies clearly transact in content-laden representations. We use language replete with images, metaphors and other symbols to tell stories and narrate our lives. We imagine particular scenarios or events, and we think about, describe, elaborate and manipulate these images or models in ways that treat them as pictures or representations of possible realities. Importantly, even on the radical view on offer here, nothing precludes such content-involving cognition. In recent discussions around the natural origins of content, it is hypothesized that neural computations can come to acquire representational content when coupled adequately to a niche or milieu through dense histories of causal coupling (Hutto and Myin, 2013; Hutto and Satne, 2015; Kirmayer and Ramstead, 2016). We suggest that immersive involvement of agents in patterned cultural practices during development, and the subsequent practice of the abilities acquired in enculturation, allows for the acquisition of stable cultural affordances. In the case of human beings, whose learning is mostly social, the function of the neural computations performed by a system becomes that of interfacing adequately with both representational and nonrepresentational aspects of culture so as to guide appropriate behavior.

## Free-Energy and the Neurodynamics of Affordances

The framework we think can account for the acquisition of cultural affordances by agents rests on recent work in computational neuroscience and theoretical biology on the 'free-energy principle.' The free-energy principle is a mathematical formulation of the tendency of autonomous living systems to adaptively resist entropic disintegration (Friston et al., 2006; Friston, 2010, 2012a, 2013a,b; Sengupta et al., 2016). This disintegration can be thought of as the natural tendency of all organized systems (which are by their nature far-from-equilibrium systems) to dissipate, that is, to return to a state of low organization and high entropy or disorder—in other words, to return to (thermodynamic) equilibrium. The free-energy principle states that the dynamics of living organisms are organized to maintain their existence by minimizing the information-theoretic quantity 'variational free-energy.' By minimizing free-energy, the organism resists entropic dissipation and maintains itself in its phenotypical steady-state, far from thermodynamic equilibrium (death).

One application of the free-energy principle in computational neuroscience is a family of models collectively referred to as 'hierarchical predictive processing' models, which instantiate a more general view of the brain as a 'prediction machine' (Frith, 2007; Friston and Kiebel, 2009; Friston, 2010, 2011,

2012b; Bar, 2011; Hohwy, 2013; Clark, 2016; for empirical evidence, see Adams et al., 2016). In this framework, the brain is modeled as a complex dynamical system, the main function of which is to 'infer' (in a qualified sense) the distal causes of its sensory stimulation, starting only from its own sensory channels. The strategy employed by the brain, according to this view, is to use a 'generative model' of the distal causes and engage in self-prediction (Friston, 2010; Eliasmith, 2005). That is, the system's function is to predict the upcoming sensory state and compare it the actual sensory state, while minimizing the difference between these two distributions (predictions and prediction errors) through ongoing modification of predictions or action on the environment (see **Figures 3** and **4**).

'Generative models' are minimal statistical models, of the kind discussed above. The use by a system of generative models need not entail semantic content. Their function is to dynamically extract and encode information about the distal environment as sets of probability distributions. The information involved here can be natural or conventional in kind. The only entailment is that the system or organism must leverage its generative model to guide skilled intentional coupling. The system uses this generative model to guide adaptive and intelligent behavior by 'inverting' that model through Bayesian forms of (computational, subpersonal) inference, allowing it to leverage the probability distributions encoded in the model to determine the most probable distal causes of that distribution and to act in the most contextually appropriate way (Friston, 2010; Hohwy, 2013; Clark, 2016).

FIGURE 3 | Hierarchical prediction error minimization frameworks. In the predictive processing approach, the main activity of the nervous system is to predict upcoming sensory states and minimize the discrepancy between prediction and sensory states ('prediction errors'). The information propagated upward to higher levels for further processing consists only in these prediction errors.

FIGURE 4 | A diagram of Bayesian inference in predictive processing architectures. The dynamics of such systems conform to the principles of the Bayesian statistical inference framework. The Bayesian statistical framework is central to predictive processing architectures, for the latter assume that neural network interactions operate in a way that maximizes Bayesian model evidence. Bayesian methods allow one to calculate the probability of an event taking place by combining the 'prior probability' of this event (the probability that such an event takes place before considering any evidence) with the 'likelihood' of that event, that is, the probability of that event given some evidence. This allows the Bayesian system to calculate the 'posterior probability' of the event, that is, the revised probability given any new available evidence. Prior probabilities are carried by predictions (green arrows) issued by the generative model units (green units). Likelihoods are carried by prediction errors (red arrows) issued by the error units (red units). In the 'empirical Bayes' framework, the system can then use the posterior obtained from one iteration as the prior in the next iteration. Predictions issued from the generative models, which encode prior beliefs, propagate up, down, and across the hierarchy (through backwards and lateral connections) and are leveraged to guide intelligent adaptive action-perception. This leveraging is achieved by canceling out (or 'explaining away') discrepancies, which encode likelihood, through rolling cycles of action-perception. This same process allows the system to learn through plastic synaptic connections, which are continuously updated through free-energy minimization in action-perception. The system thus continuously and autonomously updates its 'expectations' (Bayesian prior beliefs) in rolling cycles of action-perception.

How does this inversion take place? Generative models are used to generate a prediction about the upcoming sensory distribution. Between the predicted and actual sensory distributions, there almost always will be a discrepancy ('prediction error'), which 'tracks' surprisal (in the sense that, mathematically, it is an upper bound on that quantity). The free-energy principle states that all living systems act to reduce prediction error (and thereby implicitly resist the entropic tendency toward thermodynamic equilibrium—dissipation and death). This can occur in one of two complementary ways: (i) through action, where the best action most efficiently minimizes free-energy by making the world more like the prediction

('active inference'); and (ii) through perception and learning, by selecting the 'hypothesis' (or prediction, which corresponds to the probable distal cause of sensory distribution) that most minimizes error, or changing the hypotheses when none fits or when one fits better (Friston, 2011, 2013a; Friston et al., 2012a,b; Friston and Frith, 2015a,b). Given that generative models embody fine-grained statistical information about the distal environment at different scales, the top-down prediction signals (produced by higher levels in the processing system) provide crucial contextualizing information for the activity of lower levels in the predictive hierarchy, rendering the feedforward error signal contextually sensitive and adaptive (see **Figure 5**).

The representational minimalism of embodied generative models nicely complements the representation-sparse phenomenology of affordances. Such minimal models might be described as exploiting (non-semantic) information for affordances, rather than (semantic) information about affordances (van Dijk et al., 2015); that is, the sensory array only carries information given certain uses of it by organisms (i.e., being a statistical proxy). The 'internal representations' involved here might best be thought of as transiently 'soft-assembled neural ensembles,' adequately coupled to environmental affordances (Anderson, 2014).

It can be argued that predictive processing models complement enactivist and radical embodied approaches and are compatible with minimalism about representations, provided we do not interpret the statistical computations and error signal processes in a strong semantic, content-involving sense (Hutto and Satne, 2015; Kirchhoff, 2015a,b, 2016; Kirmayer and Ramstead, 2016). Generative models are simply embodied statistical models that are dynamically leveraged to guide intelligent adaptive behavior.

Generative models are embodied at different systemic levels and timescales, in different ways. As indicated, at the level of the brain, the predictive hierarchical architecture of neural networks come to encode statistical regularities about the niche, which allow the organism to engage with the field of affordances in adaptive cycles of action-perception. But the embodiment of generative models does not stop at the brain. Indeed, one radical implication of the free-energy principle is that the organism itself is a statistical model of its niche (Friston, 2011, 2013b). States of the organism (i.e., its phenotype, behavioral patterns, and so forth) come to statistically model the niche that it inhabits

processing is layered. The layered (hierarchical) structure of the generative model allows the model to capture the nested structure of statistical regularities in the world. This inferential architecture effectively allows the system to leverage new information dynamically and implement a 'bootstrapping' process, whereby the system extracts its own priors from its dynamic interactions with the environment. Computationally, each individual layer has the function of extracting and processing information leveraged to cope with regularities at a given level or scale. In this example, information about the visual scene is decomposed into high, medium, and low spatial frequency bands. Typically, low spatial frequency features change at a faster than high spatial frequency features. As such, lower spatial frequency information is encoded higher up in the processing hierarchy, to guide lower-level, faster processing of higher spatial frequency information. The hierarchical or layered statistical structure of the generative model enables it to recapitulate the salient statistical structure of those systems to which it is coupled. As discussed in the text, this need not imply semantic content (but does not exclude it either).

**57**

over evolutionary timescales (Badcock, 2012). Thus, phylogeny conforms to the free-energy principle as well, because the effect of natural selection is to select against organisms that are poor models of their environments. Those organisms that survive and thrive are those that embody, in this literal sense, the best generative models of their niche. Organism phenotypes can be described as conforming to the free-energy principle over developmental timescales in morphogenesis as well (Friston et al., 2015b). Generative models are thus not only 'embrained,' but embodied in an even stronger sense, over the timescales of phylogeny and ontogeny. This strong embodiment allows one to interpret free-energy approaches in a non-internalist way and to counter some objections raised against earlier formulations of predicting processing approaches (e.g., Hohwy, 2013; Clark, 2016). This multilevel embodiment of the generative model, as we shall argue below, extends to the concrete, material, humandesigned milieus (or 'designer environments') in which humans operate.

Some generative models (in this wide sense) involve semantic content and others do not (they involve something more minimal than satisfaction conditions, i.e., reliable covariation). The study of minds without content is compatible with more extensively content involving forms of (social and cultural) cognition that are scaffolded on such basic minds through processes of social learning and enculturation.

On the radical enactivist account, content-involving forms of intentionality emerge in the context of certain cultural practices in human forms of life (Hutto and Satne, 2015). Many of these practices involve multi-agent situations in which proper engagement requires forms of implicit perspective-taking and perspective-sharing (Sterelny, 2015). In some cases, such practices can involve explicit 'mindreading' as well, that is, inferring the beliefs, intentions, and desires of other agents as such (Michael et al., 2014). There is a long-running debate among anthropologists over the extent to which inferences about other people's mental states (as opposed to, say, bodily states) may reflect a folk psychology that is more pronounced among modern Western peoples (Robbins and Rumsey, 2008; Rumsey, 2013). This 'transparency of mind' folk psychology is contrasted in the literature with so-called 'opacity doctrines' found in other cultures, in which people's interior states are said to be 'opaque,' or unknowable. As recent multi-systems account of social cognition have shown, however, situations involving novel cues or too many orders of intentionality will often trigger 'higher' cognitive resources and compel humans to think about other people's intentions as such (Michael et al., 2014). Engagement with affordances in the human niche also often requires 'mindshaping,' as our interpretation of other agents' intentional profiles in turn shapes those same profiles through interpersonal loops (Sterelny, 2007, 2015; Zawidzki, 2013). Perspective-taking can be implicit and embodied in that organisms can act on situations by leveraging minimal models that encode information about other agents and their behavior without entailing the presence of semantic content (i.e., having satisfaction conditions). But this is not incompatible with the claim that perspective-taking and mindshaping abilities, in the human niche, often involve symbolically and linguistically mediated forms of communication, which substantially change the kind of affordance landscape available to human agents (Kiverstein and Rietveld, 2013, 2015).

Although the perspectivist focus on the dynamic embodied enactment of meaning in a shared social world is central to our understanding of cultural affordances (Gallagher, 2001, 2008; Fuchs and De Jaegher, 2009), our contention is that the acquisition of representational content in 'epidemics' of socially shared representations (Sperber, 1996; Claidière et al., 2014) entails that cognitive agents must be endowed with a neural-computational scaffolding adequate to such activities<sup>5</sup> . Even though basic cognition (and indeed, some forms of 'higher' cognition; Hutto and Myin, 2013) may be without content, given the symbolic and linguistic nature of human experience and culture, the human cognitive system must be equipped with the neural-computational resources needed to adequately couple with shared social representations, if we are to account for how the latter are transmitted stably and reliably. Semantic content is acquired through dense histories of embodied engagement with the environment. For humans, this involves participation in patterned, linguistically and symbolically mediated practices—which include patterns of shared attention and shared intentionality.

## Predictive Processing and Attention

One aspect of the architecture of predictive processing is crucial for our account of cultural affordances: the predictive processing model specifies a deep functional role for attention. Attention, on the predictive processing account, is modeled as 'precisionweighting,' that is, the selective sampling of high precision sensory data, i.e., prediction error with a high signal-to-noise ratio (Feldman and Friston, 2010). The efforts of the cognitive system to minimize free-energy operate not only on first-order, correlational statistical information about the distal environment, but on second-order statistical information about the signalto-noise ratio or 'precision' (that is, inverse variance) of the prediction error signal as well. This allows the system to give greater weight to less noisy signals that may provide more reliable information. Based on this information, the cognitive system balances the gain (or 'volume') on the units carrying prediction errors at specific levels of the hierarchy, as a function of precision. This control function, in effect, controls the influence of encoded prior beliefs on action-perception (Friston, 2010). Greater precision means less uncertainty; the system thus 'ups the volume' on high precision error signals to leverage that information to guide behavior. Attention, then, is the process whereby synaptic gain is optimized to 'represent' (in the sense of reliably co-varying with) the precision of prediction error in hierarchical inference (Feldman and Friston, 2010; Clark, 2016).

Precision-weighting is centrally important in these architectures and has been proposed as a mechanism of neural

<sup>5</sup>We should note a few limitations of the 'epidemic' metaphor: (i) representations are not merely transmitted through contagion, but through many different means, modes of communication, and practices that are themselves culturally mediated; (ii) they reside not just in individuals, but also in artifacts and institutions; and (iii) they are usually not simply replicated, but modified or transformed by each individual or institution that takes them up.

gating. Gating is the process whereby effective connectivity in the brain (Friston, 1995, 2011), that is, the causal influence of some neural units on others, is controlled by the functioning of distinct control units (Daw et al., 2005; Stephan et al., 2008; den Ouden et al., 2010). These are called 'neural control structures' by Clark (1998) (For assessments of the empirical evidence, see: Kok et al., 2012, 2013; Friston et al., 2015a). Attention-modulated 'gating' is the central mechanism that allows for the formation of transient task- and context-dependent coalitions or ensembles of neural units and networks (Sporns, 2010; Park and Friston, 2013; Anderson, 2014).

Thus, in the predictive processing framework, attention is the main driver of action-perception. Clark (2016, p. 148ff) describes possible implementations of this scheme in the brain. Much like for first-order expectations, the system encodes expectations about precision in the generative model, presumably in the higher levels of the cortical hierarchy (Friston et al., 2014). These signals, which carry context-sensitive second-order statistical information, then guide the balancing act between top–down prediction signals from the generative models and bottom–up error signals in attention (see **Figure 6**).

It has been argued that predictive processing models offer a plausible implementation for the neural-computational realization of affordance-responsiveness in the nervous system (Clark, 2016). As we shall see below, the free-energy model provides a mechanistic implementation of the dynamical gradient generation and consumption conception of affordance engagement examined above (Bruineberg and Rietveld, 2014). Free-energy is minimized through action and perception by the predictive processing hierarchy, which provides a mechanistic implementation of the descriptive-prescriptive aspect of affordances.

## CULTURAL AFFORDANCES AND SHARED EXPECTATIONS

We lack comprehensive accounts of how the conventions that give rise to sociocultural affordances are successfully internalized, both as implicit knowing how and explicit knowing that. As Searle and others (Sterelny, 2007; Tuomela, 2007; Tomasello, 2014; but see Zahavi and Satne, 2015) have shown, and as our models suggests, it takes higher-order levels of intentionality, meta-communication, and perspective-taking in order for symbolic conventions to be used and manipulated and for more complicated, self-referential thinking ("I know that she thinks that I believe that she intends to X," etc.), collective intentionality, and multiple orders of mindreading.

and to different expectations being encoded in the predictive hierarchy. Based in part on Figure 1 in Friston et al. (2014).

The question for the present essay is how this framework can be scaled up to account for cultural and social cognition and learning. The everyday phenomenology of affordances is one of possibilities for action and their variations; in other words, of expecting certain nested action possibilities and prescriptions for action. In effect, the phenomenology of affordances is a phenomenology of expectations about available and appropriate agent-environment couplings. The neural-computational models derived from the free-energy principle traffics in predictions and conditional probability distributions (called 'beliefs' in Bayesian probability theory, without any claim to correspond to the folk psychological notion). Arguably, the phenomenological correlate of these Bayesian beliefs can, at least at some (presumably higher) levels of the predictive hierarchy, be thought of as (or at least codetermine) agent-level expectations. Our remarks below focus on clarifying how the social scaffolding of agents leads to their acquisition of representational content in regimes of shared attention.

## Skilled Intentionality and Affordance Competition

On the radical embodied view, the central feature of the dynamic relations between organisms and environment is the tendency of the organism to move toward an 'optimal grip' on the situation. The optima in question, as nearly everywhere in biology, are local optima, rather than a single global optimum. Under the free-energy framework, the 'optimal grip' can be understood as the pattern of action-perception that most minimizes variational free-energy. The free-energy minimizing dynamics of the predictive hierarchy might be described as a kind of weighted or biased competition between different affordances, the 'affordance competition' hypothesis (Cisek, 2007; Cisek and Kalaska, 2010; Pezzulo and Cisek, 2016). This model of action selection theorizes that the cognitive system appraises different trajectories for motor action simultaneously during action selection (that is, appraising a whole field of affordances in parallel and dynamically settling on the most salient affordance).

Sport science provides an illustration of this tendency toward optimal grip (Hristovski et al., 2006, 2009; Chow et al., 2011). Studies of the dynamic interplay between a boxer's stance and position, and the action possibilities available to them as a function of stance and position, have shown that punching bags afford different kinds of strikes to boxers as a function of the distance between boxer and punching bag. Boxers tend to move their bodies to an optimal distance from the punching bag, specifically, one that affords the greatest variety of strikes. This is a case of moving toward optimal grip. When observing a painting, we also move our bodies and our gazes in a way that maximizes our grip on the scene or details observed. We might call such dynamic adaptive engagement with field of affordances in rolling cycles of action-perception 'skilled intentionality' (following Merleau-Ponty, 1945/2012; Rietveld, 2008b, 2012; Bruineberg and Rietveld, 2014).

Using the theoretical frameworks of dynamical systems and self-organization, Bruineberg and Rietveld (2014) have conceptualized this skilled intentionality as a kind of coping with the potentials that well up in the field of affordances, as a result of the dynamic relations between organism (with its phenotypical states, its states of action readiness, its concerns, etc.) and environment. More specifically, they suggest that skilled intentionality is the generation and reduction (or 'consumption') by the organism of a 'gradient' or potential tension in the field of affordances (which can be modeled using attractor dynamics). We sketched this approach in Sections "Landscapes and Fields" and "Meaning and Affordances," without the freeenergy framework. The full significance of dissipative dynamics in the field of affordances can now be appreciated.

Affordances that are relevant to the organism at a given time (solicitations) drive system dynamics by soliciting rolling loops of action-perception and are prescribed and consumed or dissipated by those very dynamics (Tschacher and Haken, 2007). That is, solicitations are equivalent to potentials in the field of affordances, which act as attractors on the organismenvironment dynamics, changing those affordances to which the organism is selectively open and receptive. The solicitations with which the organism engages, on this view, is the one that most effectively minimizes free-energy. Affect, attention, and affordances interact to sculpt a field of solicitations out of the total landscape of available affordances, adaptively and dynamically moving the organism toward an optimal grip on situations through action-perception. As the organism moves along a gradient toward an optimal grip, the gradient dissipates. The field of affordances thus changes dynamically along with perception-action and changes to states of the organism and environment. Responsiveness to the field, informed by states of the organism and environment, prescribe modes of optimal coupling. The radical embodied conception of cognition as skilled intentionality, then, can be modeled using systems theoretical models as a kind of selective responsiveness to salient available affordances or solicitations, modulated by states of the organism (concerns, interests, abilities) and states of the environment. This framework effectively bridges the descriptive levels of phenomenology, system dynamics, and cognitive functions or mechanisms.

To date, most work on affordances has focused on motor control and basic behaviors related to dynamical embodied coping (e.g., Chemero, 2009; Cisek and Kalaska, 2010; Pezzulo and Cisek, 2016). For a theory of cultural affordances, the notion of affordances must be extended to more complex features of the social and cultural niche inhabited by humans (Heft, 2001; Bruineberg and Rietveld, 2014; Rietveld and Kiverstein, 2014). Quintessential human abilities like language, shared intentionality, and mind-reading/perspective-taking emerge from human forms of life and are patterned by human sociocultural practices (Roepstorff et al., 2010), which in turn involve sophisticated forms of social cognition. We live in a landscape of cultural affordances.

## Shared Expectations, Local Ontologies, and Cultural Affordances

The upshot of our discussion so far is a general concept of skilled intentionality as selective engagement with a field of

affordances supported by embodied generative models. Skilled intentionality is a graded phenomenon. At one extreme, skilled intentionality consists in contentless direct coping. It has been suggested that this most basic form of intentionality, which Hutto and Satne (2015) call 'ur-intentionality,' acquires its tendencies for selective targeted engagement with the world in a 'teleosemiotic' process shaped by evolutionary history<sup>6</sup> . At this extreme, the only information (and affordances) needed are of the natural kind (exploitable reliable correlation). At the other extreme, we find stereotypical human intentionality, that is, symbolically dense and strongly content-involving forms of collectively and conventionally rooted intentionality (Kiverstein and Rietveld, 2015), which involves conventional information and affordances. This is a spectrum, and all points between these extremes are viable (at least prima facie). The teleological basis of this variation might be the needs, concerns, and abilities relevant to a given form of life, (Bruineberg and Rietveld, 2014; Rietveld and Kiverstein, 2014), in specific social niches with their own idiosyncratic shared representations, symbols, etc.

Our claim here is that cultural affordances (especially conventional ones) form a coordinated affordance landscape, which is enabled by sets of embodied expectations that are shared by a given community or culture. Social niches and cultural practices generally involve not isolated, individual affordances or expectations but local landscapes that give rise to and depend on shared expectations. We submit that these shared expectations—implemented in the predictive hierarchies, embodied in material culture, and enacted in patterned practices—contribute to the constitution of the landscape of affordances that characterizes a given community or culture. Indeed, shared expectations modulate the specific kinds of intentionality that are effective in a given community, determining the forms taken by skilled intentionality, especially the shared skilled intentionality of the kind that constitutes a patterned sociocultural practice.

Patterned practices are specific ways of doing joint activities in domain-specific material-discursive environments (Roepstorff et al., 2010). Echoing recent work on the natural origin of semantic content (Hutto and Satne, 2015; Sterelny, 2015), we hypothesize that such ontologies, as socially shared and embodied expectations, come to be acquired by the individual agent through their participative immersion in specific patterned practices available in multi-agent, symbolically and linguistically mediated forms of social life.

Building on work in cognitive science as well as by Hacking (1995, 1999, 2002, 2004), Kirmayer and colleagues have argued for an embodied, enactivist approach to the study of the multilevel feedback or 'looping' effects involved in jointlymediated narratives, metaphors, forms of embodiment, and mechanisms of attention (Kirmayer, 2008, 2015; Seligman and Kirmayer, 2008; Kirmayer and Bhugra, 2009; Kirmayer and Gold, 2012). In human life, the regularities to which agents are sensitive are densely mediated (and often constituted) by cultural symbols, narratives, and metaphors, which may explicitly reference or tacitly assume particular ontologies. These mechanisms shape social experience and in turn are shaped by broader social contexts.

Elsewhere, we have suggested that local, culturally specific ontologies can be understood as sets of shared expectations (Kirmayer and Ramstead, 2016). A 'local ontology' can be defined as a mode of collective expectation: agents expect the sociocultural world to be disclosed in certain ways rather than others and to afford certain forms of action-perception and nested variations to the exclusion of others. A local ontology, then, is a set of expectations that are shared by members of a cultural community. We claim that these sets of shared expectations are installed in agents through patterned practices that result in enculturation and enskillment. In the framework explored above, these ontologies codetermine the exact affordances that are available in a given niche, for they prescribe specific ways of being, thinking, perceiving, and acting in context that are situationally appropriate.

These local ontologies need not be explicitly formulated as metaphysical theories. They are more often implicit and acquired through participation in patterned practices and the enactment of customs and rituals, or embodied in the social material reality itself (as symbols, places, stories). Such distinctively human practices take place in social niches rich with narratives, symbols, and customs, which enable individuals to respond cooperatively and, at times, to infer other agents' states of mind. Such practices may underlie everyday processes of personperception. For example, as noted in the introduction, by age 5, children have acquired local ontologies and categories of personhood—which reproduce the dominant set of biases, expectations, and representations of their cultures—showing preference for dominant group culture often without being explicitly taught to do so, and despite their caregivers not consciously holding such views, even when these biases are not consonant with their minority identities (Clark and Clark, 1939; Kinzler and Spelke, 2011). These tacit views of others may arise both from the ways in which local niches are structured by social norms and conventions and from regimes of attention and interpersonal interactions shaped by cultural practices (Richeson and Sommers, 2016). Biases in person-perception will, in turn, influence subsequent social interaction and cooperative niche construction in a cognitive-social loop (Sacheli et al., 2015).

As discussed above, a number of theorists of embodied cognition have criticized the view that intersubjective interactions require that human beings be endowed with the capacity for mind-reading, opting instead for an explanation in terms of embodied practices and coupling (Gallagher, 2001, 2008; Fuchs and De Jaegher, 2009). Although we readily grant the importance of such embodied coping for basic minds on which more elaborate cognition can be scaffolded, we advocate a middle ground that posits both embodied contentless abilities and more contentful mindreading abilities (Michael et al., 2014; Tomasello,

<sup>6</sup> 'Teleosemiotics' is teleosemantics minus the semantics, that is, using the teleosemantic framework developed by Millikan (1984, 2004, 2005) to explain how organisms develop selective intentional response tendencies without trying to provide thereby an account of semantic content (Hutto and Myin, 2013). See also Kiverstein and Rietveld (2015) for a complementary account of minimal intentionality as a contentless form of skilled intentionality.

2014; Sterelny, 2015; Veissière, in review). Indeed, the framework we have proposed, which posits predictive processing hierarchies apt to engage with both natural and conventional information and affordances, can accommodate both modes of cognition. The view that human societies rely on explicit and implicit forms of mindreading does not commit us to intellectualism or to a strong content-involving view. The shared enactment of meaning, involving expectations about other agents, comes to constitute the shared, taken-for-granted meaning of local worlds, which in turn feeds back, in a kind of looping effect, to developmentally ground and scaffold the enactments of meaning by individual agents, by altering the shared expectations that are embodied and enacted in the social niche (Kirmayer, 2015). These shared ontologies shape experience by changing the abilities and styles of action-perception of encultured agents.

## Shared Expectations and Implicit Learning

We have already appealed to Grice's theory of meaning to clarify some aspects of affordances. Affordances come in a spectrum, ranging from those that depend only on reliable correlation to those that depend on shared sets of expectations. Grice's account, as improved by others (Sperber and Wilson, 1986; Levinson, 2000; Tomasello, 2014), can help account for how we successfully learn to detect and selectively respond to context in situations that involve higher order contextual appraisal, including perspective-taking and reading of other's goal-directed intent and actions. In higher-order, rule-governed semiotic contexts, the actual presence of others is not necessary for inferences to be made about the 'correctness' of affordances in terms of their correspondence to others' expectations, norms or conventions. The general internalized idea of how others would interpret a situation and context (or how a culturally competent actor would respond) suffices for 'meaning' to be derived or inferred.

Most of us have never been explicitly taught precisely how to behave, sit, move, speak, take turns, and interact with others in shared spaces such as metros, elevators, hallways, airplanes, university classrooms, bars, dance floors, janitors' closets, or the many other spaces we know not to enter. As mentioned in the introduction to this essay, children acquire the dominant social norms and appropriate behavioral repertories and responses without explicit instruction. Although we do occasionally receive explicit instructions, these do not seem necessary for normal social functioning; as Varela (1999) pointed out, we have acquired the implicit 'know how' to act appropriately. That is, human beings acquire characteristic, stereotypical ways of doing and being in response to social contexts; in a sense, each of these constitutes habitual 'microselves' as we variously engage the world as our 'getting-on-thebus-self' to our 'having-lunch-self,' etc., where each self is a style of situationally adequate and socially appropriate coupling to a context. How do we acquire the ability to selectively detect and respond to such sociocultural affordances? Or to rephrase the question in anthropological terms: how do us come to be socialized or enculturated for participation in shared worlds of expectations?

The highly stable conformity of behavior in all of these contexts goes beyond direct imitation (Michael et al., 2014). Many everyday situations involve coordinated action among many participants. Although some forms of coordinated group action can occur entirely through individual responses to local impersonal affordances (e.g., the swarming of birds), in order to read and master the social cues and scripts in complex human settings, the actors involved need to grasp the situation from the perspective of other actors. This perspective-taking is essential if each actor's appraisal of the situation is to have any counterfactual depth with regard to explicit social norms (e.g., inferring that one's behaving differently would fail to conform to others' expectations about correct behavior). However, as argued above, in some instances this perspective-taking might not involve explicit, content-involving processes; the expectations might simply be encoded and leveraged for the generation of adaptive behavior without mentalistic assumptions being made about agents at an explicit, conscious level. Thus, in any case, for a given space to afford the same engagements to a given population, that community must come to share a set of collective expectations indeed, shared expectations about others' expectations about our expectations, and so forth.

## REGIMES OF SHARED ATTENTION AND SHARED INTENTIONALITY

The framework we have outlined for cultural affordances allows us to reconsider the natural origins of content. We hypothesize that the central mechanism whereby cultural affordances are acquired, especially conventional, contentinvolving affordances, consists in the looping or feedback relations between shared intentionality and shared attention. Shared intentionality is enacted in various concrete, materially embedded cultural practices and embodied as shared sets of expectation. Shared attention is one such form of shared intentionality. We suggest that shared attention is crucial because directed attention modulates the agent's selective engagement with the field of affordances. Given the nature of the predictive hierarchy, to wit, to extract explicit and implicit statistical information, directing an agent's attention is tantamount to determining which expectations (Bayesian prior beliefs) will be encoded in the hierarchy. This, in turn, leads to different sets of abilities being implemented by the gating mechanisms of the predictive hierarchy. Under the free-energy principle, action-perception is guided attention (precision-weighting), and the gating process that is realized by attention itself rests on the expectations encoded in the generative models embodied by the organism. These high-level expectations about precision, which modulate allocations of attention (and thereby determine action-perception through gating), are leveraged to guide skillful intentional behavior. The sets of expectations embodied and enacted by organisms change the field of affordances. This mechanism, we submit, is exploited by culture in the acquisition of cultural affordances.

## Gating, Abilities, and Affordances

In the framework outlined above, we followed Rietveld and Kiverstein (2014) in defining an affordance as a relation between a set of features or aspects of the organism's material environment and the abilities available in that organism's form of life. We are now in a position to better define ability in terms of a gating control pattern, that is, a sequenced or coordinated process. An ability is simply the capability of an organism to coordinate its action-perception loops to skillfully engage an affordance in a way that is optimal under the free-energy principle. An ability, then, in the free-energy framework, includes a pattern of attention, in the specific sense employed by the free-energy framework. We use the term 'attention' not in the folk-psychological sense, as that effort or mechanism that allows us to attend to specific aspects of experience, but as the mechanism of precisionweighting that mediates neural gating and allows the agent to engage with specific affordances in action-perception cycles. Attention, in our technical sense, therefore modulates effective connectivity and, as such, determines the trajectories taken by the rolling cycles of action-perception. Typically, in the case of human agents, such patterns of attention are acquired over development.

We conjecture that we acquire our distinctively human abilities from our dense histories of temporally coordinated social interaction and shared cultural practices (Tomasello et al., 2005; Roepstorff, 2013). Attentional processes are central to this enculturation and installation of shared semantic content. In particular, the landscape of affordances available to the infant is sculpted, through joint-attentional practices that reflect sociocultural norms, into a field of relevant solicitations. Thus, participation in patterned practices allows the installation of socially, culturally, and situationally specific expectations, which, once acquired, determine agent allocations of attention (the acquisition of abilities) and, as a result, guide actionperception.

Joint (and, eventually, shared) attentional processes (Tomasello, 2014) provide a central mechanism through which the individual is molded to conform to specific group expectations and participate in forms of cooperative action. Joint and shared attention alters the field of affordances by directing the agent to engage with specific affordances, marking them out as relevant, and making them more salient. Given the nature of the predictive hierarchy, that is, to automatically extract statistical information about the distal world in its dynamic engagement (in actionperception), the agent will encode the regularities of the solicitations that it engages (that is, the relevant affordances to which it is directed in joint and shared attention). Of course, local practices of joint and shared attention themselves depend on agents sharing sets of expectations the same expectations that become encoded by agents as they participate in these practices. Through participation in patterned cultural practices that direct attention in specific ways, the agent acquires sets of expectations that gave rise, in the first instance, to (earlier versions of) that very form of cooperative action (see **Figure 6**). Cultural affordances are thus mediated by recursive regimes of shared attention, of which joint-attention is a special, signal case (Tomasello, 2014).

The study of everyday social interactions reveals how regimes of joint attention shape our understanding and sensory experiences of being in our worlds. For example, Goffman, who pioneered studies of face-to-face interaction in modern societies, showed how the 'anonymized,' 'surface character' of life in cities is routinized through what he called 'civic inattention'—that is, through the many ways in which strangers avert their gazes, avoid conversations or physical contact, and reinforce private boundaries in the public sphere (Goffman, 1971, p. 385). We can follow Goffman's lead to consider how different regimes of shared and joint-attention mediate lived experiences of meaning and being. Civic inattention, for example, is a specific regime of attention, but it is certainly not an absence of attention. In Goffman's 'Invisible City' model, attentional resources are mobilized to not pay attention to certain features of the world, particularly other agents caught in a symbolically marked game of allegiances that renders them strange or invisible.

## Looping the Loop: Regimes of Shared Attention and Skilled Intentionality

As we have seen above, in the predictive processing scheme, attention, understood as precision-weighting of prediction error signals, is a central mechanism behind the dynamical trajectory of action-perception. The expectations about precision that guide action-perception are acquired in ontogeny and stored as high-level priors, which have the effect of arbitrating the balancing act between top–down prediction and bottom–up error signals. It follows that one pathway by which cultural affordances may be transmitted is through the manipulation of attention. This may occur in a variety of ways including what we might call 'regimes of shared attention.' In the model of affordances outlined above, this kind of attentional modulation involves carving a local field of affordances out of the larger landscape of available affordances through social practices. Local environments and their associated practices are designed to solicit particular patterns of coordinated attention from participants (Kirchhoff, 2015a; Clark, 2016). In effect, these patterns act as dynamical attractors on the field of affordances, directing action-perception in some ways rather than others (Juarrero, 1999).

In this light, one can view social norms and conventions as devices to reduce mutual uncertainty, that is, consonantly with the free-energy framework, as entropy-minimizing devices (Colombo, 2014). One must know 'what is in the minds' of others (such as what one would see and how one would interpret another's action generally and in context) in order to make a successful inference (both explicit, content-involving or implicit, correlational inferences) about other agents in each situation. Goffman (1971) was hinting similar processes with his comments on the 'faces' we learn to perform when we interact with others in different situations. We can be a mentor in one situation and a mentee in another; a father in one and a friend in another. In Goffman's famous comments on

interaction in public, he describes (using other terms), how certain spaces afford more 'backstage,' 'off-screen' performances than others. The privacy of the home affords such relaxed 'offstageness,' and the bedroom and bathroom even more so. All these instances require inferential mindreading or perspectivetaking, that is, inferences about the presence or absence of other agents and their expectations as a normative guide for how one can behave. None of this depends specifically of whether these inferences consist in explicit mindreading or more implicit forms of embodied coupled enactments—both are compatible with our framework.

Now, we might suppose that the distinctly human abilities with which we are endowed result simply from better evolved predictive machinery, that is, more computationally powerful predictive hierarchies (Conway and Christiansen, 2001). However, as we argued above, in human ontogeny, it is more likely that affordances are learned through regimes of imitation, repetition, positive and negative conditioning, and culturally selective forms of attention (Meltzoff and Prinz, 2002; Whitehouse, 2002, 2004; Roepstorff and Frith, 2004; Banaji and Gelman, 2013; Veissière, 2016). The capacity for cultural learning may itself be a cultural innovation (Heyes, 2012). Indeed, the feedback or looping mechanisms between cultural practices of scaffolding individual attention (what we called regimes of attention) are themselves determined by the local ontologies (shared sets of expectations) and abilities (acquired patterns of attention and gating) of agents in that community. Repetition and reiteration of patterns of social and technological interaction, as well as reward for 'correct' inferences that denote an adequate grasp of relevance, prescription, and proscription (e.g., when a child 'gets' that some X means some Y, or figures out an 'appropriate' combination of meaningful elements in any given context), come to shape attentional mechanisms in ontogeny, and assist the child in successfully inferring a set of rules and categories (the culturally sanctioned sets of shared expectations).

Joint attention is usually understood as occurring in a dyad of two people, or between agents in direct interactional spheres of communication, gaze-following, finger-pointing, or other verbal or non-verbal cues (Vygotsky, 1978; Tomasello, 2014). To address more complex social situations, it is useful to revise current sociocognitive models of joint-attention to encompass fundamentally triadic situations in which 'the third' is the socially constituted niche of affordances, supported by local ontologies and abilities.

Shared human intentionality is sufficient to project joint attention to larger groups in the process of forming joint goals and inferring from joint expectations. Crucially, it commonly takes place without any direct interaction from members, in the many routinized, anonymous, symbolically and linguistically mediated forms of sociality, including engagement with social institutions.

To go beyond the 'toy models' of dyadic joint attention to grasp the process of culture transmission we need to study the dynamics of 'designer environments' (Goldstone et al., 2011; Salge et al., 2014). Human beings pattern their environments in a process of recursive niche construction, which in turn modulates the attributions of attention in individual agents, leading them to acquire certain sets of priors rather than others, in what Sterelny (2003) has called 'incremental downstream epistemic engineering.' This incremental process of constructing our own collective, epistemic niches, involves a kind of bootstrapping in which symbolically and linguistically mediated forms of human communication can be modeled as forms of re-entrant processing. Linguistically abled human beings produce patterned, structured outputs that become part of the material environment, and are subsequently picked up and further processed by other agents in ways that stabilize and elaborate a local social world (Clark, 2006, 2008). Indeed, human-constructed environments, which shape agent expectations and guide patterns of attention, can be viewed as another level of the generative statistical model of the niche, which human beings leverage to guide intelligent behavior in their sociocultural symbolically- and linguistically laden niches (Kirchhoff, 2015a; Clark, 2016). The prior knowledge that is leveraged in action-perception is thus encoded in multiple level and sites: in the hierarchical neural networks, in the organism's phenotype (over phylogeny and ontogeny), and in patterned sociocultural practices and designer environments.

Thus, our suggestion is that regimes of attention, which mediate the acquisition of cultural affordances (both natural and conventional), are enacted through patterned practices (especially those which modulate the allocation of attention) and are embodied in sundry ways: in the predictive hierarchies of individual agents in a community, as encoded sets of expectations, and in the concrete social and cultural world, as constructed human environments, designed to solicit certain expectations and direct attention.

## CONCLUSION

We have outlined a framework for the study of cultural affordances in terms of neural models of predictive processing and social practices of niche construction. This approach can help account for the multilevel forms of affordance learning and transmission of affordances in socially and culturally shared regimes of joint-attention and clarify one of the central mechanisms that can explain the natural origins of semantic content. The concepts of affordance and skilled intentionality in ecological, radical embodied, and enactivist cognitive science can be supplemented with an account of the nature of affordances in the humanly constructed sociocultural niches. Turning to cultural niche construction, we argued in favor of a conception of local ontologies as sets of shared expectations acquired through the immersive engagement of the agent in feedback looping relations between shared intentionality (in the form of shared embodied expectations) and shared attention (modulated by regimes of attention). We elaborated Grice's account of meaning by highlighting the dependence of selective responsiveness to cultural affordances on shared and joint intentionality, modes of conventionality and social normativity. We ended with an account of the patterned regimes

of attention and modes of social learning that might lead to the acquisition and installation of such ontologies and affordances, leading to agent enculturation and enskillment. We hope that our proposal of a framework for the study of cultural affordances will spur further research on multilevel, recursive, nested affordances and the expectations on which they depend.

## AUTHOR CONTRIBUTIONS

MR, SV, and LK made substantial contributions to the conception and design of the work. MR, SV, and LK drafted the work and revised it critically for important intellectual content; MR, SV, and LK have provided approval of the version to be published; MR, SV, and LK agree to be accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved.

## REFERENCES


## FUNDING

Work on this chapter was supported by grants from the Social Sciences and Humanities Research Council of Canada (Have We Lost Our Minds?, MR, award holder) and the Foundation for Psychocultural Research (Integrating Ethnography and Neuroscience in Global Mental Health Research, LK, PI).

## ACKNOWLEDGMENTS

We thank Paul Badcock, Jelle Bruineberg, Paul Cisek, Karl Friston, Michael Kirchhoff, Frank Muttenzer, Kris Onishi, Ishan Walpola, Eric White, and two reviewers for helpful discussions and comments on earlier versions of this paper. Thanks to Marie-Ève Lacelle and Mariana Zarpellon for help designing our figures.



Hacking, I. (2002). Historical Ontology. Cambridge, MA: Harvard University Press.




**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2016 Ramstead, Veissière and Kirmayer. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Fluent Speakers of a Second Language Process Graspable Nouns Expressed in L2 Like in Their Native Language

#### Giovanni Buccino<sup>1</sup> \*, Barbara F. Marino<sup>2</sup> , Chiara Bulgarelli <sup>3</sup> and Marco Mezzadri <sup>3</sup>

<sup>1</sup> Dipartimento di Scienze Mediche e Chirurgiche, Università Magna Graecia, Catanzaro, Italy, <sup>2</sup> Dipartimento di Psicologia, Università degli Studi di Milano-Bicocca, Milan, Italy, <sup>3</sup> Dipartimento di Discipline Umanistiche, Sociali e delle Imprese Culturali, Università degli Studi di Parma, Parma, Italy

According to embodied cognition, language processing relies on the same neural structures involved when individuals experience the content of language material. If so, processing nouns expressing a motor content presented in a second language should modulate the motor system as if presented in the mother tongue. We tested this hypothesis using a go-no go paradigm. Stimuli included English nouns and pictures depicting either graspable or non-graspable objects. Pseudo-words and scrambled images served as controls. Italian participants, fluent speakers of English as a second language, had to respond when the stimulus was sensitive and refrain from responding when it was not. As foreseen by embodiment, motor responses were selectively modulated by graspable items (images or nouns) as in a previous experiment where nouns in the same category were presented in the native language.

#### Edited by:

Maurizio Tirassa, University of Turin, Italy

#### Reviewed by:

Prakash Padakannaya, University of Mysore, India Christian Huyck, Middlesex University, United Kingdom

> \*Correspondence: Giovanni Buccino buccino@unicz.it

#### Specialty section:

This article was submitted to Cognitive Science, a section of the journal Frontiers in Psychology

Received: 02 March 2017 Accepted: 17 July 2017 Published: 03 August 2017

#### Citation:

Buccino G, Marino BF, Bulgarelli C and Mezzadri M (2017) Fluent Speakers of a Second Language Process Graspable Nouns Expressed in L2 Like in Their Native Language. Front. Psychol. 8:1306. doi: 10.3389/fpsyg.2017.01306 Keywords: embodied cognition, second language, semantics, objects, nouns

## INTRODUCTION

Embodied cognition maintains that language processing involves the recruitment of the same sensory, motor, and even emotional neural substrates recruited when one executes, perceives or feels the content of language material (Glenberg, 1997; Barsalou, 1999; Pulvermüller, 2001; Gallese and Lakoff, 2005; Zwaan and Taylor, 2006; Jirak et al., 2010; Buccino et al., 2016). If so, then processing graspable objects and the corresponding nouns should modulate in a similar manner the motor system. Observing objects and manipulating them recruit a sensorimotor circuit, including premotor and parietal areas (Jeannerod et al., 1995; Binkofski et al., 1999; Chao and Martin, 2000; Grezes et al., 2003a,b). This circuit on one side codes for the intrinsic features of objects that make them appropriate for manual action; on the other it selects and implements the most appropriate actions to manipulate those objects. There is evidence that the recruitment of the motor system during object observation is finely tuned to the intrinsic features of objects (Buccino et al., 2009; Makris et al., 2011).

As for nouns, several studies showed a modulation of the motor system activity depending on the intrinsic features (e.g., size, type of prehension required to interact with them) of objects expressed by nouns (Glover et al., 2004; Tucker and Ellis, 2004; Lindemann et al., 2006; Myung et al., 2006; Bub et al., 2008; Cattaneo et al., 2010; Gough et al., 2012, 2013)**.** Moreover, a specific modulation of hand motor responses has been shown during the processing of nouns referring to hand-related objects (Marino et al., 2013), as compared to foot-related objects. Two recent studies (Marino et al., 2014; Zhang et al., 2016) showed a similar modulation of the motor system during the processing

**69**

of objects and nouns belonging to the same category. Additionally in an fMRI study (Desai et al., 2016), during reading nouns expressing graspable objects activations were found in areas also involved in action performance, thus supporting a grounded view of semantics. Taken as a whole, current literature supports the notion that processing visually presented graspable objects and nouns referring to the same object category recruit common neural substrates crucially involving the motor system (Ganis et al., 1996; Vandenberghe et al., 1996). In this context it is worth reminding that pivotal neurophysiological studies (for review see Pulvermüller et al., 2009) showed an early recruitment (within 200 ms from stimulus presentation) of the motor system during language processing. Furthermore in behavioral studies the modulation of the motor system during language processing may change over time moving from an early interference (operating between 100 and 200 ms after stimulus onset) to a later facilitation (operating when responses are requested later than 200 ms from stimulus presentation), as maintained by some models (see Chersi et al., 2010; Garcia and Ibanez, 2016). What about the processing of nouns expressing natural graspable objects when presented in a second language (L2)? If language is embodied and grounded in the sensory, motor and even emotional representations of the speaker coding for the language content, then during processing graspable nouns in fluent speakers of L2 we should find a modulation of the motor system similar to that found for nouns presented in their mother tongue (L1). In the present study, we assessed the modulation of motor responses in native Italian speakers with a high competence in English as L2 (Level C1 of the Common European Framework of Reference for Languages, CEFR), using the same paradigm of a previous study (Marino et al., 2014) where verbal stimuli were presented in L1. In that study participants were requested to give a semantic judgment, namely whether the presented stimulus was sensitive or meaningless, pressing a button at 150 ms after the stimulus onset. Native Italian speakers showed a specific modulation of hand motor responses (namely slower motor responses) during the processing of graspable items (presented as either pictures or nouns) as compared to non-graspable ones. We interpreted these results as a manifestation of the motor system being engaged in two tasks (processing the graspable picture or word and giving the hand motor response), thereby supporting the notion that the motor system is necessary to process language material expressing a motor content. The experimental hypothesis underlying the present study was that in a similar group of participants fluent speakers of English as L2, this task would lead to similar modulation of motor responses found during the presentation of comparable stimuli in the L1, as foreseen by embodiment.

## METHODS

## Participants

Twenty-six right-handed undergraduate students from the University of Parma took part in the study (20 females; mean age = 22.07 ± 1.76 years). They were native Italian speakers who had an English language proficiency at the reference level C1 on the CEFR scale (Common European Framework of Reference for Languages: Learning, Teaching, Assessment). All had normal or corrected-to-normal vision, and reported no history of language disorders. They were unaware of the purpose of the experiment and gave their informed consent before testing. The study was conducted in accordance with the Declaration of Helsinki (1964) and the procedure recommended by the Italian Association of Psychology (AIP).

## Stimuli

Thirty-six English nouns (see **Appendix 1**) referring to natural objects and 36 pseudowords as well as 36 digital color photos (see **Appendix 2**) depicting natural objects and 36 scrambled images were used as stimuli. Eighteen nouns referred to natural graspable objects (e.g., "leaf ") and 18 to natural non-graspable objects (e.g., "fog"). The pseudo-words were built by substituting one or two consonants or vowels in each noun (e.g., "leat" instead of "leaf "). With this procedure, pseudo-words contained orthographically and phonologically legal syllables for the English language. The photos depicted 18 graspable objects and 18 non-graspable objects. **Figure 1** shows an example of each category. The scrambled images were built by applying an Adobe Illustrator distorting graphic filter (e.g., zigzag) to the photos depicting natural objects so to make them unrecognizable and then meaningless. All the photos and the scrambled images were 440 × 440 pixels.

The English nouns used as verbal stimuli and the nouns of the objects depicted in the photos were matched for word length [4.61, 4.72, 4.39, and 5.22 average letter number for graspable nouns, non-graspable nouns, graspable images, and non-graspable images, respectively; F(3, 68) = 1.32, p = 0.27], syllable number [1.17, 1.33, 1.17, and 1.33 average syllable number; F(3, 68) = 0.76, p = 0.52], and written lexical frequency [4.37, 4.68, 4.47, and 4.53 average number of occurrences per

million in Google search engine; F(3, 68) = 2.03, p = 0.12]. They were also matched for word imageability [i.e., how easily the word evokes a mental image of its referent; 5.36, 5.01, 5.23, and 5.17 average imageability score; F(3, 68) = 1.53, p = 0.21] and familiarity [i.e., how often one encounters the word referent in natural environments; 4.79, 4.29, 4.28, and 4.32 average familiarity score; F(3, 68) = 1.16, p = 0.33] as rated by 10 graduate and post-graduate student not involved in the experiment (7 females, mean age: 40.5 ± 12.9 years) using a seven-point scale (0: absent; 6 = extremely present). All the English nouns used as verbal stimuli and the nouns of the objects depicted in the photos had a reference level ranging between A1 and B1 except for 1 word of a B2 level on the CEFR scale. Whereas, the nouns of the objects depicted in the photos ranged from an A1 to a C1 level on the CEFR scale.

### Experimental Design and Procedure

The experiment was carried out in a sound-attenuated room, dimly illuminated by a halogen lamp directed toward the ceiling. Participants sat comfortably in front of a PC screen (HP 21.5′ LCD, 1,920 × 1,080 pixel resolution, and 60 Hz refresh rate). The eye-to-screen distance was about 57 cm.

**Figure 2** shows the experimental procedure. Each trial started with a black fixation cross displayed at the center of a gray background. After a delay of 1,000–1,500 ms (in order to avoid response habituation), the fixation cross was replaced by a stimulus item, either a noun/pseudo-word or a photo/scrambled image. The verbal labels were written in black lowercase Courier New bold (font size = 24). Stimuli were centrally displayed and surrounded by a red (RGB coordinates = 255, 0, 0) 440 × 440 pixels frame (20 pixels-wide line). The red frame changed to green (RGB coordinates = 0, 255, 0) 150 ms after the stimulus onset. The color change of the frame was the "go" signal for the response. Participants were instructed to give a motor response, as fast and accurate as possible, by pressing a key on a computer keyboard centered on participants' body midline with their right index finger. They had to respond when the stimulus referred to a real object, and refrain from responding when it was meaningless (go-no go paradigm). Stimuli remained visible for 1,350 ms or until participant's response. Blanch A custom program developed in the MATLAB environment was used for stimulus presentation and response time collection.

The experiment consisted of 1 practice block and 1 experimental block. In the practice block, participants were presented with 32 stimuli (4 photos of graspable objects, 4 photos of non-graspable objects, 8 scrambled images, 4 nouns of graspable objects, 4 nouns of non-graspable objects, and 8 nonsense pseudowords) which were not used in the experimental block. During the practice block, participants received feedback ("ERROR") after giving a wrong response (i.e., responding to a meaningless or refraining from responding to a real item), as well as for responses given prior to go signal presentation ("ANTICIPATION"), or later than 1.5 s ("YOU HAVE NOT ANSWERED"). In the experimental block, the 144 items selected as stimuli were randomly presented with the constraint that no more than three items of the same kind (verbal, visual)

or referring to objects of the same category (graspable, nongraspable, meaningless) could be presented on consecutive trials. No feedback was given to participants. Thus, the experiment, which lasted about 20 min, consisted of 72 go trials (36 nouns of objects, 50% graspable and 50% non-graspable, plus 36 photographs of objects, 50% graspable, and 50% nongraspable) and 72 no-go trials (36 non-sense pseudowords plus 36 scrambled images), and 32 practice trials, for a total of 176 trials. To sum up, the experiment used a 2 × 2 repeated measures factorial design with Object Graspability (graspable, non-graspable) and Modality (verbal, visual) as the withinsubjects variables.

## RESULTS

Trials with errors were excluded without replacement. Errors were not further analyzed given they were extremely rare (<5%). Three participants were excluded from the analysis because their error rate exceeded 10%. Response times (RTs) below 130 ms or above 1,000 ms were omitted from the analysis. This cut-off was established so that no more than 0.5% of correct RTs were removed (Ulrich and Miller, 1994).

Median values of remaining RTs were calculated for each combination of Object Graspability (graspable and nongraspable) and Stimulus Type (photo and noun). These data entered a 2-way repeated measures analysis of variance (ANOVA) with Object Graspability and Stimulus Type as the within-subjects factors. Partial eta square values (η 2 p ) are reported as an additional metric of effect size for all significant ANOVA contrasts.

The ANOVA revealed a main effect Object Graspability [F(1, 22) = 11.87, p < 0.003, η 2 <sup>p</sup> = 0.35], indicating that the participants gave slower responses to stimuli referring to graspable objects (387 ms ± 73) as compared to stimuli referring to non-graspable objects (371 ms ± 62). There was also a main effect of Stimulus Type [F(1, 22) = 15.72, p < 0.001, η 2 <sup>p</sup> = 0.42], reflecting slower responses to verbal stimuli than those to visual stimuli (395 ± 62 vs. 360 ms ± 70).

**Figure 3** shows the main results. Response times are expressed as means of medians.

## DISCUSSION

During the processing of graspable objects and nouns presented in English, in the present study Italian participants, fluent speakers of English as L2, showed the same kind of modulation of motor responses as participants in a previous experiment (Marino et al., 2014), where the same kind of stimuli were presented in their L1. In details, participants gave slower reaction times during the processing of graspable items as compared to non-graspable ones, independent of presentation (noun or picture). We forward that, as for the L1, to solve the task participants relied on the motor representations of potential hand interactions with the object expressed by the noun or depicted in the photo. In this way, the motor system was engaged in two tasks at the same time, that is processing the presented stimuli and performing a motor response. Hence participants paid a cost as revealed by a slowing down of their motor responses. These results are relevant within the neuroscientific literature on L2. Ullman's differential hypothesis (Ullman, 2001) claims that L2 acquisition cannot depend on the same brain mechanisms that are used to process the native language. Coherently in earlier studies in bilingual aphasics the observation of selective recovery of one language was often interpreted as evidence for a different neural representation of L1 and L2 (Albert and Obler, 1978). More recently, several brain imaging studies have led to the notion that L1 and L2 are processed by the same neural structures (Perani and Abutalebi, 2005; Abutalebi, 2008). However, differential activations were found when the age of acquisition of L2 and the level of fluency are taken into account (Liu and Cao, 2016). When considering grammatical and syntactic processing, several studies (Sakai et al., 2004; Dodel et al., 2005; Ruschemeyer et al., 2005, 2006; Golestani et al., 2006; Indefrey, 2006; Jeong et al., 2007) showed stronger activations in L2 speakers within areas classically known as devoted to syntax (Grodzinsky and Friederici, 2006), including Broca's region and the adjacent left inferior frontal gyrus, left prefrontal cortex, basal ganglia and cerebellum. These studies included late bilinguals and their findings have been interpreted as due to a stronger effort in processing L2 as compared to L1. The very few studies that assessed early bilinguals showed that, as compared to late bilinguals, these individuals more strongly recruited left inferior frontal gyrus and prefrontal cortex (Wartenburger et al., 2003; Hernandez et al., 2007). As for semantics processing, studies in the field show that L2 is essentially processed through the same neural substrates underlying L1 processing, including anterior inferior frontal cortex and supramarginal gyrus. Differences related to L2 are found for low proficiency and/or less exposed bilinguals in terms of greater engagement of the left inferior frontal gyrus or selective engagement of prefrontal areas. It should be underlined that the age of L2 acquisition seems to have no major role in the semantics domain (Perani and Abutalebi, 2005; Indefrey, 2006). In other words L2 proficiency seems to be the main and only determinant in the semantics domain since late bilinguals with native like L2 proficiency activate the same identical areas for both languages. In the present study, we tested in a semantic task a rather homogenous group of students with a high competence in English as an L2. In keeping with the current literature, the present findings clearly show that motor responses to verbal stimuli presented in L2 are similarly modulated as in L1. This, in turn, suggests and supports at behavioral level the notion that the neural mechanisms (and possibly the neural substrates) underlying the processing of nouns in different languages are shared, and overlap with those necessary to process the corresponding objects, when presented pictorially. As a whole, experimental evidence supports the neural convergence hypothesis (Green, 2003), according to which the acquisition of an L2 relies on a specified language system devoted to L1 and claims that potential neural differences between L1 and L2 are overwhelmed as proficiency in L2 increases. It is worth stressing that at difference with the previous study (Marino et al., 2014) using L1 stimuli, in the present one we found a different role for modality of presentation. With L1 stimuli, motor responses to graspable objects were faster with nouns than with photos. In the present study, motor responses to graspable items were faster with photos than with verbal stimuli. We have no clear explanation for this finding. However, if it is true, as forwarded in our discussion that L2 verbal labels share the same neural representations as for L1, then it may be that while processing L2 verbal items participants also re-enacted the correspondent L1 verbal labels. This strategy, in turn, might have led to an additional cost when processing L2 items, as revealed behaviorally by a further slowing down of motor responses.

The present findings are also relevant within the embodiment literature. In a recent paper (Buccino et al., 2016), it has been suggested that meaning is strictly grounded in experience: the same neural mechanisms and neural substrates devoted to make sensory, motor and even emotional experience are also recruited and re-enacted when individuals have to attribute a meaning to language expressing those experiences. According to this proposal, the meaning of the word "flower," for example, is not a particular flower or a bunch of flowers, and not even the stereotypical flower as a socially-defined entity that each speaker has to grasp in order to understand the meaning of that word. On the contrary, the word "flower" points at a cluster of flowerrelated real and concrete experiences that the speakers have made of that specific object called flower.

A recent proposal in the field of linguistics (Dor, 2015) seems to reach similar conclusions. This proposal defines a word as a "discrete instructor of imagination" whose basic function is to refer to and describe a set of personal experiences of the speakers. In other words, when communicating, the words have the primary role of expressing, on one side, a set of experiences that the utterer wants to focus on or convey, and of raising, on the other (or on the reader's side), an analogous set of personal experiences. Following this theoretical framework, the evidence that motor responses given to nouns presented in L2 were similar to those given for nouns in L1 (and for similar objects presented pictorially), strongly support the notion that, whatever the language used, at least for highly competent speakers, attributing a meaning to words implies re-enacting the neural substrates where experiences related to words are coded. Since we used graspable vs. non-graspable items, we found a specific modulation of motor responses common to the L1 and the L2.

In our opinion, these findings have implications in teaching and learning a second language. As discussed above, sensorimotor experience to which specific language elements refer appears central to language processing. If so, we believe that this notion is most relevant in second language learning and teaching: when a content has to be expressed and learned in a second language, it should refer to something which has already been experienced sensorially or motorically by the learner (Buccino and Mezzadri, 2015). This should lead language teachers (and learners) to adopt experience-based teaching methods whereby the content to be taught has to be targeted to the learner and revolve around the learner's experience. If experience does not support the language elements to be taught, the teacher should encourage the development of specific sensori-motor experiences which will then be verbally labeled. In other words, during the language teaching process the approach to any new language input should move from the (re)activation of pre-existing knowledge and experience. In keeping with this general statement, there is evidence that action may improve the acquisition of a foreign language or, in general, new words (Macedonia and von Kriegsten, 2012; Kronke et al., 2013) and may even be effective in the rehabilitation of language (Marangolo et al., 2010).

## ETHICS STATEMENT

The study was conducted in accordance with the Declaration of Helsinki (1964) and the procedure recommended by the Italian Association of Psychology (AIP). Participants gave their informed consent before testing. The study was approved by the Ethical Committee of the University of Parma.

## AUTHOR CONTRIBUTIONS

GB contributed to plan the experiment, to collect, and analyze data and to write the manuscript; BM contributed to prepare stimuli and to collect and analyze data; CB contributed to prepare stimuli and to collect and analyze data; MM contributed to plan the experiment and write the manuscript.

## REFERENCES


Theory – Research – Application, ed U. M. Lüdtke (Amsterdam: John Benjamins Publishing Company), 191–208.


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2017 Buccino, Marino, Bulgarelli and Mezzadri. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

## APPENDIX 1

List of the English nouns used as verbal stimuli, their related pseudo-word, graspability of their referents, word frequency on Zipf scale (logarithmic frequency of occurrence) (van Heuven et al., 2014) word length (letter and syllable number), and reference level on CEFR scale verified with the English Vocabulary Profile [based on the Cambridge Learner Corpus (CLC), a multi-billion word corpus of spoken and written current English].

## APPENDIX 2

List of the nouns of the objects depicted in the photos used as visual stimuli, their word frequency on Zipf scale (logarithmic frequency of occurrence), word length (letter and syllable number), and reference level on CEFR scale verified with the English Vocabulary Profile [based on the Cambridge Learner Corpus (CLC), a multi-billion word corpus of spoken and written current English].


# Chained Activation of the Motor System during Language Understanding

#### Barbara F. Marino<sup>1</sup> \*, Anna M. Borghi 2, 3 †, Giovanni Buccino<sup>4</sup> and Lucia Riggio<sup>5</sup> \*

<sup>1</sup> Dipartimento di Psicologia, Università di Milano-Bicocca, Milano, Italy, <sup>2</sup> Dipartimento di Psicologia, Università di Bologna, Bologna, Italy, <sup>3</sup> National Research Council (CNR), Istituto di Scienze e Tecnologie della Cognizione, Rome, Italy, <sup>4</sup> Dipartimento di Scienze Mediche e Chirurgiche, Università "Magna Graecia" di Catanzaro, Catanzaro, Italy, <sup>5</sup> Dipartimento di Neuroscienze, Sezione di Fisiologia, Università di Parma, Parma, Italy

Two experiments were carried out to investigate whether and how one important

#### Edited by:

Andriy Myachykov, Northumbria University, UK

#### Reviewed by:

Angela Bartolo, University of Lille Nord de France, France Yannick Wamain, Universtity of Lille, France

#### \*Correspondence:

Barbara F. Marino barbara.marino@unimib.it Lucia Riggio lucia.riggio@unipr.it

> † Present Address:

Anna M. Borghi, Dipartimento di Psicologia Dinamica e Clinica, Università di Roma "La Sapienza", Rome, Italy

#### Specialty section:

This article was submitted to Cognition, a section of the journal Frontiers in Psychology

Received: 16 November 2016 Accepted: 31 January 2017 Published: 20 February 2017

#### Citation:

Marino BF, Borghi AM, Buccino G and Riggio L (2017) Chained Activation of the Motor System during Language Understanding. Front. Psychol. 8:199. doi: 10.3389/fpsyg.2017.00199 characteristic of the motor system, that is its goal-directed organization in motor chains, is reflected in language processing. This possibility stems from the embodied theory of language, according to which the linguistic system re-uses the structures of the motor system. The participants were presented with nouns of common tools preceded by a pair of verbs expressing grasping or observational motor chains (i.e., grasp-to-move, grasp-to-use, look-at-to-grasp, and look-at-to-stare). They decided whether the tool mentioned in the sentence was the same as that displayed in a picture presented shortly after. A primacy of the grasp-to-use motor chain over the other motor chains in priming the participants' performance was observed in both the experiments. More interestingly, we found that the motor information evoked by the noun was modulated by the specific motor-chain expressed by the preceding verbs. Specifically, with the grasping chain aimed at using the tool, the functional motor information prevailed over the volumetric information, and vice versa with the grasping chain aimed at moving the tool (Experiment 2). Instead, the functional and volumetric information were balanced for those motor chains that comprise at least an observational act (Experiment 1). Overall our results are in keeping with the embodied theory of language and suggest that understanding sentences expressing an action directed toward a tool drives a chained activation of the motor system.

#### Keywords: embodied language, motor chains, motor system, affordances, reaction times

## INTRODUCTION

According to the theory of re-use (Anderson, 2010), evolution works in a conservative way, building on previously formed systems. In line with this general view, the embodied theory of language (Gallese and Lakoff, 2005; Glenberg, 2007; Gallese, 2008; Glenberg and Gallese, 2012) claims that the linguistic system re-uses the structures and the organization characterizing the motor system. From this perspective, language comprehension is rooted in action as it recruits the same neural areas that are active while performing movements.

Thus far, some of the most striking evidence for embodied language comes from psychophysiological and neuroimaging studies documenting an activation of the motor system during the comprehension of nouns referring to manipulable objects (i.e., tools), which parallels

**77**

the activation of the same system while both actively manipulating and passively viewing these objects (see e.g., Martin et al., 1996; Grafton et al., 1997; Binkofski et al., 1999; Chao and Martin, 2000; Gerlach et al., 2002; Creem-Regehr and Lee, 2005; for a review see Martin, 2007). For example, Cattaneo et al. (2010), using transcranial magnetic stimulation (TMS) technique, found an involvement of ventral premotor cortex in processing of nouns referring to tools. Rueschemeyer et al. (2010) with functional magnetic resonance imaging (fMRI) showed that functionally manipulable words (i.e., nouns denoting man-made objects that require manipulation for use, such as "hammer") elicit greater levels of activation in the fronto-parietal sensorimotor areas than volumetrically manipulable words (i.e., nouns denoting manmade objects that can be held in the hand but function without regular manipulation, such as "clock"). Similar findings were obtained in a TMS study by Gough et al. (2012).

Coupled evidence for an activation of motor system while processing of nouns referring to manipulable objects has been collected also in behavioral studies such as those investigating the influence of tool noun presentation on planning and executing hand movements in categorization tasks (Tucker and Ellis, 2004), lexical decision tasks (Myung et al., 2006), and gesture imitation tasks (Bub et al., 2008). For example, it has been demonstrated that planning reach-to-grasp movements aimed at using a tool interacts with the semantic activation of nouns related to the action goal of the tool use (Lindemann et al., 2006). More recently, it has been shown that the motor activation by nouns referring to manipulable objects is time-locked to the very moment at which the meaning of these nouns is accessed (Marino et al., 2013) and overlaps with the one driven by passive viewing of the manipulable objects to which the nouns are related (Marino et al., 2014).

Although, the evidence of an activation of the motor system by action-related nouns is compelling, little focus has been placed upon the aspects of this activation. The embodied theory of language predicts that the basic features characterizing the motor system should be maintained in processing action-related language. The present study is addressed at investigating this possibility. Specifically, it is aimed at testing whether and how one important characteristic of the motor system, that is its goal-directed organization in motor chains, is reflected in language processing.

The chained organization of the motor system has been recently described in some neurophysiological studies on monkeys (e.g., Fogassi et al., 2005; Bonini et al., 2011). These studies showed that, in the parietal and premotor cortices, the majority of neurons coding a specific motor act have a different activation pattern depending on the overall goal of the action sequence in which the act is embedded. For example, among the neurons selective for object grasping, some discharge best when the act is executed for placing the object into a container while others for eating it. These results show that a basic mechanism of the motor system is structuring of the same motor act in different motor chains. In humans, the chained organization of the motor system has been revealed by the results collected in a brain imaging study by Iacoboni et al. (2005) who found a significant signal increase in the brain areas where hand actions are represented (the posterior part of the inferior frontal gyrus and the adjacent sector of the ventral premotor cortex) while watching a grasping gesture embedded in a motor chain (i.e., grasp-todrink and grasp-to-clean) as compared to watching the same gesture alone. An important assumption of the present study is that such chained organization can characterize language as well.

A model based on motor chains and language has been proposed by Chersi et al. (2010) to account for contradictory findings that understanding verbs associated to different effectors can lead to either facilitation or interference in motor responses (e.g., Boulenger et al., 2006). However, to our knowledge, the issue of a chained activation of the motor system during the comprehension of linguistic material related to action has not yet been directly addressed by either behavioral or brain imaging studies. To investigate this possibility we used the procedure employed by Stanfield and Zwaan (2001), and by Borghi and Riggio (2009). Participants were presented with sentences composed by the noun of a tool and a pair of verbs expressing different grasping motor chains (i.e., grasp-to-move the tool, grasp-to-use the tool) or observational motor chains (look at-to-grasp the tool, look at to stare the tool). Each sentence was followed by a picture of a tool graspable with either a precision or power grip, with its handle oriented to the right and its functional part (i.e., the tip) oriented to the left, and vice versa. Participants were required to decide whether the tool in the picture was the same as the one presented in the sentence (i.e., word-picture matching task). If the chain organization of motor system is encoded in language, then the priming effect typically observed when the object described in the sentence overlaps with the object represented in the following picture, should be shaped by the motor chain expressed by the pair of verbs with which the noun was combined. In particular, given that objects are represented primarily in terms of their actions, a priming advantage of the motor chains containing the act of grasping over the pure observational motor chain should be observed. Moreover, since tools are represented primarily in terms of their function (e.g., Costantini et al., 2011), a primacy of the grasp-to-use motor chain over the other motor chains containing the act of grasping should be found. Finally, if a chained activation of the motor system during language processing occurs, then the motor information evoked by the noun of the tool should contain different details depending on the motor chain expressed by the sentence in which the noun is embedded. Specifically, details related to the relation between the hand and the graspable part of the tool (i.e., the handle) should be evoked by nouns of tools embedded in the grasp-to-move and look at-to-grasp motor chains. In contrast, details pertaining also to the relation between the hand and the functional part of the tool (i.e., the tip) should be somehow evoked by nouns of tools embedded in the grasp-to-use motor chain.

## EXPERIMENT 1

## Materials and Methods

### Participants

Thirty-four students of the University of Parma (14 males and 20 females, mean age ± SD, 21.2 ± 3.8 years) took part in the experiment. All were right-handed native Italian speakers (mean Edinburgh Handedness Questionnaire score ± SD, 0.85 ± 0.13, Oldfield, 1971) and were unaware of the purpose of the study. The participants had normal or corrected to normal vision and reported no history of speaking and/or motor disorders. All the participants gave a written informed consent before testing. The study was conducted in accordance with the ethical standards laid down in the 1964 Declaration of Helsinki and fulfilled the ethical standard guidelines recommended by the Italian Association of Psychology. The experimental protocol was also approved by the research ethics committee at the University of Parma.

### Materials

Fourteen digital color photographs of common tools were selected. All the tools, which had a shape elongated along the vertical axis, were composed of two structurally separated parts, the handle and the tip serving to accomplish the tool function. All the tool exemplars were chosen so that neither their handle not their tip differently popped out of the background due to a large difference in color contrast. Half of the tools were graspable with a power grip (e.g., hammer), while the other half were graspable with a precision grip (e.g., pen). A complete list of the tools is provided in **Table 1**. Each tool was scaled to be displayed within a 130 × 130 mm white frame, with its handle pointing downwards both to the left and to the right, and its tip pointing upwards both to the right and to the left, respectively.

We also created 6 different kinds of Italian imperative sentences in which a pair of transitive verbs, separated by the copulative conjunction "e" (English translation: and), was followed by a determinative article and a noun that referred to one of the tools represented in the pictures (i.e., verb 1 + conjunction + verb 2 + determinative article + noun). The sentences could include two action verbs (i.e., "afferra e sposta"—grasp and move, or "afferra e usa"—grasp and use), an observational verb and an action verb (i.e., "osserva e prendi" look at and grasp, "osserva e indica"—look at and point, or "afferra e fissa"—grasp and stare), or two observational verbs (i.e., "osserva e fissa"—look at and stare). Sentences that included the verbal pairs "look at and point" or "grasp and stare" served as catch trials to induce the participants to process the whole sentence (see below). All the verb pairs included in the critical sentences were composed of 6 syllables, and had a similar additive relative lexical frequency ("afferra e sposta" = 1.70 + 20.49 = 22.19, "afferra e usa" = 1.70 + 28.63 = 30.33, "osserva e prendi" = 22.86 + 6.66 = 29.52, "osserva e fissa" = 22.86 + 9.90 = 32.76 in occurrences per million; see (Laudanna et al., 1995) ∼ 3,798,000 words). The nouns referring to the tools graspable with either a precision or a power grip were matched for syllable number [average values: 3.00 vs. 3.14 syllables, t(1, 12) = 0.36, p = 0.73] and lexical frequency [average values: 10.59 vs. 1.78 in occurrence per million, t(1, 12) = 1.46, p = 0.17].

#### TABLE 1 | Tools used in Experiment 1 and 2.


### Procedure

The experiment was carried out in a sound-attenuated room, dimly illuminated by a halogen lamp directed toward the ceiling. The participants, tested individually, sat comfortably in front of the screen of a computer monitor (a ViewSonic 18 inch flat color CRT monitor with a 1024 × 768 pixel resolution, interfaced with an Intel R CoreTM 2.40 GHz computer equipped with an ATI Radeon HD 2600 Pro Video Board) with their head supported by a chin rest in order to maintain a stable eye-to-screen distance of 57 cm.

Each trial started with a black fixation cross displayed at the center of a white background. After a delay of 600 ms, the fixation cross was replaced by a sentence. The sentence was centrally displayed and written in black lowercase Courier New bold font (point size = 24). The sentence remained visible for 800 ms. After the offset of the sentence a digital photograph of a tool centrally appeared. The timer started operating simultaneously to the onset of the visual tool which remained visible until the participant responded, or until 1500 ms had passed (see **Figure 1**).

The participants were randomly assigned to one of two groups. Those in the first group were asked to press the "p" key with their right index finger when the tool in the photograph was the same as that mentioned in the sentence, and to press the "q" key with their left index finger when it was not; the participants in the other group were required to do the opposite. All the participants were instructed to refrain from responding in case the sentence included the verb pair "look at and point" or "grasp and stare" (i.e., catch-trials). The keyboard was positioned in front of the participants, so that the two response keys were placed symmetrically with respect to the participants' body midline. All the participants were informed that their response times (RTs) would be recorded and were invited to respond as quickly as possible while still maintaining accuracy. They received feedback after pressing the wrong key in a critical trial ("ERROR"), after pressing a key in a catch trial ("ERROR"), after taking more than 1500 ms to respond ("TOO SLOW"), or after responding correctly ("CORRECT"). The feedback remained visible for 1500 ms.

In the first phase of the experimental session, the participants performed a block of 48 practice trials. Different tools as those used in the following test phase were used. If the participants felt confident with the task, then the test phase was started, otherwise another block of practice trials was run. In the test phase, the participants performed a block of 336 trials. The

order of stimulus presentation was randomized and Verb Pair ("grasp and move," "grasp and use," "look at and grasp," "look at and point," "look at and stare," "grasp and stare"), Tool Handle Orientation (left, right), and Tool Grip (precision, power) factors were fully balanced. Each of the 6 verb pairs was combined with each of the 14 tool nouns (for a total of 84 different sentences). Each sentence, which was presented 4 times during the whole test phase, was followed twice by the photograph of the tool mentioned in the sentence (i.e., same-tool trials),—with the handle directed downward once to the left and once to the right—, and twice by the photograph of a tool not mentioned in the sentence (i.e., different-tool trials),—with the handle directed downward once to the left and once to the right. The photos used in the different-tool trials depicted tools which were graspable with the same kind of grip as that required to grasp the tool mentioned in the sentence for half of the times, and with a different kind of grip for the other half of the times.

Throughout the test phase, the participants could take a break after every 42 trials. For each trial, RTs and errors were recorded. Stimulus presentation and response collection were controlled using the software package E-Prime, version 1.1. (Psychology Software Tools, Inc.).

## Results

All cases in which the participants responded to a critical trial by pressing a wrong key and all cases in which the participants responded to a catch trial were considered as errors (i.e., response errors and catch errors, respectively). The grand mean percentage error was 16.92% (response errors = 10.72%, catch errors = 6.20%). The catch trials and the wrong critical trials were excluded from the analysis. Six participants (2 from the first group and 4 from the second group) were removed from the analysis because their error rate was statistical outlier (i.e., two standard deviation higher than the error rate grand mean). Before being analyzed, the response times (RTs) measured for the correct critical trials were screened for outliers: RTs two standard deviations higher or lower than the individual grand mean were omitted from the analysis (19.23%). Given that there was no speed-accuracy tradeoff, as determined by plotting the error rate across decile temporal bins (see **Figure 2A**), the remaining RTs measured for the same-tool trials and for the different-tool trials were separately submitted to a mixed analysis of variance (ANOVA) with Response Hand (left, right) as a between-subjects factor and Tool Grip (power, precision), Verb Pair ("grasp and move," "grasp and use," "look at and grasp," "look at and stare"), and Hand/Handle Orientation (compatible, incompatible) as within-subjects factors. This latter factor was obtained by combining the orientation of the handle of the tool shown in photographs (left, right) with the hand used to give the response (left, right). In the ANOVA on RTs measured for the different-tool trials, Grip Congruency (congruent, incongruent) between the tool mentioned in the sentence and the visual tool was considered as an additional within-subjects factor. Besides, the levels of Tool Grip factor referred to the hand posture appropriate for grasping the visual tool. In both the ANOVAs, skewness and kurtosis of RTs distributions were examined to determine whether the data were normally distributed. Following West et al. (1995), we assumed as reference to substantial departure from normality, an absolute skewness index > 2 and an absolute kurtosis index > 7. Violations of sphericity were controlled using Mauchly's test of sphericity and either Greenhouse-Geisser or Huynh-Feldt corrections were applied according to Girden (1992). If a Greenhouse-Geisser epsilon of >0.75 was found, the Huynth-Feldt corrected value was used for that parameter. Otherwise the Greenhouse-Geisser corrected value was used. Partial eta squared values (η 2 p ) were reported as a metric of effect size for all significant ANOVA contrasts. The Duncan's test was used for post-hoc comparisons with a significance level set at 0.05. Only the significant results will be reported.

#### Same-Tool Trials

RTs were normally distributed as the skewness and kurtosis indexes were within acceptable ranges of normality (skewness = 0.93 ± 0.05; kurtosis = 0.38 ± 0.10). The ANOVA revealed a main effect of Tool Grip [F(1, 26) = 18.37, MSE = 155,733, p < 0.001, η 2 <sup>p</sup> = 0.41], indicating longer response latencies when the tool was graspable with a power grip than with a precision grip (647 vs. 609 ms). The analysis also showed a main effect of Verb Pair [F(1.9, 49.46) = 36.83, MSE = 540959, p < 0.0001, η 2 <sup>p</sup> =

0.59]. Post-hoc comparisons indicated that RTs were faster for the sentences comprising the verb pair "grasp and use" (582 ms) as compared to all the other verb pairs ("grasp and move" = 608 ms, p < 0.05; "look at and grasp" = 609 ms, p < 0.05; "look at and stare" = 705 ms, p < 0.001). RTs measured for the verb pairs "grasp and move" and "look at and grasp" did not differ from each other (p = 0.91), but they were significantly faster than RTs measured for the verb pair "look at and stare" (all ps < 0.001). In addition, there was a significant 2-ways interaction between Verb Pair and Hand/Handle Orientation [F(3, 78) = 2.80, MSE = 17675, p < 0.05, η 2 <sup>p</sup> = 0.10], indicating that in the "grasp and use" trials the participants were faster when their response hand was spatially incompatible with the tool handle (562 ms) than when the response hand and the tool handle were spatially compatible (601 ms, p < 0.02, see **Figure 2B**). No effect of Hand/Handle Orientation was found for the other verb pairs.

#### Different-Tool Trials

RTs were normally distributed as the skewness and kurtosis indexes were within acceptable ranges of normality (skewness = −0.33 ± 0.05; kurtosis = 0.02 ± 0.10). The ANOVA revealed a main effect of Verb Pair [F(3, 60) = 20.34, MSE = 346,782, p < 0.0001, η 2 <sup>p</sup> = 0.50], with longer response latencies for the sentences comprising the verbal pair "look at and stare" (715 ms) as compared to the other sentences (all ps < 0.001). In contrast with the results found for the same-tool trials, the posthoc comparisons revealed no significant differences between the sentences comprising the verb pair "grasp and use" (628 ms) and those expressing the other grasping sequences ("grasp and move" = 619 ms, "look at and grasp" = 654 ms, all ps > 0.07).

### Discussion

The results of both the analyses showed converging support for our hypothesis that the chained organization is encoded in language. In line with the idea that objects are represented primarily in terms of the actions they afford, we found that in the same-tool trials the sentences expressing the pure observational motor chain (i.e., look-at-to-stare) primed the participants' responses less efficiently than the sentences expressing a chain in which the motor act of grasping was embedded (for convergent findings on the difference between observation and action sentences, see Borghi and Riggio, 2009; Costantini et al., 2011). In addition, consistently with the fact that tools are represented primarily in terms of their function, the time required to perform the word-picture matching task in the same-tool trials was modulated by the final goal of the action expressed by the sentence. The sentences expressing the grasp-to-use motor chain primed the participants' responses more efficiently than the sentences expressing the grasping motor chains not overtly aimed at using the tool (i.e., grasp-to-move and look-at-to-grasp). This finding (see Costantini et al., 2011; Lee et al., 2012, for similar results) points toward the possibility that the grasping motor chain aimed at using a tool, as compared with the other motor chains, selectively triggers the activation of the most crucial motor information of the conceptual representation of the tool, that is its functional information (i.e., the motor information about how to use an object). As already observed in previous studies, functional knowledge associated with manipulable manmade objects is an important component of their conceptual representation (e.g., Kellenbach et al., 2003; Vainio et al., 2008; Jax and Buxbaum, 2010). This knowledge is activated very early when identifying words or reading them for meaning and showed a marked benefit in priming tasks (Moss et al., 1997; Myung et al., 2006; Bub et al., 2008; Bub and Masson, 2010).

The idea that the grasp-to-use motor chain drove a selective activation of the functional information during tool noun understanding is further supported by the third result collected in the same-tool trials. In particular, we found that the graspto-use motor chain led to faster responses when the response hand and the handle of the visually presented tool were spatially incompatible. While the inversion of the affordance effect has been occasionally found in previous studies with objects (e.g. Pellicano et al., 2010; Kostov and Janyan, 2015), to our knowledge it is the first time in which it is reported in language processing. This remarkable inversion of the classic affordance effect (i.e., faster responses when the response hand and the handle of a graspable object are spatially compatible, see e.g., Tucker and Ellis, 1998) suggests that the functional information evoked by the grasp-to-use motor chain included details pertaining to the tool tip which subserves tool usage. It is likely that the activation of these details drove the attention of the participants to focus on the functional portion of the tool during the successive processing of the visual stimulus, so that the tip of the tool acquired a directional meaning and generated the inverted affordance effect which parallels the Simon effect that occurs with centrally presented stimuli conveying spatial information, such as arrows (e.g., Tipples, 2002). Notably, recent findings suggest that the way we shift attention or explore an object is biased toward action-relevant information (e.g., Handy et al., 2003; Roberts and Humphreys, 2011; Ambrosini and Costantini, 2017). In particular, Ambrosini and Costantini (2017) showed that participants mostly fixate the action-related, functional part of the tools, regardless of its visual saliency. Crucially, the effect was strongly reduced when participants were required to tie their hands behind their back. The results lead to the conclusion that the action-relevant object information at least in part guides gaze behavior and visual attention.

The lack of a canonical affordance effect in the sentences expressing the grasping motor chains not overtly aimed at tool usage does not imply that no motor information was evoked during the comprehension of the tool noun. Indeed, these sentences were significantly more effective than the sentences containing the pure observational motor chain in priming participants' responses. More likely, the non-functional grasping chains triggered, along with the functional information, the activation of the volumetric information (i.e., the motor information about how to manipulate an object) with the result that the details associated to the tool tip were contrasted with the details associated to the tool handle, generating no observable bias in the participants' responses. This is in keeping with the results of recent studies (Bub et al., 2008; Bub and Masson, 2010; Jax and Buxbaum, 2010; Pellicano et al., 2010) showing that both functional and volumetric information are activated in parallel by visual tools and that a convergence between these two kinds of information takes place, with the result of a conflict or a summation depending on their mutual consistency.

The idea of a parallel and converging activation of functional and manipulation information during tool noun comprehension is also supported by the results of the different-tool trials. In particular, when the tool in the sentence and the tool in the picture did not share the same function, the advantage for the grasp-to-use motor chain over the other grasping motor chains disappeared. This finding converges with results collected by Myung et al. (2006), showing that nouns denoting man-made objects can prime one another only if hand gestures related to their conventional use are similar (e.g., running a lexical decision on the word "piano" was faster when it was preceded by the word "typewriter" than by a control word), and indicates that presenting the picture of a different tool as that mentioned in a grasp-to-use sentence, suppressed the primacy of the functional information evoked by the noun, thus redressing the balance between the functional and manipulation information.

## EXPERIMENT 2

The word-picture matching task used in Experiment 1 turned out to be quite difficult. Indeed, for most of the participants more than one block of training trials was required. Moreover, at the end of the experimental session, the participants often reported that the task was quite demanding, mainly because of the high number of verbs and their combinations they had to retain in order to accomplish the task. The difficulty in performing the word-picture matching task was also revealed by the grand mean percentage error which resulted to be relatively high (16.92% of total trials) and by the high variability of the data. In Experiment 2, we aimed at reducing the task difficulty by decreasing the number of the verb pairs used in both critical and catch trials. We also aimed at reproducing the inverted affordance effect observed in Experiment 1.

## Materials and Methods Participants

Twenty-four right-handed students of the University of Parma (12 males and 12 females, mean age ± SD, 26.25 ± 3.47 years, mean Edinburgh Handedness Questionnaire score ± SD, 0.75 ± 0.19) took part in the experiment. The selection procedure of the participants was the same as in Experiment 1. All the participants gave a written informed consent before testing. The study was conducted in accordance with the ethical standards laid down in the 1964 Declaration of Helsinki and fulfilled the ethical standard guidelines recommended by the Italian Association of Psychology. The experimental protocol was also approved by the research ethics committee at the University of Parma.

## Materials

Fourteen digital color photographs of tools, identical to those in Experiment 1, were used as visual stimuli. Unlike Experiment 1, in this experiment we created only 4 kinds of imperative sentences with the same structure as the one previously used. These sentences always included the verb "afferra" (or its synonymous "prendi," English translation: grasp) along with an action verb (i.e., "afferra e usa"—grasp and use, "prendi e muovi"—grasp and move, "afferra e disturba"—grasp and disturb, and "prendi e ostacola"—grasp and hinder). The sentences comprising the verbal pairs "grasp and use" and "grasp and move" were already used in Experiment 1 and served as critical trials. Sentences comprising the verbal pairs "grasp and disturb" and "grasp and hinder" worked as catch trials. It should be noted that the combination of these two verb pairs with a noun referring to a tool gave rise to sentences which did not make sense. We used this kind of sentences as catch trials so that the participants would not have to memorize the verb pairs to which they have to refrain from responding. As in Experiment 1, the verb pairs used in the critical sentences were composed of 6 syllables, and had a similar additive relative lexical frequency. Moreover, nouns referring to the precision-grip and power-grip tools were matched for syllable number and lexical frequency (see Materials of Experiment 1).

## Procedure

The same procedure was used as in Experiment 1, except that the participants performed a block of 32 trials in the training phase and a block of 224 trials in the test phase. In both phases, the order of stimulus presentation was randomized and Verb Pair ("grasp and use," "grasp and move," "grasp and disturb," and "grasp and hinder"), Tool Handle Orientation (left, right), and Tool Grip (precision, power) factors were fully balanced. As in the test phase of Experiment 1, each of the 4 verb pairs was combined with each of the 14 tool nouns (for a total of 56 different sentences). Each sentence, which was presented 4 times during the whole test phase, was followed twice by the photograph of the tool mentioned in the sentence (i.e., same-tool trials),—with the handle pointing once downward to the left and once downward to the right—, and twice by the photograph of a tool not mentioned in the sentence (i.e., different-tool trials), with the handle pointing once downward to the left and once downward to the right. The photographs used in the differenttool trials depicted tools which were graspable with the same kind of grip as that required to grasp the tool mentioned in the sentence for half of the times, and with a different kind of grip for the other half of the times. Throughout the test phase, the participants could take a break after every 56 trials.

## Results

The grand mean percentage error was 3.16% of total trials (response errors = 1.96%, catch errors = 1.17%). The catch trials and the wrong critical trials were excluded from the analysis. Three participants were removed from the analysis because their error rate was statistical outlier (i.e., two standard deviation higher than the error rate grand mean). Before being analyzed, the RTs measured for the correct critical trials were screened for outliers according to the same criteria as applied in Experiment 1 (4.25% of correct RTs were omitted from the analysis). Given that there was no speed-for-accuracy tradeoff, as determined by a plot of error rate across decile temporal bins (see **Figure 3A**), the remaining RTs measured for the same-tool trials and for the different-tool trials were separately submitted to a mixed ANOVA with the same between- and within-subjects factors as considered in Experiment 1. As in Experiment 1, skewness and kurtosis of RTs distributions were examined to determine whether the data were normally distributed (values for departure from normality: skewness >2 and kurtosis >7; West et al., 1995). Partial eta squared values (η 2 p ) were reported as a metric of effect size for all significant ANOVA contrasts. The Duncan's test was used as a post-hoc test with a significance level set at 0.05. As before, only significant results have been reported.

## Same-Tool Trials

RTs were normally distributed as the skewness and kurtosis indexes were within acceptable ranges of normality (skewness = 1.42 ± 0.07; kurtosis = 2.38 ± 0.14). As in Experiment 1, the ANOVA revealed a main effect of Tool Grip [F(1,20) = 29.76, MSE = 44576, p < 0.001, η 2 <sup>p</sup> = 0.60], with longer response latencies when the tool was graspable with a power grip than with a precision grip (551 vs. 519 ms), as well as a main effect of Verb Pair [F(1,20) = 12.10, MSE = 28,776, p < 0.003, η 2 <sup>p</sup> = 0.38], indicating faster RTs for sentences comprising the verbs "grasp and use" (522 ms) than "grasp and move" (548 ms). In addition, the ANOVA revealed a significant interaction between Tool Grip and Verb Pair [F(1,20) = 4.90, MSE = 6343, p < 0.04, η 2 <sup>p</sup> = 0.20], indicating that the primacy of the "grasp-and use" sentences in priming subjects' responses was confined to the tools graspable with a precision grip (precision-grip tools = 499 ms, power-grip tools = 544 ms; p < 0.001). Noticeably, there was also a significant interaction between Verb Pair and Hand/Handle Orientation [F(1,20) = 15.67, MSE = 25,068, p < 0.001, η 2 <sup>p</sup> = 0.44], showing an inversion of the affordance effect for the sentences containing the verbs "grasp and use" (congruent orientation = 531 ms, incongruent orientation = 513 ms; p < 0.05) and a classic affordance effect for the sentences containing the verbs

"grasp and move" (congruent orientation = 533 ms, incongruent orientation = 563 ms; p < 0.003; see **Figure 3B**).

#### Different-Tool Trials

RTs were normally distributed as the skewness and kurtosis indexes were within acceptable ranges of normality (skewness = 1.72 ± 0.07; kurtosis = 3.47 ± 0.15). The ANOVA revealed a main effect of Hand/Handle Orientation [F(1, 20) = 4.64, MSE = 13,117, p < 0.05, η 2 <sup>p</sup> = 0.19], indicating faster RTs when subjects' response hand was spatially incompatible with the tool handle (590 ms) that when it was compatible (603 ms).

## Discussion

Decreasing the number of verb pairs used in the critical trials and reducing the cognitive load for the catch trials led to a marked reduction of the task difficulty. Indeed, the grand mean of percentage error was much lower than that of Experiment 1 (3.16 vs. 16.92%, respectively).

The results for the same-tool trials converge and extend those collected in the previous experiment. First, it is confirmed our interpretation that while understanding sentences obtained by combining a pair of action verbs with the noun of a tool, the motor information evoked by the noun is modulated by the specific motor-chain expressed by the verbs, as the primacy of the grasping-to-use motor chain over the grasp-to-move motor chain in priming word-picture matching was replicated.

Second, we were able to reproduce the inverted affordance effect observed in Experiment 1 for the "grasp and use" sentences. Interestingly, the removal of the sentences containing at least an observational act (i.e., "look at and grasp" and "look and stare") enabled the classic affordance effect to be significantly detected in those trials where the grasp-to-move chain motor was used. It is likely that, under this condition, the non-functional grasping motor chain was plainly contrasted with the functional grasping motor chain, rather than being assimilated to the observational motor chains. As a consequence, the manipulation information evoked by the noun was capable of prevailing over the functional information, causing the affordance effect.

The results for the different-tool trials confirmed those collected in Experiment 1, since the advantage for the functional grasping motor chain over the non-functional motor chain in priming picture matching completely disappeared. In addition, there was an inversion of the affordance effect independently of the specific motor chain expressed by the sentence likely reflecting that coding the functional part of the visual tool assured a better recognition when it was inconsistent with the tool mentioned in the sentence.

## GENERAL DISCUSSION

In two experiments, we investigated the chained activation of the motor system during language understanding by exploring the priming effects exerted by the comprehension of a tool noun on the recognition of a tool displayed in a picture presented shortly after. The tool noun was combined with a pair of action verbs to form a sentence expressing different grasping and observational motor chains. Overall, our results reveal an important role of the context as defined by the sentence in affordance perception; this is in line with previous studies documenting the importance of both verbal and visual context in affordance perception (e.g., Kalénine et al., 2012, 2016; Borghi and Riggio, 2015). More specifically, we found that accessing the meaning of the tool noun activated motor information that changed in accordance with the final goal of the motor chain expressed by the verbs. The functional information prevailed on the volumetric information when the noun of the tool was embedded in the grasp-to-use motor chain. Motor information of the volumetric kind was likely activated for the grasp-to-move motor chain and when the tool noun was embedded in the look-at-to-grasp motor chain. Instead, action information was absent if the pure observational motor chain was used (i.e., look-at-and-stare), as the slower RTs reveal.

The advantage of the precision-grip tools over the powergrip tools that we found in both the experiments is in contrast with a number of studies showing lower RTs in responding to power-grip tools as compared to precision-grip tools (Ehrsson et al., 2001; Borghi and Riggio, 2009; Kalénine et al., 2014). It is likely that our functional motor chains selectively activate motor information related to a grasping gesture performed using a precision grip and this may be the likely reason why we found an advantage of the prevision-grip tool over the power-grip tool. This possibility is supported by the results of Experiment 2, showing that the precision-grip tool advantage is consistent only in trials where the grasp-to-use motor chain was used.

The current study is novel as it provides evidence that the motor information activated while understanding nouns of common tools has a goal-directed structure. In other words, the motor information these nouns are capable of activating is dependent on the aim of the global action performed with the tool and described by the other linguistic component of the sentence. This is in keeping with the idea that the activation of motor information by graspable objects always demands a selection of competing motor information (e.g., Fagg and Arbib, 1998). The context in which the objects are presented is thought to be a crucial factor that drives this selection. In an fMRI study, Iacoboni et al. (2005) showed reliable differences in the activity of the inferior frontal region during the observation of a grasping action (i.e., a hand seizing a cup) carried out in different contexts (i.e., "before tea" vs. "after tea"), likely indicating an automatic coding of the intention/goal behind the observed action (i.e., grasping the cup for drinking vs. grasping the cup for cleaning, respectively). More recently, Mizelle and Wheaton (2010), evaluating the neural correlates for tool identification and conceptual understanding with electroencephalography (EEG), found greater activity over the left temporo-parietal junction (that is thought to be part of a manipulation network) for tools presented in a matching functional-related context (i.e., followed by objects upon which the tools can act) than in a mismatching functional-related context. Preliminary evidence supporting a context-guided selection of motor information evoked by graspable objects was found also when a verbal presentation was used. For example, Lee et al. (2012), using eyemovement recording, found that the activation time course of functional and volumetric motor information by words of tools was modulated by the linguistic context in which the words are embedded (neutral vs. action-relevant context). Similarly, Marino et al. (2012) found that the motor information evoked during the comprehension of nouns of graspable objects was shaped by the sensorimotor specificity expressed by the sentence in which the nouns are embedded.

Since context (mostly expressed by verbs) and objects (expressed by nouns) are not presented in parallel by language, as it is for vision, it is an open question how these two factors interact in activating and selecting the motor information. Two processes are theoretically possible. According to Bub and Masson (2010), a process of activation-then-selection occurs during sentence understanding. Specifically, different types of

## REFERENCES


motor information become active in response to a noun denoting a manipulable object. This activation is followed by a selection of relevant motor programs which is determined by the context expressed by the remaining linguistic components of the sentence. The current study provides evidence compatible with the alternative process, that is selection-then-activation: understanding action-related verbs, that in most of west European languages, such as Italian and English, precede the nouns defining the graspable objects to which the verbs refer, seems to automatically select a motor intention that enhances the activation by the nouns of the most contextually-relevant set of motor information among those possible.

Our finding of a goal-directed structure of motor information evoked by nouns referring to tools is consistent with the idea of a chained activation of the motor system during the comprehension of action-related linguistic material. This supports the embodied theory of language (Gallese and Lakoff, 2005; Glenberg, 2007; Gallese, 2008; Glenberg and Gallese, 2012) according to which, the linguistic system re-uses the structures of the motor system (see Rizzolatti and Arbib, 1998 for the basic role of the motor system in language evolution). Taken together the results of the current work suggest that processing combinations of action-related nouns and verbs involve the activation of the cortical motor system in a manner that parallels the organization of motor behavior and provide some hints that the syntax of language may be equivalent to the syntax of action (Dominey et al., 2003; Gallese, 2007, 2008; Clerget et al., 2009; Fazio et al., 2009; Pulvermüller and Fadiga, 2010; Marino et al., 2013).

## AUTHOR CONTRIBUTIONS

BM designed and performed the experiments, analyzed the data, contributed to the discussion of the data, wrote the manuscript, and prepared the figures. AB designed the experiments, prepared the stimuli, contributed to the discussion of the data, and revised the manuscript. GB designed the experiments, contributed to the discussion of the data, and revised the manuscript. LR designed the experiments, contributed to data analysis, contributed to the discussion of the data, and revised the manuscript. BM, AB, and LR reviewed the manuscript.

## ACKNOWLEDGMENTS

This work was supported by the FP7 project ROSSI, "Emergence of communication in Robots through Sensorimotor and Social Interaction," Grant agreement n. 216125.

in man: evidence from an fMRI-study. Eur. J. Neurosci. 11, 3276–3286. doi: 10.1046/j.1460-9568.1999.00753.x


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2017 Marino, Borghi, Buccino and Riggio. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Embodiment and Emotional Memory in First vs. Second Language

Jenny C. Baumeister<sup>1</sup> \*, Francesco Foroni1,2, Markus Conrad<sup>3</sup> , Raffaella I. Rumiati1,4 and Piotr Winkielman5,6

1 International School for Advanced Studies (SISSA), Trieste, Italy, <sup>2</sup> School of Psychology, Australian Catholic University, Sydney, NSW, Australia, <sup>3</sup> Department of Cognitive, Social, and Organizational Psychology, Universidad de La Laguna, San Cristobal de La Laguna, Spain, <sup>4</sup> National Agency for the Evaluation of Universities and Research Institutes (ANVUR), Rome, Italy, <sup>5</sup> Department of Psychology, University of California at San Diego, La Jolla, CA, USA, <sup>6</sup> Faculty of Psychology, SWPS University of Social Sciences and Humanities, Warsaw, Poland

Language and emotions are closely linked. However, previous research suggests that this link is stronger in a native language (L1) than in a second language (L2) that had been learned later in life. The present study investigates whether such reduced emotionality in L2 is reflected in changes in emotional memory and embodied responses to L2 in comparison to L1. Late Spanish/English bilinguals performed a memory task involving an encoding and a surprise retrieval phase. Facial motor resonance and skin conductance (SC) responses were recorded during encoding. The results give first indications that the enhanced memory for emotional vs. neutral content (EEM effect) is stronger in L1 and less present in L2. Furthermore, the results give partial support for decreased facial motor resonance and SC responses to emotional words in L2 as compared to L1. These findings suggest that embodied knowledge involved in emotional memory is associated to increased affective encoding and retrieval of L1 compared to L2.

Edited by:

Anna M. Borghi, Sapienza University of Rome, Italy

#### Reviewed by:

Marta Ponari, University of Kent, UK Adrienne Wood, University of Wisconsin–Madison, USA

#### \*Correspondence:

Jenny C. Baumeister j.c.baumeist@gmail.com

Received: 30 November 2016 Accepted: 01 March 2017 Published: 23 March 2017

#### Citation:

Baumeister JC, Foroni F, Conrad M, Rumiati RI and Winkielman P (2017) Embodiment and Emotional Memory in First vs. Second Language. Front. Psychol. 8:394. doi: 10.3389/fpsyg.2017.00394 Keywords: embodiment, emotional memory, first language, second language, EMG, facial motor resonance, skin conductance

## INTRODUCTION

Theories of grounded, or embodied cognition propose that the role of the body goes well beyond that of an instrument by which our actions are realized. Instead, these theories argue that bodily processes contribute to our perception, feelings, thoughts, and behavior (for reviews, see Barsalou, 2008; Winkielman et al., 2015). In the embodied view, knowledge about emotions is grounded in internal systems including the motor, sensory, and autonomic nervous systems. For example, upon the mere viewing of an emotional word or picture, a series of bodily reactions is evoked: Skin conductance (SC; an indicator of physiological arousal) and heart rate (HR) may increase, and we may produce spontaneous facial expressions reflecting a stimulus' relevant emotional connotation. For example, the zygomaticus major muscle, involved in smiling, has been shown activate in response to positive stimuli, such as pictures and words (Larsen et al., 2003). This involuntary activation of mimetic muscles, sometimes also called facial muscle resonance, can be measured by electromyography recordings (EMG) from probes placed directly above the facial muscle of interest (Cacioppo and Petty, 1981). The range of emotional stimuli to which the mimetic muscles automatically react includes facial expressions (e.g., Dimberg et al., 2000), emotional tone (Quené et al., 2012), as well as emotional words and sentences (e.g., Foroni and Semin, 2009, 2013; Davis et al., 2015; Foroni, 2015; Fino et al., 2016).

However, some recent studies suggest that these embodied activations are less evolved in a second language (L2) as compared to a first language (L1; see Pavlenko, 2012, for a review). For example, Eilola and Havelka (2010) showed that L1 speakers reacted with higher SC to negative and taboo words compared with neutral and positive words while performing an emotional Stroop task (naming color of emotional words). No such pattern was observed for L2 speakers. In line with this, other studies have demonstrated higher SC responses in late bilinguals when they listened to or rated emotional words, phrases, or reprimands in L1 but not in L2 (Harris et al., 2003; Harris, 2004; Caldwell-Harris and Ayçiçegi-Dinn, 2009 ˇ ). A recent study extending this line of research indicated that a reduction in embodied responses to L2 might also reflect only partial activation of facial motor resonance (Foroni, 2015).

The idea that facial motor resonance is less developed in an L2 gains importance when considering that the theories of embodied cognition regard facial motor resonance as an important source of embodied knowledge. It is thought to facilitate affective processing and help us to understand the emotional connotation carried by a given stimulus (e.g., Niedenthal, 2007; Winkielman et al., 2008; Wood et al., 2016). For example, blocking participants' mimic muscles and thereby hindering their spontaneous facial motor resonance has been shown to impair the recognition of emotional words and other people's facial expressions, and to increase difficulty in processing of emotional sentences (e.g., Oberman et al., 2007; Niedenthal et al., 2009; Havas et al., 2010; Ponari et al., 2012; Davis et al., 2015). In a recent study (Baumeister et al., 2015), we extended this line of research by showing that blocking facial motor resonance during encoding or retrieval not only interferes with initial recognition of emotional words but also impedes their later retrieval. When facial motor resonance had been blocked during the experiment the usually observed memory advantage for emotional words over neutral ones [Emotional Enhancement of Memory (EEM) effect; see Hamann, 2001, for a review] was hampered.

Considering this latter finding, and bearing in mind that an L2 has been found to only partially evoke facial motor resonance (Foroni, 2015), the question arises as to whether such a presumable 'disembodiment' of L2 (see also Pavlenko, 2012) might also reflect in diminished affective processing and a weakened EEM effect for emotional L2 words. Several introspective reports, surveys, interviews, and clinical observations suggest that people remain emotionally distant from an L2 if it was not learned during early childhood<sup>1</sup> (see Caldwell-Harris, 2015, for a recent review). The cognitive and behavioral effects of this emotional distance to an L2 are well documented (e.g., Colbeck and Bowers, 2012; Keysar et al., 2012; Duñabeitia and Costa, 2015). However, there has been a debate as to whether these behavioral differences also extend to the EEM effect.

If facial motor resonance occurring in response to emotionladen words is decreased in L2 as compared to L1, then this could interfere with an EEM effect in L2. In line with this hypothesis, a study by Anooshian and Hertel (1994) resulted in an EEM effect for L1 but not for L2. Yet, the absence of an EEM effect in L2 is not a consistent phenomenon. Ayçiçegi ˇ and Harris (2004) found that both L1 and L2 displayed an EEM effect, and the effect was even stronger in L2. In a followup to their research, the authors modified the incidental study task to control how deeply participants processed the emotional connotation of the stimuli. This time, the results revealed an EEM effect for L1 but not for L2, provided that the study phase required deep encoding (Ayçiçegi-Dinn and Caldwell- ˇ Harris, 2009). A possible explanation for these different results may be that encoding of L2 is more effortful than encoding of L1. When L2 and L1 words are mixed during encoding, as was the case in the procedures employed by Ayçiçegi and ˇ Harris (2004) and Ayçiçegi-Dinn and Caldwell-Harris (2009) ˇ , the elaborative processes and novelty effects associated with L2 may interfere with the processing of words presented in L1. In order to avoid such carry-over effects and distractive task-switching, it could be important to present words in separate blocks for each language. Furthermore, if deep and elaborative processing is encouraged by task instruction for both L2 and L1, then this may lead to a more comparable encoding depth for L2 and L1. These two aspects have been taken into consideration for the experimental set-up of the present study.

The present study aimed to explore the link between embodied processes and memory for emotional content within the frame of L1 and L2 processing. A group of late Spanish/English bilinguals underwent a classical memory task involving encoding and retrieval of both emotionally charged and neutral words in both languages. Facial muscle EMG activity and SC responses were obtained during the encoding phase, in which participants performed a categorization task, which required them to categorize words into "associated to emotion" or "not associated to emotion." This specific categorization task was meant to induce deep word processing, thought to encourage participants to internally simulate their emotional content (Niedenthal et al., 2009). We predicted that the processing of emotional L2 words would elicit less facial motor resonance and reduced SC responses in comparison to the processing of emotional L1 words. Since embodied simulations have been shown to be modifiable on different levels, including their magnitude, onset, and duration (Simmons et al., 2008), we theorized that the presumable weaker response of facial motor resonance to an L2 could either lead to lower magnitude, to a delay and abbreviation in the mimetic muscle responses, or both. Furthermore, we hypothesized that any reduction of embodied simulations in L2 will impact the initial perception and later memory of emotional words. Regarding the encoding phase, this means that accuracy for categorizing emotional words should be reduced in L2. The presumable link between facial motor resonance and memory for emotional content (Baumeister et al., 2015) raises the expectation that in terms of the memory performance the EEM would be present in L1 but absent or reduced in L2. Finally, we

<sup>1</sup>This study only reviews, discusses and draws conclusions from studies in which participants acquired their second language in late childhood, meaning after the age of 8. Whenever we use the term L2 within this study, we refer to a L2 that was learned after the age of 8.

speculated that participants' levels of facial motor resonance would covary with the retrieval accuracy of emotional wordtypes in L1. Since processing information in L2 may depend more than L1 on executive control functions (Keysar et al., 2012), our hypothesis in regard to any correlations between motor resonance activation and memory for emotional words in L2 was less definite.

## MATERIALS AND METHODS

## Participants

Thirty-two young healthy late bilinguals (17 females; mean age: 26.4 ± 5.2) recruited in Southern California and bordering areas of Mexico participated in the experiment. Participants' firstlearned (native) language was either English (13 participants) or Spanish (19 participants), and they spoke their second language (Spanish or English, respectively) at an advanced level. Participants self-assessed (cf. Marian and Neisser, 2000; Pavlenko, 2005) their levels of speaking, reading, and understanding of L2 on a scale from 1 (very poor) to 10 (perfect) as being on average very good [speaking: M = 8.5. SD = 1.1; reading: M = 8.8, SD = 1.1; understanding: M = 8.9, SD = 1.1; ratings ranged from 7 (good) to 10 (perfect)]. Participants reached fluency in their L2 at a mean age of 15 (SD = 6.3; age range from 8 to 32 years). Though, most participants had started learning their L2 in a classroom setting, all participants reported to have spent a minimum of 12 months in a country where their L2 was the native language (either because of immigration or because of a student exchange) and to have reached fluency in their L2 during that time. In a classical 1-min letter fluency task, participants produced on average 11 words in L1 (SD = 3.1) and nine words in L2 (SD = 3.2). A pairedsamples t-test showed that this difference was significant, t(28) = 2.5, p = 0.02. Three participants were excluded from all analyses, two because of insufficient fluency in L2 and one due to self-judged onset of fluency in L2 earlier than age 8. Technical malfunctioning led to a loss of data of three further participants, leaving data of 26 participants for the analyses.

## Stimuli

### Pilot

All final stimuli were selected from a large pool of words composed mainly by word pairs (English–Spanish translations) from an affective database for English and Spanish words (Conrad et al., n.d.), and some emotional words used in a previous research project. A total pool of 345 word pairs was selected for piloting. Words were pre-selected to be associated to the concepts of Happiness or Anger (see also Niedenthal et al., 2009, for the same emotion concepts) or to be neutral according to the previous ratings they had received. To allow comparability, all words were rated again in both languages by 24 native speakers (12 native English speakers rating the English words; 12 native Spanish speakers rating the Spanish words) in an online survey. Not all participants completed all blocks of the pilot, but all words received ratings by at least 10 participants. Ratings were done in regard to the words' association to the concepts of anger, happiness, and overall emotionality on a scale from 1 (not at all associated to) to 9 (very much associated to). Further ratings were done by the same native speakers in regard to imageability and arousal, again on scales from 1 to 9.

## Final Word-Selection

In order to be selected as a stimulus, the happy and angry wordpairs were required to be rated across both languages > 3.5 on the respective emotion scale, to be associated to emotional content in general (rating > 3.0), and to not be associated to the other emotions considered (rating < 2.5). Word-pairs were considered as neutral when they were rated in both languages below 3.3 both on the general and the specific emotionality scales. In this way, 320 word-pairs were selected. Happy words were rated higher on the happiness scale (M = 6.8, SD = 1.10) than angry (M = 1.1, SD = 0.13) and neutral words (M = 1.7, SD = 0.52), and angry words were rated higher on the anger scale (M = 6.2, SD = 1.5) in comparison with happy (M = 1.5 SD = 0.60) and neutral words (M = 1.5, SD = 0.24). Furthermore, neutral words were rated lowest on overall emotionality (M = 1.6, SD = 0.60) in comparison with both happy (M = 5.7, SD = 1.6) and angry words [M = 6.0, SD = 1.5; all Fs(2,639) > 1006, ps < 0.001]. There were no differences between the languages for neither the individual emotion ratings nor in terms of arousal and imageability, frequency or word length (all ps > 0.25). Furthermore, the individual word categories did no differ in terms of frequency, word length, or imageability. This was true also when analyzed separately for each language [all Fs(2,319) < 1.72, ps > 0.18], nor when averaged across them [all Fs(2,639) < 2.0, all ps > 0.13]. As expected, neutral words were less arousing than happy and angry words [F(2,639) = 379, p < 0.001]. Detailed information on individual variable ratings per language and word category are given in Supplementary Table 1. All words were divided into eight wordlists (four English wordlists and four Spanish wordlists). Each wordlist was composed of 20 happy words (e.g., Joy, Sweets), 20 angry words (e.g., Murderer, Harassment), and 40 neutral words (e.g., Code, Subject). These wordlists were assembled to form four sets, each including one English and one Spanish wordlist. An English word and its Spanish translation were never assigned to the same set. The four sets (composed of one English and one Spanish wordlist and thus containing 160 words each) were designed to not differ in word frequency (based on the English and Spanish versions of SUBTLEX; Brysbaert and New, 2009; Cuetos et al., 2011), word length, imageability, and arousal within the English [Fs(3,319) < 2.0, ps > 0.12] and Spanish words [Fs(3,319) < 1.4, ps > 0.26]. Furthermore, English and Spanish emotion-laden words (angry and happy) were separately matched across all four sets in terms of their ratings on the respective emotion-scale (anger- and happiness-scale), whereas neutral words were matched for their average overall emotionality ratings [all Fs(3,156) < 0.34, all ps > 0.80]. Finally, it was ensured that within each set, words differed significantly on happiness, anger, and overall emotionality ratings [all Fs(2,159) > 247, all ps < 0.001]. See Supplementary Table 2 for the full list of words.

## PROCEDURE

fpsyg-08-00394 March 21, 2017 Time: 18:4 # 4

The study protocol was approved by the Ethics Committee of the University of California, San Diego. The experiment involved an encoding phase and a surprise retrieval phase on two consecutive days. Instructions were presented in the participant's native language and described the experiment as investigating the effect of SC on word processing in a native and second language. Participants were not informed about the hypothesis related to facial muscle activity and about the subsequent memory test.

## Encoding Phase

Upon arrival for the first session, participants first read and signed a consent form and then completed the Language Experience and Proficiency Questionnaire (LEAP-Q; Marian et al., 2007) to assess their language ability in L1 and L2. Potential baseline differences in daily affect and mood were assessed by the Positive and Negative Affect Schedules (PANAS; Watson et al., 1988), presented in the respective L1 (for the Spanish version, see Sandín et al., 1999). To lower skin impedance for better EMG signals, the skin areas above the zygomaticus major and corrugator supercilii were prepared with cotton pads and abrasive lotion. Residue of the abrasive lotion was removed with cotton pads soaked in alcohol. Bipolar 4-mm EMG electrodes were attached to the corrugator supercilii and the zygomaticus major on the left side of the face. SC was recorded with gold-plated electrodes on the index and middle fingers of the non-dominant hand.

During the first session, participants performed a classification task, requiring them to categorize words into "associated to emotion" or "not associated to emotion" by pressing one of two buttons on a keyboard. Labels to the left and right of the computer monitor served as reminders of the button-assignment throughout the experiment. The English and Spanish word lists of one of the four sets were presented in separate blocks on a computer screen, divided by a short break (5 min). The order of word-lists (English word list vs. Spanish word list) within each set and the set presentation itself were randomized across participants. Each trial started with a fixation-cross presented for 2 s. This was followed by the randomly ordered display of the word stimulus for 3 s and was concluded with the display of a question mark, all presented via E-Prime software (Psychology Software Tools, Pittsburgh, PA, USA). To avoid movement interference with the EMG recordings, participants were instructed to wait for the question mark to give their response. After the response, the next trial started. Every participant started with a short practice session (six trials).

## Retrieval Phase

The second session took place 24 h later and started with participants filling out the PANAS for a second time. This was followed by a surprise memory task. Words from the old and the new set were mixed and randomly presented within a Spanish and English block, providing an equal number of old and new words. Each word was presented individually on a computer screen. Participants' task was to indicate whether the word had been presented during the encoding phase (old) or whether it was new (new) by pressing one of two buttons. Signs left and right of the computer screen indicated the button-assignment and served as reminders.

## EMG/SC Acquisition

Electromyography and skin conductance signals were recorded using BIOPAC MP150 modules (BIOPAC Systems, Inc., Goleta, CA, USA) set at a sample rate of 2,000 samples per second with gain set to 100. The signals were filtered using a bandpass from 20 to 400 Hz and a notch filter at 60 Hz. The acquisition of the EMG signals was controlled by BIOPAC's AcqKnowledge software Version 3.0.21 (Mindware Technologies LTD., Gahanna, OH, USA).

## RESULTS

## PANAS

The self-ratings of the PANAS revealed no significant differences between the first and second session of the experiment (ps > 0.25), and were thus disregarded in further analyses.

## EMG/SC Data Preparation

Offline processing of the EMG data was performed with software from Mindware Corporation. EMG signals were rectified, integrated, and averaged for a period of 2500 ms in chunks of 500 ms after stimulus onset. Data were cleaned for each participant and each muscle individually, removing data points above or below three SD of participant's mean. The baseline (recorded during the presentation of the fixation cross, 500 ms before the target) was subtracted from the mean activity. The baseline-corrected score was used as the dependent variable. The same cleaning was applied to the SC data, with the only difference that due to their usual delayed by around 1–3 s after cue onset (see Dawson et al., 2007), SC signals were averaged in chunks of 500 ms for a time period from 1500 to 3000 ms after stimulus onset. For the SC, the root mean square was calculated and all data were standardized. All EMG and SC data associated to trials in which the word presented was incorrectly categorized during the encoding phase were excluded from the analyses (8%).

## Statistical Analysis EMG Data

To examine the relationship between EMG data, language, and word-type, we ran two separate repeated measures ANOVAs one for each muscle (zygomaticus and corrugator; see **Figure 1**). The within-subjects factors were language (L1 vs. L2), word-type (happy vs. angry vs. neutral), and timepoint (0–500 ms, 500–1000 ms, 1000–1500 ms, 1500–2000 ms, 2000–2500 ms). The Greenhouse–Geisser epsilon correction was applied throughout all analyses to adjust the degrees of freedom of the F-ratios when necessary. Furthermore, all p-values of paired-samples t-tests are 2-tailed and were Holm– Bonferroni corrected. For the zygomaticus muscle, this analysis revealed a strong effect of word-type [F(2,40) = 5.80, p < 0.01,

η 2 <sup>p</sup> = 0.19] and a strong interaction of word-type and timepoint [F(3,67) = 5.32, p < 0.001, η 2 <sup>p</sup> = 0.18]. The interactions between language and word-type [F(2,50) = 0.15, η 2 <sup>p</sup> = 0.006] or between language, word-type, and time-point [F(4,97) = 0.94, η 2 <sup>p</sup> = 0.04] revealed, however, no significant effects (p = 0.87 and p = 0.49, respectively). For the corrugator muscle, a marginal interaction between language and word-type was present [F(2,47) = 2.47, p = 0.09, η 2 <sup>p</sup> = 0.09]. The expected main effect of word-type remained, despite a medium sized effect, insignificant [F(2,39) = 1.29, p = 0.28, η 2 <sup>p</sup> = 0.05]. Also the interaction between language, word-type, and time-point remained insignificant [F(8,200) = 0.90, p = 0.52, η 2 <sup>p</sup> = 0.04].

Despite the absence of the expected interactions, the visual inspection of the data (see **Figure 1**) suggested that the zygomaticus and the corrugator muscle are differently activated by L1 and L2, particularly in later time windows. We therefore decided to conduct further analyses with the intention to explore facial muscle reactivity in response to emotional words in L1 and L2 at each time point. In a series of pairwise comparisons, EMG activity in response to happy vs. angry words was compared over the five time intervals. Hereby, the activity over zygomaticus and corrugator muscles was analyzed separately.

### Zygomaticus

Significant differences in response to happy vs. angry words in L1 were recorded in the time window 1500–3000 ms after word onset. In comparison, the zygomaticus activity for happy vs. angry words in L2, differentiated only in the time window 2000–2500 ms after word onset. This result indicates a later onset and shorter duration of specifiable zygomaticus activity in response to happy vs. angry words in L2 compared with L1 (see **Figure 1**). A direct comparison of the averaged differential activity of the zygomaticus in response to happy vs. angry words in L1 vs. L2 did not reach significance [t(25) = 1.02, p = 0.22, Cohen's d = 0.26].

#### Corrugator

Exploring the specifiable activity of the corrugator in response to angry vs. happy words across time revealed significant results only within L1. Corrugator activity was stronger in response to angry words vs. happy words in L1, starting from 2000 ms after stimulus onset. No significant differences of corrugator activity in response to angry vs. happy words were observed in L2 (see **Figure 1**). A direct comparison of the averaged differential activity of the corrugator in response to angry vs. happy words in L1 vs. L2 gave some indication that the corrugator activity evoked by L2 words was lower than in L1 [t(25) = 1.25, p = 0.06, Cohen's d = 0.39].

## Statistical Analysis of SC Data

Since the Skin Conductance Response (SCR) is only sensitive to arousal but not to different valences, it was averaged for both emotional word-types (happy and angry) within each of the

three 500 ms long time windows from 1500 to 3000 ms after cue onset. SCR-scores were submitted to a repeated-measures ANOVA with language (L1 vs. L2), word-type (emotional vs. neutral) and time point (1500–2000 ms, 2000–2500 ms, 2500– 3000 ms) as within-subjects factors. The analysis resulted in a marginal three-way interaction [F(1,30) = 3.3, p = 0.07, η 2 <sup>p</sup> = 0.12] and a significant interaction between language and emotion [F(1,25) = 4.5, p = 0.05, η 2 <sup>p</sup> = 0.15]. We hence continued the analysis by disregarding the factor time point and averaged the SC responses across the time windows from 1500 to 3000 ms post cue onset (**Figure 2**). Pairwise comparisons with these values revealed a significantly stronger SCR to emotional words in L1 (M = 0.03, SD = 0.10) in comparison with L2 [M = −0.01, SD = 0.11, t(25) = 2.2, p = 0.04, Cohen's d = 0.68]. No differences between L1 (M = 0.02) and L2 (M = 0.01) were present for the neutral words [t(25) = 0.43, p = 0.67, Cohen's d = 0.12].

## Statistical Analysis of the Behavioral Data

### Encoding Phase

The dependent variable was the accuracy in discriminating emotional words from neutral words, expressed by the sensitivity index d' (Green and Swets, 1966/1974). A paired-samples t-test showed that performance was significantly better in L1 than in L2 [t(28) = 2.1, p = 0.05, Cohen's d = 0.38]. Since it could be argued that this was simply due to lower proficiency rather than reduced emotionality in L2 we investigated whether any differences in response bias, as calculated by the measure c, would be present for L1 and L2. The criterion c is defined as the distance measured in SD between the criterion and the neutral point, where neither response is favored. This analysis revealed a subtle bias toward judging words to be neutral in L2 (c = 0.04) and to be emotional in L1 (c = −0.30). The difference between these two response tendencies was marginally significant [t(28) = 2.0, p = 0.09]. To further investigated on the suspected reduced emotionality in L2, two paired samples t-tests compared the accuracy in percent for emotional words and accuracy for neutral words separately across

languages. As expected, this revealed no significant difference between L1 (M = 0.85, SD = 0.08) and L2 (M = 0.82, SD = 0.11) for neutral words [t(27) = 1.6, p = 0.12, Cohen's d = 0.31]; however, unexpectedly, it also revealed no effect for emotional words [L1: M = 0.84, SD = 0.12; L2: M = 0.80, SD = 0.16; t(27) = 1.3, p = 0.20, Cohen's d = 0.28]. In both cases, L1 showed a tendency for higher accuracy compared with L2.

Several interpretations of these results seem possible. For example, these results could lead one to assume that the difference in d' might simply be a matter of fluency, which equally affects the neutral and the emotional word-types. However, the marginal response bias toward judging words as emotional in L1 but not in L2 speaks against this explanation. An alternative explanation is based on the fact that the present study used emotional stimuli with different emotional intensity. Some previous studies and reviews have suggested that the facilitative role of facial motor resonance in the encoding of facial expression stimuli may be determined by their emotional intensity (Adolphs, 2002; Oberman et al., 2007; Baumeister et al., 2016). For example, Baumeister et al. (2016) found that blocking facial muscles mainly affected the perception of slightly emotional stimuli. Processing of neutral and strongly emotional stimuli was not at all or less affected. This raises the question of whether a specific impairment in encoding of emotional L2 words was possibly overshadowed by emotional L2 words with stronger emotional intensity. To investigate this possibility, we categorized the emotional words into slightly and highly emotional. Note that the original categorization criteria for the emotional words (described in the subsection, Stimuli) accepted a large scale in emotionality ratings, including words with only moderate emotionality ratings. Thus, to obtain a better understanding of the impact that this broad scale in emotionality ratings for the emotional words might have had, any words with ratings between 3.0 and 6.5 on the overall emotionality scale ranging from 1 to 9 during the pilot were considered to be slightly emotional. Any words with ratings above 7.5 on the same scale were considered highly emotional. See Supplementary Table 3 for the variable ratings of the slightly and highly emotional words and the associated statistical comparisons. To test the prediction that specifically slightly emotional L2 words were impaired during categorization, a 2 (emotionality: high vs. low) × 2 (language: L1 vs. L2) repeated measures analysis was conducted, with both factors being within-subjects. This revealed a significant effect of emotionality [F(1,27) = 35, p < 0.001, η 2 <sup>p</sup> = 0.56] and a marginal interaction between emotionality and language [F(1,27) = 3.31, p = 0.08, η 2 <sup>p</sup> = 0.11]. Pairwise comparisons confirmed that participants performed equally well at categorizing highly emotional words in L1 and L2. In contrast, they exhibited a strong bias to incorrectly categorize slightly emotional L2 words as neutral, while this tendency was significantly less pronounced in L1 (see **Figure 3**).

#### Retrieval Phase

Since, we were interested in the memory of emotional words vs. neutral words, we controlled for participants' individual differences in emotion perception by excluding words that had been incorrectly classified during the encoding phase. This was

done for each participant individually and guaranteed that our analysis of the memory task for emotional and neutral words was run only on words that had been perceived according to their piloted emotional category (see Baumeister et al., 2015, for a similar procedure). The dependent variable was the memory performance indexed by d', which measures the performance in discriminating between old and new words. This d' index was computed separately for neutral and emotional words, and submitted to a repeated measures ANOVA with the withinsubjects factors language (L1 vs. L2) and word-type (emotional vs. neutral). The analysis did not result in any main effects or in an interaction (all Fs < 2.0, ps > 0.18). However, because of the a priori interest in the comparison of the EEM effect within L1 and L2, pre-planned pairwise comparisons of emotional vs. neutral words in L1 and L2 were conducted. As expected, participants showed enhanced memory for emotional words (d' = 1.46, SD = 0.49) in contrast to neutral words (d' = 1.26, SD = 0.55) in L1 [t(25) = 2.20, p = 0.04, Cohen's d = 0.38]. Importantly, the EEM effect was absent in L2 [t(25) = 0.10, p = 0.93], for which participants' memory for emotional words (d' = 1.44, SD = 0.65) versus neutral words (d' = 1.43, SD = 0.59) did not differ (Cohen's d = 0.02). Since the visual inspection of the EEM effects (**Figure 4**) suggested that the reported difference in the EEM effects in L1 and L2 was mainly driven by enhanced memory of neutral words in L2 as compared with L1, a paired sample t-test investigated whether this difference was significant. This was, however, not the case [t(25) = 1.40, p = 0.17, Cohen's d = 0.30]. Finally, two pairwise comparisons showed that the effect size for the EEM in L1 was comparable across angry and happy words, though the p-value reached standard significance levels for the happy words only [angry vs. neutral: t(25) = 1.80, p = 0.08, Cohen's d = 0.44; happy vs. neutral: t(25) = 2.50, p = 0.02, Cohen's d = 0.46; see **Figure 4**].

#### Correlation between Motor Resonance and Memory for Emotional Words

We further explore the relationship between facial motor resonance and retrieval performance for emotional words by

looking at possible correlations between them. For this, we separately averaged the muscle activity recorded between 1000 and 2500 ms after stimulus onset for each muscle (corrugator and zygomaticus) in response to each emotional word-type (happy and angry) and language (L1 and L2) individually for each participant. This time window was chosen since the previous analyses revealed a rather late onset of differential muscle activity (see **Figure 1**). A series of bivariate correlation analyses was conducted to assess whether individual differences in the intensity of corrugator and zygomaticus activation would correlate with the retrieval performance of happy and angry words in L1 and L2. The results did not provide support for any relationship between individual levels of facial motor resonance and the memory for emotional words in L1 nor in L2 [all rs(26) < 0.31, all ps > 0.13].

Since previous studies suggested that facial motor resonance has a stronger impact on the recognition of slightly emotional stimuli (Adolphs, 2002; Oberman et al., 2007; Baumeister et al., 2016), we were interested in whether similar patterns could be found regarding their memory. For this, we restricted a second series of bivariate correlation analyses to words that had been rated 3.0–6.5 on the general emotionality scale (see section Statistical Analysis of the Behavioral Data). This time the correlation analyses revealed marginal correlations between the levels of zygomaticus activation and the percentage of correctly retrieved slightly happy L1 words [r(26) = 0.49, p = 0.08], as well as between the levels of corrugator activation and slightly angry L1 words [r(26) = 0.35, p = 0.09]. Thus, the more zygomaticus or corrugator activation a participant showed during encoding in response to slightly happy and slightly angry L1 words, respectively, the more likely she/he was to remember those words later. A similar non-significant pattern was observed for the levels of zygomaticus activation and retrieval performance of slightly happy L2 words [r(26) = 0.42, p = 0.10]. No relationship could be found between the corrugator activation and memory for slightly angry L2 words [r(26) = 0.01, p = 0.97].

## DISCUSSION

fpsyg-08-00394 March 21, 2017 Time: 18:4 # 8

The goal of the present study was to test the hypotheses derived from embodied cognition in the context of memory processes for emotional language in L1 and L2. First, the hypothesis that the processing of emotional words in L2 would evoke a lesser degree of embodied simulations compared with L1 was investigated. The results partially supported this hypothesis: Though the expected interactions between muscle resonance and language were not at all or only marginally significant, the visual inspection and pairwise comparisons indicated a tendency for reduced motor resonance in response to emotional L2 words for both corrugator and zygomaticus muscle.

In a second step, the hypothesis that L2 processing would interfere with both the categorization and later retrieval of emotional words was tested. The results were again not decisive but support the notion that L1 but not L2 evokes an EEM effect. Finally, the correlation analyses, gave some indication that participants whose activation of facial motor resonance was strong during encoding were more likely to retrieve emotional words during the memory task. This pattern seemed, however, to be restricted to slightly emotional L1 words and only be partially applicable to L2.

The overall results of the EMG and SC recordings suggested some reductions and differences in embodied simulations of emotional L2 words in comparison with emotional L1 words and extend these results to SC and generalize the results to the corrugator muscle activity not investigated so far in L2 (Foroni, 2015). Even though the expected two-way interaction of language and word-type was not significant for the analysis of zygomaticus activity and only marginal for the corrugator, some interesting results emerged. For example, whereas pairwise comparisons across time indicated that the activation of the zygomaticus showed typical activity in response to emotional L1 words, these patterns in activation seemed delayed and shortlived in response to emotional L2 words (see **Figure 4**). The difference between L1 and L2 processing became particularly clear in the corrugator muscle, which showed typical response patterns to emotional L1 words but no detectable responses to emotional L2 words. Similarly, the SC was significantly increased in response to emotional words presented in L1 as compared to L2. The direct comparisons of facial muscle responsiveness to L1 vs. L2 revealed a significant difference only for the corrugator muscle but not for the zygomaticus muscles. However, despite not being significant, the zygomaticus showed a similar pattern of decreased responsiveness in L2. Interestingly, Foroni (2015) in testing the somatic correlates of different linguistic forms assessed only zygomaticus muscle activation and reported significant reduction in muscle activation in L2 compared to L1. The present results complement and extend Foroni's results showing a significant difference for the corrugator muscle and, thus, together they support an embodiment account of emotion processing and a reduced embodiment in L2.

An explanation for the difference between zygomaticus and corrugator muscle activation in L2 found here could lie in the emotional content, which activates the respective muscle. Whereas the zygomaticus reacts to positive stimuli, the corrugator muscle predominantly reacts to negative stimuli. Two recent studies argued that particularly negative words may be at risk of emotional disembodiment during L2 reading, potentially reflecting a positivity bias for L2 processing. A positivity bias is thought to occur if second language acquisition coincides with positive life experiences (Conrad et al., 2011; Sheikh and Titone, 2016). This could be involved in determining the current findings.

Overall, the results of the EMG and SC recordings give some indications that the processing of emotional L2 words is less grounded in embodied simulations than the processing of L1 words. This aligns with previous reports of decreased SC (Harris et al., 2003; Harris, 2004; Caldwell-Harris and Ayçiçegi-Dinn, ˇ 2009; Eilola and Havelka, 2010) and agrees with studies, which have shown reduced behavioral responsiveness to emotional language in L2 (Colbeck and Bowers, 2012; Keysar et al., 2012). Such results may indicate that L2 learning in adulthood does not necessarily involve the same affective linguistic grounding as L1 learning in childhood. When conceptual and emotion regulation systems have already reached a relatively stable state, the affective grounding of abstract symbols, such as words, may remain shallow. This aspect is particularly relevant for intercultural communication where a weaker somatic base of L2 could on the one hand cause emotional barriers for L2 speakers and on the other hand may be beneficial if emotional distance helps to counteract biases (Keysar et al., 2012; Caldwell-Harris, 2015).

The secondary hypothesis, which states that differences in embodied simulations between L1 and L2 would additionally be associated to performance differences during the encoding and retrieval phase, could be partially supported. During the encoding phase, participants were more accurate at categorizing L1 vs. L2 words. In contrast to the expectation, this effect was caused by an interference with categorizing both emotional and neutral L2 words. This finding opposes the expected difficulty in categorizing specifically emotional words in L2 and could be an indicator of participant's general lower fluency in their L2. Further investigation and post hoc analysis of this alternative account, however, suggested a specific difficulty at identifying slightly but not strongly emotional stimuli in L2, which speaks against a general proficiency effect.

The finding suggesting that the processing of slightly emotional words is particularly linked to motor resonance is in line with the assumption that motor resonance functions primarily as an ancillary information resource, providing additional feedback to the cognitive processes in question (e.g., Adolphs, 2002; Oberman et al., 2007; Oosterwijk et al., 2015). Under clear-cut conditions, the information provided by facial feedback (e.g., Strack et al., 1988; but see also Wagenmakers et al., 2016) may be superseded by dominant established cognitive processes associated to the evaluation of emotional content, reducing its influence on our percept and behavior. However, when the emotional intensity decreases, the interpretation of

the emotional stimulus by means of memory and logical reasoning becomes more difficult. Consequently, in order to make a fast and qualified response to a given ambiguous stimulus, we rely more on bodily feedback. In this way, the influence that facial motor resonance has on the interpretation of emotional stimuli increases when cognitive resources reach their limits or leave us uncertain. However, given the post hoc nature of this analysis, its results should be interpreted with caution (but see for similar account and supporting evidence Baumeister, 2015). Although stimuli were carefully matched across languages and emotions, the splitting of words in slightly and strongly emotional stimuli was not planned. Therefore, other possible interpretations of the difference we are reporting cannot be completely ruled out. In fact, other variables for which we did not control here may have played a role and future replication or extension of this work should investigate this further. Regarding memory performance, the expected interaction between language and word-type did not appear. Yet, the comparison of memory performance for neutral and emotional words in L1 and L2 supported the initial hypothesis, suggesting the presence of an EEM effect in L1 but not in L2.

Even though the results of the memory task were relatively weak, they align with previous findings reporting no EEM effect in L2 (Anooshian and Hertel, 1994). However, they are in opposition to reports of an equal or even stronger EEM effect for L2 (Harris, 2004; Ayçiçegi-Dinn and Caldwell- ˇ Harris, 2009). Two reasons could be responsible for these differences in results for emotional memory in L2 reported in the literature. First, it is possible that these differences are owed to dissimilar experimental procedures. For example, while both the present study and the study by Anooshian and Hertel (1994) presented L1 and L2 words in different blocks, the two studies by Ayçiçegi and Harris (2004) ˇ and Ayçiçegi- ˇ Dinn and Caldwell-Harris (2009) presented L1 and L2 stimuli intermixed. This may have caused a novelty effect for L2, which may in turn have inhibited normal processing of L1 words. Second, they may be mediated by the distinct circumstances in which participants acquired L2. The Turkish participants in the studies by Ayçiçegi and Harris (2004) ˇ and Ayçiçegi- ˇ Dinn and Caldwell-Harris (2009) acquired their second language (English) in the classroom and/or in self-instruction settings and were generally less fluent in L2. Because of this, they may have been more likely to mentally translate L2 words into L1, which may have accounted for the EEM effect in L2. In contrast, most participants in the present study and in the study by Anooshian and Hertel (1994) were exposed to their L2 via immigration or via studies abroad and spoke their L2 with high fluency on a daily basis. It is, however, noteworthy that the current study also encountered large behavioral variances across subjects (see **Figure 4**), which corroborates the assumption that the processing of L2 is determined by more factors besides proficiency.

The reported absence of an EEM effect in L2 is similar to the interference effect observed in our previous study (Baumeister et al., 2015), in which blocked motor resonance specifically inhibited retrieval of emotional words. Those results suggested that facial motor resonance has a crucial role in the processing of emotional content and contributes to the presence of an EEM effect. It is thus of particular interest that the current results gave some indications that facial motor resonance could be reduced when processing L2 words. This suggests that both the external blocking of facial motor resonance and the reduction of facial motor resonance by means of processing in L2 are associated to the absence of an EEM effects.

There are some limitations associated with this study, which should be considered when interpreting the current results. First and foremost, the effects were often subtle and did not always gain the expected significance levels for the interactions. Nevertheless, we decided to continue with pairwise comparisons based on visual inspection of the data. The reason for the weak effect may depend both on the small sample size and the large variations in the data associated to the heterogeneous sample. As mentioned above, the analysis of the slightly vs. very emotional stimuli was explorative in nature and should be interpreted as such. The goal was to direct the reader's attention to the possibly important aspect of emotional intensity to be considered in future motor resonance studies.

Another subject that awaits further investigation is the determination of the mechanism underlying such reduced affective processing in L2. Lamendella (1977) proposed that implicit linguistic competence, unlike explicit or semantic knowledge, is integrated within the limbic system, involving the striatum and amygdala. This is especially interesting because the amygdala, a structure known to be involved in both experiencing and processing emotional stimuli, shows attenuated activity if facial muscles are blocked by an external force (Hennenlotter et al., 2009). In line with this and the current results, a recent study (Hsu et al., 2014) demonstrated that emotional content in L2 evokes less amygdala activity than it does in L1. Those findings indicate a general interdependence between affective processes, embodied responses, and the recruitment of limbic structures in emotional language processing.

Overall, the results suggest that reading emotional words in a native language provides a deep and embodied emotional experience, which may subsequently also support their salient encoding and retrieval and, as suggested by previous work, even the modulation of subsequent judgments (e.g., Berridge and Winkielman, 2003; Halberstadt et al., 2009; Foroni and Semin, 2011, 2012). The present results together with the results reported by Foroni (2015) suggest that cognitive processes associated with L2 encoding and retrieval seem to be less associated to embodied processes, reinforcing the idea that embodied cognition and emotional memory are linked. Some research already shows how difference in embodiment between L1 and L2 differentially affect individuals (e.g., Puntoni et al., 2009; Keysar et al., 2012) and future research should investigate memory processes as well other domains where the differences between L1 and L2 may have a significant impact implementing other paradigms used to investigate the impact of emotion on behaviors (e.g., Ambron and Foroni, 2015; Ambron et al., 2016).

## AUTHOR CONTRIBUTIONS

fpsyg-08-00394 March 21, 2017 Time: 18:4 # 10

JB designed and ran the study when visiting Winkielman's lab at UCSD. JB also analysed and wrote up the study. FF, MC, RR, and PW provided feedback on analyses, interpretation, theory, and write-up.

## FUNDING

This work was supported by University of Social Sciences and Humanities in Warsaw Poland and UCSD Academic Senate Grants to PW.

## REFERENCES


## ACKNOWLEDGMENT

This work was prepared in partial fulfillment of the requirements for the doctoral thesis, SISSA Institute Trieste, by Dr. Baumeister (2015).

## SUPPLEMENTARY MATERIAL

The Supplementary Material for this article can be found online at: http://journal.frontiersin.org/article/10.3389/fpsyg. 2017.00394/full#supplementary-material

with German-Spanish bilinguals. Front. Psychol. 2:351. doi: 10.3389/fpsyg.2011. 00351



**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2017 Baumeister, Foroni, Conrad, Rumiati and Winkielman. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Exploring the Multi-Layered Affordances of Composing and Performing Interactive Music with Responsive Technologies

Anna Einarsson<sup>1</sup> \* and Tom Ziemke2, 3

<sup>1</sup> Department of Composition, Conducting and Music Theory, Royal College of Music in Stockholm, Stockholm, Sweden, <sup>2</sup> Cognition and Interaction Lab, Human-Centered Systems Division, Department of Computer and Information Science, Linköping University, Linköping, Sweden, <sup>3</sup> Interaction Lab, School of Informatics, University of Skövde, Skövde, Sweden

The question motivating the work presented here, starting from a view of music as embodied and situated activity, is how can we account for the complexity of interactive music performance situations. These are situations in which human performers interact with responsive technologies, such as sensor-driven technology or sound synthesis affected by analysis of the performed sound signal. This requires investigating in detail the underlying mechanisms, but also providing a more holistic approach that does not lose track of the complex whole constituted by the interactions and relationships of composers, performers, audience, technologies, etc. The concept of affordances has frequently been invoked in musical research, which has seen a "bodily turn" in recent years, similar to the development of the embodied cognition approach in the cognitive sciences. We therefore begin by broadly delineating its usage in the cognitive sciences in general, and in music research in particular. We argue that what is still missing in the discourse on musical affordances is an encompassing theoretical framework incorporating the sociocultural dimensions that are fundamental to the situatedness and embodiment of interactive music performance and composition. We further argue that the cultural affordances framework, proposed by Rietveld and Kiverstein (2014) and recently articulated further by Ramstead et al. (2016) in this journal, although not previously applied to music, constitutes a promising starting point. It captures and elucidates this complex web of relationships in terms of shared landscapes and individual fields of affordances. We illustrate this with examples foremost from the first author's artistic work as composer and performer of interactive music. This sheds new light on musical composition as a process of construction—and embodied mental simulation—of situations, guiding the performers' and audience's attention in shifting fields of affordances. More generally, we believe that the theoretical perspectives and concrete examples discussed in this paper help to elucidate how situations—and with them affordances—are dynamically constructed through the interactions of various mechanisms as people engage in embodied and situated activity.

Keywords: affordances, cultural affordances, embodied activity, embodied cognition, composition, interactive music, responsive technology, situated activity

Edited by:

Zheng Jin, Zhengzhou Normal University, China

#### Reviewed by:

Andrew D. Wilson, Leeds Beckett University, United Kingdom Theresa S. S. Schilhab, Aarhus University, Denmark

\*Correspondence: Anna Einarsson annaeinarssonmusic@gmail.com

#### Specialty section:

This article was submitted to Cognitive Science, a section of the journal Frontiers in Psychology

Received: 03 March 2017 Accepted: 15 September 2017 Published: 29 September 2017

#### Citation:

Einarsson A and Ziemke T (2017) Exploring the Multi-Layered Affordances of Composing and Performing Interactive Music with Responsive Technologies. Front. Psychol. 8:1701. doi: 10.3389/fpsyg.2017.01701

## INTRODUCTION

Given that this paper deals with music, but is submitted to a cognitive science/psychology journal, we assume that the majority of readers are cognitive scientists, and only a minority of readers are familiar with music theory. The first question that comes to mind for the average cognitive scientist, who is to some degree familiar with Gibson's (1979/1986) notion of affordances, might be whether music really has affordances in the first place. After all, Gibson was mainly concerned with the sense of vision and the affordances of concrete physical objects. These were affordances conveyed by the optical array and perceived by agents being far from stationary, but moving about interacting with those objects, such as the sit-ability of chairs or the graspability of cups. Nonetheless, hearing also deals with concrete objects, since sound carries information about a source. We are always in search for what is causing the sound, learning about environmental occurrences (Gaver, 1993; Windsor, 2000). As Jonas (1966/2001) writes, hearing is related to event and not to existence. Half a century later, Gibson's ecological psychology is still highly influential, not least in research on embodied cognition (e.g., Varela et al., 1991; Chemero, 2009; Shapiro, 2011), and, although widely debated, the concept of affordance is still very much used (e.g., Thill et al., 2013; Sakreida et al., 2016) and new conceptual frameworks are continuously being developed (e.g., Rietveld and Kiverstein, 2014; Ramstead et al., 2016; Davis and Chouinard, 2017).

In musical research, perhaps contrary to what one may assume, discussing the affordances of music is nothing new. It is in accord with the more over-arching bodily turn of musicology and related fields since the beginning of the twenty-first century (Pelinski, 2005). There is in fact a growing body of support for music as embodied and situated activity Performing and interacting with musical instruments, for example, is widely recognized as an embodied phenomenon (e.g., Leman, 2007; Windsor and De Bézenac, 2012). Furthermore, Clarke (2005), among others, has discussed the role of embodiment in the experience of music, particularly listening, and there also is support for an activation of the human mirror neuron system when experiencing music (Molnar-Szakacs and Overy, 2006). The concept of affordance has been used in music by a number of authors in recent years (e.g., Windsor, 1995, 2000; Clarke, 2005; Leman, 2007; Krueger, 2011, 2014; Menin and Schiavio, 2012; Windsor and De Bézenac, 2012; Einarsson, in press). It offers unique ways of describing the reciprocal relationship between performer/composer and musical structures, but also, as we will see, toward the performance situation as a whole, in all its complexity. Windsor and De Bézenac (2012), for example, have argued that "the concept of affordances helps to conceptualize the mutual relationships that exist between listeners and sounding objects and events, between performers and their instruments, and between musicians in a manner quite foreign to more cognitive structural approaches to music psychology" (2012, p. 103). This reciprocity being a topic of great research interest is emphasized also by Geeves and Sutton (2014). However, current interpretations of the concept of affordances in musical research vary very significantly among each other. Most of them also deviate significantly from the Gibsonian notion of affordances, which is not always acknowledged by the authors (as will be discussed in detail in the next section). As Davis and Chouinard (2017)state in their discussion of the general use of the affordance concept, the challenge for researchers is to delineate their usage of the concept and adopt this in ways displaying both relational, material and dynamic dimensions. We agree with them that the mechanisms of affordances operate within a situation, whose aspects interact and thus affect the efficacy of affordances, a notion highly applicable to music.

Over the last decade questions of aliveness and embodiment, in the light of advancements in technology (i.e., increased computer processing speed enabling interactivity between agents and computer system(s) to be staged and performed live in real time), have been a major concern in artistic fields such as performance studies (Broadhurst and Machon, 2011; Barrett and Bolt, 2013), dance (Kozel, 2007), and music (Emmerson, 2007; Peters et al., 2012). In the field of music, an interesting special case, in our opinion, is music whose composition and performance is aided by computer technology in real time (running time), i.e., live electronic music. We are particularly interested in interactive music utilizing responsive technologies, such as sensor-driven technology or, as the major focus in this article, computer sound synthesis affected by computer analysis of an acoustically performed sound signal. For example, features of a sung input (e.g. vibrato) are analyzed by the computer, and the subsequent electronic sounding (e.g., a chord) is dependent on the amount of vibrato. In musical works of this kind, a notion of interacting with "a disembodied other" (Emmerson, 2009) (i.e., computer technology), brings questions of embodiment and music to the heart of the discussion. We believe that the notion of affordances—in the broadened sense of cultural affordances discussed in this paper—can play a central role in such endeavors.

Hence, the aim of this paper is threefold: Firstly, to expand on the notion of affordances as it has been used in musical research previously, by clarifying the diversity of interpretations of the concept, but also the limitations of its present use. Secondly, to suggest an application of the notion of cultural affordances—originally proposed by Rietveld and Kiverstein (2014) and recently further elaborated by Ramstead et al. (2016) in this journal—to interactive music, where the performers, the audience, and the composer shape, experience, and perform music with and through individual yet overlapping, and dynamically varying, fields of affordances. This will be illustrated with examples from the first author's artistic work as composer and performer of mixed works, where a combination of acoustic sound sources (singers) and digital sound sources (responsive computer technologies) perform together live. Last, but not least, the focus of this paper is on musical performers' and composers' skill and embodied affective appraisal in dynamic relationship with the environment, situated inside a sociocultural practice. In our opinion, this contributes to bridging the conceptual gaps between the seemingly disembodied work of the composer, the concrete embodied activity of musical performers, their interaction with more or less "invisible" technologies, and the according to some—highly abstract social and cultural practices that they are part of.

## ON AFFORDANCES

In order to contextualize the discussion, without any attempt to provide a complete historical account here<sup>1</sup> , we will recapitulate some relevant notions of the concept of affordances in psychology/cognitive science in general and elucidate its use in music research in particular.

## Affordances in the Cognitive Sciences

The Reciprocity between Organism and Environment Most of James J. Gibson's ecological psychology and his theory of perception were formulated in the late 1960s and−70s, i.e., long before embodiment had become a popular topic in the cognitive sciences. His work was a reaction against a mechanistic worldview and a move away from seeing cognitive processing as causation. First and foremost his work was concerned with visual perception, such as his influential theory of the visual field and the optical array (Gibson, 1979/1986).

Gibson introduced the notion of affordances for what he viewed as action opportunities for humans, or other animals, in their environment. In Gibson's original sense these have a peculiar ontological status: they are neither a property of the environment alone, nor a feature of the animal alone, but rather a property of both, i.e., emerging from the animal's interaction with its environment. In Gibson's own words:

[. . . ] an affordance is neither an objective property nor a subjective property; or it is both if you like. An affordance cuts across the dichotomy of subjective–objective and helps us to understand its inadequacy. It is equally a fact of the environment and a fact of behavior. It is both physical and psychical, yet neither. An affordance points both ways, to the environment and to the observer (Gibson, 1979/1986, p. 129).

Hence, a key aspect of affordances is that they are not just physical properties, but have to be considered relative to the animal. This reciprocity between organism and environment is fundamental to the Gibsonian notion of affordances. Affordances are specified by the pick-up of invariant information from the ambient light, the so-called optical array, whilst the animal—its body, legs, hands and mouth—is coperceived (Gibson, 1979/1986, p. 141). Thus, information does not equal affordances—information only points toward affordances. Furthermore, affordances, according to Gibson, are permanent and stable. They do not change relative to the organism's varying internal states, such as needs or motives. He writes: "Something that looks good today may look bad tomorrow but what it actually offers the observer will be the same" (Gibson, 1982, p. 410). This is, of course, not uncontroversial, because it means that, for example, a particular staircase is either—in principle—"climbable" for you or it is not, but its "climb-ability" does not vary over time just because some days you might be, for example, too tired or too drunk to actually climb it. So, to Gibson it is a reciprocal concept between organism and environment, but it is binary and relies on properties, which do not change according to changing needs. At this point, you might ask if affordances are opportunities for behavior, why do we not act on every possibility? What about social and cultural influences? And what about affordances not so readily available? These are issues we will come back to in the following.

As often is noted, Gibson's writings are sometimes ambiguous, some would say incomplete, and his theories have been a target of substantial criticism (e.g., Fodor and Pylyshyn, 1981). Nevertheless, his theories have undoubtedly spurred lots of interesting research and debate in the field. There are different interpretations and reformulations of Gibson's original theory, some of which have focussed more on the agent, some more on the object, and others have attempted to stay close to Gibson's original relational concept encompassing both agent and object.

### From Dispositional Properties to Relational Abilities

Turvey, Shaw, and Mace took up the challenge of developing Gibson's ideas into a more philosophically sound and empirically tractable theory through a number of papers (e.g., Turvey et al., 1981). For them it was dispositional properties in the object and in the organism that enable action. They introduced the concept of effectivities (Shaw et al., 1982), complimentary to affordances, and intended to specify an animal's means for action, i.e., a combination of the function of its tissues and organs relative to the environment, to realize a specific affordance in a given situation. That means, the dispositional affordance and the effectivity complement one another. Hence, their theory relies on ecological laws, which are not universal but relate to a niche.

In particular this latter aspect has been one of the major criticisms formulated by Chemero (2006, 2009). Although he recognizes Turvey, Shaw, and Mace's contributions to the development of Gibson's ecological theory, his point is that they have turned the theory into having too little information available for direct perception, ruling out direct perception of individuals and perception of things partly determined by convention. "If information depends on laws," he writes, "there is also no information about individual people available for perception. So although a human infant might have information available about humans, she has none about her mother" (Chemero, 2009). Moreover, ecological laws may structure the way that, for example, light is reflected off of an aluminium can, but according to Chemero they cannot account for instances where, for example, there has been a mix up in the factory between milk and soda, or someone has played a practical joke. Conventions build upon public agreement and are easily violated, he states.

Chemero's (2009) own take on modernizing Gibson, is—in a nutshell—to combine Gibson's theory with dynamic systems theory (also employing situational semantics in order to avoid natural laws and instead allowing for constraints connecting situations, which may be cultural or conventional). This is part of the formulation of what he refers to as radical embodied cognitive science. He argues that affordances are relations in a similar sense as one entity is taller than another. He also makes an important distinction between feature and property. While perceiving a feature is a matter of perceiving that the situation as a whole has a certain feature, perceiving a property, on the other hand, presupposes much more previous knowledge than perceiving features. Perceiving affordances, according to

<sup>1</sup>More complete historical accounts of the notion of affordances have recently been provided by, for example, Dotov et al. (2012) and Osiurak et al. (2017).

Einarsson and Ziemke Affordances of Interactive Music

Chemero, is placing features. Secondly, Chemero argues, that instead of talking about an individual's capacities in terms of body scales, we should consider how an individual's ability is more relational. Dispositions never fail, but abilities may, thus allowing us to account for occasions when performance does not meet up with for example biological expectations (or where musical performances fail!). For example, one day somebody might simply be too tired to walk the steep stairs. Affordances and abilities, according to Chemero, causally interact and are causally dependent. That means, what Chemero refers to as affordance 2.0 is a relation between the abilities of the individual and features of the environment.

## Affordances as Aspects of a Sociocultural Environment

Rietveld and Kiverstein (2014) propose a significantly broader application of affordances than Chemero. They emphasize how the exercise of abilities happens in a context, and that we as humans participate in sociocultural practices. Their two key concerns are: (1) the notion of a form of life, denoting human patterned behavior, i.e., "normative behaviours and customs of our communities" (ibid, p. 328), a concept borrowed from Wittgenstein, and (2) the influence of normativity on our engagement with affordances. Instead of features they prefer speaking of aspects of an environment, since "in the human case the material environment has been sculpted by our sociocultural practices into a sociomaterial environment (ibid, p. 335). Accordingly, they suggest the following definition: "Affordances are relations between aspects of a material environment and abilities available in a form of life" (ibid, p. 335). This is very much in line with Chemero's (2009) argument that "the situation as a whole supports (perhaps demands) a certain kind of action" (cf. Affordances in the Cognitive Sciences). In other words, this view enables us to consider the reciprocity between human and environment as conveyed by learned behaviors under the influence of social niches and conventions.

To Rietveld and Kiverstein, affordances are both relational and a resource (ibid, p. 327). They are relational in that they depend on the material environment and the abilities in the form of life. At the same time, they are resources in the way opportunities for action rely on how we create, for example, tools for our projects and concerns, and engage with changing aspects of the material situation. In their reading of Gibson, instead of affordances, they give primacy to the ecological niche for a kind of animal with a certain form of life. Accordingly, they introduce the notion of a landscape of affordances, meaning affordances available in an ecological niche. As Bruineberg and Rietveld (2014) put it: "In our human form of life, these are related to the whole spectrum of abilities available in our socio-cultural practices." Furthermore, some affordances that "stand out more than others" to the individual (cf. Withagen et al., 2012), and which are relevant to a particular individual in a particular situation, are denoted as a field of affordances (Bruineberg and Rietveld, 2014). Sensitivity to a situation, to the landscape of affordances, is achieved through abilities or skills. These are in turn generally acquired through training and experience in sociocultural practices.

But, recapitulating the question in section From Dispositional Properties to Relational Abilities, how come we do not act on every affordance available in our field? According to Rietveld and Kiverstein, this is due to an agent's drive to achieve an optimal grip on the situation, a striving for improvement of the situation. The concept of optimal grip stems from philosopher Merleau-Ponty, intending to capture how living systems are always simultaneously "in a state of relative equilibrium and in a state of disequilibrium" (Merleau-Ponty in Kiverstein and Rietveld, 2015). Improving optimal grip entails a bodily action readiness: "In many real-life situations multiple states of action readiness interact in generating action tendencies and action" (ibid, p. 342). Ramstead et al. (2016) exemplify the concept of optimal grip nicely with the image of a boxer who orients toward the punching bag so as to afford a suitable variety of possible strikes. Optimal grip helps explaining the way some affordances in the field, through interaction with affective appraisal and attention, cause action readiness and become solicitations to the individual.

Lastly, Rietveld and Kiverstein (2014) also introduce the concept of skilled intentionality, i.e., "the individual's selective openness and responsiveness to a rich landscape of affordances" (Kiverstein and Rietveld, 2015) in overlapping cycles of actionperception. This ability is developed over the years as part of an increasing sensitivity to discriminate between situations. In other words, skilled intentionality is a tendency to act toward an optimal grip on a field of affordances (Bruineberg and Rietveld, 2014).

### Cultural Affordances

Ramstead et al. (2016), in their discussion of cultural affordances, building on Rietveld and Kiverstein (2014) and Kiverstein and Rietveld (2015), among others, have recently raised the question how culture and context interact with human biology to shape cognition, behavior, and experience. They distinguish between two kinds of cultural affordances: natural affordances, which are possibilities for action, dependent on agents' exploiting reliable correlations in its environment with its set of abilities (similar to Chemero's affordance 2.0) and conventional affordances, which depend on a shared set of expectations, norms and conventions. Important to note, there is a continuum of affordances between those that depend on reliable correlation (natural affordances) to those that depend on shared sets of expectations (conventional affordances). This view of affordances as gradual is also in accordance with Davis and Chouinard's (2017) recent characterization of affordances, as determined by and depending on a number of, in practice not easily discernable, situational cues. According to Ramstead et al. (2016), it is important to note that culture underpins both natural and conventional affordances. Herein also lies their definition of culture. In their own words: "Human biology is cultural biology; culture has roots in human biological capacities. The affordances with which human beings engage are cultural affordances." Even more so, in their view, both kinds of affordances may be socially constructed. Hence, according to their theory, an affordance may be changed either by altering aspects of the material environment or the organism's form of life. Thus affordances may be shaped and vary in relation to enculturation, social influence, and skill, which is highly relevant to our discussion on musical practice.

Ramstead et al. (ibid) also adopt the concepts of a landscape of affordances (cf. previous section), which for them is relatively static and constituted by the totality of affordances available to a population in a given environment. They also use the notion of a field of affordances, which for them is the subset of "affordances in the landscape with which the organism, as an autonomous individual agent, dynamically copes and intelligently adapts," i.e., "those affordances that actually engage the individual organism because they are salient at a given time, as a function of the interests, concerns, and states of the organism." They argue, similar to Rietveld and Kiverstein (2014) that an organism does not encounter affordances one by one, but "as an ensemble of affordances, with which it dynamically copes and which it evaluates, often implicitly and automatically, for relevance." These affordances are themselves entangled in various ways and appear as nested, depending on each other, hiding, enabling, or revealing other possibilities for action. Certain affordances, in this view, are also highly influenced by joint intentionality, social and cultural normativity and shared expectations (implicit and explicit), codetermining the landscape of affordances. The field of affordances is "experienced as "solicitations," in that they solicit (further) affective appraisal and thereby prompt patterns of "action readiness," that is, act as perceptual and affective prompts for the organism to act on the affordance." This idea of affective appraisal causing readiness to act (cf. Lowe and Ziemke, 2011) is again highly relevant to a performer or composer's practice. Indeed, this means they depart from Gibson in a number of ways, including their argument that the individual experiences affordances as solicitations.

One of the assumptions the theoretical framework of conventional affordances rests upon is the dependence on shared expectations, or as they put it, how behavior is influenced by expectations about others' expectations. Accordingly, the presence of others affects the salience of affordances, due to human conventions. Also culturally shared expectations are embodied at various levels (brain networks, artifacts, constructed environments).

In effect, Ramstead et al. (2016) suggest a predictive processing model, emphasizing how the generative model does not need to entail semantic content. Generative models are embodied at different levels, may it be neurally in the brain or in terms of behavioral patterns. Here, attention plays a key role in guiding action perception, affecting the acquisition of culturally specific sets of expectation.

## Affordances in Music Research

How then do affordances in music work? Despite—or maybe due to—the extensive usage of the concept in music theory and related fields in recent years (e.g., Windsor, 1995, 2000; Clarke, 2005; Leman, 2007; Krueger, 2011, 2014; Menin and Schiavio, 2012; Windsor and De Bézenac, 2012; Einarsson, in press), interpretations and applications vary significantly. One thing most music scholars do seem to agree on though is that music affords movement (e.g., Clarke, 2005; Windsor and De Bézenac, 2012), although some focus mostly on aspects of synchronization or entrainment (e.g., Leman, 2007; Krueger, 2014), where entrainment, in Leman and Maes's (2014, p. 239) definition stands for "pre-reflective adaptation of human movement to music."

To begin with, Clarke (2005), who primarily addresses the listener's point of view, accounts for culture as being a vital part of what we perceive. He writes on culture: "once a tradition or convention is established and is embodied in widespread and relatively permanent objects and practices, it becomes as much a part of the environment as any other feature" (ibid). In his view, music, carrying invariant structures, can reveal almost any source in a situation: instrument, medium, social functions in which they participate, emotional states, bodily actions, and spaces. Moreover, according to Clarke, affordances change in accordance with an organism's changing needs. He acknowledges that there are social constraints that make some affordances less likely, for instance a violin that affords burning, but does not elaborate on these aspects.

To Windsor and De Bézenac (2012), in line with Clarke, affordances are not fixed. Their view on culture follows Sanders (1997), entailing a stance according to which direct perception of affordances both may and should be applied in complex cultural contexts (Windsor and De Bézenac, 2012)—again, in line with Clarke. However, unlike Clarke, Windsor and Bézenac's approach resembles Reed's adaptation of Gibson (cf. Withagen et al., 2012), according to which affordances exert a selective pressure on the behavior of individuals. Windsor et al. also adopt from Shaw et al. (1982) the concept of effectivities (cf. section From Dispositional Properties to Relational Abilities).

Moreover, they acknowledge the relevance of other people and social/material context to human behavior, which they illustrate with the example of jazz musicians "going with" or "going against" what other musicians' actions afford. It is, however, not clear what the underlying mechanisms are. It is also somewhat contradictory how these affordances, on the one hand, can "determine the characteristics" of a particular music, while at the same time it is emphasized "that while the pianist's actions afford particular behaviours, they do not demand such behaviours" (ibid). Finally, a more controversial stance of theirs is that music affords semiotic acts, and the production of particular signs, for example through verbal or textual action (ibid, p. 114). All in all, Windsor and Bézenac make a substantial contribution to the discussion of different ways of applying direct perception and affordances to music, and do include music-making to a larger degree than Clarke. Still, we lack the full picture, and the concept remains far from being well defined.

The focus of Krueger (2014), another influential voice, on the other hand, is on emotion regulation. His view is one of distributed cognition (Hutchins, 1995), according to which music is for off-loading emotional responses. He equates affordances with the concept of invites (Withagen et al., 2012), but in a manner rather different from what Withagen et al. intended (ibid). He assigns a demand character to the concept of affordance, discussing them from a perspective of "the way that we often experience music as affectively irresistible" (Krueger, 2014, p. 2), and draws upon the notion of entrainment (see section Affordances in Music Research). Music, according to Kreuger, is part of a distributed system where "musical affordances provide resources and feedback that loop back onto us and in so doing, enhance the functional complexity of various motor, attentional, and regulative capacities responsible for generating and sustaining emotional experience" (ibid). Kreuger focuses on the listener's point of view, and although he is more detailed than Clarke (2005) or Windsor and De Bézenac (2012) regarding the theoretical underpinnings of this position—drawing upon, amongst other things, the extended mind hypothesis (Clark and Chalmers, 1998)—his focus is rather narrowly set on solicitations of different emotional experiences. Hence, his theory is difficult to apply to a performance situation as a sole theory. He only touches slightly upon any social dimension in terms of affective synchrony, albeit not particularly in relation to affordances, and culture is addressed only as a consequence of discussing the many contexts in which music can be utilized.

Menin and Schiavio (2012), finally, delimit, but also reinterpret the concept of affordances as dealing with intentional relationships between musical subjects and objects exclusively, a relationship grounded in how the motor possibilities of the subject's body can interact with the environment. Therefore they reject the idea of inferential relationships—such as, for example, a movie trailer "affording" going to the cinema—as being affordances. They draw a parallel to the work of Delalande (in ibid, p. 210) on children's exploratory behavior toward musical objects, concluding how embodiment (and the discovery of musical affordances as intentional acts) arises from sensorymotor modalities of interaction with an object. Thus their stance relies on relationships that have emerged during early childhood discoveries such as plunging, hitting and scratching. Accordingly, they propose "an embodied approach that radically diverges from the standard accounts, considering musical objects as entities constituted within the intentional motor-based relation that defines a musical context" (ibid, p. 211). It is not at all clear, however, how—and to what degree (if any)—they consider cultural or social aspects to influence or be part of the embodiment musical theory they describe.

## Where Do We Go From Here?

To summarize, in most music theorists' interpretations of affordances, cultural aspects are inevitably included, while the degree to which social aspects are incorporated varies to a large extent. What is still missing in the field of music research, in our opinion, is a more encompassing theoretical framework incorporating the sociocultural dimensions that are fundamental to the situatedness and embodiment of music performance, providing a detailed account of the underlying mechanisms, but also providing a more holistic approach that does not lose track of the complex whole constituted by the interaction of composers, performers, audience, technologies, etc. We believe that Ramstead et al.'s (2016) cultural affordances framework, as discussed in the previous subsection, although not previously applied to music, constitutes a promising starting point for capturing and elucidating this complex web of relationships. We will therefore in the next section illustrate this with examples foremost from the first author's artistic works as composer and performer of mixed works, where a combination of acoustic sound sources [singer(s)] and digital sound sources (responsive computer technologies) perform together live (cf. **Figure 1**).

## AFFORDANCES IN INTERACTIVE MUSIC

The mechanisms of affordances in music operate within a situation whose aspects interact and thus affect the efficacy of affordances. Hence, affordances and situation are inevitably intertwined. However, for the sake of analysis, we will attempt to address in the following different parameters as if they were separable. Importantly though, by discussing affordances in terms of aspects of a situation, as Rietveld and Kiverstein (2014) proposed, this enables us to address affordances as graded instead of binary, which is much more applicable to the reciprocal dynamics that are crucial to music performance in general, and interactive music in particular.

## The Landscape and Fields of Affordances

What then constitutes the shared landscape of affordances in an interactive music work of which a performer is part? The landscape (cf. section Cultural Affordances) is the totality of available affordances in a niche, associated with a form of life, so for most cases it is the action possibilities offered by the audience, the concert space, the reciprocal relationship toward sounds generated by the computer technology and possibly other participating performers. As the framework suggests (cf. section Cultural Affordances), there are cultural affordances of both natural kind and conventional kind. An example of the former is a chair on stage that affords sitting, and in the latter case a microphone that affords singing into. In an interactive performance work, such as the first author's Metamorphoses (Einarsson, in press; cf. **Figure 1**), the situation holding the landscape is very complex; the music composition is realized only when the computer technology is interacted with, and stage directions are added to the performance, e.g., physical actions such as walking, sitting, standing, and singing elevated in the air in harnesses. Affordances appear, just as Rietveld and Kiverstein (2014) state, as nested and as an ensemble, where situation and affordances are inevitably intertwined.

Fields of affordances, on the other hand, (cf. section Cultural Affordances), are at the level of the individual. What will stand out for the individual performer, thus constitute their field of affordances, is dependent on the performer's concerns, needs and abilities. These are in turn under the influence of enculturation, patterned practices, directed attention and shared expectations. Altogether this will color the performer's detection of possibilities for action nested in the interplay with the computer, other performers, the audience, and the performance space. Again, there typically are cultural affordances of both natural kind and conventional kind. An example of the former is the act of turning toward a sound suddenly projected from a specific loudspeaker.

The latter, conventional affordances, may be exemplified with a musical structure containing short sampled sounds lasting for 20–100 ms or so, (i.e., granular synthesis) implemented at one

location in the work Metamorphoses. The structure invites a sort of mimicking, which the score also devises. This electronic response is dependent on the length of the sung input (alongside additional parameters), and elicits a way of singing where space is left for the computer response. Following this, the character of the response impacts the improvisation toward becoming more fragmented and the denser the response gets, it brings about more pause on part of the singers. Thus there is a potential to reshaping the affordance gradually toward a background texture, increasing the likeliness of soliciting contrasting musical gesture like silence (Einarsson, in press). Singing itself is an interesting subcase, for singing words evoke emotions, and these in turn will impact appraisal of the affordance field.

In some instances this distinction between natural and conventional may be less clear-cut, which at the same time illustrates how natural and conventional affordances are poles on a continuum rather than two distinct categories. For example, an interesting study by Berg et al. (2016) reveals how a classically trained pianist adjusts his playing in relation to the room acoustics. The study was based in a modern concert hall where ceiling height could be altered, and there were also listeners present. The larger the concert hall the longer the reverberation time, and the slower the tempo the pianist performed at became. Interestingly, there was also a heightened focus on details in the interpretation when the reverberation was shorter. So, modifications to the material environment, and the impact this had on the sociocultural situation (as constituted by for example performance practice, the character of the music and listener's expectations), influenced the pianist's behavior.

## Striving for Optimal Grip

One challenge that arises in artistic practice is, in comparison to many other activities we as humans engage in, the goal, or optimum, is not very clearly defined. Perhaps the goal, to a performer or composer, can be put as ways of being and engaging with/in music. As T.S. Eliot famously stated: "You are the music, while the music lasts." On the other hand, as Bruineberg and Rietveld (2014) write, "the skilled individual does not necessarily have an explicit goal in mind, but rather is solicited by the environment in such a way as to improve her grip on the situation." Striving toward optimal grip is thus according to them equivalent to "having an action readiness for dealing adequately with an affordance" (ibid). Our suggested "goal" in terms of ways of being in music, is constrained by the demands of the situation, its physical, social and cultural prerequisites. One prerequisite may simply be the artistic work to be performed or composed. There may also be inner constraints derived from the motivations behind engaging in music, in particular emotionally laden ones.

The ways for improving grip, as a performer, may therefore be a tending toward having the full palate of artistic expression made available, in relation to the situational demands. The performer may optimize feedback monitoring, placement of equipment, positioning in relation to the audience and/or fellow musicians, controlling muscular tension/level of anxiety in order to perform at his or her best, minimize possible distractions, rehearse, acknowledge and adapt to present room acoustics just to mention some. The performer may also learn new behaviors, cf. modifying affordances by changing the form of life. For example, in the work PS. I will be home soon by the first author (Einarsson and Friberg, 2015), performers reported that they had to find new listening strategies in order to achieve a satisfactory interaction with the computer. Many aspects applicable to the performer's situation may also be applied to the composer's situation. In addition, the composer can be said to have a goal set in terms of a directedness let us call it an affective bearing toward which the artistic course for the work is set. An affective appraisal is always present when acting. So the skilled intentionality (cf. above), i.e., striving for optimal grip, in this case, speaking from the first author's experience, is reflected in having concrete tools readily available for composition (computer, instruments, synthesizers etc.), but also in terms of having access to the desired bodily state (as Damasio denotes it), pertaining feelings and cognitive processes in accordance with the idea for the work. The composer, similar to what composer Vaggione (2001) describes, attempts to using his/her own body as a template when shaping and listening to the work in progress, making use of embodied simulation in order to work with expectations and directing the attention of as well performers as audience as the work proceeds. Affect, attention and affordances interact to sculpt a field of affordances, as Ramstead et al. (2016) put it. These aspects of skilled intentionality may be seen as ways of unveiling embodied expectations in the landscape of affordances (i.e., shared expectations embodied in material culture, social niches and patterned cultural practices, enabling the landscape of affordances), by hands-on testing and experiencing sounds and computer responses when composing.

## Attention and Joint Attention

As Ramstead et al. (2016) point out, constructed human environments, which we suggest a musical work may be likened with, work with soliciting certain expectations and directing attention. Attention impacts the ways the performer engages with the field of affordance. How a performer is attentive is shaped over the course of development, as part of an enculturation, thus ways of relating to computer responses in an interactive piece of music is part of a larger picture, where preconceptions in terms of ways of being attentive are part of how the performer attends to the music. Since parameters for analysis and synthesis not seldom change dynamically throughout the piece, many affordances are highly dynamic.

Drawing upon interviews with singers from two different musical works, it is possible to compare a classically trained vocalists' conceptualisation of the computer (Einarsson and Friberg, 2015) with jazz vocalists' preconceptions of the computer (Einarsson, in press). These differences in sociocultural situation between singers identifying with different genres, i.e., different fields of affordances, show how waywardness in the relationship toward the computer may cause uncertainty in some singers, but the appraisal of uncertainty and subsequent course of action may vary very much due to what formal training (enculturation and skill) they have and what connotations the computer brings along (the object). Uncertainty was experienced as inherently negative by the classically trained vocalist, while to the jazz singers it was at the heart of the practice and to a large degree indispensible (Einarsson and Friberg, 2015; Einarsson, in press).

The singer's accounts in the work Metamorphoses (ibid) also reveal expectations, such as listening out for what is not already there, in other words, listening out for where the piece of music is heading trying to anticipate the computer's (re-) action, or trying to "un-listen" what some singers or computers are performing in order to execute difficult passages. This directly relates to the agent's selective engagement with the field of affordance, as modulated by directed attention.

According to Ramstead et al. (2016), joint and shared attention mark out some affordances as more salient, and this we suggest is part of how the composer works, i.e., by guiding the attention of both performers and audience. Particularly with interactive works, the first author's research brings forward performers' experiences of putting the relationship toward the computer on display for the audience or for fellow musicians (Einarsson, in press) in a "look what I found" sense. For example, the violinist in PS. I will be home soon! (Einarsson, 2012), performing in a motion-tracking system, described how she wanted to show the sounds to the audience. Through her path across the floor, where the motion detector tracked her movements, she achieved this display. Simply put, in one moment, the audience afforded the action of putting on display, and the electronic sounding afforded exploration and movement, yet these affordances can be assumed to interact, similar to what is suggested by Ramstead et al. (2016), which also would be interesting grounds for continuous study. This also applies to a mechanism only briefly touched upon by Ramstead et al (ibid), a description of how joint attention, usually only applied to dyadic relationships, may be projected to larger groups. The first author's research suggests that the musical work containing interactive technology may constitute one such case of expanded joint attention, where computer technology is part of the field of affordance holding an ensemble of nested affordances.

## Sociocultural Dimensions

Recurring in this discussion, the musical performance situation is indeed a sociocultural environment, but as previously noted in section Affordances in Music Research, this is surprisingly often not addressed when discussing affordances in music. For instance, this entails that fellow musicians influence available affordances by directing attention to certain aspects of the landscape, making some behavioral responses more likely due to expectations based on formal training and experience than others.

Already Gibson spoke of information, of secondary knowledge, as a way of accessing some affordances, and by emphasizing similar sociocultural dimensions, as Rietveld and Kiverstein (2014) and later Ramstead et al. (2016) do, the theory makes much more sense in the field of music. For example, in an interactive musical work, for a performer to have some of the background information, such as knowing the composer's intentions with the relationships between materials, contributes to the sense of a whole and the discovery of affordances, i.e., how to choose between actions (Einarsson, in press).

One mechanism at work, affecting the fields of affordances for all parties involved (performers, audience, composer), is sociocultural normativity. This includes, but is by no means restricted to, (1) cultural artifacts such as the score, enculturation in terms of the singer's formal training, the ease with which certain actions are preferred over others—i.e., the ability of the performer, the participating institution (s), or (2) social influence such as the presence and proximity of the audience, the presence and proximity of other musicians, composer and technicians/staff, even social identity in terms of members of a social group not present at the moment. In interactive works it is apparent how emotions as well as culture and social relations are part of the interplay between performer and computer technology. Returning to the notion of experiencing waywardness in the relationship toward the computer, the situatedness, the enculturation and social influence, impact how this is experienced. With four singers in Metamorphoses, all having the same sort of "fickle playmate," creates, according to the singers' accounts (Einarsson, in press), a sense of a shared handling of the situation (social influence).

One kind of computer response commented on by the singers performing Metamorphoses (ibid) was imitation, a driving force that enforces social liking (Leman, 2007). Engaging with certain responses offers a give and take of imitative gestures between singer and computer. Many of the affordances in the responsive work are thus nested, or of give-and-take character, and taking the musical lead in one direction opens up an array of action possibilities in the next step.

## The Role of the Composer

Given what has been discussed so far, the role of the composer is then to shape dynamical fields of affordances accounting for their possible interactions, based on a shared landscape of affordance [cf. subject position in film theory (see Clarke, 2005), shaping a shared frame of reference for interpretation, but here with an emphasis also on action—among other things]. Within this larger landscape of affordances and the musical performance situation with all its parties and multiple layers, there are clusters and overlaps: the singers' somewhat permeable and overlapping fields of affordances, and the listeners' fields of affordances. Considering this—consciously or unconsciously is part of the composer's practice. Even when composing, we suggest composers create their field of affordances to operate within, relying on mechanisms of predictive processing and embodied simulation. Quiet inner listening brings about action cues, and extracts of musical passages or certain sounds projected over loudspeakers in the studio also suggest musical action in an embodied manner. Anticipating and forming relationships, as well as playing with expectations, is many times at the core of the composer's practice. This is in line with Ramstead et al.'s statement: "The everyday phenomenology of affordances is one of possibilities for action and their variations; in other words, of expecting certain nested action possibilities and prescriptions for action" (Ramstead et al., 2016, p. 13, our emphasis).

An interesting example of working with the field of affordances is the audiovisual performance work One piece of a shared space (Einarsson, 2015b), where sung vowel sounds had an impact on the localisation of sound in the concert space (i.e., spatialization). The singer experienced the relationship toward the live electronics as quite ephemeral, and looking through the lens of the cultural affordances framework some interesting issues arise. A response in the domain of location does not first and foremost solicit an action of vocalizing. Rather the suggested action is to turn toward the sound, to approach and examine (a natural affordance). The concert space where this particular piece was rehearsed did not allow for very much movement, thus restricting the field, but when this kind of action was added as a kind of stage direction, to turn toward the sound (a guidance in the field of affordances), it did become more meaningful to watch and also made more sense to the performer. As a continuation, one could hypothesize that this kind of affordance would be better highlighted in an environment allowing for more exploration (changing the "form of life" by manipulating the environment, and/or behavior), or, a situation where the system was also susceptible for movement, i.e., the performer's movement was also taken into account for analyses, in addition to the sung input (changing the form of life by manipulating the material).

Hence, pertinent to our discussion of music performance is a dynamic between shared landscape and individual fields of affordances, and we suggest that considering this dynamic is at the heart of the music composer's practice. We are, however, not saying that compositional practice is devoid of rationalizations or structured approaches, but rather—following Damasio (1994), Johnson (2007), and Ziemke (2016)—that embodiment is fundamental to every aspect of human life and meaning-making.

## DISCUSSION AND CONCLUSION

One of the driving forces behind this research has been the question how we can begin to account for the complexity of interactive music performance situations and analyze details without losing track of the whole. We have argued that what is still missing in the discourse on musical affordances is an encompassing theoretical framework incorporating the sociocultural dimensions that are fundamental to the situatedness and embodiment of interactive music performance. This would be facilitating a detailed account of the underlying mechanisms, but also providing a more holistic approach that does not lose track of the complex whole constituted by the interaction of composers, performers, audience, technologies, etc. We believe that Ramstead et al.'s cultural affordances framework, drawing upon the work by Rietveld and Kiverstein (2014), although not previously applied to music, constitutes a promising starting point for capturing and elucidating this complex web of relationships. Furthermore, by providing insights into the underlying mechanisms, it also facilitates new ways of considering the process toward new musical works as well as the performance situation as such. We hope to have illustrated this in this paper, at least to some degree, with examples from the first author's artistic work as composer and performer of mixed works, where a combination of acoustic sound sources (singers) and digital sound sources (responsive computer technologies) perform together live.

To begin with, Ramstead et al. (2016) put forward, echoing Rietveld and Kiverstein (2014), that an ecological niche equals a landscape of affordances. "The total ensemble of available affordances for a population in a given environment. This landscape corresponds to what evolutionary theorists in biology and anthropology call a 'niche"' (Ramstead et al., 2016, p. 3). We then learn how a niche: "[. . . ] in the case of humans, the social world [is]—associated with (and partly constituted by) a form of life" (ibid, p. 5). We also learn that: "Different human communities, societies, and cultures, with sometimes strikingly different styles of engagement with the material and social world, constitute different forms of life."

Hence, the consequence of what they are saying is, different forms of life entail different landscapes of affordances. Furthermore they describe how there is also a strong influence on available affordances in a niche from "local ontologies," i.e., collective expectations, installed through specific ways of doing joint activities in domain-specific material-discursive environments (patterned practices). They write: "[. . . ] these ontologies codetermine the exact affordances that are available in a given niche, for they prescribe certain ways of being, thinking perceiving and acting in context that are situationally appropriate" (ibid, p 14). So, local ontologies also influence affordances available in a niche, i.e., the landscape.

In our analysis, we have seen the need for a way to describe these arenas of a landscape of affordances where local ontologies derived from social niches and cultural practices have shaped a community as part of a landscape. Reading Ramstead et al. (2016) closely, they also seem to be grasping for this level of analysis:

"Ourclaim here is that cultural affordances (especially conventional ones) form a coordinated affordance landscape, which is enabled by sets of embodied expectations that are shared by a given community or culture. Social niches and cultural practices generally involve not isolated, individual affordances or expectations but local landscapes that give rise to and depend on shared expectations. We submit that these shared expectations—implemented in the predictive hierarchies, embodied in material culture, and enacted in patterned practices—contribute to the constitution of the landscape of affordances that characterizes a given community or culture" (Ibid, p.14 our emphasis).

Kiverstein and Rietveld (2015) write: "The human landscape of affordances is one that is tightly interwoven with both material aspects and social and cultural practices local to different regions of this landscape" (ibid, p. 712 our emphasis).

We interpret this as a common reaching for an intermediate level between landscape and field, a "local landscape" in the words of Ramstead et al, or a "region of the landscape" in the words of Kiverstein and Rietveld (2015). In a similar vein, Kiverstein and Rietveld (ibid) touch upon how a landscape of affordances relies on possibilities for action available in a particular form of life, because of the patterned and coordinated activities in which members of this form of life are able to partake in. We see a need for this level constituted by clustered fields, an arena of affordances, for example when discussing performers identifying with different genres, stemming from different sociocultural background, i.e., different formal training, different repertoire knowledge, different ideals and expectations, and familiarity with different institutions and patterned practices associated with these. These differences are distinct and relatively stable, although not as distinct we would claim as to call them different forms of life, i.e., different landscapes. We therefore advocate an addition to the framework in terms of an intermediate level, an arena of affordances, meaning clustered fields of affordances determined by shared local ontologies and social and cultural practices, as part of the landscape of affordances.

Hence, what this paper contributes to the understanding of music as embodied and situated activity, we believe, is the presentation and illustration of a theoretical framework centered on affordances, yet a broader notion of affordances than previously discussed in the musical context. We argue that this is more suitable for capturing the social and cultural aspects that are central to musical performances, while also not losing track of their embodied nature. In our opinion, the crucial departure from the original Gibsonian notion of affordances, and many later variations and interpretations thereof, lies in the position that it is the situation as whole that has affordances. This also sheds new light, as discussed in detail in the previous section, on musical composition as a process of construction—and embodied mental simulation—of situations, guiding the performers' and audience's attention in shifting fields of affordances.

Finally, what this paper contributes to the research topic "Beyond Embodied Cognition", is an illustration—using the case of interactive music—of how seemingly highly abstract, disembodied and unsituated activities, such as the composition of musical works, can in fact be strongly grounded in concrete embodied and situated activity. Hence, the contribution to the cognitive sciences in general, beyond the specific application to interactive music, lies in the formulation, discussion, and illustration of a significantly broader notion of affordances as aspects of situations, building on the recent work of Rietveld and Kiverstein (2014) and later Ramstead et al. (2016) on cultural affordances. The distinction between a population's relatively static landscape of affordances and individuals' dynamically varying fields of affordances, is also in line with recent work on the neural and cognitive mechanisms underlying affordances, which indicates that affordance perception is less direct, more context- and goal-dependent than Gibson thought 40–50 years ago (Thill et al., 2013), and that there are separate brain pathways for stable and variable affordances (Sakreida et al., 2016). The question of what exactly constitutes a situation is of course not trivial—that discussion goes back to at least Dewey (1938) and Russell (1939), and is beyond the scope of this paper. However, we believe that the theoretical perspectives and concrete examples discussed in this paper help to elucidate how situations—and with them affordances—are dynamically constructed through the interactions of biological, contextual, social, and cultural mechanisms as embodied and situated activity unfolds.

## AUTHOR CONTRIBUTIONS

The work reported here is part of AE's doctoral research, for which TZ has been the main supervisor. Accordingly, most of the text has been written by AE, and much of the material comes from her artistic work as well as her doctoral dissertation. TZ has contributed to framing the discussion from a cognitive science perspective.

## FUNDING

AE was funded in part by the Swedish National Artistic Research School (Konstnärliga forskarskolan). TZ's research is supported in part by the Knowledge Foundation, Stockholm, under the AIR grant (SIDUS grant agreement no. 20140220) and in part by

## REFERENCES


ELLIIT, the Excellence Center at Linköping-Lund in Information Technology.

### ACKNOWLEDGMENTS

The authors would like to thank Per Mårtensson, Royal College of Music, Stockholm, for many comments on the first author's doctoral dissertation drafts, which have indirectly also contributed to this paper.


Shapiro, L. (2011). Embodied Cognition. New York, NY: Routledge Press.


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2017 Einarsson and Ziemke. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Enactive Approach and Dual-Tasks for the Treatment of Severe Behavioral and Cognitive Impairment in a Person with Acquired Brain Injury: A Case Study

David Martínez-Pernía1,2,3 \*, David Huepe<sup>1</sup> , Daniela Huepe-Artigas<sup>1</sup> , Rut Correia<sup>4</sup> , Sergio García<sup>2</sup> and María Beitia<sup>2</sup>

<sup>1</sup> Center for Social and Cognitive Neuroscience, School of Psychology, Universidad Adolfo Ibáñez, Santiago, Chile, <sup>2</sup> Experiential Neurorehabilitation Research Department, Fundación Polibea, Madrid, Spain, <sup>3</sup> Laboratory of Experimental Psychology and Neuroscience, Institute of Cognitive and Translational Neuroscience, INECO Foundation, Favaloro University, Buenos Aires, Argentina, <sup>4</sup> Faculty of Education, Universidad Diego Portales, Santiago, Chile

One of the most important sequela in persons who suffer from acquired brain injury is a behavioral disorder. To date, the primary approaches for the rehabilitation of this sequela are Applied Behavior Analysis, Cognitive-Behavior Therapy, and Comprehensive-Holistic Rehabilitation Programs. Despite this theoretical plurality, none of these approaches focuses on rehabilitating behavioral disorders considering the relation between affordance and environmental adaptation. To introduce this therapeutic view to neurorehabilitation, we apply the theoretical tenets of the enactive paradigm to the rehabilitation of a woman with severe behavioral and cognitive impairment. Over seventeen sessions, her behavioral and cognitive performance was assessed in relation to two seated affordances (seated on a chair and seated on a ball 65 cm in diameter) and the environmental adaptation while she was working on various cognitive tasks. These two seated affordances allowed to incorporate the theoretical assumptions of the enactive approach and to know how the behavior and the cognition were modified based on these two postural settings and the environmental adaptation. The findings indicate that the subject exhibited better behavioral (physical and verbal) and cognitive (matching success and complex task) performances when the woman worked on the therapeutic ball than when the woman was on the chair. The enactive paradigm applied in neurorehabilitation introduces a level of treatment that precedes behavior and cognition. This theoretical consideration allowed the discovery of a better relation between a seated affordance and the environmental adaptation for the improvement behavioral and cognitive performance in our case study.

Keywords: enaction, seated affordance, dual-tasks, neurorehabilitation, behavioral disorder, cognitive impairment

## INTRODUCTION

Persons who suffer from acquired brain injury have multiple impairments that prevent their performing activities of daily life, such as locomotion, self-care, communicating, and reasoning, normally. In addition, such persons suffer from mood changes (depression, anxiety). Nevertheless, one of the most important problems of the brain damaged is the behavioral disorder

#### Edited by:

Zheng Jin, Zhengzhou Normal University, China

#### Reviewed by:

Derrick L. Hassert, Trinity Christian College, USA Derek A. Barton, Carrick Institute, USA

> \*Correspondence: David Martínez-Pernía davidmpernia@gmail.com

#### Specialty section:

This article was submitted to Cognition, a section of the journal Frontiers in Psychology

Received: 16 July 2016 Accepted: 17 October 2016 Published: 01 November 2016

#### Citation:

Martínez-Pernía D, Huepe D, Huepe-Artigas D, Correia R, García S and Beitia M (2016) Enactive Approach and Dual-Tasks for the Treatment of Severe Behavioral and Cognitive Impairment in a Person with Acquired Brain Injury: A Case Study. Front. Psychol. 7:1712. doi: 10.3389/fpsyg.2016.01712

**111**

(Lezak, 1986; Tate, 1986; Brooks et al., 1987; Burke and Wesolowski, 1988; Livingston and Brooks, 1988; Jacobs, 1998). The magnitude and persistence of these behavioral sequelae suggest that the problem extends beyond the individual sphere, affecting the person's social and familiar contexts such as emotional overburdening in a familiar environment (Godfrey et al., 1991; Franulic et al., 2000), problems with community integration and psychosocial adjustment (Milders et al., 2003; McCabe et al., 2007). In addition to all of these sequelae, the behavioral disorder may be a serious obstacle to the rehabilitation process and the patient's recovery (Sohlberg and Mateer, 2001; Wood and McMillan, 2001; van Reekum et al., 2005).

Behavioral disorders are commonly classified into externalizing (impulsiveness, irritability, aggression, loss of emotional control, hyperactivity) and internalizing (depression, withdrawal, apathy) symptoms in both childhood (Cicchetti and Toth, 2014) and adulthood (John et al., 2008). The persistency of these symptoms is such that they may be present for many years after brain injury (Kelly et al., 2008). The severity of this impairment depends on various factors such as premorbid behavior, personal skills, extent of brain injury, and the types of physical, emotional, and cognitive sequelae the person suffers (Eames et al., 1990; Mateer and Ruff, 1990; Davis and Goldstein, 1994).

To date, the primary approach for the treatment of behavioral disorder following acquired brain injury is based on behavioral therapy (McGlynn, 1990; Jacobs, 1993; Wesolowski and Zencius, 1994). According to the majority of the modern literature (Ylvisaker et al., 2005, 2007; Cattelani et al., 2010; Geurtsen et al., 2010), these interventions may be organized into three main categories: Applied Behavior Analysis, comprising contingency management procedures and positive behavior intervention; Cognitive-Behavior Therapy; and Comprehensive-Holistic Rehabilitation Programs. Each of these approaches has its own theoretical and therapeutic assumptions that target specific features of behavioral disorders.

Despite this theoretical plurality, none of these therapeutic approaches focuses on rehabilitating behavioral disorder considering the relation between affordance and environmental adaptation. Current perspectives do not consider how physical structure and the environment compose the first step in the emergence of behavior and cognition. The theoretical assumptions of conventional behavioral therapies are based on the dichotomy between subject and object. The subject and the object cause a natural division between the person who suffers from the disability and the environment that surrounds that person. For example, Applied Behavior Analysis focuses on manipulating the environment to improve misbehavior. In Cognitive-Behavior Therapy, the intervention is based on improving self-consciousness and learning cognitive strategies. And finally, the third group, the Comprehensive-Holistic Rehabilitation Programs, simultaneously incorporate the manipulation of the environment and cognition to recover from behavioral disorders (Cattelani et al., 2010). Nevertheless, and opposed to these therapeutic models, the enactive view is not based on the division between subject and object. The enactive approach assumes the process of interaction, constant, and unbreakable, between the environment and the sensorimotor schemas for the emergence of behavior and cognition. In the view of the therapeutic approach that we introduce in this publication, it indicates that any clinical intervention must consider the interaction between body and environment to improve the neurological sequelae. Following we briefly explain the enactive approach.

Enaction is a novel paradigm in the cognitive sciences (Di Paolo et al., 2010) that was initially articulated by Varela et al. (1991) in "The Embodied Mind." In this approach, behavior and cognition develop through a dynamic interaction between the physiology of the organism, the sensorimotor systems, and the environment (structural coupling between the body and the world). Human beings enact the world; their embodied actions in the world are the first steps in the development of perception and the basis of cognition. In opposition to other theories of the mind in which the subject and the environment are considered to be separate entities, enaction claims that the study of the action and the cognition requires the simultaneous study of the mind, the body and the environment because all three are indissolubly intertwined in the mind processes (Thompson, 2007). Originally, this paradigm focused primarily on simple cognitive processes such as color perception (Varela et al., 1991). Currently the enactive approach addresses the explanation of action and cognition in activities that require highlevel cognitive processes such as mathematics (Núñez, 2010), language (Bottineau, 2010), the human brain (Engel, 2010), social interaction (Di Paolo et al., 2010), and emotion (Colombetti, 2010). However, the enaction approach has not been applied in neurological therapy.

To introduce this therapeutic view to the field of neurorehabilitation, we apply the theoretical tenets of enaction to clinical practice. Thus, to research this interacting system of body structure and the environment, we assessed, during 17 sessions, the behavioral and cognitive performances of a woman with a severe acquired brain injury (ABI) in two different therapeutic contexts. Following Gibson's (2014) works about ecological perception we call these settings seated affordances. In one seated affordance, the woman was required to perform various cognitive tasks in the traditional posture of cognitive rehabilitation, that is, seated on a chair. In the other seated affordance, the woman was required to attempt the same tasks that she had performed in the chair; this time, however, the woman was seated on a therapeutic ball (65 cm in diameter). In both seated affordances, misbehaviours (physical and verbal), and the successes and failures of the cognitive tasks were recorded during the performance of the cognitive tasks.

We formulated a primary hypothesis and a peripheral hypothesis based on prior clinical experiences. The primary hypothesis was that the sensorimotor dynamics between the body and the environment in the ball condition would allow a better modulating effect of the externalizing symptoms than when the woman was seated on a chair. The peripheral hypothesis, considering that enaction claims that motor action is directly linked to cognition, was that the woman achieved a better cognitive performance working in the ball condition than in the chair condition. A multiple schedule design was developed

to examine how these bodily and environmental adjustments modified the behavior and cognitive variables.

## CASE REPORT

## The Subject

We applied the experimental design to a 36-year-old woman who, in 2007, suffered a severe brain injury. The injury occurred 48 h after a normal childbirth, at which time the woman suffered from a severe encephalopathy secondary to fulminant hepatic failure. At the emergency service, cerebral oedema and multiple non-specific lesions in bilateral white matter were observed. One week later, the woman was diagnosed with preeclampsia. Afterward, she received a liver transplant. Ultimately, she began her rehabilitation in a specialized center for neurological therapy.

At the time of this research, 6 years after the ABI, the neuropsychological assessment was only qualitative because of severe behavioral and cognitive disorders. During the exploration, the woman was restless, impulsive, uninhibited, and verbally incoherent. Orientation to person place and time was impaired.

Although the neuropsychological impairments were generalized in all cognitive functions, with regard to this study, the most important injuries were associated with attention, reasoning, comprehensive and expressive aphasia, and executive functions.

The subject was highly impaired in sustained, selective, alternant, and divided attention. The woman was incapable of paying attention because of extreme distractibility. Her language was verbose and meaningless, with alterations in grammar, use of neologisms, palilalia, and echolalia. The subject made inappropriate comments and numerous perseverations without being able to maintain social relationships. The woman could understand simple sentences but struggled with the pragmatic elements of language. Her executive functions were severely affected, functions such as planning and sequencing capabilities and mental flexibility. The woman could not focus her attention on relevant stimuli or omit irrelevant stimuli. The woman could not inhibit verbal and motor behaviors because of her impulsivity and made decisions and solved problems without reflection because her abstract and complex reasoning were extremely affected.

Physically, the woman could move and walk without any external support and was capable of organizing her movements adequately, both fine motor and gross motor skills. Therefore, the subject did not have any difficulty in her physical posture or balance.

This study was carried out in accordance with the recommendations of Fundación Polibea's ethical committee. Family of the person who participated in this study gave written informed consent in accordance with the Declaration of Helsinki.

## Procedures

The experimental design was applied in seventeen sessions. In each session, two cognitive tasks were rated, and different behavioral and cognitive variables were collected while the person was seated on different affordances (on a chair or on a ball). Behavioral and cognitive variables were assessed on both seated affordances; however, to counterbalance, the data from the sessions were assessed changing the order of the affordance (Session 1: First on the ball and second on the chair. Session 2: First on the chair and second on the ball, etc.). Between changing from one seated affordance to the other seated affordance, the task was stopped for 5 min.

When the woman was seated on the chair, we used the standard posture of a traditional cognitive session. The subject had to be seated in a chair with her feet on the floor. Moreover, the subject had to keep her trunk erect without resting against the chair back although she was allowed to place her arms on the table. In the second seated affordance, we exchanged the chair for a therapeutic ball (65 cm in diameter). The woman had to keep her trunk straight with both feet on the floor, neither foot touching the ball to avoid her using her ankles for stabilization. The researcher did not allow the woman to rest her arms on the table.

In both seated affordances, two cognitive exercises were implemented. The first exercise required the woman to perform a matching task, and the second task required the woman to utilize greater cognitive resources (complex task). In the matching exercise, the woman worked on visual perception and sustained attention, and the complex task involved auditory perception, language comprehension, visual perception, selective attention, and motor skills. The performance of the first exercise (the matching task) comprised giving the woman cards, one by one, to match to the card with the identical figure located on the table in front of her. The exercise lasted 15 min, and during the session, the therapist recorded the number of successes and failures. Then, the second exercise (complex task) began. This exercise comprised the researcher's asking her to point to a specific day of the week on a timetable. This performance was repeated seven times and the number of successes recorded.

During the sessions, there were two therapists in the room. One therapist, the psychologist, managed the psychological session seated in front of the woman on the other side of the table, and the other therapist, the physical therapist, was located behind the woman checking to see whether the woman touched the ball with her ankles.

All sessions were recorded on video for later study to allow various researchers to assess the behavioral variables. The score of disruptive behavior was computed as the number of laughs, grabs, strikes to the therapists, and looks back to the second therapist. Self-verbalization was computed by the number of times the woman talked without communicative intention. Finally, the verbalization variable was computed by the number of times the woman talked to one of the therapists.

## Experimental Design

We applied a multi-treatment design (Kazdin, 1982) and more specifically, a multi-schedule design (Hersen and Barlow, 1978; Kazdin, 1982). The primary feature of this single-case design is that separate interventions are associated with distinct stimulus conditions. This methodological design is consistent with the goal of this research because "after the stimulus has been associated

with its respective intervention, a clear discrimination is evident in performance" (Kazdin, 1982, p. 173).

The data analysis began by assessing the autocorrelation of all variables in both the control phase and the experimental phase. The variables that were not autocorrelated were matching failure (on the chair, r = −0.18, p = 0.51, ns; on the ball, r = 0.23, p = 0.38, ns.), disruptive behavior (on the chair, r = −0.014, p = 0.96, ns; on the ball, r = −0.26, p = 0.33, ns.), verbalization (on the chair, r = 0.318, p = 0.23, ns; on the ball, r = 0.36, p = 0.17, ns.), self-verbalization (on the chair, r = 0.25, p = 0.35, ns; on the ball, r = 0.15, p = 0.57, ns.), and complex tasks (on the chair, r = −0.18, p = 0.50, ns; on the ball, r = 0.28, p = 0.30, ns.). The only variable that showed autocorrelation was matching success and only in the chair condition (r = 0.60, p < 0.05 on the chair compared with on the ball, r = −0.30, p = 0.26, ns.). We processed this analysis in various manners depending on whether the variables were autocorrelated. The non-autocorrelated variables were assessed with the Mann–Whitney U test. This analysis is considered the most appropriate and strict for this type of data (Tate and Perdices, 2012). Relatively, the autocorrelated variable was assessed with c-statistics according to the proposal suggested by Tryon (1982) and DeCarlo and Tryon (1993). This analysis allows detecting small changes in successive measurements.

Three researchers were employed to increase the internal validity of the observational variables (disruptive behavior, selfverbalization, and verbalization). These researchers conducted the independent measurement. In addition, the intra-class correlation coefficient was analyzed for each variable. The measurements of this analysis showed a high reliability of the intra-class correlation coefficient (ICC Point Estimate [95% CI]): disruptive behavior 0.949 (0.900–0.979), self-verbalization 0.932 (0.865–0.972), and verbalization 0.937 (0.876–0.974).

## RESULTS

We identified significant differences between the performance of the person in the ball condition and in the chair condition, which are summarized in **Table 1**.

The variables assessed with the Mann–Whitney U test had significant results. For the behavior variable, a significantly lower number of disruptive behaviors were observed on the ball (**Figure 1A**), Z<sup>0</sup> = −3.55, p < 0.001, than on the chair. The sum of the ranks was 400.50 for the chair condition and 194.50 for the ball condition. For the verbalization variable, the woman decreased her communication when working on the ball (**Figure 1B**), Z<sup>0</sup> = −3.08, p < 0.001. The sum of the ranks was 387 for the chair condition and 208 for the ball condition.

We observed that the subject accomplished a better complex task performance while working on the ball, Z<sup>0</sup> = −1.74, p < 0.05. The sum of the ranks was 250.50 for the chair condition and 344.50 for the ball condition. When we assessed the manner in which the subject developed the task, the psychologist who guided the sessions did not observe that the woman modified any of the strategies used to successfully accomplish the task (visual strategy or a longer time scanning).

By contrast with our initial hypothesis, both the selfverbalization variable and the matching failure variable were significantly higher in the ball condition than in the chair condition. For self-verbalization, the person significantly increased her verbal response when working on the ball, Z<sup>0</sup> = −1. 98, p = 0.02. The sum of the ranks was 240 for the chair condition and 355 for the ball condition. In the matching failure, the person made significantly more mistakes working on the ball, Z<sup>0</sup> = −2.82, p < 0.001. The sum of the ranks was 217 for the chair condition and 378 for the ball condition.

Finally, the only autocorrelated variable (matching success) was the applied c-statistic. Assessing the baseline, a significant trend was observed: c-statistic = 0.62 (SE = 0.22), z (twotailed) = 2.82, p < 0.01 (p = 0.0024). Following the analysis protocol, the difference between the treatment line (on the ball) and the baseline (on the chair) for each session (days) was calculated. The results showed a better matching success when working on the ball: c-statistic = 0.90 (SE = 0.22), z (two-tailed) = 4.09, p < 0.01 (p = 0.004). Despite the differences between the statistics, the average differences were small (Mbaseline = 38.53 compared with Mtreatment = 38.94). The results showed a significant trend. This was possible because the differences between the experimental conditions were maintained in each session (like a trend).


<sup>∗</sup>Statistically significant (P < 0.05). ∗∗Statistically significant (P < 0.01).

## DISCUSSION

The present study was designed to assess whether different seated affordances (on a ball and on a chair) affected the behavioral and cognitive performance of a person with severe acquired brain injury. We hypothesized that the structural coupling of the body and the environment in the ball condition would be better than in the chair condition. The findings indicate that the subject produced a better behavioral and cognitive performance when working on the therapeutic ball than when working on the chair (the traditional postural setting in cognitive rehabilitation).

The results indicate that the woman managed misbehaviours (physical and verbal) better while working on the ball. We believe these better results are because the work on the ball elicits higher automatic body resources than the work on the chair, helping the person avoid irrelevant stimuli from the environment and centring herself in both her body and her task. In addition to the improvement in behavioral management, the woman also significantly improved the cognitive performance variables (matching success and complex task) while working in the ball condition, suggesting that the therapeutic strategy not only has a modulating effect on the externalizing symptoms but also allows better cognitive function. This cognitive outcome is consistent other studies, demonstrating that cognitive function improves when postural control becomes more difficult (Caldwell et al., 2003; Elliott et al., 2005; Barra et al., 2015).

By contrast with our initial premise, both the matching failure and self-verbalization variables behaved differently from our predictions. These results are consistent with some studies that observed that the increase in the exigency of postural control caused a decrease in the success of tasks (Andersson et al., 1998; Vuillerme et al., 2000; Brauer et al., 2001; Riley et al., 2003; Barra et al., 2006; Rapp et al., 2006; Simoneau et al., 2008).

Although the most important evidence in behavioral disorders arise from single-case experimental designs (Alderman et al., 1999; Alderman, 2002; Barlow et al., 2009), which avoid the uncontrolled variance produced by the heterogeneous nature of this disorder (Alderman and Wood, 2013), we find two main limitation in this study. The first one is related to a possible learning effect. The experimental design developed consisted of the application of two seated affordances during 17 sessions, which might produce this undesirable learning effect. Nevertheless, and even if the outcome was affected by learning effect, it would not impact the main outcome of this research because in each session the person performed both seated affordances. The second limitation of this study is related to ecological approach. The development of this study was applied in a therapeutic context, which means it was performed in a place without any noise, disruption, or distortion of social environment. As a future improvement of the proposed study, we suggest to carry out this research in a more ecological environment in order to evaluate how the relation between the seated affordances and the environmental adaptation modulates cognition and behavior in a rehabilitation center, a day center for people with disabilities and the family's house.

In recent decades, physical therapies have incorporated dual-task training into motor development theory. This intervention requires that an individual maintain balance while simultaneously performing another task (cognitive or motor). There may be some confusion regarding the similarity between dual-task training and the therapeutic intervention that we present; however, the two interventions are not, in fact, similar. We resolve these differences with two arguments. The first argument is that dual-task training is categorized as a therapy based on information processing theory, which

is known as computational therapy (Martínez-Pernía et al., in press), whereas our proposal is based on the enactive approach. The second argument is that the majority of dual-task training focuses on motor development; however, the aim of the present study is the improvement of behavioral and cognitive performance. To our knowledge, only three other studies apply a dual-task paradigm to improve cognitive function (Caldwell et al., 2003; Elliott et al., 2005; Barra et al., 2015), and this work is the only study that is based on the enactive paradigm.

## The Particularities of the Enactive Approach in Neurorehabilitation

In recent decades, new theories have arisen in the cognitive sciences to address the study of cognition in an innovative manner. These perspectives may be summaries from four primary perspectives, or in Gallagher's words, "the 4e approaches of the mind" (Rowlands, 2010): embodied, embedded, enacted, and extended. Despite differences among these perspectives, all have one similar characteristic. These perspectives assign major significance to extra-neuronal structures for the study of cognition. Their theories emphasize the importance of the body and the environment in the emergence of cognition. Currently, these theoretical stances are a primary line of research in non-Cartesian cognitive sciences (Rowlands, 2010): such as philosophy, (Gallagher, 1986, 1995, 2000, 2005; Johnson, 1987; Gallagher and Zahavi, 2008; Shapiro, 2011), neuroscience (Varela et al., 1991; Damasio, 1994, 2003; Thompson and Varela, 2001; Edelman, 2004), psychology (De Jaegher, 2013; McGann et al., 2013), education (van der Schyff, 2015; Lozada and Carro, 2016), and artificial intelligence (Clark, 1998). And although these types of studies remain scarce in the rehabilitation sciences, there are some publications based on this theory, such as studies regarding the rehabilitation of persons who suffer from an experiential disorder called hemiphobia (Martínez-Pernía and Ceric, 2011) and embodiedenactive clinical reasoning in physical therapy (Øberg et al., 2015).

This work may be relevant in the clinical field. This statement is not only based on the fact that our study shows hints of an effective treatment in rehabilitation of persons with severe behavioral and cognitive disorder, but also because it allows people who are usually excluded from conventional therapies [due to the lack of insight and motivation (Burgess and Wood, 1990; Sazbon and Groswasser, 1991) and also because of the severity of their behavioral disorder (Wood, 1987)] to receive a rehabilitation treatment. Traditional rehabilitation programes can only be employed with patients with less sequelae (Wood and Worthington, 2001). As a consequence, the spectrum of more serious impairments falls outside of any possible treatment. From the point of view of neurorehabilitation based on enactive approach, the main problem of classical therapeutic interventions is that they focus on increasing behavioral management by techniques of self-control. These types of strategies can be successful in people with high cognitive levels because such people are able to understand internal and external instructions. Nevertheless, these strategies are not effective in persons with low cognitive levels because such persons are not aware of what is going on internally and what the therapist is demanding (Alderman, 2003). By contrast, the therapeutic strategy based on the enactive approach overcomes this drawback because enactive therapy is working on unconscious structures of the mind; the interaction between the body and the environment modify behavior and cognition without the necessity of selfawareness.

Surprisingly in cognitive therapy, the discussion of the importance of the body and the environment has been absented despite the fact that it is a basic element into the therapeutic setting. The reason for this lack of interest is due to the fact that cognitive neurorehabilitation is based on functionalism (Martínez-Pernía et al., in press). From the functionalist perspective, the body is reduced to somatosensory cortex or it is restricted to perceive the stimulus of the environment for being later used by the cognition (Gallagher, 1995, 2005). From this view the body is reduced to provide the "raw sensory input" to the brain, but it does not have any contribution in cognition (Gallagher, 2005). This is the reason why most of clinical interventions do not consider what is the best corporal posture to improve cognitive impairment. As a consequence of this omission, therapy implicitly accepts that the gold standard for this issue is to develop cognitive intervention with the patient seated on a chair, posture that has to be maintained along all session. Although this is the ordinary corporal posture to recover from cognitive impairment there are some therapeutic strategies that assume the corporal work in a more innovative way. For instance, the cognitive diagnosis based on dualtask paradigm and some strategies of the unilateral neglect integrate a corporal therapy more pragmatically than to be seated on a chair. Nevertheless, and in spite of these scarce therapeutic interventions, the theoretical assumptions of cognitive neurorehabilitation do not show interest in knowing how the body and the environment improve the cognitive function. By contrast to the functionalist view, the embodied cognition approaches raise the importance of the body and the environment in the emergence of consciousness. The sensorimotor schemas and the environment are the substrate from which the cognition emerges and from where the perception, attention, memory, thought, reasoning, and so on are shaped. In this way, the structural coupling between the body and the environment provides specific conditions that shape the cognition. From this view cognition can be understood as a dynamic process that is situated prior to brain activity (Gallagher, 1995, 2005; Gallagher and Zahavi, 2008). The body (through its movements and its corporal posture) and the particular characteristics of the environment [that affords the action of specific sensoriomotor programs (e.g., walking, sitting, swimming)] work together to shape the cognition (Gallagher, 2005). The meaning of this theory applied to the field of cognitive neurorehabilitation is that any therapeutic strategy has to consider what is the best interaction process between the body and the environment

to favor the recovery of a person from his/her cognitive impairment.

## CONCLUSION

The enactive paradigm applied in the rehabilitation of a woman in this case study introduces a level of treatment that precedes behavior and cognition and emerges from the relation between seated affordance and environmental adaptation. This theoretical consideration allowed the discovery of a better seated affordance

## REFERENCES


for the improvement behavioral and cognitive performance in our case study.

## AUTHOR CONTRIBUTIONS

DM-P: Person who developed the original idea, design, statistic, therapist, writer, revision final paper. DH: Design, statistic, writer, revision final paper. DH-A and RC: Writer, revision final paper. SG and MB: Therapist, revision final paper.



**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2016 Martínez-Pernía, Huepe, Huepe-Artigas, Correia, García and Beitia. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# White Lies in Hand: Are Other-Oriented Lies Modified by Hand Gestures? Possibly Not

#### Katarzyna Cantarero<sup>1</sup> \*, Michal Parzuchowski<sup>1</sup> and Karolina Dukala<sup>2</sup>

<sup>1</sup> Faculty in Sopot, SWPS University of Social Sciences and Humanities, Sopot, Poland, <sup>2</sup> Department of Philosophy, Institute of Psychology, Jagiellonian University, Cracow, Poland

Previous studies have shown that the hand-over-heart gesture is related to being more honest as opposed to using self-centered dishonesty. We assumed that the hand-overheart gesture would also relate to other-oriented dishonesty, though the latter differs highly from self-centered lying. In Study 1 (N = 79), we showed that performing a handover-heart gesture diminished the tendency to use other-oriented white lies and that the fingers crossed behind one's back gesture was not related to higher dishonesty. We then pre-registered and conducted Study 2 (N = 88), which was designed following higher methodological standards than Study 1. Contrary, to the findings of Study 1, we found that using the hand-over-heart gesture did not result in refraining from using other-oriented white lies. We discuss the findings of this failed replication indicating the importance of strict methodological guidelines in conducting research and also reflect on relatively small effect sizes related to some findings in embodied cognition.

#### Edited by:

Anna M. Borghi, Sapienza University of Rome, Italy

#### Reviewed by:

Robert Calin-Jageman, Dominican University, United States Franz Mechsner, Independent Researcher, Berlin, Germany

> \*Correspondence: Katarzyna Cantarero kcantarero@swps.edu.pl

#### Specialty section:

This article was submitted to Cognition, a section of the journal Frontiers in Psychology

Received: 23 September 2016 Accepted: 03 May 2017 Published: 22 June 2017

#### Citation:

Cantarero K, Parzuchowski M and Dukala K (2017) White Lies in Hand: Are Other-Oriented Lies Modified by Hand Gestures? Possibly Not. Front. Psychol. 8:814. doi: 10.3389/fpsyg.2017.00814 Keywords: other-oriented dishonesty, white lies, body gestures, hand-over-heart, pre-registration, replication

## INTRODUCTION

Chandler Bing: Janice said "Hi, do I look fat today?" So I looked at her... Ross Geller: Whoa, whoa. You looked at her? You never look. You just answer, it's like a reflex. Do I look fat? No! Is she prettier than I am? No!

Friends (TV Series, 1994-2004)

We started writing this article with the idea to focus on the possible effects of the hand-overheart gesture on refraining from using other-oriented white lies. The experiments presented in this article were in fact designed to test the idea that body gestures commonly associated with (dis)honesty influence white lies. We have drawn hypothesis from previous psychological research and then designed and conducted a study (Study 1) to put this hypothesis to the test. The study, however, was designed and conducted few years ago, following our best intention and using the knowledge we had back then, which, looking at it today, was not flawless. Due to recent crisis regarding replications of studies (see e.g., Open Science Collaboration, 2012, 2015; Klein et al., 2014; Hagger and Chatzisarantis, 2016) and especially with reference to the uncertainty regarding some embodied cognition effects (e.g., Ranehill et al., 2015; Wagenmakers et al., 2016), we felt that we should also make an effort to replicate our own findings. As we truly hope not to be chasing noise with our scientific endeavor, we have decided to pre-register the design and conduct a replication of Study 1. The replication held to higher methodological standards as compared to the original study. Thus this article shows first why we have developed the hypothesis we had and then, after

**119**

presenting the research we have conducted and the results we found, we focus on both theoretical and methodological issues related to the (possible lack of) hand-over-heart gesture effect on other-oriented white lies.

## EMBODIED COGNITION

The pure experience of reading a good novel is often thought to involve the feeling of being totally immersed in the whole multiperceptual reality (by the means of visual imagery). Similarly, mere thinking about a concept is argued to involve simulating the relevant perceptual states (Barsalou, 2008; Bergen, 2015). According to the embodied perspective, people represent these concepts by the means of the same sensations that co-occur with the activation of such concepts (see Riskind, 1984; Chandler and Schwarz, 2009). Our bodies and their modalities can be perceived as grounding mechanism in cognitive processes (see Barsalou, 2009).

From very early on in our development, we make sense of social situations by assimilating their meaning with the states of our bodies, their movement or their orientation in space in those specific conditions. After many exposures, we learn how to associate the fact that giving somebody a hug means that you like that person or that pushing something away means that you do not like it. Similarly, a little child jumps around rapidly when faced with exciting stimulation. This kind of situated conceptualisation represents the configurations of multimodal components, that is, e.g., visual, auditory, proprioceptive, or interoceptive information that can be viewed as a specific perceptual pattern (Barsalou, 2009). The embodied cognition perspective suggests that when a component of a specific pattern is evoked or triggered, the remaining components are likely to be activated as well, as they form a pattern in which in the past they have frequently co-occurred with the perceived component. Hence, after many incidents where such modal patterns co-occur in quick succession, they are thought to form a unified situated conceptualisation in our memory that plays an important role in social cognition later in life (Barsalou, 2009). For example, the numeracy cognition observed among pre-schoolers is often based on their operations with fingers representing numbers in space (e.g., SNARC hypothesis – spatial-numerical association of response codes, Dehaene et al., 1993; Riello and Rusconi, 2011). After much exposure to such co-occurrence early in life, we can observe a pattern of spatial preference among adults as well – people respond faster to large numbers with their right answer key than with the left one, while small numbers are categorized faster with the left than with the right key (Dehaene et al., 1993).

The grounding theory (Barsalou, 1999, 2009; Niedenthal et al., 2005) proposes that simply increasing the accessibility of the specific concept (say the physical stimuli appears on left side of the visual field) can elicit thoughts, feelings, and judgments related to the concept that is applicable in this pattern (it would be appraised, judged, and coded as being smaller or of lesser value: Dehaene et al., 1993; Parzuchowski et al., 2016). For example, when hugging somebody – apart from signaling a liking for that person – it also involves a whole perceptual pattern of various sensations (experience of warmth, the smell of the person, softness of their skin, and so on). Thus when people experience warmth (e.g., when they are placed in a warm room), they are more likely to perceive others as friendly and kind (IJzerman and Semin, 2009, 2010; Szymkow and Parzuchowski, 2013; Szymkow et al., 2013; IJzerman et al., 2016).

It is important to note that these bodily induced association activations are thought to take place unobtrusively, and even without awareness of their semantic meaning (see Chandler and Schwarz, 2009; Jostmann et al., 2009). For instance, Chandler and Schwarz (2009) claimed to study text comprehension while instructing participants to perform various finger movements. During this experiment they asked participants to extend their middle fingers (a hostile gesture), or to extend their index fingers (a neutral, control gesture). The participants were asked to indicate their impressions of an ambiguously described person while performing the finger movement (hostile or control gesture). None of the participants noticed that they had, in fact, been performing the valenced gesture. Interestingly, those making the hostility–associated gesture perceived the target person as more hostile than the controls did.

To sum up, many research findings appear to show that merely experiencing a bodily sensation associated with certain concepts is enough to shape subsequent information processing, although the pattern of associations related to the body manipulation should be limited to the previous repetition pattern and its recognition. Thus the effects of such bodily manipulation are at the same time sensitive to culture and contextual clues (for a discussion on this topic, see Bialobrzeska and Parzuchowski, 2016). There is therefore theoretical and empirical evidence in psychological literature that indicates the effects of embodied cognition. We next turned to the literature on dishonesty and its' link with embodied cognition, as we wanted to focus on gestures related to dishonesty.

## SELF-ORIENTED AND OTHER-ORIENTED DISHONESTY

People in long-term relationships would often agree with the anecdotal advice coming from a fictitious character, Ross Geller and mentioned at the beginning of this article. When sensitive questions are being asked by the partner (e.g., "Do I look nice in that dress/suit"), one should not take time to give an informed response ("Well, let me see"). Many would agree that the highest scoring response will be a prompt and firm confirmation ("Yes, you look great in everything") to prevent any type of unwanted discussions that a hesitation may trigger. We often lie for the sake of our relationships with others. Yet even though people report having lied at least once a day, most of our daily communication is free from deception (DePaulo et al., 1996).

Interestingly, contextual cues may trigger the tendency to give more honest responses. Previous research shows that emblematic gesture manipulation (namely hand-over-heart) can induce a more honest response regarding the way we behave or judge ourselves and others (Parzuchowski and Wojciszke,

2014; Parzuchowski et al., 2014, 2017). However, the hand-overheart gesture has been proven to elicit a more honest approach mainly in perceiving other's intentions (Parzuchowski et al., 2014, 2017) or self-oriented motivational contexts (Parzuchowski and Wojciszke, 2014). Namely, people were less inclined to be dishonest to benefit themselves when they posed the hand-overheart gesture.

In the present article, we were interested in verifying the application of this gesture when dishonesty is motivated more prosocially. There are many ways of differentiating the types of lies (see, e.g., Camden et al., 1984; Arcimowicz et al., 2015). One of the most important categorization depends on the type of the beneficiary of the lie (e.g., DePaulo et al., 1996; Erat and Gneezy, 2011). The primary beneficiary of the lie can be the liar, another person or both. There is strong evidence showing that in fact self-oriented (or self-centered) and other-oriented dishonesty are significantly different from each other (e.g., Kashy and DePaulo, 1996; Cantarero and van Tilburg, 2014).

While self-oriented lies are the ones that are primarily aimed to benefit the liar (DePaulo et al., 2004), other-oriented lies aim at providing benefit to another person (DePaulo et al., 1996). White lies (or Pareto white lies) are aimed to benefit both the liar and another person (Erat and Gneezy, 2011). They are related to the willingness to be polite and to care about another person's feelings. The benefits of another person related to these lies can involve trying to make another person feel good by saving them from an unpleasant truth. On the other hand, white lies also bring benefits to the liar, like maintaining good interactions with others, being perceived as a nice, good person, or being liked. Nevertheless, in white lies, the other-oriented motivation is more important than in self-oriented dishonesty one. In the present article, we wanted to focus on lies that are not primarily aimed to benefit the liar, that is, on other-oriented white lies that include the interests of others.

The decision of whether to lie or to tell the truth depends on the consequences that a discovered lie has (e.g., Mazar and Ariely, 2006). Other-oriented lies, when unraveled, can be argued to have less severe consequences (Arcimowicz et al., 2015). Lies that are aimed at bringing benefits to others are also far more acceptable than the ones that are centered on the benefits of the liar (Lindskold and Walters, 1983). As a result, the cost-benefit analysis of the decision as to whether to lie or to tell the truth should differ depending on whether a lie is self-centered or there is other-oriented motivation involved. Since the psychological costs of lying are much lower for other-oriented lies, we should expect the decision of whether to use such lie to be much easier, than when a lie is self-centered. Interestingly though, when we lie, most of our lies are self-centered and not other-oriented (DePaulo et al., 1996). It seems that we are willing to accept more psychological costs of lying in exchange for receiving more personal benefits that a lie might bring.

This poses an interesting dilemma related to truth telling and lying. Previous studies have indicated that self-oriented and other-oriented dishonesty might relate in an adverse way to a self-regulation process (Cantarero and van Tilburg, 2014). Namely, while ego-depletion promotes self-centered dishonesty (Mead et al., 2009), the same conditions should push us toward reduced proneness to other-oriented deception (Cantarero and van Tilburg, 2014). This is due to the assumption that otheroriented dishonesty demands more effort and is less of a 'default' option for people. Should using an unobtrusive hand gesture be related to other-oriented dishonesty? As previously mentioned, the hand-over-heart gesture when performed by a target person he/she appeared more trustworthy than the same targets photographed with both hands down. Using the hand-over-heart gesture lead to refraining from self-centered dishonesty (e.g., Parzuchowski et al., 2014). Presumably, when the gesture is incorporated by the agent (unobtrusively within some other bogus task), it serves as the implicit association with the honesty and trigger participants to behave accordingly. Just as Mazar et al. (2008) showed that swearing an oath of allegiance to a bogus honor code or attempting to recall norms (The 10 Commandments), made people act in a more honest way, presumably because this drew attention to one's internal standards of integrity. One could argue that since polite, white lies are so socially acceptable, they pose almost no dilemma in the communicator and thus people will not refrain from telling such lies even when placing their hand over their heart.

In study 3, Parzuchowski and Wojciszke (2014) have shown that placing a hand over the heart caused people to withhold their honest opinion about the (un-) attractiveness of the alleged acquaintances of the experimenter. Participants were asked to rate how attractive people presented in the photos were. This information was not given in the presence of the judged person or in a face-to-face setting. As a consequence, such a situation might not have engaged much of the interest of the interlocutor. In the presented study, we address these issues and add new insights into the role of using hand gestures in promoting and refraining from dishonesty. In the study presented below, we wanted to test whether a white lie aimed at protecting others from harm will depend on a performing a gesture related to honesty, namely the hand-over-heart gesture.

In order to reach this goal, the design of the presented experiments involved the presence of the supposed author of considered artworks. We focused on the disliked artworks of the 'author' and asked participants to give their feedback about the work to the face of the alleged author, similarly as in the study by Bell and DePaulo (1996). We introduced this setup to trigger a dilemma for our participants between telling the blunt truth about their aesthetic dislike, or acting politely and expressing the alteration of their ratings. This is a clear situation that should involve other-oriented white lies – social norms should trigger expression of less extreme preferences when faced with the author in order to spare him/her from feeling bad. This motivation to lie is other-oriented especially because the participants did not expect to interact with the alleged author after the experiment and telling the truth could possibly hurt the feelings of the author.

Our goal was to test if gestures related to (dis)honesty can influence one's particular social behavior – tendency to tell other-oriented white lies aimed at protecting others from harm (namely to exaggerate ones' aesthetic judgments about artworks). Additionally, in Study 1, we wanted to see whether it is gestures that promote and prevent lying that will affect telling white lies. For this reason, in Study 1 we also included the fingers

crossed behind one's back gesture that should augment lying behavior. Previous studies have shown that the hand-over-heart gesture reduces self-centered dishonesty (e.g., Parzuchowski and Wojciszke, 2014). We wanted to test whether the use of the gesture primes other-oriented honesty by measuring whether people are more likely to give true (non-flattering) feedback to others (unfamiliar author of a work of art). More specifically, we wanted to focus on other-oriented white lies and hypothesized that the tendency to use these lies will be diminished when performing the hand-over-heart gesture. The second aim was to check if the opposite gesture (fingers crossed behind one's back) promotes dishonesty, we hypothesized that it would prime people to give more positive but untrue feedback to others. Additionally, we wanted to control the degree to which the 'author' was liked, as it should be related to higher proneness to use otheroriented lying, which was the case in former studies (e.g., Bell and DePaulo, 1996). To test the hypothesis we conducted a laboratory experiment (Study 1) and then conducted a replication study (Study 2) that was focused solely on the hypothesis regarding the hand-over-heart gesture and the other-oriented white lies.

## METHOD

The main aim of the studies was to test whether the hand-overheart gesture is related to refraining from using other-oriented white lies.

## STUDY 1

## Participants and Design

Eighty-three university students (67 women; Mage = 20.92, SD = 1.61) participated in the experiment in exchange for course credit. Participants in the study were enrolled to participate via a campaign advertising the study as 'Body posture and perception of artistic work.' The study was run in individual sessions, and each lasted around 30 min. All participants gave their informed consent. At the end of the procedure, participants were asked to guess what the purpose of the research was. Data from the four participants that guessed the correct hypothesis were excluded from the analysis and thus the total sample consisted of 79 people.

The procedure of the experiment was adapted from Bell and DePaulo (1996). The laboratory was turned into an 'art gallery' displaying 10 photos (which were numbered from 1 to 10). Participants were randomly assigned to one of three experimental conditions (hand-over-heart, fingers crossed behind the back, or a control gesture: hand over elbow, see Appendix 1). In each of the experimental conditions, participants were asked to pose the main gesture and two other gestures (these were: hand over arm, hand over hip, see Appendix 1) while evaluating the artwork (the order of the gesture use was counterbalanced). We refrained from using the expression 'hand-over-heart,' to avoid having the possibility of receiving the effect by means of simply instructing participants to behave according to the meaning of the gesture. In the hand-over-heart condition participants were asked to place the hand at a given height of their chest. We did this to distract the participants from the factual aim of the study. Participants were instructed to pose these gestures after having heard a verbal signal that described it. To standardize the instruction, the verbal signals of the gestures were pre-recorded and were played to participants during the experiment.<sup>1</sup>

There were always two experimenters conducting the study. Experimenter 1 was mostly responsible for conducting the first part of the story and Experimenter 2 was conducting the conversation about the photos after being introduced as an alleged author of some of the artworks. Firstly, in order to present the cover story, participants were asked to, first, privately assess without verbal or written statement one by one all of the photographs presented in the 'gallery,' while rotating gestures in accordance with the pre-recorded verbal signals played from the speakers. Only then were participants given a piece of paper and asked to choose two photos from the gallery: the one they liked the most and one they liked the least and to evaluate both chosen pictures on a Likert scale from 1 to 7, where 1 was "Definitely don't like it" and 7 was "Definitely like it." After having the participant evaluate the photos on paper, Experimenter 1 would pass that information to Experimenter 2 (without being noticed by the participants), so that s/he knew which photos to talk about with the participant. The participant was then asked to back up his or her opinion (in writing) stating why he or she had chosen the 'favorite' and the 'worst' photographs (while writing the evaluations participants were not posing any gestures).

At this phase of the experiment, seconds after having written their evaluations of the photos, participants were asked to talk to the alleged author (the Experimenter 2) of some of the photos that were presented in the 'gallery.' Experimenter 2 was blind to the hypothesis of our study. We said that, in the experiment, we were also interested to know how people discuss artwork verbally while posing gestures. We also told participants that this would be useful feedback for the "author." At this point, the auditory instruction ('chest,' 'fingers,' and 'elbow') was played and we asked the participant to hold the gesture during the whole conversation with "the author" (Experimenter 2). At this stage Experimenter 2 (alleged author) entered the room and asked participants about three photos. The questions were the same every time (whether the participant liked a photo on a 1–7 scale, where 1 was "Definitely don't like it" and 7 was "Definitely like it"; then two open-ended questions were asked: to justify their evaluation and to describe the topic of the photo). The first two

<sup>1</sup>The verbal signal was changed every 15 s. The order of the verbal signals presented to the participants was identical across conditions with the exception of the key gestures that were under study (hand-over-heart vs. fingers crossed behind one's back, vs. hand over elbow): hip, arm, (key gesture), arm, (key gesture), hip, arm, hip, (key gesture), arm. That is, the first verbal signal was 'hip,' when the participant heard the word they were supposed to place their hand over their hip while evaluating a photo. Then they were asked to turn to another photo when they heard a new verbal signal (and change their gesture). If the participant was in the 'handover-heart' condition, they would hear the word 'chest' as the third signal. In such an instance, a participant would turn to the next photo and place their hand over their heart. The examples of the gestures are presented in Appendix 1.

photos to be evaluated were random ones and not that were not indicated by the participant as the most liked one or the most disliked one in writing a few minutes earlier. The third photo was always the one that the participant chose as the most disliked one in the phase of the written evaluations of the photos. The evaluation of this third photo in conversation, the one we knew was the disliked one (and evaluated in writing a couple of minutes earlier), was the main interest of the experiment and our main operationalisation of the dependent variable.

Afterward, we asked the participant to evaluate both of the experimenters (we said that we would appreciate the feedback from each participant regarding the experiment). The questions that we used were for example: 'I think that this person was professional' (this was the mock phrase, used to make the assessment of the experimenters seem more credible to participants), 'I think that this person is nice.' The answers were on a 1 ('Definitely yes')–7 scale ('Definitely not'). The questions came from Kulesza et al.'s (2015) liking scale.

We then thanked the participants for their participation, gave the credit points and after we had examined all the participants, we debriefed them. We expected that higher ratings given in the oral evaluation phase of the experiment (compared to the first, written evaluations) would indicate that the situation indeed promoted lying (i.e., giving an excessively positive feedback on artwork to an alleged author). We assumed that participants would be less eager to lie when placing their hand over their heart. We assumed that the participants would be more willing to lie in the oral evaluation while presenting the fingers crossed gesture.

### Results

The six items liking scale (after exclusion of one mock item) reached a good reliability coefficient of α = 0.76. We then calculated an index of deception, which was the difference between the second, oral evaluation of the least liked photo and the first, written evaluation of the same, least liked photo.<sup>2</sup> A oneway ANCOVA was conducted with the deception index as a dependent variable, liking of the alleged author as a covariate and type of gesture (hand-over-heart, fingers crossed, and hand over elbow) as a between-subjects factor. The results showed a main effect of the gesture F(2,75) = 5.16, p = 0.008, η <sup>2</sup> = 0.12.<sup>3</sup> Comparison of means with Bonferroni correction showed that there was a significant difference between the hand over elbow gesture (M = 1.04, SD = 0.92) and hand-over-heart (M = 0.27, SD = 0.78), p = 0.006, while the fingers crossed gesture results did not differ from the aforementioned significantly (M = 0.67, SD = 1.07). Lower deception index in the hand-over-heart condition means that participants were indeed less likely to use an other-oriented lie when performing the gesture, which supports our hypothesis. The fingers crossed behind one's back, however, did not promote lying, leaving our second hypothesis without support.<sup>4</sup> These results are presented in **Figure 1**.

The results also revealed that there was a trend regarding the likeability of the author F(1,75) = 2.87, p = 0.094, η <sup>2</sup> = 0.04. Further analysis showed a trend that indicated that the more the alleged author was liked, the higher the deception index was rs(79) = 0.16, p = 0.077 (one-tailed).<sup>5</sup> It suggests that people may be more likely to give positive, even if not true, feedback to those whom they like more. The liking of the alleged author did not differ between the experimental conditions (p = 0.183).

The results of Study 1 suggested that the hand-over-heart gesture is related to refraining from using white lies comparing to control gesture. We wanted to replicate these findings and conducted a study that was of similar design to Study 1. We focused solely on the hand-over-heart gesture in Study 2 and decided to improve the design of the study. We intended to make both the procedure and the study material better. With the new design, we have, among others, excluded the part, where Experimenter 1 unnoticed passes information to Experimenter 2. The following studies were pre-registered at osf.io.<sup>6</sup>

<sup>2</sup>For the sake of clarity and brevity in the main body text, we decided to present the findings using a deception index. However, a repeated measure mixed-model ANOVA, with oral and written evaluations as within-subjects factors and type of gesture as between-subjects factor, yields practically the same results. The difference between oral (M = 2.38, SD = 1.15) and written (M = 1.72, SD = 0.90) evaluation is significant, indicating that the experimental setting did promote lying F(1,76) = 39.34, p < 0.001.

<sup>3</sup>Results of the ANOVA analysis without the covariate yield similar results. Namely, the experimental manipulation does show a main effect on the deception index, F(2,76) = 4.26, p = 0.015, η <sup>2</sup> = 0.10. Comparison of mean replies with Bonferroni correction indicates significant differences only between the control and the handover-heart condition (p = 0.012).

<sup>4</sup>We ran a supplemental study afterward to test the recognisability of the gestures used in the main study. The reason behind this identification test is that the level of gesture recognition could be a proxy for the strength of the association between the bodily movement people perform while making the gesture and the concept that it is represented being activated in the same time. Eighty-four pedestrians in Cracow, Poland (48 women, Mage = 23.47; SDage = 4.17) were asked one dichotomous question about a gesture's familiarity ("Are you familiar with the gesture?"; answers were either "yes" or "no" while the experimenter presented the gesture), and one open-ended question about the knowledge of the context the gesture is used ("Do you know in which situations this gesture would be used?"). We also asked participants to rate how frequently they use the target gesture ("How often do you use this gesture yourself?") on a scale from 1 (never) to 7 (always). Half of the participants were presented the accompanying hand-over-heart gesture and half of them the fingers crossed behind one's back gesture. A χ 2 test showed a significant difference in judgments of familiarity between the gestures used: χ 2 (1) = 6.89, p = 0.009, v = 0.29. When asked about the hand-over-heart gesture, more participants declared knowing it (42.9% of participants, 18 people claimed that they know the gesture,) than when asked about the fingers crossed gesture (16.7% of participants; 7 people). Next, we coded open-ended answers on the knowledge of the context the gesture was used. Participants' responses were divided into two groups '1': 'recognized' (i.e., answers "when lying" in the case of the fingers crossed gesture and "when telling the truth" or "to be perceived more truthful" in the case of the hand-over-heart gesture) and '2': – 'not recognized' (or any other answers, such as "don't know," "for a laugh" or "doesn't matter"). A χ 2 test showed that more people correctly recognized the hand-over-heart gesture (76.2% of participants; 32 people) than the fingers crossed gesture (52.4% of participants; 22 people), χ 2 (1) = 5.19, p = 0.023, V = 0.25. Ratings of frequency of gesture usage were analyzed using the independent t-test, which showed a significant difference between declared frequency of using the hand-over-heart and fingers crossed gestures, F(1,82) = 12.99, p = 0.001, η <sup>2</sup> = 0.14. Participants declared using the hand-over-heart more often than the fingers crossed gesture (M = 3.57, SD = 1.23 and M = 2.57, SD = 1.31, respectively).

This suggests that the hand-over-heart gesture is better known than the fingers crossed behind one's back gesture. Also, people seem to be more aware of the context in which the gesture is used. Finally, people declare that they are more likely to use the hand-over-heart gesture than the gesture of fingers crossed behind one's back.

<sup>5</sup>There was also a positive correlation between the liking of the alleged author and the extent to which the oral evaluation of the author's photo was positive, r(79) = 0.24, p = 0.016.

honest evaluations of the same, least liked photo, as communicated to the alleged author of the photo. Error bars present standard errors.

## STUDY 2

We preceded Study 2 with an additional pilot study to choose the appropriate stimuli.<sup>7</sup> The aim of Study 2 was to replicate the main finding from Study 1.

## Participants

Eighty-eight participants (65 women, Mage = 29.27, SDage = 10.21) took part in this study in exchange for course credits.

## Materials and Procedure

We invited our participants to join a two-part study. The first study one was said to be aimed at investigating the way people talk about photographs. We said that the second study would be on further evaluation of research material. Upon the arrival to the lab, participants were asked by Experimenter A (blind to the hypothesis of the study) to take part in a supposedly unrelated, third study (with additional small credit given to all participants willing to join the short study) where the new breathing measurement device needs to be calibrated for an upcoming student project for sport psychology. The new breathing measurement device supposedly involved having a chest rubber band that needed to be tightened with the use of the shoulder (resulting in a hand over shoulder gesture) or with the right hand (resulting in a hand over heart gesture). This information allowed us to have participants perform the gestures without being aware of the factual aim of the study. Participants were then asked to simply have the device placed on their body for 20 s and then Experimenter A started recording their breathing rhythm during the remaining time spent in the lab (in order to calibrate the measurements of the device). After that Experimenter A left the participant with Experimenter B.

Experimenter B (blind to the hypothesis of the study) then explained that she was a fellow student but in her free time she took photo class and that for her Master thesis qualification she was pursuing some qualitative research interested in how people talk about photographs. She explained that for the purpose of the study she will ask participants to rate pictures taken by her and

<sup>7</sup>We also conducted a pilot study to be able to choose the best research material for the Study 2. Two hundred and seventy four students (Mage = 24.69, SDage = 7.49) took part in an online study on evaluation of research material. The sample consisted of 213 women and 59 men. We have prepared photographs of landscape. The photographs were aimed at being of rather medium to poor quality. After gathering informed consent, participants were asked to evaluate a set of photos. Thirty photos were then randomly displayed to participants. Participants were asked to answer to four questions regarding each of the photos. We asked about the extent to which participants liked the photo, considered it of good quality, perceived it as a photo professionally taken and finally we asked whether participants thought that the photo was taken by a professional. All of the questions were answered on a Likert-type 1 to 7 scale, where 1 = definitely not, and 7 = definitely yes.

We analyzed mean replies of the thirty photos regarding the four questions. We focused on two pictures that would be most disliked and not differ regarding the evaluation of the three additional questions. We then also picked two photos that would be at least 1 SD liked more that the two most disliked ones. We also wanted the more liked photos to be evaluated more favorably than the disliked photos regarding the other qualities, while at the same time be of similar evaluation among each other. This allowed us to pick two photos that were equally negatively evaluated and two photos that were evaluated positively. **Table 1** presents descriptive statistics of the four photos.

We have analyzed mean evaluations of the four chosen photos regarding the four characteristics. We then calculated a series of dependent t-tests. We found that the photos chosen as the disliked ones did not differ between each other regarding liking, quality, perceived professionalism and perceptions of whether the photos were taken by a professional. At the same time the more liked photos did not significantly differ between each other regarding the four characteristics. We calculated two indexes of the evaluation: one comprised of the evaluations of the four characteristics of the two disliked photos (Cronbach's α = 0.85) and one comprised of the evaluations of the two more liked photos (Cronbach's α = 0.86). We then conducted a final dependent t-test analysis and found that the disliked photos in overall were more negatively evaluated (M = 3.03, SD = 0.97) than the more liked photos (M = 4.37, SD = 1.06), t(273) = 20.10, p < 0.001, d = 1.22. We used the four photos in Study 2.


TABLE 1 | Descriptive statistics of the evaluations of the two most disliked photos and two photos of more attractiveness of +1 SD.

other students that pursue the hobby of taking pictures as a part time class.

The photos were then displayed at a computer screen one, by one. The participants always first saw the two mildly unattractive photos (in random order) and then the two most unattractive photos (in random order). The pictures appeared with their authors and numbers below them, to make it easy to refer to them. Experimenter B asked separately about each of the photos (saying its' number and author) whether the participants liked it on a 1–7 scale, where 1 stood for definitely not and 7 stood for definitely yes. She then asked what the participant liked about the photo and what s/he disliked about it. Each time a third photo came up, the Experimenter B would inform the participant that in fact she was the author of the presented photo.

Participants were then asked to stop the breathing measurement (and release the gesture) and answer via computer open-ended questions regarding how they had felt when talking about art, whether they had used the type of words they typically use when they discussed the photos and whether they had felt comfortable talking about the photos or anything unusual happened during the procedure. These questions were used to probe for hypothesis guessing and cover up the real aim of the study. Participants were then asked demographic data.

After 1 week from the main part of the study, participants were asked to fill an online second part of the study. Participants were presented with 10 photos consisting of the four pictures from the main part of the study and other six that came from the pilot study. Among the six pictures, three were evaluated in the pilot study as the most liked ones, two were of average liking and there was also one of low evaluations (as shown in the pilot study). Participants were asked to rate on a 1 = definitely not to 7 = definitely yes scale whether they liked the photo, considered it of good quality, perceived it as a photo professionally taken, thought that it was taken by a professional and whether they had ever seen this photo before. We then gathered demographic data and asked about the perceived aim and hypothesis of the study. After the study participants were fully debriefed.

## Results

We conducted repeated measures ANOVA with liking of the photo as a dependent variable, type of evaluation (face to face vs. online afterward) and authorship of the photo (unknown vs. the interlocutor) as a within-subject and type of gesture (hand-over-heart vs. hand over arm) as a between-subject factor.

We found the main effect of the type of evaluation, F(1,73) = 44.69, p < 0.001, η <sup>2</sup> = 0.38. Participants gave more favorable evaluations in person (M = 4.11, SD = 1.41), than in an online setting (M = 3.35, SD = 1.54). There was also the main effect of the authorship F(1,73) = 5.88, p = 0.018, η <sup>2</sup> = 0.08. When participants were informed that they were talking to the 'author' of a photo, they declared higher liking of that photo (M = 3.89, SD = 1.49), than when they were not talking to the alleged author (M = 3.57, SD = 1.51). There was no significant main effect of the gesture F(1,73) = 0.02, p = 0.900. There was no significant interaction between the type of gesture and the authorship F(1,73) = 0.08, p = 0.781.

We only found a significant interaction between the gesture and the type of evaluation F(1,73) = 10.15, p = 0.002, η <sup>2</sup> = 0.12. Comparison of means with Bonferroni correction revealed that in the hand-over-heart condition participants gave more positive evaluations in person (M = 4.30, SD = 1.31) to the online evaluations (M = 3.20, SD = 1.44), p < 0.001. Similarly so, in the hand over arm condition evaluations in person were higher (M = 3.91, SD = 1.50) than the online ratings of liking (M = 3.51, SD = 1.64), p = 0.016.

We conducted an additional analysis, similar to that used in Study 1. We first calculated a deception index, which was the difference between the face-to-face evaluation and the ratings made online. We then conducted t-test analysis with the experimental condition as an independent variable and the deception index as a dependent variable. Results showed a significant main effect of the experimental manipulation t(73) = 2.64, p = 0.010, d = 0.61. Surprisingly, the deception index was higher in the hand-over-heart condition (M = 1.32, SD = 1.38), than the hand over arm (M = 0.49, SD = 1.35). These results are graphically displayed in **Figure 2**.

## GENERAL DISCUSSION

We hypothesized that participants who used the hand-overheart gesture would be more likely to give true feedback to "authors" of the work of art. The results of Study 1 showed that participants were more honest (less flattering) while performing the hand-over-heart gesture. However, the results of Study 2 showed no difference between the hand-over-heart gesture and the control gesture in the proneness to use other-oriented white lies regarding a photo that was either authored by the interlocutor or of an unknown authorship. What is more, when we conducted a similar analysis as in Study 1, we found the deception index to be significantly higher in the hand-over-heart condition to the control group. These inconsistent results raise several important issues of both methodological and theoretical kind.

N = 37; hand-over-heart, N = 38). The results show the deception index – the difference between the oral, face-to-face, more flattering evaluation and the written and honest evaluations of the same photo. Error bars present standard errors.

## Methodological Concerns

We designed and conducted Study 1 a few years ago, using our best knowledge and intentions. We do recognize, however, that the study had imperfections and we had not pre-registered our experimental protocol. The design was unnecessarily complex, we did not calculate the sample size before conducting the study and relied on the rule of thumb instead. We also could not exclude the possibility that there could be an influence of posing gestures in the first stage of the experiment on evaluations and preferences regarding the photos. A post hoc analysis of the preferences did not show any patterns in preferences depending on the condition, limiting such a possibility, yet without excluding it. What is more, it would have been better if Experimenter B had not known the preferences of each of the participants while talking to each of them.

We thus designed and conducted Study 2. We wanted to focus solely on the hand-over-heart effect on other-oriented white lies hypothesis and replicate previous findings, yet this time using a better design. Of course Study 2 is not free from limitations. Most importantly, we failed to gather the desired number of participants. We aimed at reaching 114 participants, when we could only gather 88 (in the time limit we had), out of which 75 took part in the follow-up assessment of the photos. This is clearly a drawback of the study. However, we also conducted a similar analysis to that in Study 1. Namely, we decided to conduct an additional analysis relying on the deception index. Post hoc analysis regarding the attained power showed that the achieved power was 0.83. This implies that the number of participants we gathered should be enough to detect the effect, had we wanted to rely solely on the deception index.

Furthermore, there were other differences between Studies 1 and 2 that might have affected the results, although based on taxonomy developed by Hendrick (1991) the methodological similarities between Studies 1 and 2 would have to call it a fair replication. We used a different cover story to introduce the necessity to pose the hand-over-heart gesture. We do think, however, that should the lack of effect of the gesture be the result of the different cover story applied, it only points to the fact that the effect (should it exist), is much weaker than we originally thought. Importantly, in Study 2 we gathered replies on personal evaluation of the photos 1 week after having stated verbally in front of the alleged author the extent, to which one liked the photo. It is possible that when participants are in the situation of just having judged a photo as unattractive (which they did in Study 1), they are more aware of the conflict between social norms and the norm of being honest and thus such a setting might have a stronger influence. Yet, both experimental settings did promote giving over positive feedback to the alleged author, which suggests that the argument of the moment in time when we gather private opinions about the photo should not affect much the hypothesized effect.

## Theoretical Implications

The hypothesis presented in this paper were driven from psychological theory and research findings supporting it. We think that it would be most correct to draw careful conclusions regarding theoretical implications. It is important to note, that we conducted only one replication of that effect. A series of multilab studies would give more solid grounds to be able to talk about robustness of an effect. We do think that above all, the results of the studies presented in this article point to the fact, that should the hand-over-heart gesture indeed influence the tendency to refrain from other-oriented white lies, it would probably be a much weaker effect. Changes in the setting between Studies 2 and 1 might have altered the obtained results, yet as stated previously, this would only imply the weakness of the original effect.

We need to point to an important theoretical issue regarding the subject of the study. We knew that the hand-over-heart gesture is related to refraining from self-oriented dishonesty (e.g., Parzuchowski et al., 2014). We therefore wanted to verify whether the effect could be generalized for other-oriented white lies. These kinds of lies are prosocially oriented and their aim is to protect others and to make the other person feel good, or at least spare them from feeling bad. People often find such lies to be even more ethical than truth telling (Levine and Schweitzer, 2014). Also, telling such a polite lie is behaving in a socially acceptable way. A rough truth is socially less acceptable than a prosocial lie (Levine and Schweitzer, 2014) and social influence mechanisms indicate that being liked is a powerful tool in social interactions. It is possible that because other-oriented white lies are so socially acceptable, they are thought of as less of a lie and therefore they do not trigger such a strong dilemma as is the case with selforiented lying. Namely, it might be that participants do not feel that there is a problem with giving an over positive feedback, as it is not so much lying when it serves another person. We thus limit the generalization of our findings to other-oriented white lies and indicate that the type of lie that was under study might be a significant factor that influenced the results.

We did find an unexpected result that indicated higher deception index in the hand-over-heart to the control group. We do not find any theoretical support for this result. If anything, such result points to the fact that the effect of hand-over-heart on other-oriented white lies either does not exist, or is extremely weak and sensitive to delicate contextual changes.

## GENERAL CONCLUSION

fpsyg-08-00814 June 20, 2017 Time: 17:4 # 9

In recent times several social priming/embodied effects came under scrutiny, e.g., cleanliness priming (Johnson et al., 2014), elderly priming (Doyen et al., 2012) or power posing (Garrison et al., 2016; online databases tracking their recent replications are PsychFileDrawer.org and CurateScience.org). Overall, we do not concur with the position that the effects of embodied cognition are in general doubtful. We are convinced that there is a significant body of compelling and replicable evidence (e.g., switching cost paradigm, Pecher et al., 2003) for the inclusion of sensorimotor system in cognitive processes (see e.g., Wilson, 2002; Pecher and Zwaan, 2005; Fischer and Zwaan, 2008; Glenberg, 2010). It has also been argued that replications of existing effects sometimes produce non-significant results (Cohen, 1969) and that (mis)replications are sensitive to contextual factors (see Van Bavela et al., 2016). That said, we are doubtful of the effect tested within this article – the tendency to use other-oriented lies is possibly not affected by honesty activation. To conclude, at least when effects are small, high methodological standards (e.g., high power, blindness to hypotheses and probing for hypothesis guessing) are vital in distinguishing when one is in search of an interesting hypothesis and when one is chasing noise. Though we cannot be entirely sure, it is possible that in our case it was the latter.

## ETHICS STATEMENT

The study was approved the Ethics Committee (decision number 1/03/2012) at the SWPS University of Social Sciences and Humanities, II Department of Psychology in Wroclaw. In all of the studies reported here, participants were informed that their participation was totally voluntary and that they could resign

### REFERENCES


from the study at any step of it. They were also informed that the data that was gathered in the study was confidential and used for scientific purpose only.

## AUTHOR CONTRIBUTIONS

Study 1: KC designed the work, KD acquired the data KC did the analysis, KC, KD, and MP interpreted the data for the work. KC, KD, and MP drafted the work and revised it critically for important intellectual content. KC, KD, and MP approved of the version to be published. KC, KD, and MP agreed to be accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved. Study 2: The pilot study was prepared by KC, KD and MP, run by KC and main analysis were conducted by KC. The main study was prepared by MP, KD, and KC, run by MP and his research assistants and main analysis was conducted by KC. The write-up of the results was conducted by KC. Each contributor participated in preparation of the final form of the research report and final paper describing the results.

## ACKNOWLEDGMENTS

We thank reviewers whose valuable critique made a significant contribution to the present version of the article. We thank Dariusz Dolinski and Olga Bialobrzeska for their comments on the paper. We also thank Roksana Galewska, Agnieszka Małucka, Viktoria Oriekhova, and Marta Szermelek for their help in conducting Study 2.

### SUPPLEMENTARY MATERIAL

The Supplementary Material for this article can be found online at: http://journal.frontiersin.org/article/10.3389/fpsyg. 2017.00814/full#supplementary-material



**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2017 Cantarero, Parzuchowski and Dukala. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Commentary: Is there any Influence of Variations in Context on Object-Affordance Effects in Schizophrenia? Perception of Property and Goals of Action

#### Thomas J. Faulkenberry <sup>1</sup> \* and Luca Tummolini <sup>2</sup>

*<sup>1</sup> Department of Psychological Sciences, Tarleton State University, Stephenville, TX, USA, <sup>2</sup> Institute of Cognitive Sciences and Technologies, Italian National Research Council, Rome, Italy*

Keywords: Bayes factors, null effects, significance testing, p-values, statistical power

#### **A commentary on**

#### **Is there any Influence of Variations in Context on Object-Affordance Effects in Schizophrenia? Perception of Property and Goals of Action**

Edited by:

*Zheng Jin, Zhengzhou Normal University, China*

#### Reviewed by:

*Victoria Simms, Ulster University, UK Daniel Ansari, University of Western Ontario, Canada*

> \*Correspondence: *Thomas J. Faulkenberry faulkenberry@tarleton.edu*

#### Specialty section:

*This article was submitted to Cognition, a section of the journal Frontiers in Psychology*

Received: *07 October 2016* Accepted: *22 November 2016* Published: *08 December 2016*

#### Citation:

*Faulkenberry TJ and Tummolini L (2016) Commentary: Is there any Influence of Variations in Context on Object-Affordance Effects in Schizophrenia? Perception of Property and Goals of Action. Front. Psychol. 7:1915. doi: 10.3389/fpsyg.2016.01915*

by Sevos, J., Grosselin, A., Brouillet, D., Pellet, J., and Massoubre, C. (2016). Front. Psychol. 7:1551. doi: 10.3389/fpsyg.2016.01551

In their recent article, Sevos et al. (2016) present data indicating that subjects with schizophrenia "have an impaired ability to experience an internal simulation of motor action potentialities when they perceived graspable objects" (p. 12). This lack of sensorimotor facilitation in patients with schizophrenia aligns broadly with other patient studies, and indicates that such individuals would require extensive use of higher cognitive processes even for the simplest routine activities in their daily life. A solid conclusion from this data would be certainly informative to understand the specific mechanisms behind schizophrenia.

The purpose of this commentary is to raise a point for further discussion. The claim that patients with schizophrenia lack this sensorimotor facilitation is based upon two non-significant effects reported in Experiments 1 and 2 of Sevos et al. (2016). This is problematic, though, since the traditional null-hypothesis testing approach does not allow one to "accept" a null hypothesis. This is because the p-value represents the probability of obtaining a sample statistic at least as large as that obtained from a given sample, assuming that the null hypothesis is true. If this p-value is small, we reject the null hypothesis on the grounds that an obtained sample statistic occurs with such low probability that the underlying null hypothesis should be considered implausible. If the p-value is not small, our only decision is to "fail to reject" the null hypothesis. It is important to note that this procedure only results in a decision to either reject or fail to reject; it does not provide any measure of evidence in favor of either the null or alternative hypothesis.

One common approach to help mitigate this problem is to report power. Mathematically, a test with sufficient power is less likely to produce a Type II error, and this allows one to feel somewhat assured that reported null effects are not simply false negatives. Though better than nothing, this approach still does not give any direct measure of evidence supporting an obtained null effect. However, recent methods based on Bayesian inference (Wagenmakers, 2007; Rouder et al., 2009) provide a relatively easy solution to this problem.

Though the specifics of Bayesian inference are beyond the scope of this short commentary (see Wagenmakers, 2007 for more details), the basic idea is that one computes a Bayes factor to index the preference for one model over another. The larger the Bayes factor, the more evidence in support of the model. One particular advantage to this approach is that it allows a researcher to directly assess evidence in support of the null hypothesis H<sup>0</sup> over another hypothesis H1; such a Bayes factor would be denoted BF01. This Bayes factor represents the odds in favor of the null hypothesis over the alternative hypothesis after the data have been observed. Further, BF<sup>01</sup> can be converted into a posterior probability, which is the probability that the null hypothesis H<sup>0</sup> is true given data D.

To this end, we will describe how to compute BF<sup>01</sup> and the posterior probabilities for the null effects reported in Experiments 1 and 2 of Sevos et al. (2016).

The first step in the computation is estimating the Bayes factor BF01. Following Wagenmakers (2007), Masson (2011) describes one approach to estimating BF<sup>01</sup> that is based on the Bayesian Information Criterion, or BIC. BF<sup>01</sup> is estimated as

$$BF\_{01} \sim e^{\left(\Delta BIC/2\right)}\tag{1}$$

where

$$
\Delta BIC = n \ln(1 - \eta\_p^2) + (k\_1 - k\_0) \ln(n). \tag{2}
$$

In Equation (2), n represents the number of subjects, η 2 p is the standard effect size measure in an ANOVA which represents the proportion of variance accounted for by the independent variable, and k<sup>1</sup> − k<sup>0</sup> represents the difference in the number of free parameters between the two models being compared. Note that in the case of a comparison between a null and alternative hypothesis for a single two-level factor (i.e., prime, present vs. absent), k<sup>1</sup> − k<sup>0</sup> = 1. Finally, if we assume that the null and alternative hypothesis are equally likely before collecting data (that is, equal priors), the Bayes factor B<sup>01</sup> can be converted into a posterior probability estimate via the equation:

$$p(H\_0|D) = \frac{BF\_{01}}{BF\_{01} + 1}.\tag{3}$$

Now, let us compute Bayes factors for the reported null effects in Experiment 1 and 2 of Sevos et al. (2016). In Experiment 1, the authors reported that for the n = 18 patients with schizophrenia, the critical interaction of response and orientation did not differ as a function of name prime (present vs. absent), F(1, 17) = 2.584, p = 0.126, η 2 <sup>p</sup> = 0.13. Equation (2) yields

$$\begin{split} \Delta BIC &= n \ln(1 - \eta\_{\rho}^{2}) + (k\_{1} - k\_{0}) \ln(n) \\ &= 18 \ln(1 - 0.13) + (1) \ln(18) \\ &= 0.384. \end{split}$$

Substituting this into Equation (1) then gives us the estimate

$$BF\_{01} \sim e^{\left(\Delta BIC/2\right)}$$

$$= e^{\left(0.384/2\right)}$$

$$= 1.211.$$

This means that, given the data, a null interaction is only 1.21 times more likely than a true interaction between response and orientation. According to Jeffreys (1961), Bayes factors falling between 1 and 3 are considered "anecdotal" evidence, whereas a Bayes factor between 3 and 10 represents "moderate" evidence, and a Bayes factor greater than 10 is considered "strong" evidence. As such, the evidence from Sevos et al. (2016) is anecdotal.

Additionally, we can use Equation (3) to compute the posterior probability of the null hypothesis:

$$\begin{split}p(H\_0|D) &= \frac{BF\_{01}}{BF\_{01} + 1} \\ &= \frac{1.211}{1.211 + 1} \\ &= 0.55.\end{split}$$

According to Masson (2011), probability values falling between 0.50 and 0.75 are taken as weak evidence in favor of the null hypothesis. As the Bayes factor and posterior probability are directly related via Equation (3), they both tell the same story; that is, the null effect of name prime reported in Sevos et al. (2016) is not well supported.

A similar computation can be carried out for the effect of action prime in Experiment 2. Sevos et al. (2016) report that for a group of n = 18 patients with schizophrenia, the interaction between response and orientation did not differ as a function of action prime (congruent vs. incongruent), F(1, 17) = 1.288, p = 0.272, η 2 <sup>p</sup> = 0.07. Applying Equation (2) gives 1BIC = 1.584, which implies (via Equation 1) that BF<sup>01</sup> ∼ 2.208, implying that the null interaction is only 2.21 times more likely than the true interaction. Equation 3 yields a posterior probability of p(H0|D) = 0.69. As with Experiment 1, evidence for this null effect is weak.

It is worth noting that this method is not the only approach to computing Bayes factors to assess null effects. The software package JASP (available as a free download from www.jaspstats.org) contains a Summary Stats module that allows the user to compute Bayes factors from test statistics for a variety of common designs, including t-tests and linear regression. At present, the Summary Stats module does not have an option for ANOVA designs, in which case the method of Masson (2011) that we have presented provides a good solution. One should also note that Bayes factors and posterior probabilities can be computed directly from raw data using JASP or the BayesFactor package (Morey and Rouder, 2015) in R (R Core Team, 2016).

In summary, a Bayesian analysis of these two results indicates that at present, there is not much support for the null effects reported in Experiments 1 and 2. As such, any interpretations of these null effects should be met with caution. It is important to note that the points raised in this commentary are not meant to be unfairly critical of the results obtained by Sevos et al. (2016). On the contrary, the experiments are well-designed and informative, both in the context of embodied cognition as well as in the context of psychopathology. The purpose of this commentary was (1) to point out the issues that are present when trying to interpret nonsignificant results in the traditional null hypothesis statistical testing framework, and (2) to offer a quick example of how to use a Bayesian approach to quantify evidence for object-affordance effects and other action-specific influences on perception in the study of embodied cognition.

## REFERENCES


## AUTHOR CONTRIBUTIONS

TF wrote the draft of the manuscript and performed calculations, and LT provided revisions and clarifications.

Schizophrenia? Perception of Property and Goals of Action. Front. Psychol. 7:1551. doi: 10.3389/fpsyg.2016.01551

Wagenmakers, E.-J. (2007). A practical solution to the pervasive problems of p values. Psychon. Bull. Rev 14, 779–804. doi: 10.3758/BF031 94105

**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2016 Faulkenberry and Tummolini. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Adaptive Smart Technology Use: The Need for Meta-Self-Regulation

Theresa Schilhab\*

Future Technologies, Culture and Learning, Danish School of Education, University of Aarhus, Copenhagen, Denmark

Keywords: smart technology, reading, novelty effects, self-regulation, automatization, attention regulation, smartphones, tablets

## INTRODUCTION

Today, smart technology in the form of tablets and smartphones is a cherished tool for most people. Instant online access that allows for extensive interacting on social media, texting, playing video games and music, checking for news and weather has turned smart technology into "an integral part of the lives of all ages worldwide" (Samaha and Hawi, 2016, p. 321). The multi-functional nature of smart technology makes it attractive as a tool for learning and education (Kucirkova, 2014; Schilhab, 2017a) leading, however, to noticeable changes in affordances and embodiment, and consequently learning (Mangen and Schilhab, 2012) 1 .

For instance, reading scholars increasingly find that changing the physical reading platform (from a printed book to a digital screen) leads to marked alterations in comprehension of the text read. They point to factors related to the affordance of the reading device such as haptics e.g., perception through touch (Mangen and Kuiken, 2014) and lighting conditions (e.g., Benedetto et al., 2013) as aspects undergoing a significant change which result in a reduced learning outcome. This observation is corroborated by studies probing for accompanying metacognitive processing that show less accurate prediction of performance and more erratic study-time regulation when reading on screen versus on paper (Ackerman and Goldsmith, 2011).

Such effects on literary reading are not agreed upon unanimously. Some researchers emphasize our "native biological plasticity" that among other things entails "bodily reconfiguration" Clark (2007 p. 263) and major "re-embodiment" (Ihde, 2010) when describing human cognition in relation to smart technology use. The argument asserts that the reduced comprehension when reading on screen is a novelty effect in the sense that subjects are proficient print-readers while still lacking in screen expertise (Hayler, 2015). Over time, people will adjust to the affordances of the new devices and the comprehension issues apparent today will evaporate as screen reading abilities are simultaneously refined and technology has co-evolved for this specific task.

In so far as screen use is tool-use, the proposed plasticity- and embodiment perspective prevalent in human-technology interaction studies seems pertinent. The question remains, however, if prolonged exposure and subsequent development of embodied skills is all it takes for humans to adapt to the affordances of smart technology. A relevant objection to the novelty claim could be that the comprehension issues associated with screen reading exemplifies a need to go beyond automatic embodiment processes and conceptualize the specifics of the mental processes that account for our adaptation to the environment.

In the following article, I unfold why automatic skill learning may not be an exhaustive answer to the affordances provided by smart technology. First, I discuss what characterizes smart technology

#### Edited by:

Zheng Jin, Zhengzhou Normal University, China

#### Reviewed by:

Tom Ziemke, University of Skövde & Linköping University, Sweden

> \*Correspondence: Theresa Schilhab tsc@edu.au.dk

#### Specialty section:

This article was submitted to Cognition, a section of the journal Frontiers in Psychology

Received: 22 November 2016 Accepted: 16 February 2017 Published: 02 March 2017

#### Citation:

Schilhab T (2017) Adaptive Smart Technology Use: The Need for Meta-Self-Regulation. Front. Psychol. 8:298. doi: 10.3389/fpsyg.2017.00298

<sup>1</sup>Kiverstein and Rietveld (2015, p. 709), affordances are "the multiple possibilities for action that stand out as relevant for an individual in a particular situation because of their needs and concerns". In this paper I compare those affordances that we address using'skilled intentionality' (ibid.) and affordances that attract our conscious attention. See however 1979/1986 for the original reading of the term.

tools from the perspective of attention to better determine significant features of the interaction and what adaptive processes would entail. I conclude that section by proposing that the interaction calls for self-regulation. Second, I discuss a socially mediated mechanism that seems especially supportive in the building of such capacities necessary for environmental coping in general and smart technology use in particular.

## SMART TECHNOLOGY AND ATTENTION

Part of the controversy over reading performance in literature studies stems from the inevitable complex relations we form with the environment (e.g., Jin et al., 2015). On the one hand, when musicians interact with their instruments like pianos for instance, or high jumpers with their bars, they adapt to factors of the interaction that all else being equal manifest little variation (e.g., Jabusch et al., 2009). Crudely put, to adapt to similarities of tasks consists among other things in strengthening the overall connection among neurons in the neural correlate to increase the signaling efficiency (Draganski et al., 2004; Jäncke, 2009). A skilled performance partly evolves because of strengthening of automatic bottom-up processes elicited by the repetition of particular elements in the task (Maguire et al., 2006).

This may be exemplified by an fMRI study on how processes underlying imagery differ among novices and experts in a complex motor skill (the high jump) showing considerable divergence with respect to the involvement of motor areas such as the supplementary motor area (SMA) and primary motor cortex (Olsson et al., 2008). Subjects were asked to imagine the performance of a full jump, with special emphasis on certain stages, such as take-off or clearing the bar. Novices who did not have previous experience of the high jump showed more activation in areas that suggested that they took an external view of the task (watching the jumps from without as if out of the body) possibly, because their previous experiences with high jumps were primarily as spectators to high jumps. Thus, the activation of SMA, which is suggested to be responsible for internally guided actions both while executed and imagined, was lower in novices than it was in expert high jumpers. Following the authors, the use of an internal perspective during motor imagery of a complex skill depends on wellestablished motor representations of the skill before these can translate into a motor/internal pattern of brain activity (Olsson et al., 2008, p. 5).

On the other hand, we have interactions that are characterized by intrinsic variability. In the course of evolution, our coping with such seemingly unstable factors has been optimized by development of fine-tuned attentional resources to help us focus on particular aspects of the environment (e.g., Kaplan and Berman, 2010). Why is this of importance when dealing with our possibilities for adapting to smart technology? The reason is that part of our interaction with smart phones and tablets when for instance reading is defined especially by the erratic factors that elicit our vigilance (e.g., Chun et al., 2011). In fact, the multi-functional affordance of the tool may itself drag attentional resources diminishing the attention we normally allocate to reading in order to comprehend the text (Wolf and Barzillai, 2009, for a recent discussion on stable and variable affordances of relevance to the present discussion, see Sakreida et al., 2016). The appeal for diverse activities such as checking for emails, surfing the internet, or tapping into social media while reading is in effect even if notifications and various alerts are deliberately turned off. The mere awareness of putative distraction may reallocate attention from comprehension processes (Przybyliski and Weinstein, 2013; Schilhab et al. submitted).

## THE NEED FOR INTENTIONALITY

Besides the smart-tool features, smart technology affords instant distraction and gratification, such as watching videos, gaming or establishing social contact online, which drags attention from other tasks. Thus, those smart technology interactions that drag attentional resources are not prone to implicate automatized processes in the procedural sense of the term. Hence, appropriate adaptation to smart technology as a tool needs to go beyond mere embodiment and to involve some kind of attention regulation as suggested by studies emphasizing risks of addiction in connection to increased smart technology (Wei et al., 2012; Tarafdar et al., 2013). When people check for messages and updates not because they need to, but out of habit (Lee et al., 2014) and are deeply attracted to the device even in the company of others (Radesky et al., 2014), social relations may become challenged (Turkle, 2015).

But how do we cultivate attention regulation?

Attention regulation is closely connected to executive functions (EFs), which refer to an assembly of functions in use when we concentrate and think. The core functions are inhibition, working memory, and cognitive flexibility and form the basis for "higher-order" EFs such as reasoning, problem solving, and planning (Diamond, 2013).

Working memory is the function that holds back information in mind to be manipulated. It is involved in making sense of linguistic information, to derive a general principle, and acknowledge novel relations among old ideas (Diamond and Lee, 2011).

Inhibition refers to control of behavior, such as when inhibiting habitual responses and resisting short-sighted temptations such as leaving a task unsolved or incomplete. Inhibition is exercised in attention regulation to corroborate focused and directed attention and in emotional self-control.

It is EFs that allow us to perform "offline" tasks (Wilson, 2002), i.e., tasks that do not depend on information from the environment but on sustained imagery (e.g., Schilhab, 2015a), while fencing off disturbing stimulations (Vanhaudenhuyse et al., 2010). During reading for instance, the active construction of meaning (Wolf and Barzillai, 2009) involves maintenance of competing interpretations until a final solution to the developing understanding is found (e.g., Smallwood et al., 2008).

On the other hand, cognitive flexibility manages perspective change, for instance switching between different aspects, thinking outside the box, and understanding the perspective of other people (Schilhab, 2015a).

Overall, EFs have been linked to better academic skills, better quality of life, and improved self-assessment and are to some extent trainable (Diamond, 2012).

## THE SOCIAL DIMENSION OF THE "INNER" SYSTEM

An assertion of attention regulation implicitly assumes that there exist mental operations beyond those evolving as bottom-up embodiment bound adaptations to the environment (though kinds of attention regulation may be closely connected to the action-perception cycle, see for instance Jin and Lee, 2013 and Jin et al., 2015 for the discussion of how the training of Kih may lead to affordance-control).

Here, I suggest that, although attention regulation exists as a potential operation of the mind of the individual, its actualization depends on social interaction in a certain kind of conversational exchange (Schilhab, 2015b). Ordinary discourse, in which participants exchange information on the fly, may happen at a superficial level without substantial attentional investment on the part of the interlocutors. Everyday exchanges of words for instance need not recruit focal thought to satisfy the purpose of a dialogue. For conversation to result in acquisition of abstract knowledge, the more knowledgeable (the parent or care taker) must take responsibility to create mutual comprehensibility in the conversation by assessing the perspective of the learner and fill in the gaps to ensure coherence of the emerging conversation (a condition salient also in Vygotsky's zone of proximal development e.g., Hasse, 2014; see also Schilhab, 2015b, 2017b). Learning about abstract referents one has had no direct experience with places a different stress on the ability to sustain understanding. To convey abstract knowledge, the interlocutor will need to establish metaphors or phrases that immediately capture the concrete meaning of the abstract knowledge. Just as the adult in ostensive learning furnishes the immediate environment, for instance holding up a cup, pointing to the cup and exclaiming "cup" (e.g., Pulvermüller, 2012), the interlocutor furnishes the world that is off-line. He or she seeks mutual comprehensibility and makes mental tableaus that are thought

## REFERENCES


to match the understanding of the child. In concrete language acquisition, interlocutors merely point to the referent of the conversation, whereas in abstract language acquisition, the interlocutor points by using words.

Thus, the mechanisms that lead to understanding fundamentally change. The cognitive efforts behind this process are comprehensive and advanced and include mastering an attentional switch from monitoring external stimulation to the internal "stream of consciousness" (e.g., Dennett, 1992).

Language elicited imagination depends on a certain degree of linguistic competence and is therefore likely to emerge relatively later in language acquisition. Moreover, for the ability to fully develop it is crucial to have emphatic interlocutors. My assertion is that a subject's abilities to acquire abstract knowledge and with that become trained in monitoring the internal stream of consciousness evolve most readily through careful guidance and therefore may in fact vary noticeably as an effect of a "master."

## FINAL REMARKS

Some of the reported side effects of smart technology employment referred to by contemporary research, such as novelty effects, will definitely vanish when users become more proficient. Here, the embodiment processes work in response to the immediately present environment. However, combating distractors, which often operate as attention-grabbers, inherent to the affordances of smart technology calls for cognitive metaprocesses elicited independently of the interaction with smart technology. As with any addictive "substance" that modern Western life has admitted almost unrestricted access to, such as calories, sessile life, alcohol, cholesterol, or on a larger scale fossil fuels, it is up to the individual to evolve a well-functioning, albeit cognitively exhausting, self-control. Though many avenues to achieve this ability are open, I suggest that the individual may quite effectively be gently nudged in the right direction by engaging in deep conversations with interlocutors. Mental mechanisms central to mediating understanding of what may not be concrete or present, simultaneously enhance the mechanisms we need in order to appropriately adapt to smart technology.


**Conflict of Interest Statement:** The author declares that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2017 Schilhab. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Gih (Qi): Beyond Affordance

Yang Lee<sup>1</sup> , Robert E. Shaw<sup>1</sup> and Zheng Jin<sup>2</sup> \*

<sup>1</sup> Center for the Ecological Study of Perception and Action, University of Connecticut, Mansfield, CT, USA, <sup>2</sup> Institute of Educational Sciences, Zhengzhou Normal University, Zhengzhou, China

Ancient Eastern thought posited the ontological integration of the "mind-body world". The body-mind syncretism was a foundational precept in Eastern philosophy in which "Gih" ("Qi") was considered the basic entity of the universe and the human being. This study attempts to build a meta-theory and to demonstrate empirical designs for Gih, discussing the problems of the mind and body, or the subject and object, compared with the concept of "affordance" proposed by ecological approaches. The notion of Gih extends beyond that of affordance in that Gih activates a psychosomatic process between the physical condition and the mental state that facilitates the bi-directional interactions between subject and object. Therefore, the concept of Gih integrates mind and body, providing a means of comparing Eastern and Western philosophical systems.

Keywords: Gih (Qi), affordance, mind–body problem, eastern philosophy, ecological psychology, perception and action, embodied cognition

#### Edited by:

Tifei Yuan, Nanjing Normal University, China

#### Reviewed by:

Jaehong Ko, Kyungnam University, South Korea Keonho Shin, Kangnam University, South Korea Yang-Gyu Choi, Daegu University, South Korea

#### \*Correspondence:

Zheng Jin jinzheng@zznu.edu.cn; zhjin@ucdavis.edu

#### Specialty section:

This article was submitted to Cognition, a section of the journal Frontiers in Psychology

Received: 17 December 2016 Accepted: 27 March 2017 Published: 11 April 2017

#### Citation:

Lee Y, Shaw RE and Jin Z (2017) Gih (Qi): Beyond Affordance. Front. Psychol. 8:556. doi: 10.3389/fpsyg.2017.00556

## INTRODUCTION

The concept of "affordance" proposed by James J. Gibson (1904–1979) pioneered the field of ecological psychology, defying the conceptual limitations of indirect realism reliant on constructive processes of representations or ideas in perceptual processes (Gibson, 1968, 1979). Ecological psychologists after Gibson (e.g., Turvey and Shaw, 1979, 1995; Shaw and Turvey, 1981; Turvey, 1992) further revised the concept of affordance (also see Turvey and Carello, 2012, for further discussions). They suggested that affordance underlies the coordination of perception and action. While raising some controversial issues, Lee et al. (2012) conducted experiments that demonstrated that perceptual scales can be varied to hit-ableness. For example, a shooter can perceive a target as enlarged when he or she hits it (a similar result is reported in Soccer by Jin and Lee, 2013). How can this result occur? This question is commonly subsumed under the philosophical issue of the "natural kind or proper observable" (Millikan, 1999; Ellis, 2001), for which perception serves as empirical evidence. Focusing on embodied cognition (e.g., Glenberg, 2010; Davis and Markman, 2012; Glenberg et al., 2013), studies integrating the perspective of subject–object can promote the concept of affordance. A comparable paradigm from Eastern philosophy explains the relationship between action adjustment and object estimation in a way that might be independent of the physical action that makes affordances available.

This study proposes to use the concept of "Gih (Qi)"<sup>1</sup> in place of "affordance". Gih, as conceptualized in Eastern philosophy, coordinates perception and action (see Lee et al., 2007; Jin et al., 2015; Lee and Shaw, unpublished, for further discussions). For example, Gih can be manifested in practicing a martial art, such as sword matching. In this case, an artist activates Gih, matching his or her opponent to react to attacks and defenses that are concretized where and

<sup>1</sup>The term is spelled " " and pronounced "Gih" in Korean. It is written " " and is pronounced "Qi" in Chinese. In this article, the term Gih corresponds to Qi.

when he or she catches the other opponent's break. The artists are subject to a potential activation, such as Gih, which is attuned to the situation in which the perception-action occurs. This paper introduces and refines the concept of Gih to elaborate how perception is coordinated with action within this conceptual framework compared with that of affordance. The terms "Gih" and "affordance" were originally defined as follows:

Gih is formed ( ) through action for change ( ) to make heaven ( refers to nature or object) and human ( refers to organism or subject) coordinated (— ). (Choi, 1857: — ).

Affordance transcends the dichotomy of subjective-objective and helps us to understand its inadequacy. It is both physical and psychical yet neither physical nor psychical (Gibson, 1979).

## METAPHYSICAL REVIEW

What is real resides in neither only the mind nor only the material (i.e., the body or matter), but in the relation between the two. The metaphysics of the mind and body have been controversial in philosophy since Rene Descartes (1596–1650) proposed the notion of "dualism". Philosophical discussions on the limitations of the dichotomy (for a review, see Kim, 1996; Ravenscroft, 2005) have been revised to include "parallelism", which was examined by Nicolas Malebranche (1638–1715), thereafter regressing into two types of "monism", namely "materialism", which was advocated by David Hume (1711–1776), and "idealism", which was advanced by George Berkley (1685–1783). The metaphysics to integrate these paradigms constituted "the third paradigm", proposed by Benedict de Spinoza (1632–1677) and Gottfried von Leibnitz (1646–1716) in the era of rationalism and subsequently renewed by existentialists such as Edmund Husserl (1858–1938) and Martin Heidegger (1889–1976) as well as analytic philosophers of language, guided by Peter Frederick Strawson (1919–2006). So-called third paradigms share the premise that the mind and body form two aspects of experience that originate from one entity, the third entity.

Given the above trajectory of philosophical thought, metaphysics engages with the problems of perception between a subject and an object. Theories of perception have been divided into empiricist and idealist. Empiricists assert that perceptual processes are guided by representations, which have been incorporated into experience. Idealists counter this assertion by arguing that perception is organized by a priori category in the mind that does not refer to experience. The third paradigm proposes to resolve the controversy between empiricists and idealists. For example, ecological approaches developed the concept of affordance, embracing certain concepts discussed in quantum theory to characterize the interaction between psychology and quantum mechanics either theoretically or empirically (Turvey, 2012; Shaw and Kinsella-Shaw, 2015).

Similarly, to Western philosophy, Eastern philosophy has discussed the problems between the person and the environment. Here, the person is the subject, and the environment is the object. The subject corresponds to the mind, and the object corresponds to the material. Thus, Eastern thought, which follows two main tracks, Taoism and Confucianism, has been concerned with the problem of how people can live as person–like in the environment (Pyung, 1934/1999; Feng, 2009; Tang, 2009; also see Yang, 1993, for further discussions). It is agreed that the environment exhibits natural laws and fates to which people accordingly try to adapt. Furthermore, Eastern philosophy is based on a third paradigm. Confucianism postulates two basic third entities, Gih and Lih (for a review, see Huh, 2004). Hwoang Lee (1501–1570) proposed that "Gih activated to implement Lih". This statement implies that Gih is a force or a potential that motivates an event, and Lih is a reason or a purpose to be directed. Daeseung Gi (1527–1572) revised this understanding of the two entities, writing: "Gih is manipulated, and Lih is inferred". Philosophers thus differentiate between the two basic entities of Gih and Lih.

Expressed in more scientific terminology, compared with the classic discussions, Hangi Choi (1803–1877) renewed a theory of Gih through the discipline of Gih ("Gihology"), which derives from Confucianism and Taoism but is also influenced by the Western disciplines, such as the electromagnetic theory of Newtonian physics (Choi, 1857). First, as the quotation at the end of the Introduction states, Gih is defined as "action for change (potential activated)". Second, Gih is activated before it is "formed to be observed". Third, Gih helps coordinate the human and the environment. According to this discussion and as elaborated by Lee et al. (2007) and Lee and Shaw (unpublished), the concept of Gih is comparable with but more advanced than the concept of affordance developed by ecological approaches (see Jin and Lee, 2013; Jin et al., 2015; Jin et al., 2016, for further suggestions).

## FROM AFFORDANCE TO GIH

The concept of affordance is well known in ecological approaches as a concept for explaining the coordination of perception and action. A refinement of this concept or a substitution with a new concept is nonetheless more productive for facilitating further discussions in philosophy and biological processes. According to Gibson (1979), affordance discloses a relationship between a subject and an object, or the "psychical and physical", respectively. This concept is a higher-level concept. To explain the concept at a more concrete level, the two directions must be understood as subject-to-object and objectto-subject. Later, ecological psychologists questioned whether this concept adequately explains a subject's inclinations or an object's properties. Whereas some scholars have criticized the implications of the concept, other scholars have defended it (for a review, see Michaels, 2003), contending that affordance connotes the object's property, directed as object-to-subject, and the subject's inclination, directed as subject-to-object. Therefore, the subject's "effectivity" or some ability of action has been proposed (Michaels and Carello, 1981; Shaw et al., 1982). Other academics have proposed "intentionality" for action (Shaw, 2001) and "relation" (Chemero, 2003) to define an animal's ability to interact with aspects of the environment. Finally, some

definitions that were extended to include "meaningfulness" were proposed but criticized for being ambivalent in sensible testing (Michaels, 2003).

Thus, it remains controversial whether the revisions of affordance can free the concept from its original limitations. First, the means by which potential effectivity can be activated should be determined. Direction indicates what is implied with a bi-direction relation, such as subject-to-object or object-tosubject (e.g., Dotov et al., 2012). Second, how can potential effectivity be activated in a situation? Is effectivity primed by the mind, matter (or the body) or both? Third, it is understood that affordance is coextensive with information. Is it possible that affordance and information are not differentiated in concept and reality (c.f., Fajen et al., 2009)? In proposing a third paradigm, it is necessary to specify the entity that subordinates both the mind and the material and thus affects both the subject and the object. Therefore, this paper argues that one such third entity, Gih, as discussed by Choi (1857) and by Lee and Shaw (unpublished), is defined as potential activation. The argument is threefold. First, Gih is specified, not as static potential, but as activation without purpose. Second, Gih specifies bi-directional interactions between subject and object. Third, Gih is psychosomatic, thus relating the mind and the body. These three propositions indicate that when refined, the concept of Gih extends the understanding of the coordination of perception and action beyond that proposed by the concept of affordance.

Gih is demonstrated in the genealogy of the words used in daily living as "blood-Gih" and "mind-Gih" (Lee and Shaw, unpublished). Blood-Gih denotes passion and intention and represents the activation from body (blood) to mind (emotion). By contrast, mind-Gih corresponds to the health and physical conditions as instances of activation from mind to body. Therefore, blood-Gih and mind-Gih possess features that can be characterized as activation and together represent a bidirection of Gih, mind-to-body and body-to-mind. The question arises regarding how the blood or mind is compounded with Gih to direct the other. Both the blood and the mind are hypothesized to attain their relative properties. It is through psychosomatics that the body (blood) works with the mind. As the term psychosomatic implies, Gih attempts to incorporate both the mental and biological processes in terms of Eastern philosophy and Oriental medical science (Leslie and Young, 1992; Tateno, 1993). Thus, Gih performs work that is psychosomatic (see Lee and Shaw, unpublished, for further discussion).

There is a critical reason to separate Gih from affordance. As noted, Gih should be differentiated from Lih, which in contrast implies a kind of reason that serves what is intended (see Hwoang Lee and Daeseung Gi's discussion in the above chapter "Metaphysical Review"). As established by Gibson and further developed by later ecological psychologists, affordance is embedded with information, which thus implies directing the effort of coordinating perception and action for a purpose. By contrast, Gih is an activation potential and thus is not directed, as it does not have purpose. Some philosophical perspectives suggest concepts similar to Gih. For instance, "Elan vital" was proposed as a "living force of no teleology" by Henri Louis Bergson (1859–1941) (Papanicolaou and Gunter, 1987).

Consider a hypothetical example to illustrate the differences between the understandings of perception and action that result from applying the concepts of Gih and affordance. Suppose a dog approaches a girl to bite her. The girl runs away from the dog. Explained with the concepts of affordance and information, the girl is informed that the dog is about to bite her, and she is afforded with running away from the situation. However, modeling the situation in terms of Gih, if the girl's Gih is not activated, then she cannot run away and thus cannot prevent herself from being bitten by the dog. Hence, Gih activation is a prerequisite to what follows with purpose, such as running away. Gih is effectively neutral in purpose, though it can be activated. Consider a more dramatic situation, such as a case in which the girl's Gih is highly activated. Suppose she encounters a high wall while being chased by the dog. She will fight the dog. At this moment, Gih is activated and is directed in terms of subject-toobject for the girl and object-to-subject for the dog and the wall, which also coordinates the mutual directions. Thus, the girl's Gih activation is situational because the wall blocks her retreat. Gih is activated by and construed as a psychosomatic process such as a fear of biting or the valor of combat, respectively manifested as running or fighting. What is psychosomatic has the potential to activate and influence both the mind and the body, and its function is mediated by the hormonal metabolic and automatic nervous systems.

## GIH AS PSYCHOSOMATIC OPEN TO EASTERN DISCIPLINES

Sword matching, the example mentioned in the Introduction, offers a better example than the previous example of the girl and the dog for comprehending Gih as a psychosomatic process. An artist (A) performs a more advanced technique, such as a counterattack, on his or her opponent (B)'s body part (for example, B's wrist), just after (B) starts to attack (A) (for example, A's head). When (B) attacks (A)'s head, (B) must open his wrist, which is performed at a sufficiently slow pace for (A) to counter-attack, despite the wrist being a small part moving quickly. How can this counter-attack be explained? Lee et al. (2012) suggested that (A) could enlarge the small space of (B)'s hidden body part, the wrist, and lengthen the time of (B)'s wrist movement to succeed in a counter-attack in a second. Therefore, the perceptual scale of (A) was enhanced (see Jin and Lee, 2013, for generalized empirical evidence). Referring to a cognitive explanation, (A) is more skilled because of repeated practice. However, this model can explain only A's skills, which are not specified in time and space but are evaluated based on average conditions. In terms of the Gih model, (A) has Gih as the psychosomatic potential that is attuned to the present situation of perception-action, which is activated by spiritual intention and coordinated by physical autonomic processes. Gih can be trained through martial arts practice and conceptualized as a type of physical and spiritual learning. However, in the above case of the weak girl who must fight against the cruel dog when blocked by the high wall, the

level of Gih activation increases without training. Thus, Gih can be situationally activated for survival to release psychosomatic processes between the mind and body, thereby accommodating the person to the environment.

If Gih is understood in terms of the psychosomatic process, it makes sense that meditation practices and treatments performed in Oriental medical science are examples of the principle of Gih. Meditation practice relies on certain correct postures and a focused mind. For the technique to integrate the mind and body, it is advised that respiration should be controlled (e.g., Schure et al., 2008; Jung et al., 2010). Breathing is normally processed involuntarily but can be controlled voluntarily with training. The control of respiration activates the mediation between physical conditions—such as blood circulation, hormonal metabolism, and nerve activation—and mental states—such as consciousness, cognition, and emotion. Thus, respiratory control is a psychosomatic process. Oriental medical science can also be theorized as a means of circulating Gih. Practitioners diagnose patients by touching the blood waves of the body (ordinary, wrists), checking the strength and rate of respiration, observing the colors of the face, and noting other vital signs. The techniques used to treat the psychosomatic variable of Gih include techniques such as the treatment of natural medicines, hot steam stimuli (moxibustion), and needle stimuli (acupuncture). Positive results have been reported in scientific studies, despite some controversies (see Unschuld, 1985; Leslie and Young, 1992; Guan and Fan, 2002; Noble, 2009, for further discussions).

What can the consideration of Gih as a psychosomatic variable contribute to empirical analysis? With respect to experimental procedures, Gih must be measured through its effects on behaviors associated with the mind. The observable phenomena can then be attributed to certain scales in perception and action, which are described as the terminal measures. For the measures of perception and action, studies have analyzed the influences of only some physical variables. In experimental analyses, Lee et al. (2012) manipulated the conditions of an archer's arm control. The treatment in the study conducted by Jin and Lee (2013) varied according to soccer players' running speeds. In the previous example of the artist sword matching, treatment can similarly be varied based on patterns of footwork. Physical manipulation has been understood to be empirical. Despite the convenience of manipulation; however, such physical variables are insufficient because they do not account for mental events. Therefore, the intermediated variable must be manipulated at the psychosomatic level, which links physical states and mental processes.

For a psychosomatic design, respiration control can be proposed in the sword matching example. Martial artists are adept at catching attack points while the space or time to attack is enlarged or prolonged. These phenomena occur by way of the psychosomatic process of Gih enhancement, which is activated by respiration control. In terms of paradigms, Gih as a psychosomatic model can be construed as a "hologram" model, such as that proposed by Pribram (1991), who argues that classic paradigms have overly relied on analytic dispositions. Because Gih, which is manipulated by respiration and other available processes, is a psychosomatic variable, it must be reevaluated as holographic rather than analytic. Thus, the discussion of Gih can be extended to arrange the empirical variables in a hierarchy ranging from the physical variables through the psychosomatic variables to the mental variables.

Because the psychosomatic variable of Gih is institutionalized, what is disposed of in popular discussions of psychological questions should be elaborated and further clarified. Problems concerning communication in language, aesthetic evaluation, and social relations represent some instances of mediation between mind and matter, or subject and object. If what has been discussed in this paper regarding blood-Gih and mind-Gih is extended, the terms of speaker-Gih and listener-Gih, artist-Gih and appreciator-Gih, and social-Gih between the subject and object could be tentatively proposed, provoking further discussion.

## CONCLUSION

The concept of Gih is compatible with Gibson's concept of affordance in some respects. In other respects, though, the two concepts differ. Gih is a potential activation that possesses not only physical properties but also a mental disposition, influencing both subject and object, and thus should be considered a third entity.

Along with the subsets "mind-Gih" and "body-Gih" (also called "blood-Gih"), the concept of Gih accounts for the bi-directional interactions between mind and body. Thus, the concept advances the understanding of the coordination of perception and action beyond that enabled by the concept of affordance as initially proposed (Dotov et al., 2012) and conforms to revised notions of intentionality and effectivity (Michaels and Carello, 1981). The role of information in enacting affordances implies that the latter is teleological. By contrast, Gih refers to the potential to activate mental and physical states and thus lacks purpose. Gih activation between the mind and body can be refined with training through meditation or respiration, which are known to control psychosomatic processes, thus influencing an involuntary mechanism through voluntary control. The possibility of such refinement remains a topic for discussion in Eastern philosophy and Oriental medical science.

Gih can be considered a psychosomatic variable that is located at the mid-level in a hierarchy of variables ranging from physical to mental and thus should be distinguished from affordance. Hence, Gih also passes the philosophical test of Occam's razor, which demands no redundancy when scientific terms are created. Therefore, the refined term of Gih is not synonymous with affordance. Nevertheless, further discussion is required to discern whether the Gih concept could incorporate elements of physical psychology (Turvey and Carello, 2012), which would mark a theoretical advancement from the ecological approaches. Looking forward, the Gih concept accommodates communication processes, aesthetic feelings, and social relationships and may offer a way to integrate Eastern and Western traditions of thought concerning the coordination of perception and action.

## AUTHOR CONTRIBUTIONS

fpsyg-08-00556 April 7, 2017 Time: 17:43 # 5

YL: substantial contribution to the conception of the work; drafted the work. RS: substantial contribution to the conception

## REFERENCES


of the work; substantial revision of the work. ZJ: substantial contribution to the conception of the work; substantial revision of the work. All authors contributed equally to this manuscript.


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2017 Lee, Shaw and Jin. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.